logo-small

On-Page Boot Camp: What Is TF-IDF And How To Use It

86
Wow-Score
The Wow-Score shows how engaging a blog post is. It is calculated based on the correlation between users’ active reading time, their scrolling speed and the article’s length.
Learn more

On-Page Boot Camp: What Is TF-IDF And How To Use It

A.J. Ghergich
On-Page Boot Camp: What Is TF-IDF And How To Use It

In the previous videos in the On-Page Boot Camp series, A.J. Ghergich showed us how he used SEMrush to find the best keywords to work with, win featured snippets and make the content even better with the SEO Ideas tool.  

In the last video of the series, A.J. unveils the secrets of on-page optimization and presents the secret weapon of content writing, the TF-IDF metric, which allows you to find the best semantically related keywords for your content. — SEMrush Team.

Video Transcription

Hey, SEMrushers! A.J. Ghergich of Ghergich & Co. We are on our fourth and final video, and I am obviously in a pretty festive mood because we are going to talk about TF-IDF. I will explain what the heck it is, how you use it to enhance your SEO and kinda take it to the next level of your on-page optimization with semantically related keywords. And then at the end of the video, I am gonna show you some new features in keyword tracking that SEMrush just released. I have been beta testing it, love it, and I know you’re gonna love it as well.

What Is TF-IDF

TF-IDF means ‘Term Frequency — Inverse Document Frequency'. Let’s clear this up and make it easier to understand.

TF-IDF description

The overall goal of TF-IDF is to statistically measure how important a word is in a collection of documents. It’s like a really useful keyword density tool on steroids. It gets less complicated when we break it down.

1. What Is TF

So let’s look at TF: term frequency. It’s exactly what it sounds like — how often the term occurs. That is what it is measuring — occurrence. That is only going to take you so far, so typically term frequency is then divided by the length of the document to account for longer or shorter documents.

Let me give you an example: let’s say you have a 500-word article that says ‘horse’ 4 times. And you have a 2000 word article that says ‘horse’ 5 times. This would then account for that. So that is why they are dividing it by document length. We now have a good measure of occurrence.

2. What Is IDF

Let’s go and look at IDF: inverse document frequency. Ooooh, scary! It is not actually that hard to understand. This is telling you how important a term is. So it essentially has two jobs.

  • The first one: It is going to weigh down terms that appear frequently, like ‘is’ or ‘the,' and a lot of the stop-words that we all use.

  • The second goal is to scale up the more unique and less-used terms.

Let me show you how SEMrush is implementing TF-IDF into their toolset and doing these calculations, so you don’t have to.

Make Use Of Semantic Ideas

Alright, I am over in Optimization Ideas again and we are looking under examples at energizer.com. I have selected Semantic from the drop-down menu, and I have keyed in on this hands-free lighting page.

Semantic Ideas

It has some rankings, but the page itself is a little sparse. When we look at the page before we dive in, you can see it is a very thin page. It is just a little bit of text and not a lot of optimization other than branding names.

screen-shot-2017-08-15-at-152335.png

SEMrush is obviously picking up on that as well, and they are saying ‘Hey, we have a semantically related idea.’ Semantic keywords are essentially based on your top-10 rivals’ rankings data, but they are showing you things that are related topically.

screen-shot-2017-08-15-at-151206.png

For instance, if you were trying to optimize for ‘car,’ SEMrush is gonna pick up on the fact that maybe you are not saying ‘automobiles’ or ‘used truck’ and those kinds of things. The point is that Google is way beyond just ‘Let’s stuff in 4 or 5 keywords here that are an exact match and call it a day.’

Focus On The Overall Concepts

You need to look at the concepts that others are talking about and the topical relevance of your page in general compared to others. Is it as beneficial to the end-user as it could possibly be? Or am I talking about the concepts that Google deems important? And so you can see here that SEMrush has a detailed analysis we can click in to, and it shows us pretty much everything you’re going to want to know.

Semantic Ideas detailed view

You have, in this column, ‘Hey, am I using this check mark or no?’ Here are my rivals using this word, and then the more traditional percentage base, and then TF-IDF, which is awesome. So we have them, we don’t have to run the calculations, worry about the math of it. It is already done for us. I tend to key in on two-word phrases because it is a little easier to wrap your head around. We are not saying ‘hands-free’, and everyone else is.

Don't Rush In!

I do want to caution, though: just because it is on here doesn’t mean you need to go and do every one of these. You are mainly looking to say ‘Ok, how do I flesh this out? How do I round this out and make this content better?’ I am not talking about some of these concepts: I am not talking about the ‘power’ or we don’t talk about ‘lumens.’ So maybe there should be some of these in there.

If it doesn’t make sense for your page, you don’t do it, and everyone else is doing it, then don’t worry about it, don’t do that. But, in general, this will point you to those concepts that you are not are talking about enough.

You can also go over to Keyword Usage. What you get is a little bit more of the traditional, ‘Are they using these keywords in their title tag, my rivals, or their meta-description, or H1, and so on? But there is now a TF-IDF section here as well, so check that out.

Keyword ideas with TF-IDF Section

Bonus: Check The Opportunities To Get Into SERP Features

In the intro, I mentioned I would show you some cool bonus features that SEMrush is rolling out. Let’s go and take a look at those now.

We are over in Position Tracking for Energizer, the example site we have been using, and if you click on Rank count, you will notice you can pick Local pack or Top stories, so you can have more control over what you are wanting to track versus yourself or your competitors. If you step down, there are several new things.

Position Tracking with SERP features filters

There is a SERP Features column that shows all the different types of SERP features.

SERP Features column

And then there is this really cool bar that allows you to select the types of features and discover what your competitors might have that you don’t have.

Featured Snippet Filter Bar in Position Tracking

So let me give a quick example of that. With Energizer we talked about review stars in one of the videos, so let’s look at reviews.

Now, here are some keywords that have reviews on the SERP. But we have selected ‘Any domain across the web.’ So watch what happens when we change that to just ‘Energizer’... nothing.

No Reviews implemented by Energizer

And that is because they haven’t really implemented review stars as part of their markup. So there is a great way to use this for discovery. It is a really cool addition; I am really excited to use it.

Keep It Up!

These videos have been really fun to make. It is fun to share how I use SEMrush.

I would love some feedback on how you use it! Maybe you are using it in a whole different way. And hit me up on Twitter, let me know some of the things that you’re doing that are a little bit different with SEMrush and maybe I can point those out in a future video. 

A.J. Ghergich is the founder of Ghergich & Co., a marketing agency focused on creating and promoting high-end content to improve SEO. Follow him at @SEO on Twitter for SEO tips and cat gifs.
Share this post
or

Comments

2000 symbols remain
I hate your scrolling-up effect so much
Alex Tsygankov
Login 2Dislike
Hi, we are aware of this problem and will fix it ASAP.
Some things that are missed in SEMrush with TF-IDF analysis:

1) SERP might have different intents, for each intent there are different key terms
2) SEMrush doesn't show which specific URLs in SERP are using this semantic (check OnPage.org or SEOlyze.com)
3) Mixing together phrases from 1 word and from 2 words in TF-IDF — is awful, they are calculated differently
4) SERP doesn't always include 10 results, sometimes not all of them can be accessed by SEMrush bot (due to different reasons) so it should be stated — which URLs have been analyzed and which have not.
A.J. Ghergich
Boris Nikator
Hi Boris, I just made a video on search intent on my YT channel and totally agree that it is the key thing to understand before you ever even start creating content in the first place.

The TF-IDF analysis is really new and I know from chatting with the team that they have plans for expanding its features. I also pay ryte.com (onpage) for their TF-IDF tool. Your point #4 is on my request list as well.

Additionally, I'm requesting the ability to input text and have that analyzed. This way you can easily analyze your text prior to publishing. SEMrush has been very responsive to my and others suggestions so I think a lot of the items on your wishlist will be eventually implemented :)
Yulia Shevy
A.J. Ghergich
Hey, Boris and A.J.!

I’ll quickly jump in on behalf of our development team. 
First of all, thank you for your feedback! It is indeed a nice idea to add specific URLs to the Semantic detailed report. It would be easier for our dev team to implement it if you could tell us more about how you plan to use it and work with this add-on. What would your workflow be like?
As for the pages that are not accessible for our bot, if you click on the drop-down menu in the last column of the Keyword Usage detailed report, you’ll see all the URLs that were analysed. Does it work for you? Or you meant something else?
As always, thanks a lot for your bright ideas, guys, seriously!
Feel free to answer me here or drop me an email, you both know it :)
A.J. Ghergich
Yulia Shevy
For me, I like to check TF-IDF before I hit publish. So having the ability to turn your TF-IDF tool on the copy I can simply paste into an input box would be awesome!