Go to Blog

TF-IDF: Advanced On-Page Optimization

91
Wow-Score
The Wow-Score shows how engaging a blog post is. It is calculated based on the correlation between users’ active reading time, their scrolling speed and the article’s length.
Learn more

TF-IDF: Advanced On-Page Optimization

Christina Sanders
TF-IDF: Advanced On-Page Optimization

Google's algorithm has dramatically evolved over the years. In 2013, the Google Hummingbird update transformed Google's ability to provide the most relevant results by interpreting the searcher's intent rather than relying on a specific keyword.

For searchers, this meant Google turned into a knowledge assistant, helping to close knowledge gaps that would have made it difficult for the searcher to find a relevant search result. For example, Google was now able to recognize the intent of a query for "president of Canada" and return information on Canada's prime minister. For SEO's this meant no longer trying to account for every synonym or keyword variation and stuffing them onto a page. It also sparked a call (once again) for focusing on creating high-quality, relevant content. 

While creating quality content is the goal, understanding how Google identifies quality content is crucial to staying competitive as Google's SERPs continue to evolve. Tying together synonyms and similar phrases were the beginning of a smarter Google algorithm, but now Google can tie together related concepts to understand which content provides the greatest breadth and calculate how often those concepts appear on a page to identify which piece provides the most significant depth. This in-depth content analysis is called term frequency-inverse document frequency (TF-IDF) analysis.

What is TF-IDF?

TF-IDF is Google's way of determining the quality of a piece of content based on an established expectation of what an in-depth piece of content contains.

(TF-IDF) measures the importance of a keyword phrase by comparing it to the frequency of the term in a large set of documents.

— Cyrus Shepherd- More than Keywords: 7 Concepts of Advanced On-Page SEO

In a previous article about TF-IDF, A.J. Ghergich tells us "The overall goal of TF-IDF is to statistically measure how important a word is in a collection of documents". 

For example, if you are a small business owner wanting to learn how to use SEO to drive more traffic to your website, there are several topics that a complete SEO guide would cover including:

  • Keyword Research
  • Meta Data
  • Site Audit
  • Crawl-ability
  • Google Bots

Other topics that would also be relevant but would likely appear less frequently than those on the list above include:

  • Moz
  • Ahrefs
  • SEMrush
  • Panda Update
  • H1 Tag

When evaluating a piece of content, the Google algorithm would calculate how often each of the above terms appears on all of the content currently associated with "SEO guide" in comparison to all of the other terms. This data is then used as a baseline "score" that any one piece of content can be scored against. 

When to Use TF-IDF Analysis

SEO's and content creators can use TF-IDF to identify content gaps in their current content based on the content currently ranking in the top 10 search results. It can also be used when creating new content so that content ranks higher, faster. However, marketers also have limited time, so which content pieces should you focus on first to get the most benefit?

1. High Potential Content Stuck on the 2nd Page

Start by identifying content that has been live on your site for a while but is struggling to break the first page. If that content has already been optimized for technical SEO considerations and has some authority going to it, it likely would benefit from further content optimization.

2. Content Slowly Losing Traffic (and Rankings) Over the Past Year

Whenever I see a site that has slowly dropped from the top of the first page to the bottom of the first page, it's typically due to increasing competition or Google's algorithm changing which content is most relevant to that SERP. A quick way to check this is to pull up a screenshot of the SERP from a year ago using a tool like SpyFu and comparing it to the current SERP. In either case, revisiting your content to ensure that it still relevant and the most relevant helps you recover and maintain those rankings.

3. Product Pages Struggling to Rank

While it is more common for top-of-funnel content to benefit from TF-IDF, if your product pages are struggling to rank for your money terms, critical content is likely missing from that page.

How to Complete TF-IDF Analysis

Collecting the data necessary for TF-IDF is relatively easy. I start by pulling the top 10 results for my target keyword and putting them in Screaming Frog to get an average word count. This number helps me determine whether I'm going to need to add large sections of content to my page or if I'm covering too much of the wrong subject. I then run the analysis with a TF-IDF tool. There are several available including Ryte and Link Assistant. Ryte (offers free accounts) compares a live URL to the top 10 results and provides a text editor that provides optimization recommendations as you are creating new content.

Ryte provides you with a list of the most important keywords and scores your website based on that list.

TF-IDF REsults

How to Optimize With the User in Mind

The tricky part comes next. How do you take this list of terms and add them to your content, so the content is more useful to the user?

1. Edit the List

Start by using common sense to narrow down your list. In the analysis above, SquareSpace shows up as a relevant keyword. Competitors who use their brand name frequently throughout their site show up in these analyses. Unless Google is looking for a product or vendor comparison, mentioning competitors will typically not help your content to be more relevant.

2. Identify Missing Subjects

Many SEO's see a list of TF-IDF terms and immediately go back to their keyword density days. While adding variations of a keyword to copy can still be valuable, the goal of TF-IDF isn't merely to stuff each word into the copy somewhere a couple of times. Instead, TF-IDF should help you identify missing subjects that should be in your document, which could be as small as providing sizing on a product page or as big as adding a paragraph or two to a blog post that makes the piece more comprehensive. Reviewing how competitors are using your missing terms helps you identify the best way to go about optimizing your content.

Start by pulling up the top 10 pages for your target keyword and search for the TF-IDF term within the competitors' content. Identify patterns of content that your competitors have that you don't. Ryte also identifies which page uses the TF-IDF term the most, so you can click directly to that competitor's page.

ryte-1.png

3. Adapt Format if Necessary

Changing the design and layout of a website takes time and resources that aren't always available or necessarily worth it for every SEO update. However, if you've experimented with several similar pages and found that changing the overall content is useful, updating the design to match creates a much better user experience and helps you optimize additional content in the future.

When to update your design:

  • Page structure doesn't allow for new content sections.
  • Page was originally built for the wrong search intent and/or audience.
  • Content has become too bulky for the current sections.
  • Page template doesn't include design components that effectively break up the text.
  • The page is too long and needs more interactive components to be effective.

Once you have identified a page that needs to be updated, remember the following best practices:

  • Except for e-commerce sites and image or template galleries, the content you're adding to the page should be information the searcher is actively looking for, so make it easy and compelling to read. In the case of product descriptions, a section at the bottom of the page with small text is universally understood as (ignore this section) text.
  • Remember your hierarchy. Keep your value proposition and messaging up top and add supplemental content below.
  • As you add more content to a page, add additional CTAs throughout.
  • For extensive, in-depth content, add sticky menus and interactive elements to keep the reader engaged.
  • Keep the content scannable with subheadings, bold text, bullet points, and imagery.

An Example

Does this stuff work and how will I know if it does? Great questions!

Last year, Lucidpress created this brand management software page to promote its new enterprise features. While the page was optimized, crawl-able and relevant, it was struggling to rank months later. We used Ryte to pull a TF-IDF analysis:

www-lucidpress-com-content-report-for-your-keyword-brand-management-software-ryte.png

In the chart, the higher the orange bar is, the more relevant the keyword is. As you can see, digital assets are considered nearly as relevant as brand assets in this SERP. From here, we needed to determine what topic other pages were including that ours wasn't. To do this, go to the SERP for your original keyword and review how your competitors use that term.

A look at the title tags provided the first clue:

brand-asset-management-google-search.png

Digital asset management and brand asset management are technically two different product categories, but they tend to get used interchangeably, and the same sites rank for both terms. (See Brandfolder above). Lucidpress currently does not have all of the features of a digital asset management solution, but there is much overlap, so we added the topic by addressing that overlap:

online-marketing-brand-asset-management-software-lucidpress.png

The chart below shows the resulting keyword ranking increase. Before the content updates, the page either didn't rank (where the line drops off suddenly) or averaged a ranking of #50. After the content updates, the page ranks consistently around position #25.

rank-tracker.png

Our niche, long-tail keywords were ranking at the bottom of the second page. Since making updates, those rankings have moved to the first page.

rank-tracker-2.png

Remember, the goal of TF-IDF is to help you approach content quality in the same way that a machine (Google) does, but the ultimate goal of both Google and yourself is to create the best piece of content for the user.

 Questions? Comment below, and good luck!

Like this post? Follow us on RSS and read more interesting posts:

RSS
Currently manage SEO and content strategy for LucidPress. Previously a digital marketing agency girl.
Share this post
or

Comments

2000 symbols remain
Great Post about TF/IDF - i use this concept for a few years now and the results are amazing - for my optimization and research i use SEOlyze.com, a tool providing extensive analysis based on the TF/IDF Algorithm.
BALACHANDAR I
I never heard about TF-IDF. I have gained knowledge after reading this article and developing my skills in advanced SEO techniques.
TF-IDF - True value of your keyword
Akhilesh Singh
Great post, I really thanks to you for sharing this informative post with us. As some peoples mention here already, I also struggling to understand the importance of this concept "TF-IDF" but as you explained so deeply/ nicely about it, anyone can understand this is about relevancy. Thanks!
Tammy Ane
Great written.Thank your nice idea sharing . I know every SEO marketer know advance seo technique. Your blog really help to SEO learner .Many many thanks
Jason Barnard
Lovely stuff Christina.
Thanks !
Hi,

Thanks for sharing this article with us. I'm working hard on my blog to achieve some success and to get some audience. to it. Hopefully, Health blog [link removed by moderator] can grab attention but it needs hard work and smart work as well. So, I will follow your points to make mine on page SEO successful.

Thanks and Regards
Incredibly insightful! I'm sure every SEO has hit that content rut or plateau and this was such a crisp, clear approach to getting through it using TF-IDF. Will be adding Ryte to my toolbelt now. Thank you!
Kelechi Ibe
Struggling to understand the importance of this concept, TF-IDF. Seems it just means we should cover topics that Google finds relevant. And it seems the Google AutoComplete and "Searches Related To" is sufficient for this.

Plus the SEMRush SEO Writing assistant shows semantically related terms that are generated from the 10 top search results. Isn't this the TF-IDF in a nutshell? What am I missing here?
Nikola Roza
Kelechi Ibe
Not quite sure, but perhaps TF-IDF analysis can show you terms and concepts you never thought of, and ones that traditional research, (like Google suggest and LSI) can't show you.

Because Google suggest shows the phrases people already search for, but TF-IDF shows what algorithm knows is related. That is how I understood it, hope it makes sense.
Simon Cox
Thats a nice peice Christina! Lots to try out here and I love the idea of targetting the 2nd page SERP layabouts! I have TF-IDF in Website Auditor and can now understand the benefit of it. Thanks!
Christina Sanders
Simon Cox
Thanks Simon! I'm glad it helped!
Saad Ali Khan
Wow, This is something new for me. TF-IDF 💜...An Awesome piece of content to get the idea of TF-IDF...
Great article interesting to learn from for my business
Yo Patel
Wow! An Advanced TF-IDF guide that I have ever read!
Balkonhotel
now I understand more, thank you for the article