Google's algorithm has dramatically evolved over the years. In 2013, the Google Hummingbird update transformed Google's ability to provide the most relevant results by interpreting the searcher's intent rather than relying on a specific keyword.
For searchers, this meant Google turned into a knowledge assistant, helping to close knowledge gaps that would have made it difficult for the searcher to find a relevant search result. For example, Google was now able to recognize the intent of a query for "president of Canada" and return information on Canada's prime minister.
For SEO's this meant no longer trying to account for every synonym or keyword variation and stuffing them onto a page. It also sparked a call (once again) for focusing on creating high-quality, relevant content. While creating quality content is the goal, understanding how Google identifies quality content is crucial to staying competitive as Google's SERPs continue to evolve. Tying together synonyms and similar phrases were the beginning of a smarter Google algorithm, but now Google can tie together related concepts to understand which content provides the greatest breadth and calculate how often those concepts appear on a page to identify which piece provides the most significant depth.
What is TF-IDF?
TF-IDF stands for frequency-inverse document frequency and is a way of determining the quality of a piece of content based on an established expectation of what an in-depth piece of content contains.
(TF-IDF) measures the importance of a keyword phrase by comparing it to the frequency of the term in a large set of documents.
In a previous article about TF-IDF,A.J. Ghergich tells us "The overall goal of TF-IDF is to statistically measure how important a word is in a collection of documents".
For example, if you are a small business owner wanting to learn how to use SEO to drive more traffic to your website, there are several topics that a complete SEO guide would cover including:
- Keyword Research
- Meta Data
- Site Audit
- Google Bots
Other topics that would also be relevant but would likely appear less frequently than those on the list above include:
- SEO Tools
- Core Update
- Panda Update
- H1 Tag
When evaluating a piece of content, the Google algorithm would calculate how often each of the above terms appears on all of the content currently associated with "SEO guide" in comparison to all of the other terms. This data is then used as a baseline "score" that any one piece of content can be scored against. TF-IDF can help you determine what keywords you are missing.
When to Use TF-IDF Analysis
SEO's and content creators can use TF-IDF to identify content gaps in their current content based on the content currently ranking in the top 10 search results. It can also be used when creating new content so that content ranks higher, faster. However, marketers also have limited time, so which content pieces should you focus on first to get the most benefit?
1. High Potential Content Stuck on the 2nd Page
Start by identifying content that has been live on your site for a while but is struggling to break the first page. If that content has already been optimized for technical SEO considerations and has some authority going to it, it likely would benefit from further content optimization.
2. Content Slowly Losing Traffic (and Rankings) Over the Past Year
Whenever I see a site that has slowly dropped from the top of the first page to the bottom of the first page, it's typically due to increasing competition or Google's algorithm changing which content is most relevant to that SERP. A quick way to check this is to pull up a screenshot of the SERP from a year ago using a tool like SpyFu and comparing it to the current SERP. In either case, revisiting your content to ensure that it still relevant and the most relevant helps you recover and maintain those rankings.
3. Product Pages Struggling to Rank
While it is more common for top-of-funnel content to benefit from TF-IDF, if your product pages are struggling to rank for your money terms, critical content is likely missing from that page.
How to Complete TF-IDF Analysis
Collecting the data necessary for TF-IDF is relatively easy. I start by pulling the top 10 results for my target keyword and putting them in Screaming Frog to get an average word count.
This number helps me determine whether I'm going to need to add large sections of content to my page or if I am covering too much of the wrong subject. I then run the analysis with a TF-IDF tool.
There are several available including Ryte and Link Assistant. Ryte (offers free accounts) compares a live URL to the top 10 results and provides a text editor that provides optimization recommendations as you are creating new content.
Ryte provides you with a list of the most important keywords and scores your website based on that list.
Note: Many people like to use Python for TF-IDF analysis.
How to Optimize TF-IDF with the User In Mind
The tricky part comes next. How do you take this list of terms and add them to your content, so the content is more useful to the user?
1. Edit the List
Start by using common sense to narrow down your list. In the analysis above, SquareSpace shows up as a relevant keyword. Competitors who use their brand name frequently throughout their site show up in these analyses.
Unless Google is looking for a product or vendor comparison, mentioning competitors will typically not help your content to be more relevant.
2. Identify Missing Subjects
Many SEO's see a list of TF-IDF terms and immediately go back to their keyword density days. While adding variations of a keyword to copy can still be valuable, the goal of TF-IDF isn't merely to stuff each word into the copy somewhere a couple of times.
Instead, TF-IDF should help you identify missing subjects that should be in your document, which could be as small as providing sizing on a product page or as big as adding a paragraph or two to a blog post that makes the piece more comprehensive. Reviewing how competitors are using your missing terms helps you identify the best way to go about optimizing your content.
Start by pulling up the top 10 pages for your target keyword and search for the TF-IDF term within the competitors' content. Identify patterns of content that your competitors have that you don't. Ryte also identifies which page uses the TF-IDF term the most, so you can click directly to that competitor's page.
3. Adapt Format if Necessary
Changing the design and layout of a website takes time and resources that aren't always available or necessarily worth it for every SEO update. However, if you have experimented with several similar pages and found that changing the overall content is useful, updating the design to match creates a much better user experience and helps you optimize additional content in the future.
When to update your design:
- Page structure doesn't allow for new content sections.
- Page was originally built for the wrong search intent and/or audience.
- Content has become too bulky for the current sections.
- Page template doesn't include design components that effectively break up the text.
- The page is too long and needs more interactive components to be effective.
Once you have identified a page that needs to be updated, remember the following best practices:
- Except for e-commerce sites and image or template galleries, the content you're adding to the page should be information the searcher is actively looking for, so make it easy and compelling to read. In the case of product descriptions, a section at the bottom of the page with small text is universally understood as (ignore this section) text.
- Remember your hierarchy. Keep your value proposition and messaging up top and add supplemental content below.
- As you add more content to a page, add additional CTAs throughout.
- For extensive, in-depth content, add sticky menus and interactive elements to keep the reader engaged.
- Keep the content scannable with subheadings, bold text, bullet points, and imagery.
A TF-IDF Example
Does this stuff work and how will I know if it does? Great questions!
Last year, Lucidpress created this brand management software page to promote its new enterprise features. While the page was optimized, crawl-able, and relevant, it was struggling to rank months later. We used Ryte to pull a TF-IDF analysis:
In the chart, the higher the orange bar is, the more relevant the keyword is. As you can see, digital assets are considered nearly as relevant as brand assets in this SERP. From here, we needed to determine what topic other pages were including that ours wasn't. To do this, go to the SERP for your original keyword and review how your competitors use that term.
A look at the title tags provided the first clue:
Digital asset management and brand asset management are technically two different product categories, but they tend to get used interchangeably, and the same sites rank for both terms. (See Brandfolder above). Lucidpress currently does not have all of the features of a digital asset management solution, but there is much overlap, so we added the topic by addressing that overlap:
The chart below shows the resulting keyword ranking increase. Before the content updates, the page either didn't rank (where the line drops off suddenly) or averaged a ranking of #50. After the content updates, the page ranks consistently around position #25.
Our niche, long-tail keywords were ranking at the bottom of the second page. Since making updates, those rankings have moved to the first page.
Remember, the goal of TF-IDF is to help you approach content quality in the same way that a machine (Google) does, but the ultimate goal of both Google and yourself is to create the best piece of content for the user.