Nick will join SEMrush on February 26 for a free webinar, “Data Mining with SEMrush — Unlocking SEMrush for the Serious SEO'er.” Learn more about the webinar after the jump!

If you want to be a mini SEO data scientist, it's a case of simply shifting your mindset from accepting what you're told to challenging assumptions based on rational thinking.

Irrational thinking is accepting everything Google tells you without validating it. Most people never question why something is the way it is. As a result, they carry on with outdated assumptions and end up being the source of a lot of misdirection.

In SEO, there is no definitive answer to many huge questions. Google doesn't tell us exactly what signals the algorithm responds to; they just say there are 200 ranking factors and leave us to work out which ones are most important. As a result, SEO is largely guesswork, with the person who has the best track record and the most experience being the thought leader on any SEO question.

This is where being comfortable with data analysis is really important because the data gives you correlations. And with some depth of insight, you can work out what stuff is affecting those correlations.

A classic case is with social shares. There are lots of people who say social shares will make you rank, and as a result, there's a whole universe of sellers who will give you Facebook likes, Google+ plusses and countless other social shares.

But do social shares actually get you rankings? Or is there some other force in play here?

The only way you're going to work it out is to analyze the best data you can get your hands on, and start to piece together the answers.

What it Takes to Be a Mini SEO Data Scientist

To be a mini SEO data scientist, it's all about using the right data; manipulating it the right way around a particular hypothesis — giving you insights — which then helps you make better decisions and, therefore, be more successful in SEO.

So, what do you need to do to be a mini SEO data scientist?

The first thing is to understand Excel because it's the fastest and easiest way to manipulate information. Of course, not everyone likes Excel or has the aptitude for it. In this case, it's a question of getting a hold of the right tools that give you the insights you need.

The second thing is to understand what the metrics actually mean and how they relate to various websites. It's a bit like if you want to buy a car; you get to understand the costs of various cars and their features, and from there you can make a buying decision.

To understand the relative value and importance of metrics, it comes down to looking at lots of websites and associating metrics with them. After a while you will get a feel for what is good or bad based on a metric.

The third thing is to have good hypotheses. A typical hypothesis might be, "Do these social shares help with my rankings?" Since a hypothesis is ultimately a question, the better the question, the better the hypothesis.

Let's test a hypothesis around social shares and rankings: "Social shares get you ranked." To test this hypothesis we need a list of posts from a website with their social shares.

Here's a good example:


Lee Odden has given us a list of 25 of his top posts for 2014 with their respective shares. From there, we can do some data mining and correlation analysis.

Step 1: Find out how each of these pages has done on Google search. It so happens that 90 DataGrabber can give me all the key phrases that each of these URLs rank for within the SEMrush database.

Step 2: See how the URLs have done on link acquisition by running them through 90 DataGrabber once again, and get all of the relevant Majestic metrics for them.

Step 3: Aggregate the social shares, Google ranking data, link data and start to look for some correlations.


Insights from the correlations.

Insights from the correlations

The first question is, "What relationship is there between social shares and visibility on Google search results?" Interestingly, if you use the SEMrush search volume metric (which is an estimation of search traffic that comes into your site for a given key phrase), the correlations are only at 1%.

However, if you compare social shares with the number of key phrases that rank, the correlations are at 59%, which is a good, solid number.

In itself, you could argue somehow that social shares affect rankings directly. However, Google has denied repeatedly that social shares have a direct effect on the algorithm. And I agree.

There's a 70% correlation between number of referring domains and social shares. The correlation climbs to 81% when it comes to the relationship between the number of links, which are Majestic trust flow one and above, plus follow links and social shares.

Then, if you look at the correlation between the number of key phrases a page ranks for and the number of links which are above trust flow one plus follow, you get a remarkable 76% correlation.

This data also validates a general hypothesis I have about rankings; my theory is that Google will simply rank a domain across more key phrases, generally irrespective of competition, if it trusts that domain.

The only real exceptions are where you get highly competitive money key phrases. In those circumstances it does seem to be a game of link volume and strength, combined with user engagement metrics associated with search results

Final Summary of the Data

Based on this data, it seems links get you rankings and social shares are indicative of the link-ability of a page. So, if you're writing content, you could say that a great KPI for linkable content is getting more shares. Once you have those shares, you can get out and acquire links for those pages.

In Conclusion

As far as being a mini SEO data scientist goes, it's really a question of having good questions, getting the best data you can get your hands on and then developing a nice structured way of organizing it to get answers.

I make no claims about being very skilled at Excel. But even with my limited knowledge, which goes as far as pivot tables and Vlookups, I can get a lot of insight.

One of the reasons I can do this kind of data analysis is because we've built a tool called 90 DataGrabber that allows us to mine data in new ways really easily. You can download a free copy and play around with it here.

Welcome to the exciting world of SEO enlightenment!

Join Nick Garner and SEMrush on Thursday, February 26, at 12 p.m. (EST) for “Data Mining with SEMrush — Unlocking SEMrush for the Serious SEO'er.” Learn how to:

  • Simplify identification of potential link partners with massive link databases such as Majestic SEO and Ahrefs
  • Rapidly use multiple factors to evaluate a domain's organic success
  • Easily apply SEMrush API to qualify the best link sources at massive scale
  • Walk away with a huge list of highly-qualified link sources with which to dominate any industry, product or service

Register today!

Author Photo
Nick GarnerNick Garner is founder of 90 Digital, a specialist agency for SEO in competitive verticals, digital PR and enterprise web development. Coincidentally, 90 Digital has a team of just over 90 people. His last article for SEMrush was "Data Mining with SEMrush."
More about SEO
How To Track SEO in Multiple Cities with SemrushDo you want your website to be seen by searchers in more than just one location? Semrush now has a new multi-targeting feature you have to check out! Whether you operate a business with multiple locations or a just single location serving a larger area, being found by searchers in more than one location is good for business.
Apr 05, 2021
6 min read