A few weeks back, OpenAI finally released its full version of GPT-2, a state-of-the-art text generator they previously called “too dangerous to release publicly” back in January.
New tools like this one mark a new era in digital content creation with massive implications for organic search and SEO in general. The ability to generate unlimited text that readers (and likely search engines) think is genuine human-written text presents a potentially existential threat to SERP quality.
Given this, I decided to run an experiment. I created a blog entirely generated by AI called This Marketing Blog Does Not Exist.
I wondered: Could it rank? Could it drive traffic? Would it be caught and penalized?
And most importantly: Is this a viable new blackhat technique that we will see exploited by bad actors for financial gain?
Here is What Happened
After publication, through highly-targeted digital PR outreach, we were able to secure interviews and press coverage with top-tier online publishers.
As a result, we accumulated high domain authority links right off the bat, from publications like:
These initial media stories generated decent additional syndications, as well as at least one major additional pickup months later at Venture Beat (due to publishing scheduling timelines).
We can see this accumulation of root linking domains reflected with the charts below:
In the roughly 4-5 months since the blog was launched, it acquired links almost 220 unique linking domains, with 50+ having a domain authority of 50 or better.
This clearly set the groundwork for relatively fast indexing of the entire site (took about a month), as well as some good early domain authority building to allow it to begin ranking in the longtail almost immediately.
I want to note that I did no onsite optimization. I simply uploaded the text generated by the AI model and Idid no additional optimizations for specific keywords, article comprehensiveness (with Clearscope or others), title/heading optimization, meta optimization, page interlinking (except the WordPress category links in the sidebar), etc.
Given this, the site ended up ranking for nearly 300 terms, according to SEMRush, over the 4 months since its launch.
Total Ranking Keywords: 292
First Page Rankings: 3
Monthly Traffic from Organic: ~60 visits per month
So What Does This Mean?
It is clear that press coverage conferred a great deal of early high domain authority links, which provided the basis for the ranking ability of the ~600 pages generated.
But because this was an entirely new domain, the press links could only take rankings so far — 215 ULDs (unique linking domains) is a great start, but it is certainly not enough to allow for many first-page rankings outside the extreme longtail.
Had I put more planning and effort into onsite SEO optimization and implemented a hub/spoke content organization, I am confident we would have seen better keyword/traffic results.
The blog’s 600 pages only took 20 minutes to generate. Adding additional pages would most likely create a corresponding increase in total ranking keywords and organic search traffic.
Extrapolating on the results seen with 600 pages, the following could roughly be assumed.
Increasing to 6,000 AI-generated pages could mean:
~3,000 keywords ranking
~600 visits per month
Increasing to 60,000 AI-generated pages could mean:
~30,000 keyword rankings
~6,000 visits per month
Improving onsite SEO, site structure, internal linking, etc., could have massive effects, with exponential improvements being realized on theoretical sites of greater and greater size.
In essence, it does appear possible for someone to create a highly visited blog in a day without a word of human-written content.
Where Do We Go From Here?
As anticipated, some companies have already begun to monetize AI-generated text for SEO.
KafkAI is an early example; it seems likely we will see more companies trying to cash in, further disrupting the current marketplace of low to mid-quality content creation companies, article spinners, etc.
I see the advent of this technology as a potential existential risk for web content, and Google’s ability to continue serving relevant content in the SERPS. Recent news of the incorporation of BERT (same architecture as the AI models talked about in this article) is telling news.
Google is indeed looking to incorporate more and more state-of-the-art AI into their algorithm, but so far, it appears that it doesn’t include trying to identify or filter content created by AI text generators.
Perhaps continued discussion around the risks to SERP quality and the potential for overall degradation of the web’s content ecosystem will spur new efforts by Google to find — and filter out — this next generation of webspam.