Want the same success?
Experience the full power of SemrushStart free trial
Your browser is out of date. The site might not be displayed correctly. Please update your browser.
Drizly is the world's largest alcohol marketplace and the best way to shop beer, wine and spirits. Our customers trust us to be part of their lives – their celebrations, parties, dinners and quiet nights at home. We are there when it matters - committed to life's moments and the people who create them. We partner with the best retail stores in over 1200 cities across North America to serve up the best buying experience. Drizly offers a huge selection and competitive pricing with a side of personalized content.
Up until 2021, Drizly, like many others, had to rely on the marketplace best practices and intuitive changes when it came to SEO but felt like they were always in the gray area.
“Someone did a test somewhere and said 60 characters was perfect for a title, so that’s what we did,” says Jake Jamieson, Senior SEO Manager at Drizly.
“I got sick of having to rely just on word of mouth and best practices that are coming from the industry when I know that every website is different and every content type that we have is different. I wanted to be able to point at something and say: ‘if we did this test and it made change X, we’d get Y’, rather than saying, ‘well, you know, I read an article that said this change should happen’”.
While experimenting with manual SEO A/B testing, Jake faced many challenges:
1. Manual SEO split-testing takes too much time
At the peak of it, Jake was spending 15-20 hours a week on SEO testing.
“I'd been spending a lot of my time testing title and meta descriptions: manually splitting two sets of URLs, putting them into our SEO platform to track rankings, forecasting at a page level. I would come up with a number and then nothing ever matched up with that.”
“I didn't understand how many man-hours it would take, and it wasn't the kind of thing I could hand to my team because they'd never done it. I'm the guy who's supposed to be running this entire SEO program and I'm spending close to half my time for at least a few months just trying to build something out.”
2. The test results of manual SEO testing are unreliable and unscalable
“After every manual test using Google Analytics and Google Search Console, the question that kept coming up was ‘what's your confidence level in this test?’ All I could say was ‘between zero and 100%’. I couldn't even tell if I was at statistical significance when I was stopping my tests.”
This is a common challenge among SEO professionals who run tests manually. According to the most recent research on SEO split-testing, 51% of respondents struggle to decide when a test reaches statistical significance — and therefore, whether their SEO test was successful or not.
3. Implementing unreliable test results wasted the developers’ team resources
“Most of our changes didn’t bring results. I ended up starting to look for a split-testing platform because I felt I was reinventing the wheel,” says Jake.
4. Proving the ROI of SEO was nearly impossible
“We're a very data-driven organization. I wanted to be able to present hard data at our product team meeting and include it in my monthly report for SEO, and none of that stuff was happening when I was doing it manually.”
“I wanted to make decisions based on data that said we did this test on a smaller set of pages. We saw this increase. If we apply it, that's exactly how it will work.”
Drizly decided to start a pilot with SplitSignal, a stand-alone tool for statistical SEO split-testing powered by Semrush. If you’d like to learn more about the principles of SEO experimentation with SplitSignal, read here.
The test can be broken down into the following stages:
The first step to SEO testing is to formulate a hypothesis that will be aligned with your SEO roadmap.
A lot of Drizly’s product pages were built 5-6 years ago, and their default title includes just the product, price, and reviews. This doesn't really reflect the search intent that people have when they're looking for these products.
“I was like, oh, we just got to get ‘60-minute delivery’ into the title, we need to help people understand that we're going to deliver to them. That’s a no-brainer, we barely have to test it. And those tests tanked, like any of the tests where we talked about the local stuff and the speed of delivery,” Jake explains.
What he and the team wanted to test next was adding more eCommerce modifiers like the words “shop” and “browse”. This ended up giving a significant boost to Drizly’s traffic.
But first, they needed to set the KPI:
“We jumped around a lot on KPIs at first because it was so new to us. We were contemplating if we should be optimizing to improve click-through rate or average ranking. Then we realized we needed to figure out if this test increases traffic to the specific page.”
The next step was to choose a group of pages Drizly wanted to run the SEO split-test on. SplitSignal splits the pages for you. It utilizes 100 days of historical traffic data to come up with two statistically similar groups of pages. Such a split increases the accuracy of the test and controls external factors like seasonality and spikes in traffic.
Drizly chose to run the test on a group of 476 category pages (split into 238 control and 228 variant pages) adding the word “shop” to the beginning of beer brand titles in the variant group of pages.
“Just seeing the tool interface had me say, ‘yes, this is exactly what I wanted to be able to run these tests”.
The graph below shows the correlation between the control and variant groups in the previous 100 days, as well as the test period. The accuracy of the model is dependent on the ‘real’ traffic and the ‘predicted’ traffic matching as closely as possible during the historical time period (pre-test-launch), and the ‘real traffic’ line being consistently higher than the ‘predicted’ traffic line post-test launch.
Drizly’s test was conducted in 21 days.
SplitSignal allows you to see the changes between control and variant groups in real-time. You can set the duration of the test between the range of 14 to 42 days. This determination is usually made based on clicks and pages available in the test.
“I check the test results at least every couple of days. I want to know how we are trending and sometimes people will just ping me within the organization. Being able to say something specific like we're doing a test on brand pages and they're currently up by about 3% and we're only about halfway through the test. That's the kind of stuff that gives people confidence.”
After 21 days, the test showed a significant improvement in the variant group performance: the pages with “shop” in the title received 5.3% more clicks.
It’s not only what kind of words but also where on the site they are placed that matters. “Shop” might work better on brand pages, while “browse” works better on category pages, for instance.
“I was surprised at how much of an impact it had. These sort of nuance tests are what gets Drizly beyond the vague best practice methods,” Jake says.
Learn more about How to Read SplitSignal Test Results.
Once the test proved to be a success, it was time to roll out the update to the entire group of pages.
(NOTE: Semrush is developing a tool that will help SEOs bypass the dev team and implement improvements on pages directly. If you’re interested in being a Beta user for this tool, email firstname.lastname@example.org).
SplitSignal can provide additional insights from the connected Google Search Console account. Besides clicks, changes in ranking and impressions, for example, can be provided upon request.
But the most important factor for Jake is being able to translate the SEO terms into business language:
“Everyone wants to talk about rank changes and that stuff is important. But at the end of the day, all that really matters is how many people showed up on the site. Now I can say ‘hey, an extra 2000 people came to these 100 brand pages during the time we were testing. If I track it out, this is what it will be in 3 weeks, 3 months, a year.’ That's something that clicks for people rather than saying ‘well, the average ranking went from 4.7 to 4.5’ - this sort of SEO smoke and mirrors that can happen.”
Proving the value of the change to dev teams has become easy as well:
“I take a screenshot of the real versus projected trend line, and that little box up at the top that tells the confidence and the traffic difference. That's pretty much all the information that I need to hand off to people internally to talk about implementing results.”
“I'm actually able to make declarative recommendations based on these tests. Even if the test isn't successful, I can say okay, the next step for this test is to test another variant of that thing. I actually have data that I can share rather than just promising good results soon.”
To learn more on this topic, read “Why Any Result is a Good Result”.
Summing up, Jake and the Drizly team are now able to: