English Español Deutsch Français Italiano Português (Brasil) Русский 中文 日本語
Go to Blog
Elena Terenteva

Crawl Optimization Issues and How to Fix Them #semrushchat

Elena Terenteva
Crawl Optimization Issues and How to Fix Them #semrushchat

Crawlers (or “spiders”) are scripts that Google and other search engines use to look in every nook and cranny of your webpage.

By doing this they are able to understand the general purpose of a page and also decide if it is worthy of being included in the SERP. Unfortunately, crawling often returns different issues and errors, which will block spiders from doing their jobs and hurt your rankings.

In SEMrush Chat, we discussed different crawling optimization issues and ways to fix them with Dawn Anderson @dawnieando and our other participants.

SEO techniques that help crawlers find content fasterA robots.txt is a document that allows you to control how crawlers behave by restricting scripts from accessing certain pages. It helps when a website has a lot of content that doesn’t behave well in organic search: download pages, specific “thanks” pages, etc.

Remember that adding a “noindex” value to your pages is dangerous, because this might block your whole website from being crawled

Adding new content to your webpage is always important, because it signals to Google that your site is developing. But let’s not forget about sitemaps – where all this new content should be mentioned – and other important techniques.

The best way to think about internal links is as pathways for crawlers; it’s important to optimize your website’s internal link structure. Avoid using redirects too often, because doing so can stop crawlers from indexing linked pages.

A clean and simple URL structure is also considered to be one of the main ranking factors used by Google and other search engines.

Sometimes during the website development phase, multiple copies of some pages can occur, which may result in errors because crawlers don’t know which copy to trust.

Using the correct “rel=canonical” attribute helps avoid such problems.

Headlines are important. They’re, well, not the first, but definitely the second or third thing crawlers looks at to determine how useful a page would be to someone who types a query into a search box. Remember to include keywords you’re targeting in your and use appropriate tags for your pages and publications.

Fetch as Google is a tool inside the Search Console; it’s very useful when a marketer wants to evaluate a page as Google would. Don’t forget to send new pages to be indexed so crawlers can find them quicker.

Considering all the points above, the best techniques for helping crawlers index pages faster are: submitting sitemaps, creating plain SEO-friendly URLs, internal linking and using the correct rel=canonical attributes, redirects and robots.txt.

SEO techniques that help crawlers

Before fixing common crawling issues, each marketer must identify them first. So how exactly do you do that? We asked our participants.

How can SEO specialists analyze the Googlebot's performance?

The very first things to consider are the Google Search Console tools that work in conjunction with crawlers and support marketers with fresh information about the crawling process. The next thing you should do is use some third-party crawlers that simulate GoogleBot’s behavior.

Ryan Johnson @rsj8000 also added that Webmaster Tools is a great place to start, then goes Fetch as Google tool, and Screaming Frog is nice too.

Most of our participants agree that Google’s native tools are almost godlike when it comes to determining issues and errors, and that if you also use third-party bots, it will make your webpage as crawler-friendly as possible.

A more technical approach is to check server logs and trace spiders when they visit your webpage. What are your spider traps?

Stephen Kenwright @stekenwright adds that Search Console is best for qualitative feedback, server logs are best for quantitative feedback. You should see the problem in WMT and see the scale in logs to improve webpage for crawlers.

Let’s recap all these opinions and techniques. To identify spider traps, SEOs should use GSC and ScreamingFrog to crawl websites by themselves.

Google bot performance

Above we talked about what companies should do to get search engines to index as many pages as possible. The next thing we’re about to cover is what they should do to avoid being indexed. In what situations do we need this? Let’s find out.

Avoiding search engine indexing

Agent Palmer (@AgentPalmer) was one of the first to answer and said that temporary content (like some promotional or landing pages) used for a short time for social media, might not want crawling.

As we mentioned above, duplicate content can be a big problem for companies that are growing quickly, as they may not be able to manage all the content on their site at the same time. Some of us choose to use rel=canonical tags, but the easiest to do this is to mark a certain page with “noindex” in its robots.txt.

Companies that respect your personal data are usually concerned with security. That’s why some of them can restrict search crawlers from going on certain pages. Chris Desadoy @EliteYouTubePro noticed that companies also may not want to have their contact pages deep crawled for other obvious reasons -- spam. But it depends on company.

Indeed, aggregation pages are not worthy of being included in an index, because too many links and related pieces of content would confuse spiders and result in Google awarding spam points to your site’s profile.

So why avoid being indexed by search engine crawlers? There are several reasons why you should: duplicate content, confidential user data, technical issues or temporary promotional pages.

Reasons why company want to avoid being indexed

A “crawl budget” is the amount of pages Google or other search engines will crawl during each visit. The bigger your crawl budget is, the more crawler visits you get. Therefore, your chances of appearing near the top of search engine results pages increases. So how do you measure your crawl budget and what should you do to increase it?

Measuring and optimizing crawl budget

Dan O Brian @DanBlueChief was the first to answer this question by sharing a useful concept packed into six simple words: Test and tweak, test and tweak. And Google Search Console, which was mentioned by Sam Barnes @Sam_Barnes90, will help you track crawls after making changes in order to test your results.

Always keep in mind the structure and hierarchy of your URLs – a few targeted pages are better than thousands of useless ones. Allow minor articles and other content to help your main pages grow.

Redirects are not advised, and neither are broken links that return a 404 error or duplicate content. Another thing to pay attention to is the loading speed of your pages. Keep your website fast and Google will like it.

Martin Kelly @MartinKSEO suggested using Site Search, one of the commands that force Google to return all results for a certain website’s URL in the search box.

A Rel=”nofollow” attribute helps to restrict spiders from following a link. While it may seem strange, it’s used by marketers who want to improve their crawl budget. The purpose of adding an attribute is to keep search weight on targeted pages. Next comes a checkbox from Martin Kůra.

In order to improve their crawl budget, marketers should optimize their site’s structure and URLs, remove 404s, improve page speed and minimize redirects.

Crawl Issues Q6

Above we mentioned Screaming Frog and Google Search Console, but are there any other tools marketers can use to predict crawler behavior and fix possible issues?

Crawl optimization toolkit

Here’s what our participants added to the list.

Tools for crawl optimization

After a recent change in design or usability, you may need for crawlers to revisit your site. What techniques should you use to attract them? The next question reveals the best practices for redesigning and tweaking your site.

How to encourage Googlebot to recrawl a website

Adding a new sitemap is a simple yet effective decision. You can also use an automated sitemap generator, but you should always check your automated sitemaps manually to avoid critical issues or mistakes.

Fetch as Google is a first-rate tool here, because it has the ability to send new or redesigned pages to the index. And GoogleBot will find them very quickly. Peter Nikolow suggests the following checklist.

According to WildShark SEO @wildsharkseo, marketers also shouldn’t forget to share new content on social media. Stephen Kenwright @stekenwright suggested Google+ as the best platform to use, because GoogleBot also visits it.

In conclusion, we’d like to sum up all the steps for getting a website recrawled as quickly as possible after implementing a new design or adding new content: submit a new sitemap, add links to any new content, promote your new content on social media, and finally, fetch and render using Google Search Console.

Encouraging Google to recrawl a website

That’s it for today! Thank you for your attention and brilliant answers.

Special thanks goes to Dawn Anderson @dawnieando for her expertise. See you at the next SEMrush Chat next Wednesday to discuss a new topic: “SEO and UX.”

Elena Terenteva

SEMrush employee.

Elena Terenteva, Product Marketing Manager at SEMrush. Elena has eight years public relations and journalism experience, working as a broadcasting journalist, PR/Content manager for IT and finance companies.
Bookworm, poker player, good swimmer.
Send feedback
Your feedback must contain at least 3 words (10 characters).

We will only use this email to respond to you on your feedback. Privacy Policy

Thank you for your feedback!