If you run a small or new website, one thing you will always struggle with is the crawl rate within Google. If you add your 10 page website to Google with a few targeted keywords within the title tags and some cleverly placed variations within the content, you’re going to rank well, right? Well, that’s not always the case.
You may get the initial crawl from the many spiders out there, followed by being indexed for your own brand term within the SERPS which is great. But then what? You are leaving your online success in the hands of Google, Bing, Yahoo and the hundreds of other search engines out there. But what’s to say a week later you add a new product, service or just update a page after noticing a spelling mistake? Will this be picked up by the crawlers and changed straight away within the cached version of a page? Unfortunately, again, this is not always the case.
Not all is doom and gloom though. Google get a lot of criticism for how their crawlers work – but ultimately their aim is to go out into the web and bring back the best information possible. I see too much negativity towards search engines (mainly Google), but they have tried to help us and make things as simple as possible when it comes to helping your sites crawl rate.
So with that in mind, here some best practices each website should be following – especially if you are new to the SERPs or a small website that needs the extra boost in overall visibility.
Sitemaps are pretty much self-explanatory. These help all the crawlers out there understand how many pages a website has. As a standard ruling you will create an XML sitemap that will list off your pages, usually in a priority order of how important they are and the frequency you wish for the page to be crawled. There are thousands of XML sitemap generators out there that will do this just for you. A quick Google search for “XML Sitemap generator” will find one of these. Once the sitemap is generated, upload the sitemap and hey presto you have a complete easy map for any crawler to find the webpages on your site, and how often they should be crawled.
A few things to remember with sitemaps:
- If you add a new page or delete one then make sure to remove it from the sitemap
- Make sure you add a link to your site from a robots.txt
- If you block a page from your website from the crawlers, take it out of the sitemap also
2. Free Services from Google and Bing
Google and Bing offer a free webmaster service. Both are very simple to sign up to by adding your site, and verifying them through a simple file upload. Within 24 hours they will have started to pull in crawl data and errors that will help you understand how your website is crawled. Google and Bing Webmaster tools do offer different ways of showing this and some ways of control. With Google you can find some good in-depth data on how many pages Google will crawl in a day and what the average is.
Use some simple investigation work when looking at these numbers. If you have a 20 page website and your high number is at 10, average is at 5 and low at 2, that means potentially 10 pages of your site have not been crawled in the last 90 days. If you have blocked these pages then that’s fine, but if not, start to look at the reasons why these aren’t being crawled. This could be because they’re not included in the sitemap, accidentally no followed on the pages, canonical tags have been placed to other pages and so on.
Google webmaster tools will also show you the index status within Google. This is a really useful feature that you can compare against the crawl rate of your website to see how much is actually indexed within Google.
On top of this they will show you the total number of pages indexed, what’s blocked by the robots and if any pages have been removed. For smaller sites this is really useful as it show you any pages that may have been removed by Google and then highlight that work is either needed to the content or page itself to get it ranking back within Google.
The final nugget that Google Webmaster Tools offers is the fetch as Google feature. You can add in individual pages and fetch them as Google or fetch the homepage and all pages linking out from the homepage. This is a real useful tool to help get deeper pages that may not have been crawled or for whatever reason pages that you may have changed and Google has been back to realize you've made changes.
The Bing webmaster tools offers you a little bit more control in that you can select when Bing crawls your website. They ask you to figure this by seeing when your peak traffic times are. Bing doesn't offer a full-blown analytics service like Google, so I suggest you use Google’s analytics platform to work out when this will be. Then you can set the crawler to whatever times fits.
The data it shows is pages blocked in the robots.txt and other errors that may help you correct any fixes on the website.
Both tools allow you to submit your XML sitemap to make it easier for their crawlers to find it, and then crawl your website and the desired pages.
3. Content, Content, Content
At some point you would have heard that content is king. And to a degree that is correct when it comes to Google and others. So when it comes to content I would say make sure you tick all the right boxes. You’ve optimized all the correct pages that you want ranking; next make sure you are taking a look at the page you are going to be putting out on the Web.
Do a few standard checks; take a look at the content on the page:
- Does it match up and better what’s on the first page of Google already?
- Are you going to keep you users on the page and make sure they find what they are looking for?
- Will they convert?
- Does it need a video?
The idea is that Google wants to focus on the user. And your content on the page needs to make sure it’s giving the user the best experience possible. Nail that process and you’re going to win for not only good rankings but making sure Google and other search engines will keep coming back to your site for more.
Do You Have a Blog?
Another thing you need to considr is adding a blog. As much as it can be a strain on time and resources, the blog is one of the best ways to give these search engines what they want. Fresh, unique, relevant and up-to-date content or news will help you. Google’s Matt Cutts, who is famous for being the middle man between us SEOs and the search engine giant (now famous for being given probably one of the best holidays/leave of work you could ask for) even said during a webmaster video:
Now typically if the site does have more pages, it might have more links pointing to it, which means it has higher PageRank...If that is the case we might be willing to crawl a little bit deeper into the website and if it has higher PageRank, then we might think it's a little bit of a better match for users queries.
So those are some that factors involved, just having a big website with a lot of pages by itself does not automatically confer a boost but if you have a lot of links or a lot of PageRank, that is leading to deeper crawling within your site, then that might be the sort of indicator that perhaps sites would rank just a little bit higher.
Again just having the number of pages doesn't give you a boost though It might give you a few more opportunities, but normally the only reason you get that opportunity is because we see more links to your website so we are willing to crawl a little bit deeper and find more pages to index.
He is not directly telling us, but to a degree larger websites will have better overall visibility, crawl rates and links going to the site as they can target more keywords. So this is where a blog will come in. So for smaller/new websites it’s simple – find a way of adding a news or blog section to your website and keep it updated with good content, then you’re going to find the crawl rate of your site increase. A very big thing to make sure you avoid though is that your content is not duplicated. After a while you will see your crawl rate decrease as it doesn’t like duplicated content and can result in a certain zoo animal coming down hard on your site for duplicated content. If you have to duplicate your content use canonical tags or the no follow attribute on pages.
4. Technical Checks
Utilizing all the techniques above you are going to find that crawlers out there are going to find your website one way or another. When they do, make sure that when they get here they’re going to find a good clean site that is easy to crawl and all content is there for them to see. The main things I would be looking out for technically for your website is to make sure that:
- All content isn’t hidden: Google have famously said don’t hide content as it can harm your site. Plus, if you have a good page that users like, you want Google and others to be able to see that
- Use things like structured data/mark up. Anything that will help crawlers understand your data on the page easier and can be displayed easier to the user will help.
- Make sure you are interlinking correctly. Small websites link out to all your important pages from your homepage.
The main goal is to make sure that your website is easily crawled by making sure you follow all the current best practice. The above are a few ways that will ensure the crawlers will understand your site, find any deeper pages you may have, or will make sure that all the pages you want to be found on your website are indexed.
This could fall under technical but it’s an important factor that you shouldn’t be ignoring. Smaller websites have no excuse when it comes to website speed. You need to check pages individually and the server itself. For pages, it is best practice to keep your page size up to 150K as pages greater than that are not fully cached by search engines (let’s put it like this: the longer a search crawler waits, the less it likes visiting your page).
Use the Google page speed tool to assess pages. It will give you the high priority tasks that need doing and you can work on the rest from there. Don’t just run the homepage through the tool, but make sure all pages are put through – even the contact page or pages with contact forms. A slow loading contact form could be the difference between a user submitting a request for a quote, service or help and leaving without getting in touch.
When it comes to servers, use tools like Pingdom to help check the server load time and the uptime of the server. If you have a lot of drop offs or downtime, then I recommend that it may be time to pay that bit extra for a better performing server.
Another thing to note with crawlers is that the crawl works on a budget and that if it spends too much time crawling your larger images or PDFs, there will be no time left to visit your other pages.
So there you have it, I know there is a lot more to be doing that will help the pages to be crawled like offsite content or link building, but a big issue I see from a lot of websites is that the standard checks are not put in place to start with so that Google, Bing and everyone else can easily access your website and place you in their listings correctly.
Confidently, I can say if you follow these checks you will find your crawl rates increase and your SEO campaign become a lot easier.