The latest update to our Site Audit tool introduces two much-demanded features that allow users to more finely adjust crawler settings: advanced URL parameter exclusion settings and the crawl delay feature.
Advanced URL Parameters Exclusion
You can now specify what URL parameters you want the Site Audit crawler to ignore. For user convenience, the new parameter removal feature works in a similar way as the URL parameter settings in the Google Search Console. So, for instance, if you set up Site Audit to ignore all UTM codes, the crawler will process http://example.com/webpage.html and http://example.com/webpage.html?utm_media=blog as the same URL.
You can create or edit your list of excluded URL parameters by clicking on the gear icon in the Site Audit interface and then selecting the 'Removed parameters' option in the dropdown menu, as shown in the picture below.
Then, you can specify URL parameters you want removed from URLs before they are crawled by typing in one parameter per line. When you're done, click the 'Save' button and re-run your Site Audit to apply all changes.
By default, the Site Audit crawler ignores the 'Crawl Delay' parameter specified in a robots.txt file in order to speed up the crawling process. However, in rare cases such crawling intensity might cause a website to overload, so in order to prevent that, users can now use the 'Respect robots.txt crawl delay' rule in the Site Audit settings.
You can do so by clicking on the gear icon in the Site Audit interface and selecting the 'Respect robots.txt crawl delay' option in the dropdown menu.
This will open the Site Audit settings. From there, tick the 'Respect robots.txt crawl delay' option and click on the 'Save' button to apply new settings.
We hope these new features improve your Site Audit experience! Please feel free to send your comments, suggestions and questions to [email protected] Thank you for helping us make SEMrush better!