When it comes to blocking certain pages within your Site Audit crawl, you can choose to avoid crawling specific URL or subfolders. If you are setting up your project for the first time, these options will be found in the third and fourth tab of the setup wizard:
This option will allow you to block subfolders. You will want to include everything within the URL after the TLD. For example, if you wanted to block the subfolder http://www.example.com/shoes/ you would want to enter: “/shoes/” into the disallow box on the right.
This option will allow you to remove certain values from your URL strings. For example, if you were to add “page” into this box, this would remove all URLs that included “page” in the URL extension. This would be URLs with values such as page=1, page=2, etc. This would then avoid crawling the same page twice (for example. both “/shoes” and “/shoes/page=1” as one URL) in the crawling process.
If you already have a project set up and would like to change your settings, you can do so using the Settings gear:
You will use the same directions listed above by selecting the “Masks” and “Removed Parameters” options.
In order to track a subfolder, you will have to specify this folder under the “masks” section of the setup wizard. See in the screenshot below, in the left box you would enter the name of the subfolders as they would appear in the url.
For example, to track only the shop, men’s section, and shoes section of your ecommerce website, you would enter the following:
Similarly, you can also avoid crawling specific subfolders on your site by entering those correlating subfolder names in the right hand box.