en
English Español Deutsch Français Italiano Português (Brasil) Русский 中文 日本語
Submit post
Go to Blog

How to Untangle the Website Architecture of a Site With 500,000+ Pages

73
Wow-Score
The Wow-Score shows how engaging a blog post is. It is calculated based on the correlation between users’ active reading time, their scrolling speed and the article’s length.

How to Untangle the Website Architecture of a Site With 500,000+ Pages

This post is in English
Nick Brown
This post is in English
How to Untangle the Website Architecture of a Site With 500,000+ Pages

A business’s website is in many ways, like a digital shop or office. It is where customers or clients go to interact with the business. It is where they make the first contact and where they ultimately buy the company’s products or services. It is also what they often use to develop their view of a business’s brand. Just as an office or shop should be attractive and welcoming to visitors, so should a commercial website.

A website needs to be easy to navigate and allow visitors to find what they want quickly. In fact, it is even more important for a website to have a straightforward structure. That is because of the negative impact poor site architecture can have on SEO. If Google bots and search engine crawlers can’t understand or navigate a site, it can cause big issues.

Unfortunately, it is all too easy for site architecture to become confused and muddled — even if a site owner or webmaster started with the best of intentions. The day-to-day operations of a business often get in the way of website structure best practice.

Our Experience With a Client Website

An ad hoc content strategy can see pages and elements added to a site with no thought for the overall architecture. Staff turnover can lead to individuals with no understanding of a site’s overall structure, suddenly being in charge of maintaining it. The end result is often an ungainly and inefficient site architecture.

That is exactly the situation we faced when we worked with a client of ours. Over time, their site had grown to have 500,000+ pages. The sheer number of pages and the way they had been added to the site had created some serious SEO issues. We were faced with the challenge of untangling the site architecture of what was a gargantuan website.  

Even the biggest and most confusing site can be brought into line. What you need is a solid strategy and the will to follow through. We are going to share our experience of working with a client whose name has been anonymized. By doing so, we should hopefully give you some pointers on how to meet your own structural SEO challenges.

Analysis

As an external expert coming in to advise on or to solve technical SEO issues with a site, the first step is understanding the scope of the project. Once you appreciate the full picture, can you begin to develop a strategy to move forward. Even if you are working on improving your own site, analyzing the site as an overall entity is still a good place to start.

How to Untangle the Website Architecture of a Site With 500,000+ Pages. Image 0

With smaller sites, it is possible to review site architecture and spot issues manually. As a site gets larger, this gets increasingly more difficult. Once you get to a site with over 500,000 pages, it is just not viable. You could devote every hour of every day for weeks to the process and still not scratch the surface. That is where technical SEO tools come in.

There is a wide variety of technical SEO and Google Analytics tools around today. Some are free to use, others have to be paid for, and a few have both free and pay versions. The tools can help you do everything from checking page speed to testing your structured data for implementation errors.

Website Crawl Using SEMrush

The SEMrush Bot is an efficient way to perform the kind of deep crawl required to get to grips with a huge technical SEO project. That deep crawl will help identify both basic and more advanced technical SEO issues plaguing any site.

Some of the more basic issues it can diagnose are as follows:

  • Errors in URLs

  • Missing page titles

  • Missing metadata

  • Errored response codes

  • Errors in canonicals

On top of that, a deep crawl with SEMrush can help you with some more advanced tasks:

  • Identifying issues with pagination

  • Evaluating internal linking

  • Visualizing and diagnosing other problems with site architecture

When we performed our deep crawl of the site, it highlighted several serious issues. Identifying these issues helped us develop our overall strategy for untangling the site’s architecture. Below is an overview of some of the issues we discovered.

Issue One: Distance From Homepage

One of the most apparent issues highlighted by the deep crawl was just how far some pages were from the homepage. There was content on the site found to be anything up to 15 pages away. As a rule, you should try to ensure that content is never more than three clicks from your homepage.

The reason for that ‘three click rule’ is twofold; it makes sense for both your website visitors and for SEO purposes. On the part of visitors, they are unlikely to be willing to click through 15 pages to find the content they need. If they can’t quickly find what they are looking for, they will bounce from your site and search elsewhere.

Limiting click distance also makes sense from an SEO standpoint. The distance a page is from a site’s homepage is taken into account by search engine algorithms. The further away it is, the less important a page is seen to be. What’s more, pages far from the home page are unlikely to get many benefits from the homepage’s higher link authority.

How to Untangle the Website Architecture of a Site With 500,000+ Pages. Image 1

Issue Two: Only 10% of Pages Indexed

From Google Search Console, we found that only 10% of the pages were being indexed. As part of our technical SEO analysis, we also reviewed the log files of the site; that revealed many other issues.

A page which is unindexed is basically ‘unread’ by Google and other search engines. There is absolutely nothing you can do for a page from an SEO standpoint if the search engines don’t find and ‘read’ it. It will not rank for any search query and is useless for SEO.

How to Untangle the Website Architecture of a Site With 500,000+ Pages. Image 2

Our deep crawl revealed that a whopping 90% of the site pages fell into that category. The site had more than 500,000 pages that equated to around 450,000 with little to no SEO value. As we said, it was a major issue.

Issue Three: Unclear URL Structure

Through the site audit, it also became clear that content lacked a clear URL structure. Pages that should have been on the same level of the site did not have URLs to reflect that. Confusing signals were being sent to Google about how the content was categorized; this is an issue you need to consider as you build out your blog or website.

The best way to explain this issue is with an example. Say you have a site which includes a variety of products. Your URLs should flow logically from domain to category to sub-category to the product. An example would be ‘website/category/sub-category/ product’. All product pages should then have that consistent URL structure.

Problems arise if some of the products have a different URL structure. For instance, if a product has a URL like ‘website/product’. That product is then on a different level in the site to the rest. That creates confusion for search engines and users alike.

Issue Four: Too Many Links On the Homepage & Menu

In addition to issues with the URL structure, another factor that needed to be addressed was the number of links found on a page; this was, in part, a sitewide issue. For example, the menu had 400+ links. The footer menu meanwhile contained 42 links. This number is far more links than a person is likely to use. It was clear that a large proportion of these links were not being used enough to make it worth incorporating them in either the menu or footer.

Through the site crawl, we also identified several pages, including the homepage, that had 100+ links. The more links a page has, including the menu, the less internal PageRank each of those links passes. It is also indicative of a confusing site structure.

Overall it was clear that there were serious issues with the internal linking strategy that the site implemented. This strategy impacted how Google indexed the site, as well as resulting in bad user experience for visitors to the site.

Additional Issues

Through the site audit, a number of other issues were identified. Some of these issues would not impact the website architecture, but if addressed, would improve search rankings. Below is an overview of some of the key factors that we needed to act on:

  • Robot Text File: There were thousands of tags used on the site. The majority of tags had only a few pieces of content associated with them. In addition to this, there was the opportunity to optimize the crawl budget by denying bots access to the site; this had the potential to improve page speed.

  • Improve Metadata: The metadata for ranking pages could be reviewed and improved to increase organic Click Through Rates.

  • Page Load Time: There was an opportunity to improve the load time of the pages. Again, this is a ranking factor for Google.

  • Relic Domain Pages: As a result of the server log analysis, we identified a number of relic domain pages that were getting a lot of Google Bot activity; this is far from ideal. The activity may have meant that those expired pages were still appearing in search results.

Solution

Our analysis and research defined the scope of the project ahead of us. It spelled out the issues that needed solving. That allowed us to develop a strategy to work step-by-step through those issues. Below is an overview of four of the key areas that we focused on to improve the website structure. The list excludes other tasks that we completed, like updating meta descriptions and content on core pages, which forms a core part of a website audit.

Step One: Redirects & Other Tweaks

The first task to get our teeth into was redirecting relic pages. Starting with those we identified as having the most Google Bot activity. We made sure that pages were redirected to the most relevant content. Sometimes that meant a page that had replaced the original. On other occasions, it meant the homepage.

This strategy gave us a quick win to get started. It ensured that any traffic resulting from the expired domains wasn’t met with a broken page. Another simple tweak we made at the outset was to the layout of the PR page; this removed Google’s confusion as to whether it was actually the site’s archive page. That led to an immediate improvement in the indexing of content.

In addition to this, the site had thousands of tags. We didn't want tags with limited content to appear in the search results as it provided a poor user experience for people accessing the site. For this reason, we deindexed those pages. 

Step Two: Reduce the Number of Links in the Menu

Alongside the task of redirecting expired pages, we also worked on improving the menu structure. As I mentioned previously, the website had 400+ links on the menu; this was far more than site visitors needed. Moreover, it was four times more than the recommended number of links on a page.

While we understood that the number of links in the menu was an issue that needed to be dealt with, we still had to choose which links to remove. Our solution was to first analyze what links people were clicking on, so we understood which ones were most useful for readers. We used a combination of Google Analytics and heatmap software to generate the data we then analyzed.

Once we had identified the most useful links, we then started the process of identifying the category and sub-category pages that we knew needed to be on the menu. Our approach in this regard was to create a site structure where all content was within three clicks of the homepage.

Step Three: Clustering & Restructuring

After our initial quick wins, it was time to turn to the larger job of reshaping the site architecture. We needed to solve the issue of click distance and give the site a more logical structure. That way, it could be more easily navigated by both crawlers and actual users.

One step we took was to use the idea of clustering content. That meant bringing together related pages and linking out to them from a ‘pillar topic’ page. That page would be within a couple of clicks of the site’s homepage, which means that a huge array of related pages could be brought within three clicks. Content clusters like this make it much easier for visitors and crawlers to manoeuver a site’s content.

Clusters were how we started to organize the vast number of pages. When it came to the overall structure, we focused on the idea of creating a content pyramid. Such a pyramid is the accepted gold standard for site structure. The site’s homepage sits atop the pyramid. Category pages are beneath and sub-categories a further level down. Individual pages then make up the wide base of the pyramid.  

Once the content clusters and pyramid took shape, the issue of links was also easily solved. Having a defined structure made it much more straightforward to interlink pages. No longer were there any pages with hundreds of excessive links.

Step Four: Improving URL Structure

Our restructuring of the site structure made it a cinch to improve URL structure. The content pyramid meant that the site's URLs could achieve the logical flow we mentioned earlier. No longer were pages sitting at different site levels on an ad hoc basis.

The URLs of the site’s myriad of pages were brought in line with one defined structure; that removed the mixed messages being sent to Google. The search engine could now far more easily understand – and therefore index – the site’s content.

Step Five: Adding Dimensions to images

The file size of an image is a significant issue when it comes to page speed for a lot of sites. Images are often improperly formatted, if at all. By adding dimensions to images, you can significantly improve page load time by ensuring that a correctly sized and formatted image is displayed on the first time.

By adding dimensions to the images, we reduced the category pages from often in excess of 25+ MB down to a more acceptable 2-4 MB per page. This significantly increased page speed and user experience while reducing the strain on the server.

How to Untangle the Website Architecture of a Site With 500,000+ Pages. Image 3

Results

The impact of our work on the site was clear to see. Improvements in the site's structure had a profound and rapid effect on indexing. As a result, it had a similarly pleasing effect on site traffic. Within the first three months, the percentage of pages indexed by Google had risen from 10% to 93%. Moreover, the percentage of URLs submitted that were approved also improved.

Unsurprising, the fact that 350,000 pages were suddenly indexed led to an increase in site traffic. The volume of visitors rose 27% in the first three months, and 120% after 9 months.

Conclusion

The exact steps we took to improve the site’s architecture may not work for your site. What will work, however, is the general strategy we applied. It was a strategy that can deliver impressive results even for the largest, most confusing sites.

To begin, you must be thorough in researching the project at hand. Comprehensive research and analysis of your site are what should define the scope of your work. You will then have a roadmap for your improvements. If you follow through with the work according to that roadmap, you are sure to see results.

Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Nick Brown is the co-founder of Accelerate Agency, an SEO agency based in Bristol. He has over 12 years experience in digital marketing and works with large companies advising them on SEO, CRO, and content marketing.
Share this post
or

Comments

2000
Newcomer

Either just recently joined or is too shy to say something.

Hello Nick Brown, I want change all link my website: Example : Mywebsite(.)com => Mywebsite/catergory/name-post(.)com. How do very fast, please help!
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

QUANG DAN
Hi Quang, use your htaccess file to add the redirects
Expert

Provides valuable insights and adds depth to the conversation.

Thanks for the great info! I couldn't agree more that analyzing every aspect of your content strategy is necessary for success and I think the tools you offer are awesome!
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Alison Ver Halen
Thanks Alison, much appreciated
Newcomer

Either just recently joined or is too shy to say something.

Distance from the homepage is highly inaccurate. That is a very old ( 10+ years ) SEO concept when the homepage was considered the main point of entry to a website. With update SEO the visitors land on exactly the page that relates to their search. With strong internal webpage content linking ( not menus or sidebars ) the visitors and Google spiders are guided on a highly relevant browse of the site.
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Jeffrey Peltier
Hi Jeffrey, The home page was not the main source of activity for this website only a minor one. The issue was content was hard to find for users, as too far from any page not just the home page this is a bad user experience which affects SEO.
After we fixed this issue there was a large increase in organic non-branded traffic to this site.
So the proof is in the results
Newcomer

Either just recently joined or is too shy to say something.

Nick Brown
If you are doing SEO the correct way the user will land on the page and not need to navigate. Page distance is NOT a google ranking algorithm. The proof is in the testing for 10+ years and ensuring the testing is not affect by the Street Light methodology.
Newcomer

Either just recently joined or is too shy to say something.

good article, could you post some links to the "how to" section of how you ran some of these reports. That would be great thanks!
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Robert Emanuel
Hi Robert, most are a combination of GSC and GA, GSC for sitemaps indexation, GA for Page speed.
Paul Lovell
Pro

Asks great questions and provides brilliant answers.

Great post nick
Like the fact you went into the log analysis a lot of seos avoid the subject
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Paul Lovell
Thanks Paul, its a big part of our job
faizan
Newcomer

Either just recently joined or is too shy to say something.

Thanks for the blog. It is really helpful. Would love to read more.
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

faizan
Thanks - will endeavour to write more here
I‘m Abdullaeff
Newcomer

Either just recently joined or is too shy to say something.

Great work.
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

I‘m Abdullaeff
Thanks Alesker
Ricci M.
Pro

Asks great questions and provides brilliant answers.

Thanks for sharing your experience of using the SEMRush tools for technical SEO Nick. Restructuring can have such an impact on SEO and often people just don't see it. I really liked the way you explained this all.
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Ricci M.
Thanks Ricci
Raja @indreshraja
Enthusiast

Occasionally takes part in conversations.

Hi nick, Great article
Too many links on home page is really bad in the sense of interlinking.
Can you give an ideal number of home page interlinking. This will help me.
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Raja @indreshraja
Hi, we would recommend between 70 - 150 depending on the size of the site.
Meghan DuCille
Newcomer

Either just recently joined or is too shy to say something.

Hi Nick,

Brilliant article, thank you very much! I am wondering how you handled step five. Which tool/method did you use to add dimensions to images from such a large website?
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Meghan DuCille
Hi Meghan, we worked with the developers and found a solution, Googled it and found a few ideas then implemented them on the staging site and it worked
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

When working with a client we make sure to get the buy-in of the key stakeholders, this way implementation is a lot easier.
Tristan Bailey
Newcomer

Either just recently joined or is too shy to say something.

How do you get on with the possibly multiple teams or managers to get these things improved? Or was the site large rather than the company.
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Tristan Bailey
Hi Tristan, a company with about 60 employees, and a very large publishing/ecommerce site
Sebastian Hovv
Enthusiast

Occasionally takes part in conversations.

Great insights!
Nick Brown
Expert

Provides valuable insights and adds depth to the conversation.

Sebastian Hovv
Thanks Sebastian! I put a lot of effort into this one.

Send feedback

Your feedback must contain at least 3 words (10 characters).

We will only use this email to respond to you on your feedback. Privacy Policy

Thank you for your feedback!