In SEO, backlinks (i.e. links to a website from other websites) get a lot of focus. Clients as much as SEOs are well aware of their importance, and some people will still do very dodgy things to try to get them (my advice on this – do not do those things).
The links within a website itself however, can often be overlooked. While they may not have the power, or the bragging rights, of a followed link from a major global media source, they serve an important function as far as search engines and their crawlers are concerned. As more pages are added and the structure of the website evolves, it is easy for errors to creep in. It’s therefore important to keep checking in on your internal link structure to ensure it’s doing its job when it comes to helping people, and crawlers, get around your site and find the things you want them to find.
With this in mind it is well worth doing an audit of your internal linking structure about once a year, or if you’ve recently made major changes to your site architecture or added a lot of new pages.
I recommend using a site crawling tool like Screaming Frog to start off. These crawl your site in exactly the same way as GoogleBot and other search engine crawlers, by following the internal link structure of your site. Doing this will quickly highlight any problems, such as broken URLs, redirect chains or orphan pages. I’ll explain each of these issues in more detail below, but they can all be caused by errors in internal links to a page, or by a lack of them.
The most likely issue you’ll come across when carrying out an internal audit is links that just don’t work. These can be the result of mistakes made when originally creating the link – a typo in a URL will send users and crawlers to a page that doesn’t exist – or when the URL that was linked to has been changed or deleted. Often when a URL has been amended or removed, a redirect will have been put in place. This essentially fixes the issue but can lead to problems of its own, which I’ll get onto shortly.
If a link is trying to send people to a page that doesn’t exist and there’s no redirect in place, however, then the problem is obvious. In most cases, your web server will give a 404 – Page Not Found response, which will take visitors (and search engine bots) to a page that tells them the URL they just tried to visit doesn’t exist. I strongly recommend creating a custom 404 page for these instances, as this gives you the opportunity to personalise this message and link back to important pages, increasing your chances of keeping visitors and crawlers on the site, but that’s a whole topic to itself.
404 responses aren’t problematic in and of themselves. This is the best-practice, industry-standard way of dealing with an incorrect URL, whether it’s from an error with the site or made when someone types it into their browser. But it can give a visitor the impression that the site is “broken”, and if a crawler starts finding multiple 404 errors it will take them as a red flag that there are more serious issues with the site, which means it’s unlikely to rank as well as it might in search results. Google won’t direct its users to a site that’s going to give them a poor experience.
It will also look bad for the site if these links trigger a different type of error response for some reason, as that suggests even bigger issues. If you come across any error responses other than a 404 – basically any of the 4xx or 5xx responses that are listed here – then you or your web developer should investigate and fix the cause of them as quickly as possible.
Internal links that are leading to 404 errors, however, are usually easy to fix. Usually it’s just a case of correcting the URL in the link so that it points to a live page on the site, if the URL is incorrect or it has been changed since the link was created.
If the page has been deleted but there is no redirect in place, it is a good idea to set one up, so that visitors are taken to the next most relevant page automatically. This will help the new page to benefit from any authority the old one might have built up in search engines.
If a page no longer exists and there isn’t a close equivalent, it’s better to remove the link altogether.
Something else to look out for is chains of redirects. This can happen when a URL is redirected to another, which then gets redirected to another at a later time, and so on. The longer these chains are, the more likely they are to cause issues, as they will slow down the time it takes for a “live” page to load.
Worse still are redirect loops, where a URL is repeated in a chain, creating an endless cycle of redirects. Visitors and crawlers will never get to a working page and search engines will view your site as having technical issues and therefore not a great place to feature in search results.
If you find any URLs in your internal link structure that are being redirected to another, it’s best to change the URL in these links to the newer one. That way you’ll avoid creating any redirect chains or loops, now or in future, and make sure visitors and crawlers go straight to a working page of the website.
Another problem, which can be hard to find with just a crawling tool, is pages that don’t have any links from anywhere else on the site. We call these orphan pages. They are a problem because, without links, it’s impossible for crawlers to find those pages, which means they’ll never be included in search results. It’s also impossible for visitors to find those pages, of course, unless they happen to know the exact address of the page they’re looking for.
This problem may be caused by links being removed from the website navigation, thereby “orphaning” the page itself. Sometimes orphan pages are prototype pages that were never linked to, or pages that were under construction but weren’t meant to be published, in which case they can probably be removed.
You can use Google Analytics to look for orphan pages. In the Behaviour section, click on Site Content > All Pages. As these pages are unlikely to have had many visits, it’s best to look over as long a time period as possible, and to sort the data by page views in ascending order. Cross-reference this list against the list of crawled URLs you have – the quickest way is to paste one list below the other in a spreadsheet (leaving a gap so you can identify where one ends and the other starts) and use conditional formatting to highlight any duplicates. You can then filter out or delete all those rows from the Analytics list, leaving just the unique URLs.
If a URL is in the Analytics list but didn’t get crawled, it may be because there were no links for the crawler to follow. Check that page is still live; if it is, then it may be an orphan, or was at least so hard to find that it wasn’t crawled.
If the orphan pages you find are important to the site, make sure they are linked to from at least a few places in the site. The best solution is to add them to a navigation menu for a relevant section of your site. That way they will be linked to from every page in that section. If there are pages you find that aren’t of any value, you can delete them. There’s no real need to set up redirects in this case, as no one could find those pages anyway.
Deep Pages & Pages With Very Few Links
Search engines pay attention to the hierarchical structure of a website. The further away a page is from the home page (the “top” page in the hierarchy), the less important it will be assumed to be. This usually makes logical sense. For example, on an ecommerce site the home page will link to product category pages, which then have subcategories which link to the products themselves. This means the product pages are two or three “clicks” away from the top page of the site. While the products are important, of course, each subcategory or category will have a greater number of products within it and are crucial in helping search engines understand exactly what types of products the website is offering, so for a search engine they are of more importance.
If a page is buried very deep down in the site, so that they are four or more clicks away from the home page, search engines won’t view them as very valuable. It may take a long time for a crawler to find deeper pages, and when it does it won’t revisit them as frequently as the higher-up pages of the site.
Try to make sure your internal link structure keeps all the pages you want to be found in search results no more than three clicks away from the home page, and that all those pages have as many links from other pages as possible.
If a page is really important, make sure it’s included in the main navigation of the site, or the footer menu, so that there is a link to it from as many other pages as possible, including the home page.
Check Your Internal Link Structure Regularly
I recommend doing an audit of your internal links at least once a year. It’s easy for errors to creep in. A link might be removed from a menu which leaves a page orphaned, or a page might be deleted or redirected without the associated links being altered. All of these things can add up to search engine crawlers finding a slew of “broken” or missing pages, which can hold your site back in search results. Fixing most of them is quick and easy and will help you stay as visible as possible in search.