Your site architecture — the way you structure and organize internal links (e.g., a link to the About Us section of your website from your main navigation) — plays a vital role in how both users and search engines are able to navigate your website, ultimately impacting your website’s rankings.
Modern search engines use links to crawl the web. The crawlers used by these search engines click on each link that appears on a page — both internal links and external links — and then all the links on each subsequent page, and so on. This allows the search engines to find your pages and rank them in their indices.
Search engines such as Google also use the number of links to rank query results, considering each link as a vote of importance for a page (i.e., PageRank).
For this reason, the way you link the pages on your website plays a big role in how search engines crawl, understand and rank your site. As an SEO practitioner, how do you make sure your site architecture is optimal and that internal links are organized correctly? Let’s explore how calculating a metric I call Internal PageRank can help us with this task.
Basic site architecture and navigation-based internal links
There are two basic types of internal links:
- The internal links that form your site’s navigational structure
- The secondary internal links that appear in context throughout your site (in articles and other places that aren’t necessarily a product of your site’s navigational structure)
Let’s look at the former. The first step to getting your internal links in order is to organize common navigation elements and adhere to a well-organized site structure. I recommend creating a classic internal linking structure and utilizing Bruce Clay’s silo architecture as a foundation for internal links. These are tried and tested, logical site structures that work. Here’s an example from Portent:
Now that your site has a solid foundation for internal links, let’s take a look at how these navigational links, as well as the internal links that exist in context, might impact how the search engines crawl and rank your pages. To look at the overall internal linking impact, we will examine the internal PageRank of all the pages.
What is PageRank?
Before we continue, let’s take a moment to discuss what PageRank is. PageRank is one of the algorithms that Google uses to rank web pages in their search results. It is named after Larry Page, one of the co-founders of the company.
The PageRank algorithm, put simply by Google, “works by counting the number and quality of links to a page to determine a rough estimate of how important the website is.”
Google calculates PageRank for every page in its index, linking various pages within a site together, as well as linking other websites to those pages. But the idea behind PageRank — determining the importance of a page based on links from other pages — can be applied across a large network (like the one uncovered by Google’s crawler) or across a smaller subset of a network.
For the purpose of examining internal links, we will utilize the idea of PageRank to look at the relative importance of each page on a single website.
By “Internal PageRank,” I am referring not to Google’s PageRank algorithm, but to a similar calculation based on the internal links of a single website. Let’s get started and calculate Internal PageRank for your site.
Note: To be clear, I’m not talking about or advocating for PageRank sculpting. I’m talking about using a PageRank-like metric to diagnose any issues within your site architecture. This will become clearer when I run through an example.
Step 1: Crawl with Screaming Frog
Before we can actually calculate Internal PageRank, we need to crawl our website. For this example, I use Screaming Frog, as it is a standard tool in an SEO practitioner’s arsenal.
Start by launching Screaming Frog and crawling your website. When the crawl is finished, select Bulk Export > All Outlinks from the top menu, and save the CSV file to your desired location.
The CSV contains a list of all the internal links on your website. We will use this list to create a network and calculate Internal PageRank.
Step 2: Calculate Internal PageRank with R
If you’re not familiar with R, it’s a free software for statistical computing and graphics that runs on a wide variety of platforms. Download and install it, if you don’t already have it.
Install the igraph library by launching the R console and executing:
Once the library is installed, you will be able to use the following code in conjunction with the Screaming Frog crawl for your site:
|# Swap out path to your Screaming Frog All Outlink CSV. For Windows, remember to change backslashes to forward slashes.|
|links <- read.csv(“C:/Documents/screaming-frog-all-outlinks.csv“, skip = 1) # CSV Path|
|links <- subset(links, Type==“HREF“) # Optional line. Filter.|
|links <- subset(links, Follow==“true“)|
|links <- subset(links, select=c(Source,Destination))|
|g <- graph.data.frame(links)|
|pr <- page.rank(g, algo = “prpack“, vids = V(g), directed = TRUE, damping = 0.85)|
|values <- data.frame(pr$vector)|
|values$names <- rownames(values)|
|row.names(values) <- NULL|
|values <- values[c(2,1)]|
|names(values) <- “url“|
|names(values) <- “pr“|
|# Swap out ‘domain’ and ‘com’ to represent your website address.|
|values <- values[grepl(“https?:\\/\\/(.*\\.)?domain\\.com.*“, values$url),] # Domain filter.|
|# Replace with your desired filename for the output file.|
write.csv(values, file = “output-pagerank.csv“) # Output file.
Simply follow the code comments (denoted by #) and don’t forget to:
- Specify the path to your Screaming Frog CSV file.
- Specify your domain and TLD extension.
- Name your output file, which will contain the Internal PageRank of each individual page on your website.
Let’s run through a couple of examples on some real websites.
Our agency, Catalyst Digital, recently relaunched our website after a rebrand, and we are still working out some of the kinks. So I decided to crawl the new site and examine its Internal PageRank.
Here is a sample of the output:
Looking at the site pages in terms of Internal PageRank, we see that our top page is our contact page. That doesn’t look right!
You wouldn’t be able to see this based on typical site crawl. For example, Screaming Frog indicates that the contact page actually has one link fewer than the home page, despite the higher Internal PageRank value. Internal PageRank, like Google’s PageRank algorithm, takes into account which links are linking to that page in the network, rather than just the quantity of links.
Now, let’s search for our brand name in Google:
Our Google search confirms we have a problem. Our agency’s contact page is ranking above our home page in organic results, likely due to how we have structured our internal page links.
Now that we are aware of this problem, we can take a look at our site architecture and start to craft a solution. Knowledge is power.
Let’s run a similar test on Online Geniuses, an internet marketing Slack community that I moderate, and see if anything comes up.
Here’s a sample of the output from R:
The website has a job board page that has a higher Internal PageRank value than our home page. It’s not causing a problem for us yet, likely due to the number of external links pointing to the home page and the difference in our keyword usage, but it’s probably something we should look into to maintain site integrity.
You should now have some sense of how you structure your internal links on your website. After you have established a basic structure for your navigation-based internal links, you can start to audit your site for internal linking issues by crawling your website and calculating Internal PageRank using R.