How Monsido Scans Your Website: A Comprehensive Overview

Monsido offers a powerful website scanning tool that thoroughly analyzes your website's content and structure. By understanding how Monsido scans your website, you can make the most of its features and optimize your website's content quality and performance. In this article, we will explore the different aspects of the scanning process, including links, page counting, exclusions and constraints, the scan process, sitemaps, and various types of pages in a CMS.

Get a Free Scan

For issues such as response time, WCAG 2.2 compliance, content quality, and more.

Please enter your name

Please enter your last name

Please enter your email

Please enter your Job Title

Please enter your Phone Number

Please enter your Website to Scan

Please select your country

Please agree to receiving news and offers from Monsido by Acquia

Please agree with the Privacy Policy and Terms & Conditions

Invalid Input

group-32590.svg

Frequency of Scans

Per default, Monsido's web crawler scans your website on a weekly basis, ensuring that you can catch issues and errors promptly, especially as new content is added. This regular scanning frequency provides you with real-time insights into your website's performance and helps maintain its quality. However, if you're working less frequently on the site or prefer a different schedule, you have the flexibility to choose a bi-weekly or monthly scan frequency instead. To explore the available options further, you can refer to the Help Center for more details on customizing your scan frequency.

Now let's take a look at how the scans work.

Links in Monsido

Monsido's scanning process revolves around following links on your website to discover and assess all the content within your domain. A link in Monsido refers to a unique URL that can point to an HTML page or various assets like images, JavaScript files, CSS files, or documents. It's important to note that even a slight difference in a URL, such as a single letter change, creates a distinct link. For example:

  • https://domain.tld/webaccessibility
  • https://domain.tld/web-accessibility
Each of the above links is considered separate and unique within Monsido's scanning process.

How Monsido Counts Pages

To provide accurate metrics and insights, Monsido counts each unique link to an HTML page on your primary domain, subdomain (if added), or internal URL as one page. This ensures that even if a link appears multiple times or on multiple pages, it is counted only once.

Exclusions and Constraints

Monsido provides features that allow you to exclude specific links and URL paths from the scanning process. These features help you prevent Monsido from following, scanning, and counting certain links and pages that you deem unnecessary for analysis. With this in mind, let's explore two important exclusion and constraint mechanisms in Monsido:

  • Link Excludes enable you to define patterns that instruct the crawler to ignore specific URLs during the scan. Although these excluded links are still recorded as present on the page, Monsido will not check or follow them. To learn more about Link Excludes and how to implement them effectively, you can refer to our dedicated resource on the Monsido website.
Link Exclude Example 1: Language Translation Links - Suppose your website has language translation links for different regions or languages. These links typically lead to translated versions of the same content and scanning them might not provide meaningful insights for your analysis. By using the Link Excludes feature, you can specify patterns that cover these translation URLs, ensuring Monsido skips scanning them, streamlining the analysis process.

Link Exclude Example 2: External Affiliate Links - Often, websites include external affiliate links that lead visitors to partner websites for specific products or services. Since these links are not part of your website's direct content, scanning them might not add significant value to your analysis. By utilizing Link Excludes, you can easily instruct the crawler to bypass these affiliate links, allowing you to focus on the core content and internal links.

  • Path Constraints offer you control over the pages Monsido scans. By using regular expressions, you can include or exclude content from the scan based on specific patterns. Here are a couple of examples to illustrate how Path Constraints work:
Path Constraint Example 1: Suppose you only want to scan the news section of your homepage, which has the URL: https://domain.tld/news. To accomplish this, you can add a constraint using the regular expression ^/news. This ensures that the crawler focuses on the content within the news section. Remember, the start URL should be part of the constraints, and there are multiple ways to achieve this, such as changing the start URL to https://domain.tld/news or adding an extra constraint with ^/$ to include the front page.

Path Constraint Example 2: Let's say you have a result page, and you want to exclude all the results from the scan. The URL of the results page could be something like: https://domain.tld/search/results?query=test. To exclude these result pages, you can create a negative constraint like !search/results?

Canonical Links

Canonical links play a crucial role in indicating the preferred version of a webpage when there are duplicate or similar versions under different URLs. These links primarily assist search engines in understanding which version to index and display in search results, ultimately improving your website's search engine optimization (SEO). Here are a couple of examples illustrating the use of canonical links:

Example 1: Print Version of a Page: Suppose you have a webpage with the URL https://domain.tld/page_id=32. When a print version of this page is created, CMS systems often add a print parameter within the URL, resulting in a URL like https://domain.tld/page_id=32?print=yes. Although the content on these two pages is essentially the same, web crawlers and search engines treat them as separate pages. To address this, you can add a canonical tag on the print page, pointing to the primary page URL:

<link rel="canonical" href="https://domain.tld/page_id=32">

This canonical tag informs web crawlers and search engines that these pages contain duplicate content, with the URL lacking the print parameter being the primary version.

Example 2: Sortable Lists: Another scenario involves a page that displays a sortable list of items, such as a news site with a list of articles or a store with a list of products. Assume the URL https://domain.tld/list contains a list where users can sort by color, price, or size. Although the content remains the same, each sorted version of the page has a unique URL:

  • https://domain.tld/list?sort=colors
  • https://domain.tld/list?sort=price
  • https://domain.tld/list?sort=size
To address this, you can add a canonical link to the main list page:

<link rel="canonical" href="https://domain.tld/list">

This canonical link informs search engines and web crawlers, including Monsido, that all these URLs lead to pages with the same content, with the default sort version considered the primary version.

Monsido utilizes canonical tags to exclude URLs pointing to identical content. For directions on how to instruct the scan to ignore canonical URLs, you can refer to the relevant section in the Monsido Broken Links FAQ.

How the Monsido Scan Works

Monsido's scanning process employs a dynamic discovery approach, actively exploring and discovering webpages on your website. By systematically following links from one page to another, the scanner identifies and analyzes all the pages within your domain. The scan operates in a breadth-first manner, starting with the initial webpage and systematically exploring all links on a page before moving to the next level or depth of pages. The crawler can simultaneously scan up to 10 different pages of the same domain. It generally respects the depth priority of links, considering the response and processing latency of each page. Additionally, when a sitemap is found, all the pages within the sitemap are considered at depth level 0 (top). It's important to note that the crawler is limited to a depth of 100 links from the start page.

Monsido's crawler also interacts with robots.txt files, allowing it to detect declared sitemaps within the file. Once a sitemap is detected, Monsido automatically scans all the links specified in the sitemap's XML, including any linked PDFs.

Sitemaps

In addition to using a start page, you can enhance the effectiveness of the Monsido domain scan by adding a sitemap. Sitemaps serve as a valuable tool, particularly for larger and more complex websites with abundant multimedia content. A sitemap acts as a roadmap for the Monsido crawler, providing an organized structure of your website. This facilitates easier navigation and helps ensure the discovery of URLs across your site. On larger websites, where linking every page to at least one other page can be challenging, a sitemap guides the Monsido crawler to new pages that might otherwise be overlooked.

Types of Pages in a CMS

Many content management systems (CMS) categorize different types of pages into specific categories, such as news pages, event pages, and other pages created by CMS modules. Typically, a user can set up a normal content page themselves, while pages created by CMS modules are categorized differently. These categorized pages can be a collection of pages or a specific content type within the CMS system. However, all content within a CMS has a unique URL and can be accessed by users.

In Summary

To truly maximize Monsido's performance and tap into its impressive features, it's crucial to grasp how it navigates and examines your website. By diving into the various facets of its scanning process, such as assessing links, tallying pages, setting exclusions, considering constraints, evaluating canonical links, comprehending the scan procedure, understanding sitemaps, and recognizing different page types within a CMS, you've gained valuable insights into the inner workings of Monsido.

Equipped with this knowledge, you can now seize control of your website's analysis and ensure that the Monsido tool provides the accurate metrics and profound insights your organization requires. So, don't hesitate to embark on a journey of exploration, uncovering the endless possibilities that Monsido offers. Unleash your website's full potential and elevate it to unprecedented levels of excellence.

Want to see how your website stacks up for 2023? Get a free scan now to see what we uncover on your website.

Sign Up for Our Newsletter

Get the latest from Monsido on SEO, web accessibility, upcoming legislation, and more.

Mail envelope illustration