CMS-General

Unable to crawl my HubSpot pages with an external crawler

Last updated: July 9, 2018

Applies to:

Marketing Hub
marketing-basic-pro-enterprise
Professional, Enterprise
Legacy Marketing Hub Basic

If you have attempted to crawl your HubSpot pages using an external SEO tool such as Moz, OnPage, or SEMRush, you may find that you are unable to crawl your pages successfully. If this is the case, there are a few things you can check:

  1. Robots.txt: check to see if your pages have been added to the robots.txt file in content settings, which would prevent it from being indexed or crawled.
  2. Metatags: check to see if code, such as noindex, has been added to the Head HTML of your pages which would prevent them from being indexed or crawled.
  3. Googlebot: HubSpot does not allow the crawling of HubSpot pages from the Googlebot originating from non-Google IP addresses. If you attempt to crawl your HubSpot site as Googlebot, you will likely see a 403 error.

You can also adjust your settings to prevent certain pages from being indexed or crawled

Please note: if you are auditing your site using SEMRush and receive a timeout error, make sure you are auditing the specific subdomain you host with HubSpot, not the root domain. 

Why am I seeing SEO errors for my HubSpot-hosted content?

External SEO tools will often return errors when crawling HubSpot-hosted content. For example, you may see 401 errors or warnings for your blog listing page, or the blog RSS feeds. Both of these links change when a new post is published, and as a result these links are set to expire. External SEO tools can’t re-crawl these links after they expire, and as a result will flag them as errors. There is also no need to index the RSS feed in particular, because this is the same content that is live on your actual blog post. This error might look like the following:

Blocked Resources > https://mydomain.com/_hcms/rss/feed?feedId=

Additionally, external SEO tools may present blocked resource errors for HubSpot resources that do not need to be indexed. For example, there are scripts used to load the HubSpot sprocket shortcut menu which takes you to the page editor, or your HubSpot tracking code, that don't need to be crawled because they wouldn't surface as a search result, and they are not critical to understanding a page’s content. While these resources may be blocked or flagged, they do not mean that your page itself has not been crawled. This error might look like the following:

Blocked Resources > https://js.hs-scripts.com