There are a few ways to prevent search engines from indexing specific pages on your website. It's recommended to carefully research each of these methods before implementing any changes to ensure that only the desired pages are blocked from search engines.
Please note: these instructions will block a page URL from being indexed for search. Learn how to customize a file URL in the files tool to block it from search engines.
Google and other search engines can't retroactively remove pages from results after you implement the robots.txt file method. While this tells bots not to crawl a page, search engines can still index your content (e.g., if there are inbound links to your page from other websites). If your page has already been indexed and you want to remove it from search engines retroactively, it's recommended to use the "No Index" meta tag method instead.
"No index" meta tag
Please note: if you choose to use the "No Index" meta tag method, please be aware that it should not be combined with the robots.txt file method. Search engines need to begin crawling the page in order to see the "No Index" meta tag and the robots.txt file prevents crawling altogether.
A "no index" meta tag is a string of code entered into the head section of a page's HTML that tells search engines not to index the page.
Navigate to your content:
Website Pages: In your HubSpot account, navigate to Marketing > Website > Website Pages.
Landing Pages: In your HubSpot account, navigate to Marketing > Landing Pages.
Blog: In your HubSpot account, navigate to Marketing > Website > Blog.
Click the name of a specific page or blog post.
In the content editor, click the Settings tab.
Click Advanced Options.
In the Head HTML section, copy and paste the following code:
If you want to block files in your HubSpot file manager (e.g., a PDF document) from being indexed by search engines, you must select a connected subdomain for the file(s) and use the file URL to block web crawlers.
How HubSpot handles requests from a user agent
If you're setting a user agent string to test crawl your website and are seeing a message of access denied, this is expected behavior. Google is still crawling and indexing your site.
The reason you see this message is because HubSpot only allows requests from the googlebot user agent coming from IPs that are owned by Google. To protect HubSpot-hosted sites from attackers or spoofers, requests from other IP addresses will be denied. HubSpot does this for other search engine crawlers as well, such as BingBot, MSNBot, and Baiduspider.