There are a few options to prevent search engines from indexing specific pages on your website. We recommend carefully researching each of these options before implementing any changes to ensure that only the desired pages are blocked from search engines.
Please note that Google and other search engines may not retroactively remove pages from results if you implement the robots.txt file method. While this tells bots not to crawl a page, search engines can still index your content if, for example, there are inbound links to your page from other websites. If your page has already been indexed and you'd like it to be removed from search engines retroactively, you'll likely want to use the "No Index" meta tag method below.
If you choose to use the "No Index" meta tag method, please be aware that it should not be combined with the robots.txt file method. Search engines need to begin crawling the page in order to see the "No Index" meta tag and the robots.txt file prevents crawling altogether.
- This is a file on your website that search engine crawlers read to see what pages they should and should not index.
- Learn more about setting up a robots.txt file in HubSpot.
"No index" meta tag
A "no index" meta tag is a string of code entered into the head section of a page's HTML that tells search engines not to index the page.
- Navigate to Content > Website Pages or Landing Pages > Edit > Settings > Edit/Add Head HTML.
- Copy and paste the following code into the head HTML section of a page: <meta name="robots" content="noindex">.
Google Webmaster Tools
- If you have a Google Webmaster Tools account, you may submit a URL to be removed from Google search results: see further instructions.
Note: this will only apply to Google's search results.
If you wish to block files in your HubSpot file manager, such as a PDF document, from being indexed by search engines, you will need to select a connected subdomain for the file(s) and use the file URL to be blocked from crawlers. Click here to learn more.
"It looks like Google (or another search engine) isn't crawling my website..."
If you're setting a user agent string to test crawl your website and are seeing a message of access denied - this is normal. Google is still crawling/indexing your site.
The reason you see this message is because in order to protect HubSpot-hosted sites from attackers, HubSpot only allows requests from the googlebot user agent coming from IPs that are owned by Google. Requests from other IP addresses (for example, from someone spoofing user-agent string) will be denied. HubSpot does this for other search engine crawlers, as well, such as BingBot, MSNBot, and Baiduspider.