There are certain pages of a site that can lower a domain’s ranking on Google. Here, we establish what types of pages these are why you should block Google from pages using robots.txt.
Why should I block Google from my site?
Good question. Initially, blocking Googlebot from scanning some of your site’s pages might seem counterproductive if you’re trying to rank higher. However, often robots.txt is used to prevent crawling of search results pages or auto-generated pages that offer little value to users coming from search engines.
Search results pages are a prime example of this. Each time someone searches your site they are provided with a search results page. If you don’t block these pages from Google, they may become indexed despite not being content pages. This can affect your ranking in Google as suddenly a seven page site can become a thousand page site.
It’s far better to have a smaller site with informative, focused and valuable content than a bigger site which seems (to Google at least) to have very little focus on one particular area. Simply put, it’s more beneficial to have a few high performing pages than many more ones which receive no traffic. Google views these thousands of low quality content pages as uninformative, diluting your site’s value and it therefore lowers your ranking as it no longer views your site as valuable.
What pages should I block from Google?
This reinforces the importance of content pages having a specific purpose and be useful in some way. Assess its function, what it adds to the site and whether it fulfils its intention to the best that it possibly can. Every page of your site should be high quality, if not then what purpose does it serve and do you want your audience, or even Google to see it? If a page is adding minimal to no value to your site, cut it.
- Search result pages and tool result pages (useful only for person using the feature, not the general public)
- Tool result pages (useful for one person, not general public)
- Auto generated pages (text translated by an auto tool, unoriginal stitched content taken from other sites)
- Pages from affiliate databases, content that comes from other sources for you but not by you
Low rankings are often due to sites not providing original content, so it makes sense to block the areas of your site that aren’t unique. Like humans, Google doesn’t like pages with the same content as other pages, so avoid this at all costs.
How do I block Google from site pages?
Website owners use robots.txt, also known as the robots exclusion standard, to send instructions to web crawlers and other web robots. The standard specifies to the robot which areas of a site should not be scanned.
Using robots.txt
The simplest robots.txt file uses two keywords, User-agent and Disallow. User-agents are search engine robots, whilst Disallow is a command for the robot that tells it not to access a particular URL.
User-agent: Googlebot
Disallow: /
To ensure web crawlers can find and identify your robots.txt file, you must save your robots.txt code as a text file and place the file in the highest-level directory (or root) of your site. A more in depth explanation of this process and how to block directories or images instead of whole pages is available in the Google Search Console Help section.
Block Googlebot with .htaccess
Add this to your htaccessfile – replace yourdomain.com with your domain!
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteRule ^.*$ “http\:\/\/yordomain\.com” [R=301,L]
How do I add Google not to my robots.txt file so that Google can crawl my site and when I apply for AdSense, my site can be viewed by Google bot?