The robots meta tag provides various permissions to the crawling robots regarding a particular page on the site. Like the other meta tags, it appears in the head part of the page. The tag mainly defines for the crawler whether to index a certain page or whether to follow links from this page, but it has other possible values.
robots tag values are:
noindex: Preventing the indexing of the page in the crawling bots databases.
nofollow: Prevents crawlers from being allowed to follow outgoing links from this page. Basically, the nofollow meta tag turns all the links on the page into nofollow links, this is in contrast to the nofollow value at the single link level (rel nofollow) which does not allow tracking of only one specific link.
noarchive: Preventing a saved version of the page (cached copy) from appearing in the search results.
nosnippet: Preventing the display of the page description (which appears under the page title) in the search results, that is, the content of the meta name descrption tag or a piece of text taken from the page (snippet) or the site description in the Dmoz index, will not be displayed in the page description in the search results. Using the nosnippet tag also activates the noarchive tag, which means adding the nosnippet tag also prevents the saved copy of the page from appearing in the search results.
noodp:Blocking the page title and its description in the Dmoz index (Dmoz: Open Directory Project) from appearing in the page details in the search results.
all: Allowing all possible values, that is: index, follow, archive, etc.
none: Preventing all values, ie: noindex, nofollow, noarchive, etc.
Robots meta tag – did you know?
The robots meta tag can contain several comma-separated values, and it is definitely recommended to group several values in one tag to improve data readability and prevent conflicting instructions.
By the way, if scanning robots encounter a contradictory instruction, for example:
<meta name=”robots” content=”noindex, nofollow”>
<meta name=”robots” content=”index, follow”>
They will decide on the severity, that is, they will treat the instruction as noindex, nofollow.
By default, crawling robots index web pages and follow the links coming out of them (after all, this is exactly their purpose…) so there is no need to add a robots tag whose values are: index or follow, etc.
When should you avoid indexing pages?
For example, pages that contain intimate content, pages that require permission, login pages to the management system and duplicate content pages.
By the way, the robots tag is not case-sensitive, which means that its various values can be written like this: noindex, or like this: NOINDEX and like this: Noindex.
The difference from the point of view of the crawling robots between the instructions of the robots.txt file and the instructions of the robots meta tag is this: if a page is blocked for crawling by the robots.txt file, crawling robots will not reach it, and probably will not read its meta tags. On the other hand, if the page is allowed to be crawled in the robots.txt file, but blocked for example for indexing in the meta tag, the crawling robots will reach the page, read the meta tag but not index it.
Some differences in using the robots.txt file in the robots meta tag
- Prevent crawling: robots.txt file will prevent crawling while robots meta tag will not prevent it.
- Preventing indexing: A robots.txt file and robots meta tag will prevent indexing, although in the case of a robots.txt file, blocked pages may appear in the index, without a title and without a description text (Meta Description).
- Preventing link registration: a robots.txt file will not prevent registration, while the robots meta tag will remove the link from the link map, at least Google’s.
- Differences in use: we will use the robots.txt file mainly if we want to prevent access to an entire directory, while we will use the robots meta tag if there is no access to the main directory of the site and if we want to prevent indexing of a certain page or tracking of links from this page only.