Nuevo protocolo de robots.txt

Como ya hicieran con el protocolo del Sitemap, Google, Yahoo! y Live Search han acordado un protocolo de exclusión de robots.txt conjunto que también afecta a las meta-tags.

Este nuevo protocolo abarca los siguientes elementos:

1. Robots.txt Directives

Disallow Tells a crawler not to index your site — your site’s robots.txt file still needs to be crawled to find this directive, however disallowed pages will not be crawled ‘No Crawl’ page from a site. This directive in the default syntax prevents specific path(s) of a site from being crawled.
Allow Tells a crawler the specific pages on your site you want indexed so you can use this in combination with Disallow This is useful in particular in conjunction with Disallow clauses, where a large section of a site is disallowed except for a small section within it
$ Wildcard Support Tells a crawler to match everything from the end of a URL — large number of directories without specifying specific pages ‘No Crawl’ files with specific patterns, for example, files with certain filetypes that always have a certain extension, say pdf
* Wildcard Support Tells a crawler to match a sequence of characters ‘No Crawl’ URLs with certain patterns, for example, disallow URLs with session ids or other extraneous parameters
Sitemaps Location Tells a crawler where it can find your Sitemaps Point to other locations where feeds exist to help crawlers find URLs on a site

2. HTML META Directives

NOINDEX META Tag Tells a crawler not to index a given page Don’t index the page. This allows pages that are crawled to be kept out of the index.
NOFOLLOW META Tag Tells a crawler not to follow a link to other content on a given page Prevent publicly writeable areas to be abused by spammers looking for link credit. By using NOFOLLOW you let the robot know that you are discounting all outgoing links from this page.
NOSNIPPET META Tag Tells a crawler not to display snippets in the search results for a given pagePresent no snippet for the page on Search Results
NOARCHIVE META Tag Tells a search engine not to show a “cached” link for a given pageDo not make available to users a copy of the page from the Search Engine cache
NOODP META TagTells a crawler not to use a title and snippet from the Open Directory Project for a given page Do not use the ODP (Open Directory Project) title and snippet for this page

Además de otras directivas REP como:


Podemos encontrar todo lo relacionado con el robots.txt en su página oficial

  1. Lo de 2. HTML META Directives es nuevo? No entiendo que es lo nuevo, el allow antes no estaba como oficial creo, pero lo demas pensaba que si era oficial

