The best guide to bot herding and index wrangling — Part Two

In Part Among our three-part collection, we learned what bots are usually and why crawl budgets are very important. Let’ s take a look at how to allow search engines know what’ s essential and some common coding issues.

How to let search engines understand what’ s important

When a bot crawls your site, there are a variety of cues that direct this through your files.

Such as humans, bots follow links to obtain a sense of the information on your site. Yet they’ re also looking through your program code and directories for specific documents, tags and elements. Let’ h take a look at a number of these elements.

Robots. txt

The first thing a bot will look meant for on your site is your robots. txt file.

For complicated sites, a robots. txt document is essential. For smaller sites along with just a handful of pages, a programs. txt file may not be necessary — without it, search engine bots only will crawl everything on your site.

There are two main methods for you to guide bots using your robots. txt file.

one First, you can use the “ disallow” directive. This will  instruct bots to ignore particular uniform resource locators (URLs), data files, file extensions, or even whole parts of your site:

User-agent: Googlebot
Refuse: /example/

Even though the disallow directive will stop bots through crawling particular parts of your site (therefore saving on crawl budget), expense necessarily stop pages from getting indexed and showing up in search outcomes, such as can be seen here:

The cryptic and unhelpful “ no info is available for this page” message is just not something that you’ ll want to see inside your search listings.

The above mentioned example came about because of this disallow directive in census. gov/robots. txt:

User-agent: Googlebot
Crawl-delay: 3

Disallow: /cgi-bin/

2 . Another way is to use the particular noindex directive. Noindexing a particular page or file will not prevent being crawled, however , it will prevent being indexed (or remove it from your index). This robots. txt directive is unofficially supported by Search engines, and is not supported at all simply by Bing (so be sure to have an User-agent: * set of disallows for Bingbot and other bots other than Googlebot):

User-agent: Googlebot
Noindex: /example/
User-agent: *
Disallow: /example/

Obviously, since these types of pages are stil…

[ Read the full article on Search Engine Land. ]


Opinions expressed in this article are those from the guest author and not necessarily Marketing and advertising Land. Staff authors are shown here .


About The Writer

Stephan Spencer is the originator of the 3-day immersive SEO workshop Traffic Control; an author of the O’ Reilly books The ability of SEO , Search engines Power Search , and Social eCommerce ; owner of the SEO agency Netconcepts (acquired in 2010); inventor of the SEARCH ENGINE OPTIMIZATION proxy technology GravityStream; and the sponsor of two podcast shows The Optimized Nerd and Marketing Speak .

If you liked The best guide to bot herding and index wrangling — Part Two by Stephan Spencer Then you'll love Marketing Services Miami

Leave a Reply

Your email address will not be published. Required fields are marked *