Pages

Monday, March 31, 2008

Making harmonious use of Webmaster Tools and Analytics

Written by Reid Yokoyama, Search Quality Team

Occasionally in the discussion group, webmasters ask, "Should I be using Google Webmaster Tools or Google Analytics?" Our answer is: use both! Here are three scenarios that really highlight the power of both tools.

1. Make the most of your impressions
One of my favorite features of Webmaster Tools is that it will show you the Top 20 search queries your site appeared for along with the Top 20 clicked queries. The data from the Top Search Queries allows you to quickly pinpoint what searches your site appears for and which of those searches are resulting in clicks. Let's look at last week's data for www.google.com/webmasters as an example.


As you can see, Google Webmaster Central is receiving a great number of impressions for the query [gadgets] but may not be fully capitalizing on these impressions with user clicks. Click on [gadgets] to see how your site appears in our search results. Does your title and snippet look appealing to users? As my colleague Michael recently wrote, it might be time to do some "housekeeping" on your website -- it's a great, low-to-no-cost way to catch the attention of your users. For example, we could work to improve our snippet from:

To something more readable such as "Use gadgets to easily add cool, dynamic content to your site..." by adding a meta description to the URL.

And what are users doing when they visit your site? Are they browsing your content or bouncing off your site quickly? To find out, Google Analytics will calculate your site's "bounce rate," or the percentage of single-page visits (e.g. someone just visiting your homepage and then leaving). This can be a helpful measure of the quality of your site's landing page and the traffic your site receives. After all, once you've worked hard to get your users to visit your site, you want to keep them there! Check out the Analytics blog for further information about "bounce rate."

2. Perform smart geo-targeting
Let's imagine you have a .com that you want to target at a Japanese market. Webmaster Tools allows you to set a geographic target for your site, where you would probably pick Japan. But, doing so is not an immediate solution. You can confirm the location of your visitors using the map overlay of Analytics, right up to the city level. You can also discover what types of users are accessing your site - including their browser and connection speed. If users cannot access your website due to an incompatible browser or slower connection speeds, you may need to rethink your website's design. Doing so can go a long way toward achieving the level of relevant traffic you would like.

3. Control access to sensitive content
One day, you log into Analytics and look at your "Content by Title" data. You shockingly discover that users are visiting your /privatedata pages. Have no fear! Go into Webmaster Tools and use the URL removal tool to remove those pages from Google's search results. Modifying your robots.txt file will also block Googlebot from crawling that section of your site in the future.

For more tips and tricks on Analytics, check out the Analytics Help Center. If you have any more suggestions, feel free to comment below or in our Webmaster Help Group.

Thursday, March 27, 2008

Speaking the language of robots



We all know how friendly Googlebot is. And like all benevolent robots, he listens to us and respects our wishes about parts of our site that we don't want crawled. We can just give him a robots.txt file explaining what we want, and he'll happily comply. But what if you're intimidated by the idea of communicating directly with Googlebot? After all, not all of us are fluent in the language of robots.txt. This is why we're pleased to introduce you to your personal robot translator: the Robots.txt Generator in Webmaster Tools. It's designed to give you an easy and interactive way to build a robots.txt file. It can be as simple as entering the files and directories you don't want crawled by any robots.

Or, if you need to, you can create fine-grained rules for specific robots and areas of your site.
Once you're finished with the generator, feel free to test the effects of your new robots.txt file with our robots.txt analysis tool. When you're done, just save the generated file to the top level (root) directory of your site, and you're good to go. There are a couple of important things to keep in mind about robots.txt files:
  • Not every search engine will support every extension to robots.txt files
The Robots.txt Generator creates files that Googlebot will understand, and most other major robots will understand them too. But it's possible that some robots won't understand all of the robots.txt features that the generator uses.
  • Robots.txt is simply a request
Although it's highly unlikely from a major search engine, there are some unscrupulous robots that may ignore the contents of robots.txt and crawl blocked areas anyway. If you have sensitive content that you need to protect completely, you should put it behind password protection rather than relying on robots.txt.

We hope this new tool helps you communicate your wishes to Googlebot and other robots that visit your site. If you want to learn more about robots.txt files, check out our Help Center. And if you'd like to discuss robots.txt and robots with other webmasters, visit our Google Webmaster Help Group.

Wednesday, March 26, 2008

Bionic Posters help webmasters worldwide



Setting up and running a website is getting easier and easier, and it's wonderful to see so many new webmasters sharing their voices with the world! For you as a webmaster it's pretty easy going... until you run into an issue that you just can't seem to solve on your own. Maybe some technical issues were flagged in your Webmaster Tools account; maybe you're just trying to get your robots.txt to block a certain part of your site; or maybe someone reported that they got a virus while visiting your site (gasp!). All of these issues can come up and sometimes it's helpful to have a helping hand when diagnosing and solving it.

Our Google Webmaster Help Group is a great place to get help. There are many webmasters active in our group, friendly and ready to help others, often with first-hand experience. They can show you what might be wrong, show you how you can find answers in the future, and point you towards a solution that you'll be able to use.

Just recently a webmaster came into the groups with a website that was having strange problems. Less than 20 minutes later, one of our dedicated members replied and pointed the webmaster to hidden content that was placed on their site by someone else. Finding that is bad enough; but not finding it is even more frustrating.

While there are lots of helpful people in our groups, we have some that really stand out as being exceptionally active, helpful, competent and friendly. They volunteer time and energy to help build a great community and to help webmasters all around the world. In order to more publicly recognize their contributions, we're calling them our Bionic Posters. We want to highlight their outstanding efforts and thank them for the sound advice they've offered to so many.

We wanted to take a minute and send a shout out to our Bionic Posters:
Thank you all for helping to make the Webmaster Help Group such a success!
Come and visit the Webmaster Help Groups and see how you can make a difference as well. Be bionic!

Tuesday, March 25, 2008

Taking advantage of universal search, part 2



Universal search and personalized search were two of the hot topics at SMX West last month. Many webmasters wanted to know how these evolutions in search influence the way their content appears in search results, and how they can use these features to gain more relevant search traffic. We posted several recommendations on how to take advantage of universal search last year. Here are a few additional tips:
  1. Local search: Help nearby searchers find your business.
    Of the various search verticals, local search was the one we heard the most questions about. Here are a few tips to help business owners get the most out of local search:
  2. Video search: Enhance your video results.
    Several site owners asked whether they could specify a preferred thumbnail image for videos when they appear in search results. Good news: our Video Sitemaps protocol lets you suggest a thumbnail for each video.
  3. Personalized search basics
    A few observations from Googler Phil McDonnell:
    • Personalization of search results is usually accomplished through subtle ranking changes, rather than a drastic rearrangement of results. You shouldn't worry about personalization radically altering your site's ranking for a particular query.
    • Targeting a niche, or filling a very specific need, may be a good way to stand out in personalized results. For example, rather than creating a site about "music," you could create a site about the musical history of Haiti. Or about musicians who recorded with Elton John between 1969-1979.
    • Some personalization is based on the geographic location of the searcher; for example, a user searching for [needle] in Seattle is more likely to get search results about the Space Needle than, say, a searcher in Florida. Take advantage of features like Local Business Center and geographic targeting to let us know whether your website is especially relevant to searchers in a particular location.
    • As always, create interesting, unique and compelling content or tools.
  4. Image search: Increase your visibility.
    One panelist presented a case study in which a client's images were being filtered out of search results by SafeSearch because they had been classified as explicit. If you find yourself in this situation and believe your site should not be filtered by SafeSearch, use this contact form to let us know. Select the Report a problem > Inappropriate or irrelevant search results option and describe your situation.
Feel free to leave a comment if you have other tips to share!

Monday, March 24, 2008

Join us for an online live chat this Friday!

Many of us Webmaster Help Group guides have happily gotten to meet Group members at various functions around the world. Over sandwiches in San Jose. Salads in Stockholm. Sweets in Sydney. And it's been super!

But some of you are hard to find and so we haven't had the pleasure of chatting with you. Therefore, we've decided to visit you right through your beloved monitor screen.

On Friday, March 28 at 9am PDT / noon EDT / 16:00 GMT, we'll be having our first-ever all-group live chat, where you'll have a chance to hear and see us answer some of your most pressing questions. All that's required is a phone (we'll pay for the call), a sufficiently-modern web browser, and an internet connection.

We'll be posting a "sticky note" with more details in the Random Chitchat section of the Group a day or two before this online meetup, and we're looking forward to chatting with you soon!

Talkatively yours,
Adam and the English Webmaster Help Guides

EDITED ON MARCH 26 TO ADD:
More information about the chat is now available on this page.

Thursday, March 20, 2008

Good housekeeping



Today's the first day of spring in the Northern Hemisphere, so now is a perfect time to start your spring cleaning. But as a webmaster, your chores don't end after you've cleaned the garage -- you'll probably also want to do some cleaning on your server as well.

Exterior
Before we get to the interior, step outside, and see how your site looks from the street -- or in Google search results. Just head on over to your nearest Google search box, and do a site search on your site using the query format [site:example.com]. Just like you keep your street number visible on your house, and maybe even your name on the mailbox, check to see that your visitors can easily identify your site and its contents from the title and snippet listed in Google. If you'd like to improve your current appearance, try out the content analysis feature in Webmaster Tools, and read up on how to influence your snippets.


Speaking of making your address visible, how are you listed? My name is Michael, but I'll also answer to Mike or even Wysz. However, I only expect to be listed once in the phone book. Similarly, your site may have pages that can be accessed from multiple URLs: for instance, www.example.com and example.com. To consolidate your site's listings in Google, use 301 redirects to tell Google (and other search engines) how you'd prefer your pages to be listed. You can also easily let Google know about your preferred domain via Webmaster Tools. And just like I'd want my bank to understand that deposits to Mike and Michael should route to the same account, those redirects can help Google appropriately consolidate link properties (like PageRank) to the destination page.

Interior
No matter how clean your home is, all that work may go unnoticed if your visitors can't get in the door or find their way around. Review your site's appearance and functionality on multiple browsers to make sure that all of your visitors get the experience you've worked so hard to design. Not everyone uses Internet Explorer, so it's a good idea to test using browsers representing different layout engines. Firefox, Safari, and Opera all see things differently, and these three browsers likely control how at least 20% of your users are experiencing the web. For some sites it can be dramatically higher -- The New York Times recently reported that around 38% of their online readers used either Firefox or Safari.

If your site requires the use of plug-ins, check to see how this additional content behaves across different operating systems. Keep in mind that many people only update their operating system with the purchase of a new computer, so go back a version or two and see how your site works on yesterday's OS. And to make sure you're not completely shutting out visitors with limited capabilities, try to navigate your site without using images, Flash or JavaScript. If you want to see where Google may be having trouble getting in, check Webmaster Tools to see if there have been any crawl errors reported for your site.

Taking out the trash
Unfortunately, many of us have hosted unwelcome guests. If they left a mess behind, do your future visitors a favor and get rid of the garbage. Tear out spammed guestbook pages. Pull out those weeds in your forum that were planted by an off-topic advertiser. And while you're throwing stuff away, look out for any blank or abandoned pages. We've all had projects in the basement that never got finished. If your site still shows URLs with one of those circa-1997 "under construction" graphics or templates showing "Products > Shirts > Graphic T's: There are no graphic t's at this time" and they're just gathering dust, it's probably safe to say you'll never get around to finishing it. After you've collected the junk and corrected any broken links on your site, make sure you let everyone know it's really gone by using the 404 HTTP status code. You can check to see which code your server is returning by using the Live HTTP Headers extension for Firefox.

Security and preventive maintenance
To prevent problems with future visitors, especially those who may try to come in your back door at night, go through our checklist to verify you've covered security basics.

If your site's maintenance tasks, such as upgrading software packages, make your content temporarily unavailable, let your visitors know to "pardon the dust" by using the 503 HTTP status code. This will let Google know to check back later, and not index your error page as part of your site's content. If you're using WordPress, you can easily set up your message along with the status code using the Maintenance Mode plug-in.

And speaking of intruders and software updates, you just never know when something will go wrong. Before something does happen, now is a great time to evaluate your backup strategy. Like insurance for your home, the effort and expense put into it is well worth the peace of mind alone, not to mention if you ever actually need it. A good backup system archives your backups in a different location than the working site, and happens automatically to avoid the problems of forgetfulness. It's a great idea to make a backup of your site (including databases) right before running any software updates or making a major change.

Tuesday, March 18, 2008

SES London Calling!


February is that time of the year: the Search Engine Strategies conference hits London! A few of us were there to meet webmasters and search engine representatives to talk about the latest trends and issues in the search engine world.

It was a three-day marathon full of interesting talks - and of course, we heard a lot of good questions in between the sessions! If you didn't get a chance to talk with us, fear not: we've pulled together some of the best questions we encountered. You can find a few of them below, and an additional set in our Webmaster Help Group. Please join the discussion!

Why should I upload a Sitemap to Google Webmaster Tools, if my site is crawled just fine?

All sites can benefit from submitting a Sitemap to Google Webmaster Tools. You may help us to do a better job of crawling and understanding your site, especially if it has dynamic content or a complicated architecture.

Besides, you will have access to more information about your site, for example the number of pages from your Sitemaps that are indexed by Google, any errors Google found with your Sitemap, as well as warnings about potential problems. Also, you can submit specialized Sitemaps for certain types of content including Video, Mobile, News and Code.
More information about the benefits of submitting a Sitemap to Google Webmaster Tools can be found here.

How do you detect paid links? If I want to stay on the safe side, should I use the "nofollow" attribute on all links?

We blogged about our position on paid links and the use of nofollow a few months ago. You may also find it interesting to read this thread in our Help Group about appropriate uses of the nofollow attribute.

How do I associate my site with a particular country/region using Google Webmaster Tools? Can I do this for a dynamic website?

The instructions in our Help Center explain that you can associate a country or region to an entire domain, individual subdomains or subdirectories. A quick tip: if, for instance, you are targeting the UK market, better ways of structuring your site would be example.co.uk, uk.example.com, or example.com/uk/. Google can geolocate all of those patterns.

If your domain name has no regional significance, such as www.example.com, you can still associate your website with a country or region. To do that you will need to verify the domain, or the subdomains and/or subdirectories one by one in your Webmaster Tools account and then associate each of them with a country/region. However, for the moment we don't support setting a geographical target for patterns that can't be verified such as, for example, www.example.com/?region=countrycode.

I have a news site and it is not entirely crawled. Why? Other crawlers had no problem crawling us...

First off, make sure that nothing prevents us from crawling your news site - the architecture of your site or the robots.txt file. Also, we suggest you sign up for Webmaster Tools and submit your content. We specifically have the News Sitemap protocol for sites offering this type of content. If you take advantage of this feature, we can give you more information on which URLs we had trouble with and why. It really rocks!

A quick note to conclude: the lively, international environment of SES is always incredible. I have had a lot of interesting conversations in English, as well as in Italian, French and Spanish. Fellow Googlers chatted with webmasters in English, Danish, Dutch, German and Hungarian. That's amazing - and a great opportunity to get to know each other better, in the language you speak! So next time you wonder how Google Universal Search works in English or you're concerned about Google News Search in German, don't hesitate; grab us for a chat or write to us!

Saturday, March 15, 2008

Tips for making information universally accessible





Many people talk about the effect the Internet has on democratizing access to information, but as someone who has been visually impaired since my teenage years, I can certainly speak to the profound impact it has had on my life.

In everyday life, things like a sheet of paper—and anything written on it—are completely inaccessible to a blind or visually impaired user. But with the Internet a new world has opened up for me and so many others. Thanks to modern technology like screen readers, web pages, books, and web applications are now at our fingertips.

In order to help the visually impaired find the most relevant, useful information on the web, and as quickly as possible, we developed Accessible Search. Google Accessible Search identifies and prioritizes search results that are more easily used by blind and visually impaired users – that means pages that are clean and simple (think of the Google homepage!) and that can load without images.

Why should you take the time to make your site more accessible? In addition to the service you'll be doing for the visually-impaired community, accessible sites are more easily crawled, which is a first step in your site's ability to appear in search results.

So what can you do to make your sites more accessible? Well first of all, think simple. In its current version, Google Accessible Search looks at a number of signals by examining the HTML markup found on a web page. It tends to favor pages that degrade gracefully: pages with few visual distractions and that are likely to render well with images turned off. Flashing banners and dancing animals are probably the worst thing you could put on your site if you want its content to be read by an adaptive technology like a screen reader.

Here are some basic tips:
  1. Keep web pages easy to read, avoiding visual clutter and ensuring that the primary purpose of the web page is immediately accessible with full keyboard navigation.

  2. There are many organizations and online resources that offer website owners and authors guidance on how to make websites and pages more accessible for the blind and visually impaired. The W3C publishes numerous guidelines including Web Content Access Guidelines that are helpful for website owners and authors.

  3. As with regular search, the best thing you can do with respect to making your site rank highly is to create unique, compelling content. In fact, you can think of the Google crawler as the world's most influential blind user. The content that matters most to the Googlebot is the content that matters most to the blind user: good, quality text.

  4. It's also worth reviewing your content to see how accessible it is for other end users. For example, try browsing your site on a monochrome display or try using your site without a mouse. You may also consider your site's usability through a mobile device like a Blackberry or iPhone.

Fellow webmasters, thanks for taking the time to better understand principles of accessibility. In my next post I'll talk about how to make sure that critical site features, like site navigation, are accessible. Until then!

Friday, March 14, 2008

German Webmaster Blog turns one

Written by Juliane Stiller, Search Quality


Our German Webmaster Central Blog celebrates its first birthday and we'd like to raise our glasses to 57 published posts in the last year! We enjoy looking back at an exciting first year of blogging and communicating with webmasters. It's the growing webmaster community that made this blog a success. Thanks to our readers for providing feedback on our blog posts and posting in the German Webmaster Help group.

Over the past year, we published numerous articles specifically targeted for the German market - topics varying from affiliate programs to code snippets. We also translated many of the applicable English posts for the German blog. If you speak German (Hallo!) come check out the German Webmaster Blog and subscribe to our feed or email alert.

Hope to see you soon,
Juliane Stiller on behalf of the German Webmaster Communication Team

Tuesday, March 11, 2008

Webmaster Tools keeps your "messages waiting"



We’re happy to announce that the Message Center supports a new “messages waiting” feature. Previously, it could only store penalty notifications for existing verified site owners (webmasters who had already verified their sites). However, the Message Center now has the ability to keep these waiting for future owners, i.e. those who haven’t previously registered with Google's Webmaster Tools.

Creating a new Webmaster Tools account and verifying your site gives you access to any message from Google concerning violations of our Webmaster Guidelines. Messages sent after the launch of this feature can now be retrieved for one year and will remain in your account until you choose to delete them.

Some questions you might be asking:

Q: What happens to old messages when a site changes ownership?
A: Also in the case of a change of ownership, new verified owners will be able to retrieve a message as noted above.

Q: If a site has more than one verified owner and one of them deletes a message, will it be deleted for all the other site owners as well?
A: No, each owner gets his or her own copy of the message when retrieving the message. Deleting one does not affect any past, current, or future message retrievals.

Just as before, if you've received a message alerting you to Webmaster Guidelines violations, you can make the necessary changes so that your site is in line with our guidelines. Then, sign in to Webmaster Tools and file a reconsideration request.

Wednesday, March 5, 2008

First date with the Googlebot: Headers and compression




googlebot with flowers
Name/User-Agent: Googlebot
IP Address: Verify it here
Looking For: Websites with unique and compelling content
Major Turn Off: Violations of the Webmaster Guidelines
Googlebot -- what a dreamboat. It's like he knows us <head>, <body>, and soul.  He's probably not looking for anything exclusive; he sees billions of other sites (though we share our data with other bots as well :), but tonight we'll really get to know each other as website and crawler.

I know, it's never good to over-analyze a first date. We're going to get to know Googlebot a bit more slowly, in a series of posts:
  1. Our first date (tonight!): Headers Googlebot sends, file formats he "notices," whether it's better to compress data
  2. Judging his response: Response codes (301s, 302s), how he handles redirects and If-Modified-Since
  3. Next steps: Following links, having him crawl faster or slower (so he doesn't come on too strong)
And tonight is just the first date...

***************
Googlebot:  ACK
Website:  Googlebot, you're here!
Googlebot:  I am.

GET / HTTP/1.1
Host: example.com
Connection: Keep-alive
Accept: */*
From: googlebot(at)googlebot.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Accept-Encoding: gzip,deflate

Website:  Those headers are so flashy! Would you crawl with the same headers if my site were in the U.S., Asia or Europe? Do you ever use different headers?

Googlebot:  My headers are typically consistent world-wide. I'm trying to see what a page looks like for the default language and settings for the site. Sometimes the User-Agent is different, for instance AdSense fetches use "Mediapartners-Google":
  User-Agent: Mediapartners-Google

Or for image search:
  User-Agent: Googlebot-Image/1.0

Wireless fetches often have carrier-specific user agents, whereas Google Reader RSS fetches include extra info such as number of subscribers.

I usually avoid cookies (so no "Cookie:" header) since I don't want the content affected too much by session-specific info. And, if a server uses a session id in a dynamic URL rather than a cookie, I can usually figure this out, so that I don't end up crawling your same page a million times with a million different session ids.


Website:  I'm very complex. I have many file types. Your headers say "Accept: */*". Do you index all URLs or are certain file extensions automatically filtered?

Googlebot:  That depends on what I'm looking for.

If I'm indexing for regular web search, and I see links to MP3s and videos, I probably won't download those. Similarly, if I see a JPG, I will treat it differently than an HTML or PDF link. For instance, JPG is much less likely to change frequently than HTML, so I will check the JPG for changes less often to save bandwidth. Meanwhile, if I'm looking for links as Google Scholar, I'm going to be far more interested in the PDF article than the JPG file. Downloading doodles (like JPGs) and videos of skateboarding dogs is distracting for a scholar—do you agree?

Website:  Yes, they can be distracting. I'm in awe of your dedication. I love doodles (JPGs) and find them hard to resist.

Googlebot:  Me, too; I'm not always so scholarly. When I crawl for image search, I'm very interested in JPGs. And for news, I'm mostly looking at HTML and nearby images.

There are also plenty of extensions (exe, dll, zip, dmg...), that tend to be big and less useful for a search engine.


Website:  If you saw my URL, http://www.example.com/page1.LOL111, would you (whimper whimper) reject it just because it contains an unknown file extension?

Googlebot:  Website, let me give a bit more background. After actually downloading a file, I use the Content-Type header to check whether it really is HTML, an image, text, or something else. If it's a special data type like a PDF file, Word document, or Excel spreadsheet, I'll make sure it's in the valid format and extract the text content. Maybe it has a virus; you never know. If the document or data type is really garbled, there's usually not much to do besides discard the content.

So, if I'm crawling http://www.example.com/page1.LOL111 with an unknown file extension, it's likely that I would start to download it. If I can't figure out the content type from the header, or it's a format that we don't index (e.g. mp3), then it'll be put aside. Otherwise, we proceed indexing the file.


Website:  My apologies for scrutinizing your style, Googlebot, but I noticed your Accept-Encoding headers say:
Accept-Encoding: gzip,deflate

Can you explain these headers to me?

Googlebot:  Sure. All major search engines and web browsers support gzip compression for content to save bandwidth. Other entries that you might see here include "x-gzip" (the same as "gzip"), "deflate" (which we also support), and "identity" (none).


Website:  Can you talk more about file compression and "Accept-Encoding: gzip,deflate"? Many of my URLs consist of big Flash files and stunning images, not just HTML. Would it help you to crawl faster if I compressed my larger files?

Googlebot:  There's not a simple answer to this question. First of all, many file formats, such as swf (Flash), jpg, png, gif, and pdf are already compressed (there are also specialized Flash optimizers).

Website: Perhaps I've been compressing my Flash files and I didn't even know? I'm obviously very efficient.

Googlebot:  Both Apache and IIS have options to enable gzip and deflate compression, though there's a CPU cost involved for the bandwidth saved. Typically, it's only enabled for easily compressible text HTML/CSS/PHP content. And it only gets used if the user's browser or I (a search engine crawler) allow it. Personally, I prefer "gzip" over "deflate". Gzip is a slightly more robust encoding — there is consistently a checksum and a full header, giving me less guess-work than with deflate. Otherwise they're very similar compression algorithms.

If you have some spare CPU on your servers, it might be worth experimenting with compression (links: Apache, IIS). But, if you're serving dynamic content and your servers are already heavily CPU loaded, you might want to hold off.


Website:  Great information. I'm really glad you came tonight — thank goodness my robots.txt allowed it. That file can be like an over-protective parent!

Googlebot:  Ah yes; meeting the parents, the robots.txt. I've met plenty of crazy ones. Some are really just HTML error pages rather than valid robots.txt. Some have infinite redirects all over the place, maybe to totally unrelated sites, while others are just huge and have thousands of different URLs listed individually. Here's one unfortunate pattern. The site is normally eager for me to crawl:
  User-Agent: *
  Allow: /


Then, during a peak time with high user traffic, the site switches the robots.txt to something restrictive:
  # Can you go away for a while? I'll let you back
  # again in the future. Really, I promise!
  User-Agent: *
  Disallow: /


The problem with the above robots.txt file-swapping is that once I see the restrictive robots.txt, I may have to start throwing away content I've already crawled in the index. And then I have to recrawl a lot of content once I'm allowed to hit the site again. At least a 503 response code would've been temporary.

I typically only re-check robots.txt once a day (otherwise on many virtual hosting sites, I'd be spending a large fraction of my fetches just getting robots.txt, and no date wants to "meet the parents" that often). For webmasters, trying to control crawl rate through robots.txt swapping usually backfires. It's better to set the rate to "slower" in Webmaster Tools.


Googlebot:  Website, thanks for all of your questions, you've been wonderful, but I'm going to have to say "FIN, my love."

Website:  Oh, Googlebot... ACK/FIN. :)

***************