Tuesday, August 26, 2008

The Impact of User Feedback, Part 2 (and more Popular Picks!)

As a follow-up to my recent post about how user reports of webspam and paid links help improve Google's search results for millions of users, I wanted to highlight one of the most essential parts of Google Webmaster Central: our Webmaster Help Group. With over 37,000 members in our English group and support in 15 other languages, the group is the place to get your questions answered regarding crawling and indexing or Webmaster Tools. We're thankful for a fabulous group of Bionic Posters who have dedicated their time and energy to making the Webmaster Help Group a great place to be. When appropriate, Googlers, including myself, jump in to clarify issues or participate in the dialogue. One thing to note: we try hard to read most posts in the group, and although we may not respond to each one, your feedback and concerns help drive the features we work on. Here are a few examples:

Sitemap detailsSubmitting a Sitemap through Webmaster Tools is one way to let know Google know about what pages exist on your site. Users were quick to note that even though they submitted a Sitemap of all the pages on their site, they only found a sampling of URLs indexed through a site: search. In response, the Webmaster Tools team created a Sitemaps details page to better tell you how your Sitemap was processed. You can read a refresher about the Sitemaps details page in Jonathan's blog post.

Contextual help
One request we received early on with Webmaster Tools was for better documentation on the data displayed. We saw several questions about meta description and title tag issues using our Content Analysis tool, which led us to beef up our documentation on that page and link to that Help Center article directly from that page. Similarly, we discovered that users needed clarification on the distinction between "top search queries" and "top clicked queries" and how the data can be used. We added an expandable section entitled "How do I use this data?" and placed contextual help information across Webmaster Tools to explain what each feature is and where to get more information about it.

Blog posts
The Webmaster Help Group is also a way for us to keep a pulse on what overarching questions are on the minds of webmasters so we can address some of those concerns through this blog. Whether it's how to submit a reconsideration request using Webmaster Tools, deal with duplicate content, move a site, or design for accessibility, we're always open to hearing more about your concerns in the Group. Which reminds me...

It's time for more Popular Picks!
Last year, we devoted two weeks to soliciting and answering five of your most pressing webmaster-related questions. These Popular Picks covered the following topics:
Seeing as this was a well-received initiative, I'm happy to announce that we're going to do it again. Head on over to this thread to ask your webmaster-related questions. See you there!

Friday, August 22, 2008


Since both tennis and table tennis are in the Olympics, perhaps you're wondering: if there's soccer, why not "table soccer?" Of course, we know table soccer by another name; and while foosball may not be an Olympic sport, we still cheered Nathan Johns and Jan Backes—two members of our Search Quality team—as they brought home the foosball silver medal at the search engine foosball smackdown at SES San Jose.

"Smackdown" doesn't quite equate to "Olympics," but check out the intensity—you could hear a pin drop!

silver medalists at foosball

The gold medal (cup) went to the search engine down the road. :)

gold medalists at foosball
Yahoo's first place winners Daniel Wong and Jake Rosenberg.

Just to be sure they weren't ringers, I quizzed Daniel and Jake, "How can you prevent a file from being crawled?" They correctly answered, "robots.txt."

Gold cup well deserved.

Thursday, August 21, 2008

Hey Google, I no longer have badware

This post is for anyone who has been emailed or notified by Google about badware, received a badware warning when browsing their own site using Firefox, or has come across malware-labeled search results for their own site(s).  As you know, these warnings are produced by our automated scanning systems, which we've put in place to ensure the quality of our results by protecting our users.  Whatever the case, if you are dealing with badware, here are a few recommendations that can help you out. 

1.  If you have badware, it usually means that your web server, your website, or a database used by your website has been compromised. We have a nifty post on how to handle being hacked.  Be very careful when inspecting for malware on your site so as to avoid exposing your computer to infection.

2. Once everything is clear and dandy, you can follow the steps in our post about malware reviews via Webmaster Tools. Please note the screen shot on the previous post is outdated, and the new malware review form is on the Overview page and looks like this:

  • Other programs, such as Firefox, also use our badware data and may not recognize the change immediately due to their caching of the data.  So even if the badware label in search is removed, it may take some time for that to be visible in such programs.

3. Lastly, if you believe that your rankings were somehow affected by the malware, such as compromised content that violated our Webmaster Guidelines [i.e. hacked pages with hidden pharmacy text links], you should fill out a reconsideration request. To clarify, reconsideration requests are usually used for when you notice issues stemming from violations of our Webmaster Guidelines and are separate from malware requests.

If you have additional questions, please review our documentation or post to the discussion group with the URL of your site. We hope you find this updated feature in Webmaster Tools useful in discovering and fixing any malware-related problems. 

Tuesday, August 19, 2008

Make your 404 pages more useful

Your visitors may stumble into a 404 "Not found" page on your website for a variety of reasons:
  • A mistyped URL, or a copy-and-paste mistake
  • Broken or truncated links on web pages or in an email message
  • Moved or deleted content
Confronted by a 404 page, they may then attempt to manually correct the URL, click the back button, or even navigate away from your site. As hinted in an earlier post for "404 week at Webmaster Central", there are various ways to help your visitors get out of the dead-end situation. In our quest to make 404 pages more useful, we've just added a section in Webmaster Tools called "Enhance 404 pages". If you've created a custom 404 page this allows you to embed a widget in your 404 page that helps your visitors find what they're looking for by providing suggestions based on the incorrect URL.

Example: Jamie receives the link in an email message. Because of formatting due to a bad email client, the URL is truncated to As a result it returns a 404 page. With the 404 widget added, however, she could instead see the following:

In addition to attempting to correct the URL, the 404 widget also suggests the following, if available:
  • a link to the parent subdirectory
  • a sitemap webpage
  • site search query suggestions and search box

How do you add the widget? Visit the "Enhance 404 pages" section in Webmaster Tools, which allows you to generate a JavaScript snippet. You can then copy and paste this into your custom 404 page's code. As always, don't forget to return a proper 404 code.

Can you change the way it looks? Sure. We leave the HTML unstyled initially, but you can edit the CSS block that we've included. For more information, check out our guide on how to customize the look of your 404 widget.

This feature is currently experimental -- we might not provide corrections and suggestions for your site but we'll be working to improve the coverage. In the meantime, let us know what you think in the comments below or in our group discussion. Thanks for helping us make the Internet a more friendly place!

Friday, August 15, 2008

More on 404

Now that we've bid farewell to soft 404s, in this post for 404 week we'll answer your burning 404 questions.

How do you treat the response code 410 "Gone"?
Just like a 404.

Do you index content or follow links from a page with a 404 response code?
We aim to understand as much as possible about your site and its content. So while we wouldn't want to show a hard 404 to users in search results, we may utilize a 404's content or links if it's detected as a signal to help us better understand your site.

Keep in mind that if you want links crawled or content indexed, it's far more beneficial to include them in a non-404 page.

What about 404s with a 10-second meta refresh?
Yahoo! currently utilizes this method on their 404s. They respond with a 404, but the 404 content also shows:

<meta http-equiv="refresh" content="10;url=">

We feel this technique is fine because it reduces confusion by giving users 10 seconds to make a new selection, only offering the homepage after 10 seconds without the user's input.

Should I 301-redirect misspelled 404s to the correct URL?
Redirecting/301-ing 404s is a good idea when it's helpful to users (i.e. not confusing like soft 404s). For instance, if you notice that the Crawl Errors of Webmaster Tools shows a 404 for a misspelled version of your URL, feel free to 301 the misspelled version of the URL to the correct version.

For example, if we saw this 404 in Crawl Errors:  <-- typo for "webmasters"

we may first correct the typo if it exists on our own site, then 301 the URL to the correct version (as the broken link may occur elsewhere on the web):

Have you guys seen any good 404s?
Yes, we have! (Confession: no one asked us this question, but few things are as fun to discuss as response codes. :) We've put together a list of some of our favorite 404 pages. If you have more 404-related questions, let us know, and thanks for joining us for 404 week!
"If you're looking for an item that's no longer stocked (as I was), this makes it really easy to find an alternative."
-Riona, domestigeek
"Blame the robot monkeys"
-Reid, tells really bad jokes
"Boost your 'Time on site' metrics with a 404 page like this."
-Susan, dabbler in music and Analytics
"It's not reassuring, but it's definitive."
-Jonathan, has trained actual spiders to build websites, ants handle the 404s
"Good with respect to usability."
"At least there's a mailbox."
-JohnMu, adventurous
"It's pretty cute. :)"
-Jessica, likes cute things
"Flow charts rule."
-Sahala, internet traveller
"I can has useful links and even e-mail address for questions! But they could have added 'OH NOES! IZ MISSING PAGE! MAYBE TIPO OR BROKN LINKZ?' so folks'd know what's up."
-Adam, lindy hop geek

Tuesday, August 12, 2008

Farewell to soft 404s

We see two kinds of 404 ("File not found") responses on the web: "hard 404s" and "soft 404s." We discourage the use of so-called "soft 404s" because they can be a confusing experience for users and search engines. Instead of returning a 404 response code for a non-existent URL, websites that serve "soft 404s" return a 200 response code. The content of the 200 response is often the homepage of the site, or an error page.

How does a soft 404 look to the user? Here's a mockup of a soft 404: This site returns a 200 response code and the site's homepage for URLs that don't exist.

As exemplified above, soft 404s are confusing for users, and furthermore search engines may spend much of their time crawling and indexing non-existent, often duplicative URLs on your site. This can negatively impact your site's crawl coverage—because of the time Googlebot spends on non-existent pages, your unique URLs may not be discovered as quickly or visited as frequently.

What should you do instead of returning a soft 404?
It's much better to return a 404 response code and clearly explain to users that the file wasn't found. This makes search engines and many users happy.

Return 404 response code

Return clear message to users

Can your webserver return 404, but send a helpful "Not found" message to the user?
Of course! More info as "404 week" continues!

Monday, August 11, 2008

It's 404 week at Webmaster Central

This week we're publishing several blog posts dedicated to helping you with one response code: 404.

Response codes are a numeric status (like 200 for "OK", 301 for "Moved Permanently") that a webserver returns in response to a request for a URL. The 404 response code should be returned for a file "Not Found".

When a user sends a request for your webpage, your webserver looks for the corresponding file for the URL. If a file exists, your webserver likely responds with a 200 response code along with a message (often the content of the page, such as the HTML).

200 response code flow chart

So what's a 404? Let's say that in the link to "Visit Google Apps" above, the link is broken because of a typing error when coding the page. Now when a user clicks "Visit Google Apps", the particular webpage/file isn't located by the webserver. The webserver should return a 404 response code, meaning "Not Found".

404 response code flow chart

Now that we're all on board with the basics of 404s, stay tuned 4 even more information on making 404s good 4 users and 4 search engines.

Thursday, August 7, 2008

How to start a multilingual site

Have you ever thought of creating one or several sites in different languages? Let's say you want to start a travel site about backpacking in Europe, and you want to offer your content to English, German, and Spanish speakers. You'll want to keep in mind factors like site structure, geographic as well as language targeting, and content organization.

Site structure
The first thing you'll want to consider is if it makes sense for you to buy country-specific top-level domains (TLD) for all the countries you plan to serve. So your domains might be,, and This option is beneficial if you want to target the countries that each TLD is associated with, a method known as geo targeting. Note that this is different from language targeting, which we will get into a little more later. Let's say your German content is specifically for users from Germany and not as relevant for German-speaking users in Austria or Switzerland. In this case, you'd want to register a domain on the .de TLD. German users will identify your site as a local one they are more likely to trust. On the other hand, it can be pretty expensive to buy domains on the country-specific TLDs, and it's more of a pain to update and maintain multiple domains. So if your time and resources are limited, consider buying one non-country-specific domain, which hosts all the different versions of your website. In this case, we recommend either of these two options:
  1. Put the content of every language in a different subdomain. For our example, you would have,, and
  2. Put the content of every language in a different subdirectory. This is easier to handle when updating and maintaining your site. For our example, you would have,, and
Matt Cutts wrote a substantial post on subdirectories and subdomains, which may help you decide which option to go with.

Geographic targeting vs. Language targeting
As mentioned above, if your content is especially targeted towards a particular region in the world, you can use the Set Geographic Target tool in Webmaster Tools. It allows you to set different geographic targets for different subdirectories or subdomains (e.g., /de/ for Germany).

If you want to reach all speakers of a particular language around the world, you probably don't want to limit yourself to a specific geographic location. This is known as language targeting, and in this case, you don't want to use the geographic target tool.

Content organization
The same content in different languages is not considered duplicate content. Just make sure you keep things organized. If you follow one of the site structure recommendations mentioned above, this should be pretty straightforward. Avoid mixing languages on each page, as this may confuse Googlebot as well as your users. Keep navigation and content in the same language on each page.

If you want to check how many of your pages are recognized in a certain language, you can perform a language-specific site search. For example, if you go to and do a site search on, choose the option below the search box to only display German results.
If you have more questions on this topic, you can join our Webmaster Help Group to get more advice.

Tuesday, August 5, 2008

To infinity and beyond? No!

When Googlebot crawls the web, it often finds what we call an "infinite space". These are very large numbers of links that usually provide little or no new content for Googlebot to index. If this happens on your site, crawling those URLs may use unnecessary bandwidth, and could result in Googlebot failing to completely index the real content on your site.

Recently, we started notifying site owners when we discover this problem on their web sites. Like most messages we send, you'll find them in Webmaster Tools in the Message Center. You'll probably want to know right away if Googlebot has this problem - or other problems - crawling your sites. So verify your site with Webmaster Tools, and check the Message Center every now and then.

Examples of an infinite space

The classic example of an "infinite space" is a calendar with a "Next Month" link. It may be possible to keep following those "Next Month" links forever! Of course, that's not what you want Googlebot to do. Googlebot is smart enough to figure out some of those on its own, but there are a lot of ways to create an infinite space and we may not detect all of them.

Another common scenario is websites which provide for filtering a set of search results in many ways. A shopping site might allow for finding clothing items by filtering on category, price, color, brand, style, etc. The number of possible combinations of filters can grow exponentially. This can produce thousands of URLs, all finding some subset of the items sold. This may be convenient for your users, but is not so helpful for the Googlebot, which just wants to find everything - once!

Correcting infinite space issues

Our Webmaster Tools Help article describes more ways infinite spaces can arise, and provides recommendations on how to avoid the problem. One fix is to eliminate whole categories of dynamically generated links using your robots.txt file. The Help Center has lots of information on how to use robots.txt. If you do that, don't forget to verify that Googlebot can find all your content some other way. Another option is to block those problematic links with a "nofollow" link attribute. If you'd like more information on "nofollow" links, check out the Webmaster Help Center.