Wednesday, August 29, 2007

Update on penalty notifications

First, a brief recap: In late 2005, we started emailing webmasters to let them know that their site is violating our Webmaster Guidelines and that we have temporarily removed some of their pages from our index. A few months ago we put these emails on hold due to a number of spoofed messages being sent from outside Google, primarily to German webmasters. Then, in mid-July, we launched Message Center in our webmaster console, which allows us to send messages to verified site owners.

While Message Center is great for verified site owners, it doesn't allow us to notify webmasters who aren't registered in Google's Webmaster Tools. For this reason, we plan to resume sending emails in addition to the Message Center notifications. Please note that, as before, our emails will not include attachments. Currently, the Message Center won't keep messages waiting if you haven't previously registered, but we hope to add that feature in the next few months. We'll keep you posted as things change.

Monday, August 27, 2007

Register non-English domain names with Webmaster Tools

I'm happy to announce that Webmaster Tools is expanding support for webmasters outside of the English-speaking world, by supporting Internationalizing Domain Names in Applications (IDNA). IDNA provides a way for site owners to have domains that go beyond the domain name system's limitations of English letters and numbers. Prior to IDNA, Internet host names could only be in the 26 letters of the English alphabet, the numbers 0-9, and the hyphen character. With IDNA support, you'll now be able to add your sites that use other character sets, and organize them easily on your Webmaster Tools Dashboard.

Let's say you wanted to add http://北京大学.cn/ (Peking University) to your Webmaster Tools account before we launched IDNA support. If you typed that in to the "Add Site" box, you'd get back an error message that looks like this:

Some webmasters discovered a workaround. Internally, IDNA converts nicely encoded http://北京大学.cn/ to a format called Punycode, which looks like This allowed them to diagnose and view information about their site, but it looked pretty ugly. Also, if they had more than one IDNA site, you can imagine it would be pretty hard to tell them apart.

Since we now support IDNA throughout Webmaster Tools, all you need to do is type in the name of your site, and we will add it correctly. Here is what it looks like if you attempt to add http://北京大学.cn/ to your account:

If you are one of the webmasters who discovered the workaround previously (i.e., you have had sites listed in your account that look like, those sites will now automatically display correctly.

We'd love to hear your questions and feedback on this new feature; you can write a comment below or post in the Google Webmaster Tools section of our Webmaster Help Group. We'd also appreciate suggestions for other ways we can improve our international support.

Friday, August 17, 2007

Join us at cool SES San Jose - it'll be hot!

As summer inches towards fall and in many places the temperature is still rising, you're probably thinking the best place to be right now is on the beach, by a pool or inside somewhere that's air-conditioned. These are all good choices, but next week there's somewhere else to be that's both hot and cool: the Search Engines Strategies conference in San Jose. In addition to the many tantalizing conference sessions covering diverse topics related to search, there will be refreshments, food, and of course, air-conditioning.
Googlers attending SES San Jose
Additionally, on Tuesday evening at our Mountain View ‘plex we're hosting the “Google Dance” -- where conference attendees can eat, drink, play, dance, and talk about search. During the Google Dance be sure to attend the “Meet the Engineers” event where you’ll be able to meet and have a conversation with 25 or more engineers including Webmaster Central’s own Amanda Camp. Also, if you get a spare minute from merry-making, head over to the Webmaster Tools booth, where you’ll find Maile Ohye offering lots of good advice.

If you’re a night owl, you’ll probably also be interested in the unofficial late-night SES after-parties that you only know about if you talk to the right person. To stem the potential barrage of “where’s the party” questions, I'd like to make it clear that I unfortunately am not the right person. But if you happen to be someone who’s organizing a late night party, please consider inviting me. ;)

"Enough about the parties -- what about the conference?," you ask. As you would expect, Google will be well-represented at the conference. Here is a sampling of the Search-related sessions at which Googlers will be speaking:

Universal & Blended Search
Monday, August 20
David Baile

Personalization, User Data & Search
Monday, August 20
2:00 - 3:30pm
Sep Kamvar

Searcher Behavior Research Update
Monday, August 20
4:00 - 5:30pm
Oliver Deighton

Are Paid Links Evil?
Tuesday, August 21
4:45 - 6:00pm
Matt Cutts

Keynote Conversation
Wednesday, August 22
9:00 - 9:45am
Marissa Mayer

Search APIs
Wednesday, August 22
10:30am - 12:00pm
Jon Diorio

SEO Through Blogs & Feeds
Wednesday, August 22
10:30am - 12:00pm
Rick Klau

Duplicate Content & Multiple Site Issues
Wednesday, August 22
1:30 - 2:45pm
Greg Grothaus

CSS, AJAX, Web 2.0 & Search Engines
Wednesday, August 22
3:15 - 4:30pm
Amanda Camp

Search Engine Q&A On Links
Wednesday, August 22
4:45 - 6:00pm
Shashi Thakur

Meet the Crawlers
Thursday, August 23
10:45am - 12:00pm
Evan Roseman

We will also have a large presence in the conference expo hall where members of the Webmaster Central Team like Susan Moskwa and I will be present at the Webmaster Tools booth to answer questions, listen to your thoughts and generally be there to chat about all things webmaster related. Bergy and Wysz, two more of us who tackle tough questions in the Webmaster Help Groups, will be offering assistance at the Google booth (live and in person, not via discussion thread).

If you're reading this and thinking, "I should go and grab the last frozen juice bar in the freezer," I suggest that you save that frozen juice bar for when you return from the conference and find that your brain's overheating from employing all the strategies you've learned and networking with all the people you've met.

Joking aside, we are psyched about the conference and hope to see you there. Save a cold beverage for me!

Wednesday, August 15, 2007

New robots.txt feature and REP Meta Tags

We've improved Webmaster Central's robots.txt analysis tool to recognize Sitemap declarations and relative URLs. Earlier versions weren't aware of Sitemaps at all, and understood only absolute URLs; anything else was reported as Syntax not understood. The improved version now tells you whether your Sitemap's URL and scope are valid. You can also test against relative URLs with a lot less typing.

Reporting is better, too. You'll now be told of multiple problems per line if they exist, unlike earlier versions which only reported the first problem encountered. And we've made other general improvements to analysis and validation.

Imagine that you're responsible for the domain and you want search engines to index everything on your site, except for your /images folder. You also want to make sure your Sitemap gets noticed, so you save the following as your robots.txt file:

disalow images

user-agent: *


You visit Webmaster Central to test your site against the robots.txt analysis tool using these two test URLs:

Earlier versions of the tool would have reported this:

The improved version tells you more about that robots.txt file:

We also want to make sure you've heard about the new unavailable_after meta tag announced by Dan Crow on the Official Google Blog a few weeks ago. This allows for a more dynamic relationship between your site and Googlebot. Just think, with, any time you have a temporarily available news story or limited offer sale or promotion page, you can specify the exact date and time you want specific pages to stop being crawled and indexed.

Let's assume you're running a promotion that expires at the end of 2007. In the headers of page, you would use the following:

CONTENT="unavailable_after: 31-Dec-2007 23:59:59 EST">

The second exciting news: the new X-Robots-Tag directive, which adds Robots Exclusion Protocol (REP) META tag support for non-HTML pages! Finally, you can have the same control over your videos, spreadsheets, and other indexed file types. Using the example above, let's say your promotion page is in PDF format. For, you would use the following:

X-Robots-Tag: unavailable_after: 31 Dec
2007 23:59:59 EST

Remember, REP meta tags can be useful for implementing noarchive, nosnippet, and now unavailable_after tags for page-level instruction, as opposed to robots.txt, which is controlled at the domain root. We get requests from bloggers and webmasters for these features, so enjoy. If you have other suggestions, keep them coming. Any questions? Please ask them in the Webmaster Help Group.

Monday, August 13, 2007

Malware reviews via Webmaster Tools

In the past year, the number of sites affected by malware/badware grew from a handful a week to thousands per week. We noted your suggestions to improve communication for webmasters of affected sites -- suggestions mentioned in our earlier blog post "About badware warnings" as well as the stopbadware discussion group. Now, Webmaster Tools provides malware reviews.

If you find that your site is affected by malware, either through malware-labeled search results or in the summary for your site in Webmaster Tools, we've streamlined the process to review your site and return it malware-label-free in our search results:
  1. View a sample of the dangerous URLs on your site in Webmaster Tools.
  2. Make any necessary changes to your site according to's Security tips.
  3. New: Request a malware review from Google and we'll evaluate your site.
  4. New: Check the status of your review.
    • If we feel the site is still harmful, we'll provide an updated list of remaining dangerous URLs
    • If we've determined the site to be clean, you can expect removal of malware messages in the near future (usually within 24 hours).

We encourage all webmasters to become familiar with Stopbadware's malware prevention tips. If you have additional questions, please review our documentation or post to the discussion group. We hope you find this new feature in Webmaster Tools useful in discovering and fixing any malware-related problems, and thanks for your diligence for awareness and prevention of malware.

Thursday, August 2, 2007

Server location, cross-linking, and Web 2.0 technology thoughts

Held on June 27th, Searchnomics 2007 gave us (Greg Grothaus and Shashi Thakur) a chance to meet webmasters and answer some of their questions. As we're both engineers focused on improving search quality, the feedback was extremely valuable. Here's our take on the conference and a recap of some of what we talked about there.

Shashi: While I've worked at Google for over a year, this was my first time speaking at a conference. I spoke on the "Search Engine Friendly Design" panel. The exchanges were hugely valuable, helping me grasp some of the concerns of webmasters. Greg and I thought it would be valuable to share our responses to a few questions:

Does location of server matter? I use a .com domain but my content is for customers in the UK.

In our understanding of web content, Google considers both the IP address and the top-level domain (e.g. .com, Because we attempt to serve geographically relevant content, we factor domains that have a regional significance. For example, " " domains are likely very relevant for user queries originating from the UK. In the absence of a significant top-level domain, we often use the web server's IP address as an added hint in our understanding of content.

I have many different sites. Can I cross-link between them?

Before you begin cross-linking sites, consider the user's perspective and whether the crosslinks provide value. If the sites are related in business -- e.g., an auto manual site linking to an auto parts retail site, then it could make sense -- the links are organic and useful. Cross-linking between dozens or hundreds of sites, however, probably doesn't provide value, and I would not recommend it.

Greg: Like Shashi, this was also my first opportunity to speak at a conference as a Googler. It was refreshing to hear feedback from the people who use the software we work every day to perfect. The session also underscored the argument that we're just at the beginning of search and have a long way to go. I spoke on the subject of Web 2.0 technologies. It was clear that many people are intimidated by the challenges of building a Web 2.0 site with respect to search engines. We understand these concerns. You should expect see more feedback from us on this subject, both at conferences and through our blog.

Any special guidance for DHTML/AJAX/Flash documents?

It's important to make sure that content and navigation can be rendered/negotiated using only HTML. So long as the content and navigation are the same for search crawlers and end users, you're more than welcome to use advanced technologies such as Flash and/or Javascript to improve the user experience using a richer presentation. In "Best uses of Flash," we wrote in more detail about this, and are working on a post about AJAX technology.