Friday, August 14, 2009

Specifying an image's license using RDFa

Webmaster Level: All

We recently introduced a new feature on Google Image Search which allows you to restrict your search results to images that have been tagged for free reuse. As a webmaster, you may be interested in how you can let Google know which licenses your images are released under, so I've prepared a brief video explaining how to do this using RDFa markup.

If you have any questions about how to mark up your images, please ask in our Webmaster Help Forum.

Tuesday, August 11, 2009

New tools for Google Services for Websites

Webmaster Level: All
(A nearly duplicate version :) cross-posted on the Official Google Blog)

Earlier this year, we launched Google Services for Websites, a program that helps partners, e.g., web hoster and access providers, offer useful and powerful tools to their customers. By making services, such as Webmaster Tools, Custom Search, Site Search and AdSense, easily accessible via the hoster control panel, hosters can easily enable these services for their webmasters. The tools help website owners understand search performance, improve user retention and monetize their content — in other words, run more effective websites.

Since we launched the program, several hosting platforms have enhanced their offerings by integrating with the appropriate APIs. Webmasters can configure accounts, submit Sitemaps with Webmaster Tools, create Custom Search Boxes for their sites and monetize their content with AdSense, all with a few clicks at their hoster control panel. More partners are in the process of implementing these enhancements.

We've just added new tools to the suite:
  • Web Elements allows your customers to enhance their websites with the ease of cut-and-paste. Webmasters can provide maps, real-time news, calendars, presentations, spreadsheets and YouTube videos on their sites. With the Conversation Element, websites can create more engagement with their communities. The Custom Search Element provides inline search over your own site (or others you specify) without having to write any code and various options to customize further.
  • Page Speed allows webmasters to measure the performance of their websites. Snappier websites help users find things faster; the recommendations from these latency tools allow hosters and webmasters to optimize website speed. These techniques can help hosters reduce resource use and optimize network bandwidth.
  • The Tips for Hosters page offers a set of tips for hosters for creating a richer website hosting platform. Hosters can improve the convenience and accessibility of tools, while at the same time saving platform costs and earning referral fees. Tips include the use of analytics tools such as Google Analytics to help webmasters understand their traffic and linguistic tools such as Google Translate to help websites reach a broader audience.
If you're a hoster and would like to participate in the Google Services for Websites program, please apply here. You'll have to integrate with the service APIs before these services can be made available to your customers, so the earlier you start that process, the better.

And if your hosting service doesn't have Google Services for Websites yet, send them to this page. Once they become a partner, you can quickly configure the services you want at your hoster's control panel (without having to come to Google).

As always, we'd love to get feedback on how the program is working for you, and what improvements you'd like to see.

Monday, August 10, 2009

Help test some next-generation infrastructure

Webmaster Level: All

To build a great web search engine, you need to:
  1. Crawl a large chunk of the web.
  2. Index the resulting pages and compute how reputable those pages are.
  3. Rank and return the most relevant pages for users' queries as quickly as possible.
For the last several months, a large team of Googlers has been working on a secret project: a next-generation architecture for Google's web search. It's the first step in a process that will let us push the envelope on size, indexing speed, accuracy, comprehensiveness and other dimensions. The new infrastructure sits "under the hood" of Google's search engine, which means that most users won't notice a difference in search results. But web developers and power searchers might notice a few differences, so we're opening up a web developer preview to collect feedback.

Some parts of this system aren't completely finished yet, so we'd welcome feedback on any issues you see. We invite you to visit the web developer preview of Google's new infrastructure at and try searches there.

Right now, we only want feedback on the differences between Google's current search results and our new system. We're also interested in higher-level feedback ("These types of sites seem to rank better or worse in the new system") in addition to "This specific site should or shouldn't rank for this query." Engineers will be reading the feedback, but we won't have the cycles to send replies.

Here's how to give us feedback: Do a search at and look on the search results page for a link at the bottom of the page that says "Dissatisfied? Help us improve." Click on that link, type your feedback in the text box and then include the word caffeine somewhere in the text box. Thanks in advance for your feedback!

Update on August 11, 2009: [ If you have language or country specific feedback on our new system's search results, we're happy to hear from you. It's a little more difficult to obtain these results from the sandbox URL, though, because you'll need manually alter the query parameters.

You can change these two values appropriately:
hl = language
gl = country code

German language in Germany: &hl=de&gl=de

Spanish language in Mexico: &hl=es&gl=mx

And please don't forget to add the word "caffeine" in the feedback text box. :) ]

Sunday, August 9, 2009

Optimize your crawling & indexing

Webmaster Level: Intermediate to Advanced

Many questions about website architecture, crawling and indexing, and even ranking issues can be boiled down to one central issue: How easy is it for search engines to crawl your site? We've spoken on this topic at a number of recent events, and below you'll find our presentation and some key takeaways on this topic.

The Internet is a big place; new content is being created all the time. Google has a finite number of resources, so when faced with the nearly-infinite quantity of content that's available online, Googlebot is only able to find and crawl a percentage of that content. Then, of the content we've crawled, we're only able to index a portion.

URLs are like the bridges between your website and a search engine's crawler: crawlers need to be able to find and cross those bridges (i.e., find and crawl your URLs) in order to get to your site's content. If your URLs are complicated or redundant, crawlers are going to spend time tracing and retracing their steps; if your URLs are organized and lead directly to distinct content, crawlers can spend their time accessing your content rather than crawling through empty pages, or crawling the same content over and over via different URLs.

In the slides above you can see some examples of what not to do—real-life examples (though names have been changed to protect the innocent) of homegrown URL hacks and encodings, parameters masquerading as part of the URL path, infinite crawl spaces, and more. You'll also find some recommendations for straightening out that labyrinth of URLs and helping crawlers find more of your content faster, including:
  • Remove user-specific details from URLs.
    URL parameters that don't change the content of the page—like session IDs or sort order—can be removed from the URL and put into a cookie. By putting this information in a cookie and 301 redirecting to a "clean" URL, you retain the information and reduce the number of URLs pointing to that same content.
  • Rein in infinite spaces.
    Do you have a calendar that links to an infinite number of past or future dates (each with their own unique URL)? Do you have paginated data that returns a status code of 200 when you add &page=3563 to the URL, even if there aren't that many pages of data? If so, you have an infinite crawl space on your website, and crawlers could be wasting their (and your!) bandwidth trying to crawl it all. Consider these tips for reining in infinite spaces.
  • Disallow actions Googlebot can't perform.
    Using your robots.txt file, you can disallow crawling of login pages, contact forms, shopping carts, and other pages whose sole functionality is something that a crawler can't perform. (Crawlers are notoriously cheap and shy, so they don't usually "Add to cart" or "Contact us.") This lets crawlers spend more of their time crawling content that they can actually do something with.
  • One man, one vote. One URL, one set of content.
    In an ideal world, there's a one-to-one pairing between URL and content: each URL leads to a unique piece of content, and each piece of content can only be accessed via one URL. The closer you can get to this ideal, the more streamlined your site will be for crawling and indexing. If your CMS or current site setup makes this difficult, you can use the rel=canonical element to indicate the preferred URL for a particular piece of content.

If you have further questions about optimizing your site for crawling and indexing, check out some of our previous writing on the subject, or stop by our Help Forum.

Thursday, August 6, 2009

How do you use Webmaster Tools? Share your stories and become a YouTube star!

Our greatest resource is the webmaster community, and here at Webmaster Central we're constantly impressed by the knowledge and expertise we see among webmasters: real-life SEOs, bloggers, online retailers, and all those other people creating great online content.
How do real-life webmasters actually use Webmaster Tools? We'd love to know, and we'd like to showcase some real-life examples for the rest of the community. Create a video telling your story, and upload it via the gadget in our Help Center. We'll highlight the best on our Webmaster Central YouTube channel, and even embed some in relevant Help Center articles (with full credit to you, of course).

To share your stories: Make a video, upload it to YouTube, then go to our Help Center, and submit your vid via our Help Center gadget. Our full guidelines give more information, but here is a summary of the key points:

  • Keep the video short; 3-5 minutes is ideal. Think small: a short video is a good way to showcase your use of - for example - Top Search Queries, but not long enough to highlight your whole SEO strategy.
  • Focus on a real-life example of how you used a particular feature. For example, you could show how you used link data to research your brand, or crawl errors to diagnose problems with your site structure. Do you have a great tip or recommendation?
  • Upload your video before September 30.
  • White hats are recommended. They show up better on screen.

Advanced Q&A from (the appropriately-named) SMX Advanced

Webmaster Level: Intermediate to Advanced

Earlier this summer SMX Advanced landed once again in our fair city—Seattle—and it was indeed advanced. I got a number of questions at some Q&A panels that I had to go back and do a little research on. Here, as promised, are answers:

Q. We hear that Google's now doing a better job of indexing Flash content. If I have a Flash file that pulls in content from an external file and the external file is blocked by robots.txt, will that content be indexed in the Flash (which is not blocked by robots.txt)? Or will Google not be able to index that content?

A. We won't be able to access that content if it's in a file that's disallowed by robots.txt; so even though that content would be visible to humans (via the Flash), search engine crawlers wouldn't be able to access it. For more details, see our blog post on indexing Flash that loads external resources.

Q. Sites that customize content based on user behavior or clickstream are becoming more common. If a user clicks through to my site from a search results page, can I customize the content of that page or redirect the user based on the terms in their search query? Or is that considered cloaking? For example, if someone searches for [vintage cameo pendants] but clicks on my site's general vintage jewelry page, can I redirect that user to my vintage cameo-specific page since I know that's what they were searching for?

A. If you're redirecting or returning different content to the user than what Googlebot would see on that URL (e.g., based on the referrer or query string), we consider that cloaking. If the searcher decided to click on the 'vintage jewelry' result, you should show them the page they clicked on even if you think a different page might be better. You can always link between related pages on your website (i.e., link to your 'vintage jewelry' page from your 'vintage cameos' page and vice versa, so that anyone landing on those pages from any source can cross-navigate); but we don't believe you should make that decision for the searcher.

Q. Even though it involves showing different content to different visitors, Google considers ethical website testing (such as A/B or multivariate testing) a legitimate practice that does not violate Google's guidelines. One reason for this is because, while search engines may only see the original content of the page and not the variations, there's also a percentage of human users who see that same content; so the technique doesn't specifically target search engines.

However, some testing services recommend running 100% of a site's traffic through the winning combination for awhile after an experiment has completed, to verify that conversion rates stay high. How does this fit in with Google's view of cloaking?

A. Running 100% of traffic through one combination for a brief period of time in order to verify your experiment's results is fine. However, as our article on this subject states, "if we find a site running a single non-original combination at 100% for a number of months... we may remove that site from our index." If you want to confirm the results of your experiment but are worried about "how long is too long," consider running a follow-up experiment in which you send most of your traffic through your winning combination while still sending a small percentage to the original page as a control. This is what Google recommends with its own testing tool, Website Optimizer.

Q. If the character encoding specified in a page's HTTP header is different from that specified in the <meta equiv="Content-Type"> tag, which one will Google pay attention to?

A. We take a look at both of these, and also do a bit of processing/guessing on our own based on the content of the page. Most major browsers prioritize the encoding specified in the HTTP header over that specified in the HTML, if both are valid but different. However, if you're aware that they're different, the best answer is to fix one of them!

Q. How does Google handle triple-byte UTF-8-encoded international characters in a URL (such as Chinese or Japanese characters)? These types of URLs break in some applications; is Google able to process them correctly? Does Google understand keywords that are encoded this way—that is, can you understand that is just as relevant to shoes as is?

A. We can correctly handle %-escaped UTF-8 characters in the URL path and in query parameters, and we understand keywords that are encoded in this way. For international characters in a domain name, we recommend using punycode rather than %-encoding, because some older browsers (such as IE6) don't support non-ASCII domain names.

Have a question of your own? Join our discussion forum.