Tuesday, June 29, 2010

Sitemaps: One file, many content types

Webmaster Level: All

Have you ever wanted to submit your various content types (video, images, etc.) in one Sitemap? Now you can! If your site contains videos, images, mobile URLs, code or geo information, you can now create—and submit—a Sitemap with all the information.

Site owners have been leveraging Sitemaps to let Google know about their sites’ content since Sitemaps were first introduced in 2005. Since that time additional specialized Sitemap formats have been introduced to better accommodate video, images, mobile, code or geographic content. With the increasing number of specialized formats, we’d like to make it easier for you by supporting Sitemaps that can include multiple content types in the same file.

The structure of a Sitemap with multiple content types is similar to a standard Sitemap, with the additional ability to contain URLs referencing different content types. Here's an example of a Sitemap that contains a reference to a standard web page for Web search, image content for Image search and a video reference to be included in Video search:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns=""
<video:title>Grilling tofu for summer</video:title>

Here's an example of what you'll see in Webmaster Tools when a Sitemap containing multiple content types is submitted:

We hope the capability to include multiple content types in one Sitemap simplifies your Sitemap submission. The rest of the Sitemap rules, like 50,000 max URLs in one file and the 10MB uncompressed file size limit, still apply. If you have questions or other feedback, please visit the Webmaster Help Forum.

Monday, June 21, 2010

Quality links to your site

A popular question on our Webmaster Help Forum is in regard to best practices for organic link building. There seems to be some confusion, especially among less experienced webmasters, on how to approach the topic. Different perspectives have been shared, and we would also like to explain our viewpoint on earning quality links.

If your site is rather new and still unknown, a good way marketing technique is to get involved in the community around your topic. Interact and contribute on forums and blogs. Just keep in mind to contribute in a positive way, rather than spamming or soliciting for your site. Just building a reputation can drive people to your site. And they will keep on visiting it and linking to it. If you offer long-lasting, unique and compelling content -- something that lets your expertise shine -- people will want to recommend it to others. Great content can serve this purpose as much as providing useful tools.

A promising way to create value for your target group and earn great links is to think of issues or problems your users might encounter. Visitors are likely to appreciate your site and link to it if you publish a short tutorial or a video providing a solution, or a practical tool. Survey or original research results can serve the same purpose, if they turn out to be useful for the target audience. Both methods grow your credibility in the community and increase visibility. This can help you gain lasting, merit-based links and loyal followers who generate direct traffic and "spread the word." Offering a number of solutions for different problems could evolve into a blog which can continuously affect the site's reputation in a positive way.

Humor can be another way to gain both great links and get people to talk about your site. With Google Buzz and other social media services constantly growing, entertaining content is being shared now more than ever. We've seen all kinds of amusing content, from ASCII art embedded in a site's source code to funny downtime messages used as a viral marketing technique to increase the visibility of a site. However, we do not recommend counting only on short-lived link-bait tactics. Their appeal wears off quickly and as powerful as marketing stunts can be, you shouldn't rely on them as a long-term strategy or as your only marketing effort.

It's important to clarify that any legitimate link building strategy is a long-term effort. There are those who advocate for short-lived, often spammy methods, but these are not advisable if you care for your site's reputation. Buying PageRank-passing links or randomly exchanging links are the worst ways of attempting to gather links and they're likely to have no positive impact on your site's performance over time. If your site's visibility in the Google index is important to you it's best to avoid them.

Directory entries are often mentioned as another way to promote young sites in the Google index. There are great, topical directories that add value to the Internet. But there are not many of them in proportion to those of lower quality. If you decide to submit your site to a directory, make sure it's on topic, moderated, and well structured. Mass submissions, which are sometimes offered as a quick work-around SEO method, are mostly useless and not likely to serve your purposes.

It can be a good idea to take a look at similar sites in other markets and identify the elements of those sites that might work well for yours, too. However, it's important not to just copy success stories but to adapt them, so that they provide unique value for your visitors.

Social bookmarks on YouTube enable users to share content easily

Finally, consider making linking to your site easier for less tech savvy users. Similar to the way we do it on YouTube, offering bookmarking services for social sites like Twitter or Facebook can help spread the word about the great content on your site and draw users' attention.

As usual, we'd like to hear your opinion. You're welcome to comment here in the blog, or join our Webmaster Help Forum community.

Friday, June 11, 2010

Google Videos best practices

Webmaster Level: All

We'd like to highlight three best practices that address some of the most common problems found when crawling and indexing video content. These best practices include ensuring your video URLs are crawlable, stating what countries your videos may be played in, and that if your videos are removed, you clearly indicate this state to search engines.

  • Best Practice 1: Verify your video URLs are crawlable: check your robots.txt
    • Sometimes publishers unknowingly include video URLs in their Sitemap that are robots.txt disallowed. Please make sure your robots.txt file isn't blocking any of the URLs specified in your Sitemap. This includes URLs for the:
      • Playpage
      • Content and player
      • Thumbnail
      More information about robots.txt.

  • Best Practice 2: Tell us what countries the video may be played in
    • Is your video only available in some locales? The optional attribute “restriction” has recently been added (documentation at, which you can use to tell us whether the video can only be played in certain territories. Using this tag, you have the option of either including a list of all countries where it can be played, or just telling us the countries where it can't be played. If your videos can be played everywhere, then you don't need to include this.

  • Best Practice 3: Indicate clearly when videos are removed -- protect the user experience
    • Sometimes publishers take videos down but don't signal to search engines that they've done so. This can result in the search engine's index not accurately reflecting content of the web. Then when users click on a search result, they're taken to a page either indicating that the video doesn't exist, or to a different video. Users find this experience dissatisfying. Although we have mechanisms to detect when search results are no longer available, we strongly encourage following community standards.

      To signal that a video has been removed,
      1. Return a 404 (Not found) HTTP response code, you can still return a helpful page to be displayed to your users. Check out these guidelines for creating useful 404 pages.
      2. Indicate expiration dates for each video listed in a Video Sitemap (use the <video:expiration_date> element) or mRSS feed (<dcterms:valid> tag) submitted to Google.
For more information on Google Videos please visit our Help Center, and to post questions and search answers check out our Help Forum.

Tuesday, June 8, 2010

Our new search index: Caffeine

(Cross-posted on the Official Google Blog)

Today, we're announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.

Some background for those of you who don't build search engines for a living like us: when you search Google, you're not searching the live web. Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here's a good explanation of how it all works.)

So why did we build a new search indexing system? Content on the web is blossoming. It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people's expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

To keep up with the evolution of the web and to meet rising user expectations, we've built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:
Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before — no matter when or where it was published.

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

We've built Caffeine with the future in mind. Not only is it fresher, it's a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.

Monday, June 7, 2010

Crawl Errors now reports soft 404s

Webmaster Level: All

Today we’re releasing a feature to help you discover if your site serves undesirable "soft” or “crypto” 404s. A "soft 404" occurs when a webserver responds with a 200 OK HTTP response code for a page that doesn't exist rather than the appropriate 404 Not Found. Soft 404s can limit a site's crawl coverage by search engines because these duplicate URLs may be crawled instead of pages with unique content.

The web is infinite, but the time search engines spend crawling your site is limited. Properly reporting non-existent pages with a 404 or 410 response code can improve the crawl coverage of your site’s best content. Additionally, soft 404s can potentially be confusing for your site's visitors as described in our past blog post, Farewell to Soft 404s.    

You can find the new soft 404s reporting feature under the Crawl errors section in Webmaster Tools.

Here’s a list of steps to correct soft 404s to help both Google and your users:
  1. Check whether you have soft 404s listed in Webmaster Tools
  2. For the soft 404s, determine whether the URL:
    1. Contains the correct content and properly returns a 200 response (not actually a soft 404)
    2. Should 301 redirect to a more accurate URL
    3. Doesn’t exist and should return a 404 or 410 response
  3. Confirm that you’ve configured the proper HTTP Response by using Fetch as Googlebot in Webmaster Tools
  4. If you now return 404s, you may want to customize your 404 page to aid your users. Our custom 404 widget can help.

We hope that you’re now better enabled to find and correct soft 404s on your site. If you have feedback or questions about the new "soft 404s" reporting feature or any other Webmaster Tools feature, please share your thoughts with us in the Webmaster Help Forum.

Tuesday, June 1, 2010

Grab bag videos are back!

We’re kicking off June with the start of a new round of webmaster Q&A on the Webmaster Central YouTube channel. You submitted and voted on questions for Matt Cutts to answer, and Matt sat in the studio for a full day sharing advice for webmasters.

For those of you who watch each video (and who doesn’t?), we’ve worked hard to keep things interesting. Not only did Matt wear different colored shirts, we changed the backgrounds as well! Just don’t submit any screen grabs to We Have Lasers, okay?

To get you started, here’s the first video, which addresses a question about geographic targeting in Webmaster Tools:

We’ll be posting links to new videos as they’re posted on our Twitter account, so follow us there or subscribe to our YouTube channel to be notified of new answers.