Tuesday, August 31, 2010

Google now indexes SVG

Webmaster Level: All

You can now use Google search to find SVG documents. SVG is an open, XML-based format for vector graphics with support for interactive elements. We’re big fans of open standards, and our mission is to organize the world’s information, so indexing SVG is a natural step.

We index SVG content whether it is in a standalone file or embedded directly in HTML. The web is big, so it may take some time before we crawl and index most SVG files, but as of today you may start seeing them in your search results. If you want to see it yourself, try searching for [sitemap] or [HideShow]

If you host SVG files and you wish to exclude them from Google’s search results, you can use the “X-Robots-Tag: noindex” directive in the HTTP header.

Check out Webmaster Central for a full list of file types we support.

Friday, August 20, 2010

Showing more results from a domain

Webmaster Level: All

Today we’ve launched a change to our ranking algorithm that will make it much easier for users to find a large number of results from a single site. For queries that indicate a strong user interest in a particular domain, like [exhibitions at amnh], we’ll now show more results from the relevant site:

Prior to today’s change, only two results from would have appeared for this query. Now, we determine that the user is likely interested in the Museum of Natural History’s website, so seven results from the domain appear. Since the user is looking for exhibitions at the museum, it’s far more likely that they’ll find what they’re looking for, faster. The last few results for this query are from other sites, preserving some diversity in the results.

We’re always reassessing our ranking and user interface, making hundreds of changes each year. We expect today’s improvement will help users find deeper results from a single site, while still providing diversity on the results page.

Wednesday, August 18, 2010

Verification time savers —  Analytics included!

Webmaster Level: All

Nobody likes to duplicate effort. Unfortunately, sometimes it's a fact of life. If you want to use Google Analytics, you need to add a JavaScript tracking code to your pages. When you're ready to verify ownership of your site in other Google products (such as Webmaster Tools), you have to add a meta tag, HTML file or DNS record to your site. They're very similar tasks, but also completely independent. Until today.

You can now use a Google Analytics JavaScript snippet to verify ownership of your website. If you already have Google Analytics set up, verifying ownership is as simple as clicking a button.

This only works with the newer asynchronous Analytics JavaScript, so if you haven't migrated yet, now is a great time. If you haven't set up Google Analytics or verified yet, go ahead and set up Google Analytics first, then come verify ownership of your site. It'll save you a little time — who doesn't like that? Just as with all of our other verification methods, the Google Analytics JavaScript needs to stay in place on your site, or your verification will expire. You also need to remain an administrator on the Google Analytics account associated with the JavaScript snippet.

Don't forget that once you've verified ownership, you can add other verified owners quickly and easily through the Verification Details page. There's no need for each owner to manually verify ownership. More effort and time saved!

We've also introduced an improved interface for verification. The new verification page gives you more information about each verification method. In some cases, we can now provide detailed instructions about how to complete verification with your specific domain registrar or provider. If your provider is included, there's no need to dig through their documentation to figure out how to add a verification DNS record — we'll walk you through it.

The time you save using these new verification features might not be enough to let you take up a new hobby, but we hope it makes the verification process a little bit more pleasant. As always, please visit the Webmaster Help Forum if you have any questions.

Monday, August 16, 2010

To err is human, Video Sitemap feedback is divine!

Webmaster Level: All

You can now check your Video Sitemap for even more errors right in Webmaster Tools! It’s a new Labs feature to signal issues in your Video Sitemap such as:
  • URLs disallowed by robots.txt
  • Thumbnail size errors (160x120px is ideal. Anything smaller than 90x50 will be rejected.)

Video Sitemaps help us to better crawl and extract information about your videos, so we can appropriately feature them in search results.

Totally new to Video Sitemaps? Check out the Video Sitemaps center for more information. Otherwise, take a look at this new Labs feature in Webmaster Tools.

Sunday, August 15, 2010

Video Sitemaps: Understanding location tags

Webmaster Level: All

If you want to add video information to a Sitemap or mRSS feed you must specify the location of the video. This means you must include one of two tags, either the video:player_loc or video:content_loc. In the case of an mRSS feed, these equivalent tags are media:player or media:content, respectively. We need this information to verify that there is actually a live video on your landing page and to extract metadata and signals from the video bytes for ranking. If one of these tags is not included we will not be able to verify the video and your Sitemap/mRSS feed will not be crawled. To reduce confusion, here is some more detail about these elements.

Video Locations Defined

Player Location/URL: the player (e.g., .swf) URL with corresponding arguments that load and play the actual video.

Content Location/URL: the actual raw video bytes (e.g., .flv, .avi) containing the video content.

The Requirements

One of either the player video:player_loc or content video:content_loc location is required. However, we strongly suggest you provide both, as they each serve distinct purposes: player location is primarily used to help verify that a video exists on the page, and content location helps us extract more signals and metadata to accurately rank your videos.

URL extensions at a glance:

<loc><link>The playpage URL

<media:player> (url attribute)The SWF URL
<video:content_loc><media:content> (url attribute)The FLV or other raw video URL

NOTE: All URLs should be unique (every URL in your entire Video Sitemap and mRSS feed should be unique)

If you would like to better ensure that only Googlebot accesses your content, you can perform a reverse DNS lookup.

For more information on Google Videos please visit our Help Center, and to post questions and search for answers check out our Help Forum.

Friday, August 6, 2010

URL removals explained, part II: Removing sensitive text from a page

Webmaster level: All

Change can happen—sometimes, as we saw in our previous post on URL removals, you may completely block or remove a page from your site. Other times you might only change parts of a page, or remove certain pieces of text. Depending on how frequently a page is being crawled, it can take some time before these changes get reflected in our search results. In this blog post we'll look at the steps you can take if we're still showing old, removed content in our search results, either in the form of a "snippet" or on the cached page that's linked to from the search result. Doing this makes sense when the old content contains sensitive information that needs to be removed quickly—it's not necessary to do this when you just update a website normally.

As an example, let's look at the following fictitious search result:

Walter E. Coyote < Title
Chief Development Officer at Acme Corp 1948-2003: worked on the top secret velocitus incalculii capturing device which has shown potential ... < Snippet
... - Cached < URL + link to cached page

To change the content shown in the snippet (or on the linked cached page), you'll first need to change the content on the actual (live) page. Unless a page's publicly visible content is changed, Google's automatic processes will continue to show parts of the original content in our search results.

Once the page's content has been changed, there are several options available to make those changes visible in our search results:

  1. Wait for Googlebot to re-crawl and re-index the page
    This is the natural method for how most content is updated at Google. Sometimes it can take a fairly long time, depending on how frequently Googlebot currently crawls the page in question. Once we've re-crawled and re-indexed the page, the old content will usually not be visible as it'll be replaced by the current content. Provided Googlebot is not blocked from crawling the page in question (either by robots.txt or by not being able to access the server properly), you don't have to do anything special for this to take place. It's generally not possible to speed up crawling and indexing, as these processes are fully automated and depend on many external factors.
  2. Use Google's public URL removal tool to request removal of content that has been removed from someone else's webpage
    Using this tool, it's necessary to enter the exact URL of the page that has been modified, select the "Content has been removed from the page" option, and then specify one or more words that have been completely removed from that page.

    Note that none of the words you enter can appear on the page; even if a word has been removed from one part of the page, your request will be denied if that word still appears on another part of the page. Be sure to choose a word (or words) that no longer appear anywhere on the page. If, in the above example, you removed "top secret velocitus incalculii capturing device," you should submit those words and not something like "my project." However, if the word "top" or "device" still exists anywhere on the page, the request would be denied. To maximize your chances of success, it's often easiest to just enter one word that you're sure no longer appears anywhere on the page.

    Once your request has been processed and it's found that the submitted word(s) no longer appear on the page, the search result will no longer show a snippet, nor will the cached page be available. The title and the URL of the page will still be visible, and the entry may still appear in search results for searches related to the content that has been removed (such as searches for [velocitus incalculii]), even if those words no longer appear in the snippet. However, once the page has been re-crawled and re-indexed, the new snippet and cached page can be visible in our search results.

    Keep in mind that we will need to verify removal of the word(s) by viewing the page. If the page no longer exists and the server is returning a proper 404 or 410 HTTP result code, making us unable to view the page, you may be better off requesting removal of the page altogether.
  3. Use Google Webmaster Tools URL removal tool to request removal of information on a page from your website
    If you have access to the website in question and have verified ownership of it in Google Webmaster Tools, you can use the URL removal tool there (under Site Configuration > Crawler access) to request that the snippet and the cached page be removed until the page has been re-crawled. To use this tool, you only need to submit the exact URL of the page (you won't need to specify any removed words). Once your request has been processed, we'll remove the snippet and the cached page from search results. The title and the URL of the page will still be visible, and the page may also continue to rank in search results for queries related to content that has been removed. After the page has been re-crawled and re-indexed, the search result with an updated snippet and cached page (based on the new content) can be visible.

Google indexes and ranks items based not only on the content of a page, but also on other external factors, such as the inbound links to the URL. Because of this, it's possible for a URL to continue to appear in search results for content that no longer exists on the page, even after the page has been re-crawled and re-indexed. While the URL removal tool can remove the snippet and the cached page from a search result, it will not change or remove the title of the search result, change the URL that is shown, or prevent the page from being shown for searches based on any current or previous content. If this is important to you, you should make sure that the URL fulfills the requirements for a complete removal from our search results.

Removing non-HTML content

If the changed content is not in (X)HTML (for example if an image, a Flash file or a PDF file has been changed), you won't be able to use the cache removal tool. So if it's important that the old content no longer be visible in search results, the fastest solution would be to change the URL of the file so that the old URL returns a 404 HTTP result code and use the URL removal tool to remove the old URL. Otherwise, if you chose to allow Google to naturally refresh your information, know that previews of non-HTML content (such as Quick View links for PDF files) can take longer to update after recrawling than normal HTML pages would.

Proactively preventing the appearance of snippets or cached versions

As a webmaster, you have the option to use robots meta tags to proactively prevent the appearance of snippets or cached versions without using our removal tools. While we don't recommend this as a default approach (the snippet can help users recognize a relevant search result faster, and a cached page gives them the ability to view your content even in the unexpected event of your server not being available), you can use the "nosnippet" robots meta tag to prevent showing of a snippet, or the "noarchive" robots meta tag to disable caching of a page. Note that if this is changed on existing and known pages, Googlebot will need to re-crawl and re-index those pages before this change becomes visible in search results.

We hope this blog post helps to make some of the processes behind the URL removal tool for updated pages a bit clearer. In our next blog post we'll look at ways to request removal of content that you don't own; stay tuned!

As always, we welcome your feedback and questions in our Webmaster Help Forum.

Edit: Read the rest of this series:
Part I: Removing URLs & directories
Part III: Removing content you don't own
Part IV: Tracking requests, what not to remove

Companion post: Managing what information is available about you online