Thursday, December 22, 2011

Download search queries data using Python

Webmaster level: Advanced

For all the developers who have expressed interest in getting programmatic access to the search queries data for their sites in Webmaster Tools, we've got some good news. You can now get access to your search queries data in CSV format using a open source Python script from the webmaster-tools-downloads project. Search queries data is not currently available via the Webmaster Tools API, which has been a common API user request that we're considering for the next API update. For those of you who need access to search queries data right now, let's look at an example of how the search queries downloader Python script can be used to download your search queries data and upload it to a Google Spreadsheet in Google Docs.

Example usage of the search queries downloader Python script
1) If Python is not already installed on your machine, download and install Python.
2) Download and install the Google Data APIs Python Client Library.
3) Create a folder and add the script to the newly created folder.
4) Copy the script to the same folder as and edit it to replace the example values for “website,” “email” and “password” with valid values for your Webmaster Tools verified site.
5) Open a Terminal window and run the script by entering "python" at the Terminal window command line:
6) Visit Google Docs to see a new spreadsheet containing your search queries data.

If you just want to download your search queries data in a .csv file without uploading the data to a Google spreadsheet use instead of in the example above.

You could easily configure these scripts to be run daily or monthly to archive and view your search queries data across larger date ranges than the current one month of data that is available in Webmaster Tools, for example, by setting up a cron job or using Windows Task Scheduler.

An important point to note is that this script example includes user name and password credentials within the script itself. If you plan to run this in a production environment you should follow security best practices like using encrypted user credentials retrieved from a secure data storage source. The script itself uses HTTPS to communicate with the API to protect these credentials.

Take a look at the search queries downloader script and start using search queries data in your own scripts or tools. Let us know if you have questions or feedback in the Webmaster Help Forum.

Tuesday, December 20, 2011

Website user research and testing on the cheap

Webmaster level: Intermediate

As the team responsible for tens of thousands of Google’s informational web pages, the Webmaster Team is here to offer tips and advice based on their experiences as hands-on webmasters.

If you’ve never tested or analyzed usage of your website, ask yourself if you really know whether your site is useful for your target audience. If you’re unsure, why not find out? For example, did you know that on average users scroll down 5.9 times as often as they scroll up, meaning that often once page content is scrolled past, it is “lost?” (See Jakob Nielsen’s findings on scrolling, where he advises that users don’t mind scrolling, but within limits.)

Also, check your analytics—are you curious about high bounce rates from any of your pages, or very short time-on-page metrics?

First, think about your user

The start of a web project—whether it’s completely new or a revamp of an existing site—is a great time to ask questions like:

  • How might users access your site—home, office, on-the-go?
  • How tech-savvy are your visitors?
  • How familiar are users with the subject matter of your website?

The answers to some of these questions can be valuable when making initial design decisions.

For instance, if the user is likely to be on the road, they might be short on time to find the information they need from your site, or be in a distracting environment and have a slow data connection—so a simple layout with single purpose would work best. Additionally, if you’re providing content for a less technical audience, make sure it’s not too difficult to access content—animation might provide a “wow” factor, but only if your user appreciates it and it’s not too difficult to get to the content.

Even without testing, building a basic user profile (or “persona”) can help shape your designs for the benefit of the user—this doesn’t have to be an exhaustive biography, but just some basic considerations of your user’s behavior patterns.

Simple testing

Testing doesn’t have to be a costly operation – friends and family can be a great resource. Some pointers:

  • Sample size: Just five people can be a large enough number of users to find common problems in your layouts and navigation (see Jakob Nielsen’s article on why using a small sample size is sufficient).
  • Choosing your testers: A range of different technical ability can be useful, but be sure to only focus on trends—for example, if more than 50% of your testers have the same usability issue, it’s likely a real problem—rather than individual issues encountered.
  • Testing location: If possible, visit the user in their home and watch how they use the site—observe how he/she normally navigates the web when relaxed and in their natural environment. Remote testing is also a possibility if you can’t make it in person—we’ve heard that Google+ hangouts can be used effectively for this (find out more about using Google+ hangouts).
  • How to test: Based on your site’s goals, define 4 or 5 simple tasks to do on your website, and let the user try to complete the tasks. Ask your testers to speak aloud so you can better understand their experiences and thought processes.
  • What to test: Basic prototypes in clickable image or document format (for example, PDF) or HTML can be used to test the basic interactions, without having to build out a full site for testing. This way, you can test out different options for navigation and layouts to see how they perform before implementing them.
  • What not to test: Focus on functionality rather than graphic design elements; viewpoints are often subjective. You would only get useful feedback on design from quantitative testing with large (200+) numbers of users (unless, for example, the colors you use on your site make the content unreadable, which would be good feedback!). One format for getting some useful feedback on the design can be to offer 5-6 descriptive keywords and ask your user to choose the most representative ones.
Overall, basic testing is most useful for seeing how your website’s functionality is working—the ease of finding information and common site interactions.

Lessons learned

In case you’re still wondering whether it’s really worth research and testing, here are a few simple things we confirmed from actual users that we wouldn’t have known if we hadn’t sat with actual users and watched them use our pages, or analyzed our web traffic.

  • Take care when using layouts that hide/show content: We found when using scripts to expand and collapse long text passages, the user often didn’t realize the extra content was available—effectively “hiding” the JavaScript-rendered content when the user searches within the page (for example, using Control + F, which we’ve seen often).

    Wireframe of layout tested, showing “zipped”
    content on the bottom left

    Final page design showing anchor links in the top
    and content laid out in the main body of the page

  • Check your language: Headings, link and button text are what catches the user’s eye the most when scanning the page. Avoid using “Learn more…” in link text—users seem averse to clicking on a link which implies they will need to learn something. Instead, just try to use a literal description of what content the user will get behind the link—and make sure link text makes sense and is easy to understand out of context, because that is often how it will be scanned. Be mindful about language and try to make button text descriptive, inviting and interesting.
  • Test pages on a slower connection: Try out your pages using different networks (for example, try browsing your website using the wifi at your local coffee shop or a friend’s house), especially if your target users are likely to be viewing your pages from a home connection that’s not as fast as your office network. We found a considerable improvement in CTR and time-on-site metrics in some cases when we made scripted animations much simpler and faster (hint: use Google’s Page Speed Online to check performance if you don’t have access to a slower Internet connection).
So if you’re caught up in a seemingly never-ending redevelopment cycle, save yourself some time in the future by investing a little up front through user profiling and basic testing, so that you’re more likely to choose the right approach for your site layout and architecture.

We’d love to hear from you in the comments: have you tried out website usability testing? If so, how did you get on, and what are your favorite simple and low-cost tricks to get the most out of it?

Friday, December 16, 2011

Rich Snippets Instructional Videos

Webmaster level: All

When users come to Google, they have a pretty good idea of what they’re looking for, but they need help deciding which result might have the information that best suits their needs. So, the challenge for Google is to make it very clear to our users what content exists on a page in both a useful and concise manner. That’s why we have rich snippets.

Essentially, rich snippets provide you with the ability to help Google highlight aspects of your page. Whether your site contains information about products, recipes, events or apps, a few simple additions to your markup can result in more engagement with your content -- and potentially more traffic to your site.

To help you get started or fine tune your rich snippets, we’ve put together a series of tutorial videos for webmasters of all experience levels. These videos provide guidance as you mark up your site so that Google is better able to understand your content. We can use that content to power the rich snippets we display for your pages. Check out the videos below to get started:

For more information on how to use rich snippets markup for your site, visit our Help Center.

Thursday, December 15, 2011

Introducing smartphone Googlebot-Mobile

Webmaster level: All

With the number of smartphone users rapidly rising, we’re seeing more and more websites providing content specifically designed to be browsed on smartphones. Today we are happy to announce that Googlebot-Mobile now crawls with a smartphone user-agent in addition to its previous feature phone user-agents. This is to increase our coverage of smartphone content and to provide a better search experience for smartphone users.

Here are the main user-agent strings that Googlebot-Mobile now uses:

  • Feature phones Googlebot-Mobile:

    • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/ (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +
    • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +
  • Smartphone Googlebot-Mobile:

    • Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +

The content crawled by smartphone Googlebot-Mobile will be used primarily to improve the user experience on mobile search. For example, the new crawler may discover content specifically optimized to be browsed on smartphones as well as smartphone-specific redirects.

One new feature we’re also launching that uses these signals is Skip Redirect for Smartphone-Optimized Pages. When we discover a URL in our search results that redirects smartphone users to another URL serving smartphone-optimized content, we change the link target shown in the search results to point directly to the final destination URL. This removes the extra latency the redirect introduces leading to a saving of 0.5-1 seconds on average when visiting landing page for such search results.

Since all Googlebot-Mobile user-agents identify themselves as a specific kind of mobile, please treat each Googlebot-Mobile request as you would a human user with the same phone user-agent. This, and other guidelines are described in our previous blog post and they still apply, except for those referring to smartphones which we are updating today. If your site has treated Googlebot-Mobile specially based on the fact that it only crawls with feature phone user-agents, we strongly recommend reviewing this policy and serving the appropriate content based on the Googlebot-Mobile’s user-agent, so that both your feature phone and smartphone content will be indexed properly.

If you have more questions, please ask on our Webmaster Help forums.

Wednesday, December 14, 2011

Clicks and impressions for authors

Webmaster Level: All
(Cross-posted on the Inside Search Blog)

With the latest improvements to the way authorship annotations look in search and the addition of authorship to Google News, authors have been really excited about getting more visibility, and users benefit from seeing the name, photo, and way to connect with the person who created the content.

Authors have also been giving us a lot of feedback on what else they'd like to see, so today we're introducing “Author Stats” in Webmaster Tools that shows you how often your content is showing up on the Google search results page. If you associate your content with your Google Profile either via e-mail verification or a simple link, you can visit Webmaster Tools to see how many impressions and clicks your content got on the Google search results page. Check out what Matt Cutts would see for his content:

To see your information, go to and login with the same username you use for your Google+ Profile. On the left hand panel, you can see “Author Stats” under the “Labs” section. This is an experimental feature so we’re continuing to iterate and improve, but we wanted to get early feedback from you. You can e-mail us at if you run into any issues or have feedback.

If you’re a content creator interested in learning more about authorship, check out our Help Center.

Tuesday, December 6, 2011

Tips for hosting providers and webmasters

Webmaster level: All

Some webmasters on our forums ask about hosting-related issues affecting their sites. To help both hosting providers and webmasters recognize, diagnose, and fix such problems, we’d like to share with you some of the common problems we’ve seen and suggest how you can fix them.

  • Blocking of Googlebot crawling. This is a very common issue usually due to a misconfiguration in a firewall or DoS protection system and sometimes due to the content management system the site runs. Protection systems are an important part of good hosting and are often configured to block unusually high levels of server requests, sometimes automatically. Because, however, Googlebot often performs more requests than a human user, these protection systems may decide to block Googlebot and prevent it from crawling your website. To check for this kind of problem, use the Fetch as Googlebot function in Webmaster Tools, and check for other crawl errors shown in Webmaster Tools.

    We offer several tools to webmasters and hosting providers who want more control over Googlebot’s crawling, and to improve crawling efficiency:

    We have more information in our crawling and indexing FAQ.

  • Availability issues. A related type of problem we see is websites being unavailable when Googlebot (and users) attempt to access the site. This includes DNS issues, overloaded servers leading to timeouts and refused connections, misconfigured content distribution networks (CDNs), and many other kinds of errors. When Googlebot encounters such issues, we report them in Webmaster Tools as either URL unreachable errors or crawl errors.

  • Invalid SSL certificates. For SSL certificates to be valid for your website, they need to match the name of the site. Common problems include expired SSL certificates and servers misconfigured such that all websites on that server use the same certificate. Most web browsers will try warn users in these situations, and Google tries to alert webmasters of this issue by sending a message via Webmaster Tools. The fix for these problems is to make sure to use SSL certificates that are valid for all your website’s domains and subdomains your users will interact with.

  • Wildcard DNS. Websites can be configured to respond to all subdomain requests. For example, the website at can be configured to respond to requests to, and all other subdomains.

    In some cases this is desirable to have; for example, a user-generated content website may choose to give each account its own subdomain. However, in some cases, the webmaster may not wish to have this behavior as it may cause content to be duplicated unnecessarily across different hostnames and it may also affect Googlebot’s crawling.

    To minimize problems in wildcard DNS setups, either configure your website to not use them, or configure your server to not respond successfully to non-existent hostnames, either by refusing the connection or by returning an HTTP 404 header.

  • Misconfigured virtual hosting. The symptom of this problem is that multiple hosts and/or domain names hosted on the same server always return the contents of only one site. To rephrase, although the server hosts multiple sites, it returns only one site regardless of what is being requested. To diagnose the issue, you need to check that the server responds correctly to the Host HTTP header.

  • Content duplication through hosting-specific URLs. Many hosts helpfully offer URLs for your website for testing/development purposes. For example, if you’re hosting the website on the hosting provider, the host may offer access to your site through a URL like or Our recommendation is to have these hosting-specific URLs not publicly accessible (by password protecting them); and even if these URLs are accessible, our algorithms usually pick the URL webmasters intend. If our algorithms select the hosting-specific URLs, you can influence our algorithms to pick your preferred URLs by implementing canonicalization techniques correctly.

  • Soft error pages. Some hosting providers show error pages using an HTTP 200 status code (meaning “Success”) instead of an HTTP error status code. For example, a “Page not found” error page could return HTTP 200 instead of 404, making it a soft 404 page; or a “Website temporarily unavailable” message might return a 200 instead of correctly returning a 503 HTTP status code. We try hard to detect soft error pages, but when our algorithms fail to detect a web host’s soft error pages, these pages may get indexed with the error content. This may cause ranking or cross-domain URL selection issues.

    It’s easy to check the status code returned: simply check the HTTP headers the server returns using any one of a number of tools, such as Fetch as Googlebot. If an error page is returning HTTP 200, change the configuration to return the correct HTTP error status code. Also, keep an eye out for soft 404 reports in Webmaster Tools, on the Crawl errors page in the Diagnostics section.

  • Content modification and frames. Webmasters may be surprised to see their page contents modified by hosting providers, typically by injecting scripts or images into the page. Web hosts may also serve your content by embedding it in other pages using frames or iframes. To check whether a web host is changing your content in unexpected ways, simply check the source code of the page as served by the host and compare it to the code you uploaded.

    Note that some server-side code modifications may be very useful. For example, a server using Google’s mod_pagespeed Apache module or other tools may be returning your code minified for page speed optimization.

  • Spam and malware. We’ve seen some web hosts and bulk subdomain services become major sources of malware and spam. We try hard to be granular in our actions when protecting our users and search quality, but if we see a very large fraction of sites on a specific web host that are spammy or are distributing malware, we may be forced to take action on the web host as a whole. To help you keep on top of malware, we offer:

We hope this list helps both hosting providers and webmasters diagnose and fix these issues. Beyond this list, also think about the qualitative aspects of hosting like quality of service and the helpfulness of support. As always, if you have questions or need more help, please ask in our Webmaster Help Forum.

Monday, December 5, 2011

New markup for multilingual content

Many websites serve users from around the world. There are different approaches to serving content appropriate to your users' language and/or region. Last year, we launched support for explicit annotations for web pages rendering the same content with different language templates.
Today we're going further with our support for multilingual content with improved handling for these two scenarios:
  • Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price
  • Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French

Specifying language and location

We've expanded our support of the rel="alternate" hreflang link element to handle content that is translated or provided for multiple geographic regions. The hreflang attribute can specify the language, optionally the country, and URLs of equivalent content. By specifying these alternate URLs, our goal is to be able to consolidate signals for these pages, and to serve the appropriate URL to users in search. Alternative URLs can be on the same site or on another domain.

Annotating pages as substantially similar content

Optionally, for pages that have substantially the same content in the same language and are targeted at multiple countries, you may use the rel="canonical" link element to specify your preferred version. We’ll use that signal to focus on that version in search, while showing the local URLs to users where appropriate. For example, you could use this if you have the same product page in German, but want to target it separately to users searching on the Google properties for Germany, Austria, and Switzerland.
Update: to simplify implementation, we no longer recommend using rel=canonical.

Example usage

To explain how it works, let’s look at some example URLs:
  • - contains the general homepage of a website, in Spanish
  • - is the version for users in Spain, in Spanish
  • - is the version for users in Mexico, in Spanish
  • - is the generic English language version
On all of these pages, we could use the following markup to specify language and optionally the region:

<link rel="alternate" hreflang="es" href="" />
<link rel="alternate" hreflang="es-ES" href="" />
<link rel="alternate" hreflang="es-MX" href="" />
<link rel="alternate" hreflang="en" href="" />

If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.

More help

As always, if you need more help correctly implementing multiregional and multilingual websites, please see our Help Center article about this topic, or ask in our Webmaster Help Forum.