Pages

Friday, October 31, 2008

Spookier than malware


hotdog

lion king
...and infinitely more fun: webmasters and their pets incognito! Happy Halloween, everyone! If you see any costumes that would pass the SafeSearch filter :), feel like sharing a gripe or telling a good story, please join the chat!

Take care, and don't forget to brush your teeth.
 Yours scarily,
  The Webmaster Central Team


Our glasses-wearing, no vampire-teeth vampire (Ryan), zoombie Mur, Holiday Fail (Tiffany Lane), Colbert Hipster (Dan Vanderkam), Rick Astley Cutts, Homeboy Ben D'Angelo, Me -- pinker & poofier, Investment Bank CEO Shyam Jayaraman (though you can't see the golden parachute in his backpack)



Chark as Juno, Wysz as Beah Burger (our co-worker), Adi and Matt Dougherty as yellow ninja, red ninja!


Heroes come in all shapes and sizes...

Powdered toast man, Mike Leotta

Adam Lasnik as, let me see if I get this right, a "secret service agent masquerading as a backstage tech" :)

Wednesday, October 29, 2008

Reflections on the "Tricks and Treats" webmaster event

What featured over 750 webmasters and a large number of Googlers from around the world, hundreds of questions, and over one hundred answers over the course of nearly two hours?  If you guessed "the Tricks and Treats webmaster event from this earlier this month!" well, you're either absolutely brilliant, you read the title of this post, or both!

How did it go?
It was an exhilarating, exhausting, and educational event, if we may say so ourselves, even though there were a few snafus.  We're aware that the sound quality wasn't great for some folks, and we've also appreciated quite-helpful constructive criticisms in this feedback thread.  Last but not least, we are bummed to admit that someone (whose name starts with 'A' and ends with 'M') uncharacteristically forgot to hit the record button (really!), so there's unfortunately no audio recording to share :-(.

But on more positive notes, we're delighted that so many of you enjoyed our presentations (embedded below), our many answers, and even some of our bad jokes (mercifully not to be repeated).

What next?
Well, for starters, all of us Webmaster Central Googlers will be spending quite some time taking in your feedback.  Some of you have requested sessions exclusively covering particular (pre-announced) topics or tailored to specific experience levels, and we've also heard from many webmasters outside of the U.S. who would love online events in other languages and at more convenient times.  No promises, but you can bet we're eager to please!  Stay tuned on this blog (and, as a hint and hallo to our German-speaking webmasters, do make sure to follow our German webmaster blog  ;-).  

And finally, a big thank you!
A heartfelt thank you to my fellow Googlers, many of whom got up at the crack of dawn to get to the office early for the chat and previous day's runthrough or stayed at work late in Europe.  But more importantly, major props to all of you (from New Delhi, New York, New Zealand and older places) who asked great questions and hung out with us online for up to two hours.  You webmasters are the reason we love coming to work each day, and we look forward to our next chat!

*  *  *

The presentations...
We had presentations from John, Jonathan, Maile, and Wysz.  Presentations from the first three are embedded below (Wysz didn't have a written presentation this time).


John's slides on "Frightening Webmastering Myths"


Jonathan's slides on "Using the Not Found errors report in Webmaster Tools"


Maile's slides on "Where We're Coming From"


Edited on Wednesday, October 29 at 6:00pm to update number of participants

Friday, October 24, 2008

Malware? We don't need no stinking malware!

(Cross-posted from the Google Online Security Blog.)

"This site may harm your computer"
You may have seen those words in Google search results — but what do they mean? If you click the search result link you get another warning page instead of the website you were expecting. But if the web page was your grandmother's baking blog, you're still confused. Surely your grandmother hasn't been secretly honing her l33t computer hacking skills at night school. Google must have made a mistake and your grandmother's web page is just fine...

I work with the team that helps put the warning in Google's search results, so let me try to explain. The good news is that your grandmother is still kind and loves turtles. She isn't trying to start a botnet or steal credit card numbers. The bad news is that her website or the server that it runs on probably has a security vulnerability, most likely from some out-of-date software. That vulnerability has been exploited and malicious code has been added to your grandmother's website. It's most likely an invisible script or iframe that pulls content from another website that tries to attack any computer that views the page. If the attack succeeds, then viruses, spyware, key loggers, botnets, and other nasty stuff will get installed.

If you see the warning on a site in Google's search results, it's a good idea to pay attention to it. Google has automatic scanners that are constantly looking for these sorts of web pages. I help build the scanners and continue to be surprised by how accurate they are. There is almost certainly something wrong with the website even if it is run by someone you trust. The automatic scanners make unbiased decisions based on the malicious content of the pages, not the reputation of the webmaster.

Servers are just like your home computer and need constant updating. There are lots of tools that make building a website easy, but each one adds some risk of being exploited. Even if you're diligent and keep all your website components updated, your web host may not be. They control your website's server and may not have installed the most recent OS patches. And it's not just innocent grandmothers that this happens to. There have been warnings on the websites of banks, sports teams, and corporate and government websites.

Uh-oh... I need help!
Now that we understand what the malware label means in search results, what do you do if you're a webmaster and Google's scanners have found malware on your site?

There are some resources to help clean things up. The Google Webmaster Central blog has some tips and a quick security checklist for webmasters. Stopbadware.org has great information, and their forums have a number of helpful and knowledgeable volunteers who may be able to help (sometimes I'm one of them). You can also use the Google SafeBrowsing diagnostics page for your site (http://www.google.com/safebrowsing/diagnostic?site=<site-name-here>) to see specific information about what Google's automatic scanners have found. If your site has been flagged, Google's Webmaster Tools lists some of the URLs that were scanned and found to be infected.

Once you've cleaned up your website, use Google's Webmaster Tools to request a malware review. The automatic systems will rescan your website and the warning will be removed if the malware is gone.

Advance warning
I often hear webmasters asking Google for advance warning before a malware label is put on their website. When the label is applied, Google usually emails the website owners and then posts a warning in Google's Webmaster Tools. But no warning is given ahead of time - before the label is applied - so a webmaster can't quickly clean up the site before a warning is applied.

But, look at the situation from the user's point of view. As a user, I'd be pretty annoyed if Google sent me to a site it knew was dangerous. Even a short delay would expose some users to that risk, and it doesn't seem justified. I know it's frustrating for a webmaster to see a malware label on their website. But, ultimately, protecting users against malware makes the internet a safer place and everyone benefits, both webmasters and users.

Google's Webmaster Tools has started a test to provide warnings to webmasters that their server software may be vulnerable. Responding to that warning and updating server software can prevent your website from being compromised with malware. The best way to avoid a malware label is to never have any malware on the site!

Reviews
You can request a review via Google's Webmaster Tools and you can see the status of the review there. If you think the review is taking too long, make sure to check the status. Finding all the malware on a site is difficult and the automated scanners are far more accurate than humans. The scanners may have found something you've missed and the review may have failed. If your site has a malware label, Google's Webmaster Tools will also list some sample URLs that have problems. This is not a full list of all of the problem URLs (because that's often very, very long), but it should get you started.

Finally, don't confuse a malware review with a request for reconsideration. If Google's automated scanners find malware on your website, the site will usually not be removed from search results. There is also a different process that removes spammy websites from Google search results. If that's happened and you disagree with Google, you should submit a reconsideration request. But if your site has a malware label, a reconsideration request won't do any good — for malware you need to file a malware review from the Overview page.

How long will a review take?
Webmasters are eager to have a Google malware label removed from their site and often ask how long a review of the site will take. Both the original scanning and the review process are fully automated. The systems analyze large portions of the internet, which is big place, so the review may not happen immediately. Ideally, the label will be removed within a few hours. At its longest, the process should take a day or so.

Tuesday, October 21, 2008

Webmaster chat event: Vote early and often!


No matter where in the world you are, you can vote right now on webmaster-oriented questions by registering for our free Webmaster chat  ("Tricks and Treats") which is scheduled for tomorrow at 9am PDT (5pm GMT).  Even better: you can suggest your own questions that you'd like Webmaster Central Googlers to answer.


We're using the new Google Moderator tool, so posting questions and voting on your favorites is fun and easy; you'll receive an e-mail with a link to the webmaster chat questions right after you register.  Click on the check mark next to questions you find particularly interesting and important. Click on the X next to questions that seem less relevant or useful.  From your votes, Google Moderator will surface the best questions, helping us spend more time in the chat on issues you really care about.

Feel free to review our post from yesterday for more details on this event.

See you there!


P.S. - Speaking of voting:  If you're an American citizen, we hope you're also participating in the upcoming presidential election! Our friends in Google Maps have even prepared a handy lookup tool to help you find your voting place -- check it out!



Monday, October 20, 2008

Join us for our third live online webmaster chat!


You know how some myths just won't die?  Well, do we have some great news for you!  A not-so-scary bunch of Gooooooooooooglers will be on hand to drive a stake through the most ghoulish webmastering myths and misconceptions in our live online "Tricks and Treats" chat this coming Wednesday.

That's right!  You'll be treated to some brief presentations and then have the chance to ask lots of questions to Googlers ranging from Matt Cutts in Mountain View to John Mueller in Zurich to Kaspar Szymanski in Dublin (and many more folks as well).


Here's what you'll need
  • About an hour of free time
  • A computer with audio capabilities that is connected to the Internet and has these additional specifications
    (We'll be broadcasting via the Internet tubes this time rather than over the phone lines)
  • A URL for the chat, which you can only get when you register for the event (don't worry -- it's fast and painless!)
  • Costumes: optional

What will our Tricks and Treats chat include?
  • INTRO:  A quick hello from some of your favorite Help Group Guides
  • PRESO:  A 15 minute presentation on "Frightening Myths and Misconceptions" by John Mueller
  • FAQs:  A return of our popular "Three for Three," in which we'll have three different Googlers tackling three different issues we've seen come up in the Group recently... in under three minutes each!
  • And lots of Q&A!  You'll have a chance to type questions during the entire session (actually, starting an hour prior!) using our hunky-dory new Google Moderator tool.  Ask, then vote!  With this tool and your insights, we expect the most interesting questions to quickly float to the top.

When and how can you join in?
  1. Mark the date on your calendar now:  Wednesday, October 22, at 9am PDT, noon EDT, and 5pm GMT
  2. Register right now for this event.  Please note that you'll need to click on the "register" link on the bottom lefthand side.
  3. Optionally post questions via Google Moderator one hour prior to the start of the event.  The link will be mailed to all registrants.
  4. Log in 5-10 minutes prior to the start of the chat, using the link e-mailed to you by WebEx (the service hosting the event).
  5. Interact!  During the event, you'll be able to chat (by typing) with your fellow attendees, and also post questions and vote on your favorite questions via Google Moderator.

We look forward to seeing you online!  In the meantime, if you have any questions, feel free to post a note in this thread of our friendly Webmaster Help Group.

Edited on October 21st at 12:15pm and 12:29pm PDT to add:
We've decided to open up the Google Moderator page early.  Everyone who registered for this event previously and everyone registering from this moment on will receive the link in e-mail.  Also, the event is scheduled for *5pm* GMT (correctly listed on the registration page and in the followup e-mails).

Sunday, October 19, 2008

Where's my data?

Today we're going back to basics. We'll be answering the question: What is a website?

...Okay, not exactly. But we will be looking into what a "website" means in the context of Webmaster Tools, what kind of sites you can add to your Webmaster Tools account, and what data you can get from different types of sites.

Why should you care? Well, the following are all questions that we've gotten from webmasters recently:
  • "I know my site has lots of incoming links; why don't I see any in my Webmaster Tools account?"
  • "I see sitelinks for my site in Google's search results, but when I look in Webmaster Tools it says 'No sitelinks have been generated for your site.'"
  • "Why does my Top search queries report still say 'Data is not available at this time'? My site has been verified for months."
In each of these cases, the answer was the same: the data was there, but the webmaster was looking at the wrong "version" of their domain in Webmaster Tools.


A little background
The majority of tools and settings in Webmaster Tools operate on a per-site basis. This means that when you're looking at, say, the Top search queries report, you're only seeing the top search queries for a particular site. Looking at the top queries for www.example.com will show you different data than looking at the top queries for www.example.org. Makes sense, right?

Not all websites have URLs in the form www.example.com, though. Your root URL may not include the www subdomain (example.com); it may include a custom subdomain (rollergirl.example.com); or your site may live in a subfolder, for example if it's hosted on a free hosting site (www.example.com/rollergirl/). Since we want webmasters to be able to access our tools regardless of how their site is hosted, you can add any combination of domain, subdomain(s), and/or subfolder(s) as a "site" on your Webmaster Tools dashboard. Once you've verified your ownership of that site, we'll show you the information we have for that particular piece of the web, however big or small it may be. If you've verified your domain at the root level, we'll show you data for that whole domain; if you've only verified a particular subfolder or subdomain, we'll only show you data for that subfolder or subdomain. Take Blogger as an example—someone who blogs with Blogger should only be able to have access to the data for their own subdomain (googlewebmastercentral.blogspot.com), not the entire blogspot.com domain.

What some people overlook is the fact that www is actually a subdomain. It's a very, very common subdomain, and many sites serve the same content whether you access them with or without the www; but the fact remains that example.com and www.example.com are two different URLs and have the potential to serve different content. For this reason, they're considered different sites in Webmaster Tools. Since they're different sites—just like www.example.com and www.example.orgthey can have different data. When you're looking at the data for www.example.com (with the www subdomain) you're not seeing the data for example.com (without the subdomain), and vice versa.

What can I do to make sure I'm seeing all my data?
  • If you feel like you're missing some data, add both the www and the non-www version of your domain to your Webmaster Tools account. Take a look at the data for both sites.
  • Do a site: search for your domain without the www (e.g. [site:example.com]). This should return pages from your domain and any of your indexed subdomains (www.example.com, rollergirl.example.com, etc.). You should be able to tell from the results whether your site is mainly indexed with or without the www subdomain. The version that's indexed is likely to be the version that shows the most data in your Webmaster Tools account.
  • Tell us whether you prefer for your site to be indexed with or without the www by setting your preferred domain.
  • Let everyone else know which version you prefer by doing a site-wide 301 redirect.
Even though example.com and www.example.com may look like identical twins, any twins will be quick to tell you that they're not actually the same person. :-) Now that you know, we urge you to give both your www and non-www sites some love in Webmaster Tools, and—as usual—to post any follow-up questions in our Webmaster Help Group.

Friday, October 17, 2008

First Click Free for Web Search

While working on our mission to organize the world's information and make it universally accessible and useful, we sometimes run into situations where important content is not publicly available. In order to help users find and access content that may require registration or a subscription, Google offers an option to web and news publishers called "First Click Free." First Click Free has two main goals:
  1. To include highly relevant content in Google's search index. This provides a better experience for Google users who may not have known that content existed.
  2. To provide a promotion and discovery opportunity for publishers with restricted content.

First Click Free is designed to protect your content while allowing you to include it Google's search index. To implement First Click Free, you must allow all users who find your page through Google search to see the full text of the document that the user found in Google's search results and that Google's crawler found on the web without requiring them to register or subscribe to see that content. The user's first click to your content is free and does not require logging in. You may, however, block the user with a login or payment or registration request when he tries to click away from that page to another section of your content site.

Guidelines
Webmasters wishing to implement First Click Free should follow these guidelines:
  • All users who click a Google search result to arrive at your site should be allowed to see the full text of the content they're trying to access.
  • The page displayed to all users who visit from Google must be identical to the content that is shown to Googlebot.
  • If a user clicks to a multi-page article, the user must be able to view the entire article. To allow this, you could display all of the content on a single page—you would need to do this for both Googlebot and for users. Alternately, you could use cookies to make sure that a user can visit each page of a multi-page article before being asked for registration or payment.

Implementation Suggestions
To include your restricted content in Google's search index, our crawler needs to be able to access that content on your site. Keep in mind that Googlebot cannot access pages behind registration or login forms. You need to configure your website to serve the full text of each document when the request is identified as coming from Googlebot via the user-agent and IP-address. It's equally important that your robots.txt file allows access of these URLs by Googlebot.

When users click a Google search result to access your content, your web server will need to check the "Referer" HTTP request-header field. When the referring URL is on a Google domain, like www.google.com or www.google.de, your site will need to display the full text version of the page instead of the protected version of the page that is otherwise shown. Most web servers have instructions for implementing this type of behavior.

Frequently Asked Questions
Q: Can I allow Googlebot to access some restricted content pages but not others?
A: Yes.

Q: Can I limit the number of restricted content pages that an individual user can access on my site via First Click Free?
A: No. Any user arriving at your site from a Google search results page should be shown the full text of the requested page.

Q: Can First Click Free URLs be submitted using Sitemap files?
A: Yes. Simply create and submit your Sitemap file as usual.

Q: Is First Click Free content guaranteed inclusion in the Google Index?
A: No. Google does not guarantee inclusion in the web index.


Do you have any more questions or comments? Come on over to the Google Webmaster Help forum and join the discussion!


Thursday, October 16, 2008

Message Center warnings for hackable sites

Recently we've seen more websites get hacked because of various security holes. In order to help webmasters with this issue, we plan to run a test that will alert some webmasters if their content management system (CMS) or publishing platform looks like it might have a security hole or be hackable. This is a test, so we're starting out by alerting five to six thousand webmasters. We will be leaving messages for owners of potentially vulnerable sites in the Google Message Center that we provide as a free service as part of Webmaster Tools. If you manage a website but haven't signed up for Webmaster Tools, don't worry. The messages will be saved and if you sign up later on, you'll still be able to access any messages that Google has left for your site.

One of the most popular pieces of software on the web is WordPress, so we're starting our test with a specific version (2.1.1) that is known to be vulnerable to exploits. If the test goes well, we may expand these messages to include other types of software on the web. The message that a webmaster will see in their Message Center if they run WordPress 2.1.1 will look like this:


Quick note from Matt: In general, it's a good idea to make sure that your webserver's software is up-to-date. For example, the current version of WordPress is 2.6.2; not only is that version more secure than previous versions, but it will also alert you when a new version of WordPress is available for downloading. If you run an older version of WordPress, I highly encourage you to upgrade to the latest version.

Wednesday, October 15, 2008

Video Tutorial: Google for Webmasters

We're always looking for new ways to help educate our fellow webmasters. While you may already be familiar with Webmaster Tools, Webmaster Help Discussion Groups, this blog, and our Help Center, we've added another tutorial to help you understand how Google works. Hence we've made this video of a soon-to-come presentation titled "Google for Webmasters." This video will introduce how Google discovers, crawls, indexes your site's pages, and how Google displays them in search results. It also touches lightly upon challenges webmasters and search engines face, such as duplicate content, and the effective indexing of Flash and AJAX content. Lastly, it also talks about the benefits of offerings Webmaster Central and other useful Google products.


Take a look for yourself.

Discoverability:



Accessibility - Crawling and Indexing:


Ranking:


Webmaster Central Overview:


Other Resources:



Google Presentations Version:
http://docs.google.com/Presentation?id=dc5x7mrn_245gf8kjwfx

Important links from this presentation as they chronologically appear in the video:
Add your URL to Google
Help Center: Sitemaps
Sitemaps.org
Robots.txt
Meta tags
Best uses of Flash
Best uses of Ajax
Duplicate content
Google's Technology
Google's History
PigeonRank
Help Center: Link Schemes
Help Center: Cloaking
Webmaster Guidelines
Webmaster Central
Google Analytics
Google Website Optimizer
Google Trends
Google Reader
Google Alerts
More Google Products


Special thanks to Wysz, Chark, and Alissa for the voices.

Tuesday, October 14, 2008

Helping you break the language barrier

When webmasters put content out on the web it's there for the world to see. Unfortunately, most content on the web is only published in a single language, understandable by only a fraction of the world's population.

In a continued effort to make the world's information universally accessible, Google Translate has a number of tools for you to automatically translate your content into the languages of the world.


Users may already be translating your webpage using Google Translate, but you can make it even easier by including our "Translate My Page" gadget, available at http://translate.google.com/translate_tools.

The gadget will be rendered in the user's language, so if they come to your page and can't understand anything else, they'll be able to read the gadget, and translate your page into their language.

Sometimes there may be some content on your page that you don't want us to translate. You can now add class=notranslate to any HTML element to prevent that element from being translated. For example, you may want to do something like:
Email us at <span class="notranslate">sales at mydomain dot com</span>
And if you have an entire page that should not be translated, you can add:
<meta name="google" value="notranslate">
to the <head> of your page and we won't translate any of the content on that page.

Update on 12/15/2008: We also support:
<meta name="google" content="notranslate">
Thanks to chaoskaizer for pointing this out in the comments. :)

Lastly, if you want to do some fancier automatic translation integrated directly into your page, check out the AJAX Language API we launched last March.

With these tools we hope you can more easily make your content available in all the languages we support, including Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian, and Vietnamese.

Monday, October 13, 2008

Webmaster Tools API updated with Site Settings

The Webmaster Tools GData API has been updated to allow you to get even more out of Webmaster Tools, such as setting a geographic location or your preferred domain. For those of you that aren't familiar with GData, it's a protocol for reading and writing data on the web. GData makes it very easy to communicate with many Google services, like Webmaster Tools. The Webmaster Tools GData API already allows you to add and verify sites for your account and to submit Sitemaps programmatically. Now you can also access and update site-specific information. This is especially useful if you have a large number of sites. With the Webmaster Tools API, you can perform hundreds of operations in the time that it would take to add and verify a single site through the web interface.
What can I do?
We've included four new features in the API. You can see and update these settings for each site that you have verified. The features are:
  • Crawl Rate: You can request that Googlebot crawl your site slower or faster than it normally would (the details can be found in our Help Center article about crawl rate control). If many of your sites are hosted on the same server and you know your server's capacity, you may want to update all sites at the same time. This now a trivial task using the Webmaster Tools GData API.
  • Geographic Location: If your site is targeted towards a particular geographic location but your domain doesn't reflect that (for example with a .com domain), you can provide information to help us determine where your target users are located.
  • Preferred Domain: You can select which is the canonical domain to use to index your pages. For example, if you have a site like www.example.com, you can set either example.com or www.example.com as the preferred domain to use. This avoids the risk of treating both sites differently.
  • Enhanced Image Search: Tools like the Google Image Labeler allow users to tag images in order to improve Image Search results. Now you can opt in or out for all your sites in a breeze using the Webmaster Tools API.
How do I do it?
We provide you with Java code samples for all the current Webmaster Tools API functionality. Here's a sample snippet of code that takes a list of sites and updates the geographic location of all of them:

  // Authenticate against the Webmaster Tools service
  WebmasterToolsService service;
  try {
    service = new WebmasterToolsService("exampleCo-exampleApp-1");
    service.setUserCredentials(USERNAME, PASSWORD);
  } catch (AuthenticationException e) {
    System.out.println("Error while authenticating.");
    return;
  }

  // Read sites and geolocations from your database
  readSitesAndGeolocations(sitesList, geolocationsList);

  // Update all sites
  Iterator
sites = sitesList.iterator();
  Iterator
geolocations = geolocationsList.iterator();
  while (sites.hasNext() && geolocations.hasNext()) {
    // Create a blank entry and add the updated information
    SitesEntry updateEntry = new SitesEntry();
    updateEntry.setGeolocation(geolocations.next());

    // Get the URL to update the site
    String encodedSiteId = URLEncoder.encode(sites.next(),
        "UTF-8");
    URL siteUrl = new URL(
        "http://www.google.com/webmasters/tools/feeds/sites/"
        + encodedSiteId);

    // Update the site
    service.update(siteUrl, updateEntry);
  }

Where do I get it?
The main page for the Webmaster Tools GData API explains all the details of the API. It has a detailed reference guide and also many code snippets that explain how to use the Java client library, which is available for download. You can find more details about GData and all the different Google APIs in the Google Data API homepage.

Webmaster Tools shows Crawl error sources

Ever since we released the crawl errors feature in Webmaster Tools, webmasters have asked for the sources of the URLs causing the errors. Well, we're listening! We know it was difficult for those of you who wanted to identify the cause of a particular "Not found" error, in order to prevent it in the future or even to request a correction, without knowing the source URL. Now, Crawl error sources makes the process of tracking down the causes of "Not found" errors a piece of cake. This helps you improve the user experience on your site and gives you a jump start for links week (check out our updated post on "Good times with inbound links" to get the scoop).

In our "Not Found" and "Errors for URLs in Sitemaps" reports, we've added the "Linked From" column. For every error in these reports, the "Linked From" column now lists the number of pages that link to a specific "Not found" URL.



Clicking on an item in the "Linked From" column opens a separate dialog box which lists each page that linked to this URL along with the date it was discovered. The source URL for the 404 can be within or external to your site.





For those of you who just want the data, we've also added the ability to download all your crawl error sources at once. Just click the "Download all sources of errors on this site" link to download all your site's crawl error sources.



Again, if we report crawl errors for your website, you can use crawl error sources to quickly determine if the cause is from your site or someone else's. You'll have the information you need to contact them to get it fixed, and if needed, you can still put in place redirects on your own site to the appropriate URL. Just sign in to Webmaster Tools and check it out for your verified site. You can help people visiting your site—from anywhere on the web—find what they're looking for.

Thursday, October 9, 2008

Good times with inbound links

Inbound links are links from pages on external sites linking back to your site. Inbound links can bring new users to your site, and when the links are merit-based and freely-volunteered as an editorial choice, they're also one of the positive signals to Google about your site's importance. Other signals include things like our analysis of your site's content, its relevance to a geographic location, etc. As many of you know, relevant, quality inbound links can affect your PageRank (one of many factors in our ranking algorithm). And quality links often come naturally to sites with compelling content or offering a unique service.

How do these signals factor into ranking?

Let's say I have a site, example.com, that offers users a variety of unique website templates and design tips. One of the strongest ranking factors is my site's content. Additionally, perhaps my site is also linked from three sources -- however, one inbound link is from a spammy site. As far as Google is concerned, we want only the two quality inbound links to contribute to the PageRank signal in our ranking.

Given the user's query, over 200 signals (including the analysis of the site's content and inbound links as mentioned above) are applied to return the most relevant results to the user.


So how can you engage more users and potentially increase merit-based inbound links?

Many webmasters have written about their success in growing their audience. We've compiled several ideas and resources that can improve the web for all users.
Create unique and compelling content on your site and the web in general
  • Start a blog: make videos, do original research, and post interesting stuff on a regular basis. If you're passionate about your site's topic, there are lots of great avenues to engage more users.

    If you're interested in blogging, see our Help Center for specific tips for bloggers.

  • Teach readers new things, uncover new news, be entertaining or insightful, show your expertise, interview different personalities in your industry and highlight their interesting side. Make your site worthwhile.

  • Participate thoughtfully in blogs and user reviews related to your topic of interest. Offer your knowledgeable perspective to the community.

  • Provide a useful product or service. If visitors to your site get value from what you provide, they're more likely to link to you.

  • For more actionable ideas, see one of my favorite interviews with Matt Cutts for no-cost tips to help increase your traffic. It's a great primer for webmasters. (Even before this post, I forwarded the URL to many of my friends. :)
Pursue business development opportunities
Use Webmaster Tools for "Links > Pages with external links" to learn about others interested in your site. Expand the web community by figuring out who links to you and how they're linking. You may have new audiences or demographics you didn't realize were interested in your niche. For instance, if the webmasters for example.com noticed external links coming from art schools, they may start to engage with the art community -- receiving new feedback and promoting their site and ideas.

Of course, be responsible when pursuing possible opportunities in this space. Don't engage in mass link-begging; no one likes form letters, and few webmasters of quality sites are likely to respond positively to such solicitations. In general, many of the business development techniques that are successful in human relationships can also be reflected online for your site.
Now that you've read more information about internal links, outbound links, and inbound links (today's post :), we'll see you in the blog comments! Thanks for joining us for links week.

Update -- Here's one more business development opportunity:
Investigate your "Diagnostics > Web/mobile crawl > Crawl error sources" to not only correct broken links, but also to cultivate relationships with external webmasters who share an interest in your site. (And while you're chatting, see if they'll correct the broken link. :) This is a fantastic way to turn broken links into free links to important parts of your site.

In addition to contacting these webmasters, you may also wish to use 301 redirects to redirect incoming traffic from old pages to their new locations. This is good for users who may still have bookmarks with links to your old pages... and you'll be happy to know that Google appropriately flows PageRank and related signals through these redirects.

Wednesday, October 8, 2008

Linking out: Often it's just applying common sense

Creating outbound links on your site, or "linking out", is our topic for Day 3 of Links Week. Linking out happens naturally, and for most webmasters, it's not something you have to worry about. Nonetheless, in case you're interested about an otherwise simple topic that's fundamental to the web, here's the good, the bad, and answers to more advanced questions asked by our fellow webmasters. First, let's start with the good...

Relevant outbound links can help your visitors.
  • Provide your readers in-depth information about similar topics
  • Offer readers your unique commentary on existing resources
Thoughtful outbound links can help your credibility.
  • Show that you've done your research and have expertise in the subject manner
  • Make visitors want to come back for more analysis on future topics
  • Build relationships with other domain experts (e.g. sending visitors can get you on the radar of other successful bloggers and begin a business relationship)
When it comes to the less-than-ideal practices of linking out, there shouldn't be too many surprises, but we'll go on record to avoid any confusion...

The bad: Unmonitored (especially user-generated) links and undisclosed paid advertising outbound links can reduce your site's credibility.
  • Including too many links on one page confuses visitors (we usually encourage webmasters to not have much more than 100 links per page)
  • Hurts your credibility—turns off savvy visitors and reduces your authority with search engines. If you accept payment for outbound links, it's best to rel="nofollow" them or otherwise ensure that they don't pass PageRank for search engines. (As a user, I prefer to see disclosure to maintain my loyalty as well.)
  • Allows comment spam, which provides little benefit for users. Also, from a search engine perspective, comment spam can connect your site with bad neighborhoods instead of legitimate resources. Webmasters often add the nofollow attribute (<rel="nofollow">) to links that are user generated, such as spammable blog comments, unless the comments are responsibly reviewed and thus vouched for.

    See Jason Morrison's recent blog post about keeping comment spam off your site to prevent spam in the first place.
Answers to advanced questions about outbound links

When linking out, am I sending visitors away forever?!
Hmmm... visitors may initially leave your site to check out relevant information. But can you recall your behavior on sites that link to good articles outside their domain? Personally, I always come back to sites I feel provide commentary and additional resources. Sometimes I stay on the original site and just open up the interesting link in a different tab. It's likely that with relevant outbound links you'll gain repeat visitors, and you won't lose them forever.
Yesterday's post mentioned that descriptive anchor text is helpful in internal links. Is it still important for outbound links?
Descriptive anchor text (the visible text in a hyperlink) helps accurately inter-connect the web. It allows both users and Googlebot to better understand what they're likely to find when following a link to another page. So if it's not too much trouble, try making anchor text descriptive.
Should I worry about the sites I choose to link to? What if their PageRank may be lower than mine?
If you're linking to content you believe your users will enjoy, then please don't worry about the site's perceived PageRank. As a webmaster, the things to be wary of regarding outbound links are listed above, such as losing credibility by linking to spammy sites. Otherwise, consider outbound links as a common sense way to provide more value to your users, not a complicated formula.

Monday, October 6, 2008

Importance of link architecture

In Day 2 of links week, we'd like to discuss the importance of link architecture and answer more advanced questions on the topic. Link architecture—the method of internal linking on your site—is a crucial step in site design if you want your site indexed by search engines. It plays a critical role in Googlebot's ability to find your site's pages and ensures that your visitors can navigate and enjoy your site.

Keep important pages within several clicks from the homepage

Although you may believe that users prefer a search box on your site rather than category navigation, it's uncommon for search engine crawlers to type into search boxes or navigate via pulldown menus. So make sure your important pages are clickable from the homepage and for easy for Googlebot to find throughout your site. It's best to create a link architecture that's intuitive for users and crawlable for search engines. Here are more ideas to get started:
Intuitive navigation for users

Create common user scenarios, get "in character," then try working through your site. For example, if your site is about basketball, imagine being a visitor (in this case a "baller" :) trying to learn the best dribbling technique.
  • Starting at the homepage, if the user doesn't use the search box on your site or a pulldown menu, can they easily find the desired information (ball handling like a superstar) from the navigation links?

  • Let's say a user found your site through an external link, but they didn't land on the homepage. Starting from any (sub-/child) page on your site, make sure they can easily find their way to the homepage and/or other relevant sections. In other words, make sure users aren't trapped or stuck. Was the "best dribbling technique" easy for your imaginary user to find? Often breadcrumbs such as "Home > Techniques > Dribbling" help users to understand where they are.
Crawlable links for search engines
  • Text links are easily discovered by search engines and are often the safest bet if your priority is having your content crawled. While you're welcome to try the latest technologies, keep-in-mind that when text-based links are available and easily navigable for users, chances are that search engines can crawl your site as well.

    This <a href="new-page.html">text link</a> is easy for search engines to find.

  • Sitemap submission is also helpful for major search engines, though it shouldn't be a substitute for crawlable link architecture. If your site utilizes newer techniques, such as AJAX, see "Verify that Googlebot finds your internal links" below.
Use descriptive anchor text

Writing descriptive anchor text, the clickable words in a link, is a useful signal to help search engines and users alike to better understand your content. The more Google knows about your site—through your content, page titles, anchor text, etc.—the more relevant results we can return for users (and your potential search visitors). For example, if you run a basketball site and you have videos to accompany the textual content, a not-very-optimal way of linking would be:

To see all our basketball videos, <a href="videos.html">click here</a> for the entire listing.

However, instead of the generic "click here," you could rewrite the anchor text more descriptively as:

Feel free to browse all of our <a href="videos.html">basketball videos</a>.

Verify that Googlebot finds your internal links

For verified site owners, Webmaster Tools has the feature "Links > Pages with internal links" that's great for verifying that Googlebot finds most of the links you'd expect. This is especially useful if your site uses navigation involving JavaScript (which Googlebot doesn't always execute)—you'll want to make sure that Googlebot is finding other internal links as expected.

Here's an abridged snapshot of our internal links to the introductory post for "404 week at Webmaster Central." Our internal links are discovered as we had hoped.


Feel free to ask more internal linking questions
Here are some to get you started...

Q: What about using rel="nofollow" for maximizing PageRank flow in my internal link architecture (such as PageRank sculpting, or PageRank siloing)?
A: It's not something we, as webmasters who also work at Google, would really spend time or energy on. In other words, if your site already has strong link architecture, it's far more productive to work on keeping users happy with fresh and compelling content rather than to worry about PageRank sculpting.

Matt Cutts answered more questions about "appropriate uses of nofollow" in our webmaster discussion group.
Q: Let's say my website is about my favorite hobbies: biking and camping. Should I keep my internal linking architecture "themed" and not cross-link between the two?
A: We haven't found a case where a webmaster would benefit by intentionally "theming" their link architecture for search engines. And, keep-in-mind, if a visitor to one part of your site can't easily reach other parts of your site, that may be a problem for search engines as well.
Perhaps it's cliche, but at the end of the day, and at the end of this post, :) it's best to create solid link architecture (making navigation intuitive for users and crawlable for search engines)—implementing what makes sense for your users and their experience on your site.

Thanks for your time today! Information about outbound links will soon be available in Day 3 of links week. And, if you have helpful tips about internal links or questions for our team, please share them in the comments below.

Links information straight from the source

We hope that you're able to focus on helping users (and improving the web) by creating great content or providing a great service on your site. In between creating content and working on your site, you may have read some of the (often conflicting) link discussions circling the web. If you're asking, "What's going on -- what do I need to know about links?" then welcome to the first day of links week!

Day 2: Internal links (links within your site)
Internal linking is your homepage linking to your "Contact us" page, or your "Contact us" page linking to your "About me" page. Internal linking (also known as link architecture) is important because it's a major factor in how easily visitors can navigate your site. Additionally, internal linking contributes to your site's "crawlability" -- how easily a spider can reach your pages. More in Day 2 of links week.
Day 3: Outbound links (sites you link to)
Outbound links are external sites that you're linking to. For example, www.google.com/webmasters links to the domain googlewebmastercentral.blogspot.com (our lovely blog!). Outbound links allow us to surf the web -- they're a big reason why the web is so exciting and collaborative. Without outbound links, your site can seem isolated from the community because each page becomes "brochure-ware." Most sites include outbound links naturally and it shouldn't be a big concern. If you still have questions, we'll be covering outbound linking in more detail on Day 3.
Day 4: Inbound links (sites linking to you)
Inbound links are external sites linking to you. There are many webmasters who (rightfully) aren't preoccupied by the subject of inbound links. So why do some webmasters care? It's likely because merit-based or volunteered inbound links may seem like a quick way to increase rankings and traffic. Answers to your questions like, "Are there no-cost methods to maximize my merit-based links?" are provided on Day 4.
Update: Included references to blog posts as they were published throughout links week.