Pages

Friday, March 30, 2007

BlogHer 2007: Building your audience

Last week, I spoke at BlogHer Business about search engine optimization issues. I presented with Elise Bauer, who talked about the power of community in blogging. She made great points about the linking patterns of blogs. Link out to sites that would be relevant and useful for your readers. Comment on blogs that you like to continue the conversation and provide a link back to your blog. Write useful content that other bloggers will want to link to. Blogging connects readers and writers and creates real communities where valuable content can be exchanged. I talked more generally about search and a few things you might consider when developing your site and blog.

Why is search important for a business?
With search, your potential customers are telling you exactly what they are looking for. Search can be a powerful tool to help you deliver content that is relevant and useful and meets your customers' needs. For instance, do keyword research to find out the most common types of searches that are relevant to your brand. Does your audience most often search for "houses for sale" or "real estate"? Check your referrer logs to see what searches are bringing visitors to your site (you can find a list of the most common searches that return your site in the results from the Query stats page of webmaster tools). Does your site include valuable content for those searches? A blog is a great way to add this content. You can write unique, targeted articles that provide exactly what the searcher wanted.

How do search engines index sites?
The first step in the indexing process is discovery. A search engine has to know the pages exist. Search engines generally learn about pages from following links, and this process works great. If you have new pages, ensure relevant sites link to them, and provide links to them from within your site. For instance, if you have a blog for your business, you could provide a link from your main site to the latest blog post. You can also let search engines know about the pages of your site by submitting a Sitemap file. Google, Yahoo!, and Microsoft all support the Sitemaps protocol and if you have a blog, it couldn't be easier! Simply submit your blog's RSS feed. Each time you update your blog and your RSS feed is updated, the search engines can extract the URL of the latest post. This ensures search engines know about the updates right away.

Once a search engine knows about the pages, it has to be able to access those pages. You can use the crawl errors reports in webmaster tools to see if we're having any trouble crawling your site. These reports show you exactly what pages we couldn't crawl, when we tried to crawl them, and what the error was.

Once we access the pages, we extract the content. You want to make sure that what your page is about is represented by text. What does the page look like with Javascript, Flash, and images turned off in the browser? Use ALT text and descriptive filenames for images. For instance, if your company name is in a graphic, the ALT text should be the company name rather than "logo". Put text in HTML rather than in Flash or images. This not only helps search engines index your content, but also makes your site more accessible to visitors with mobile browsers, screen readers, or older browsers.

What is your site about?
Does each page have unique title and meta description tags that describe the content? Are the words that visitors search for represented in your content? Do a search of your pages for the queries you expect searchers to do most often and make sure that those words do indeed appear in your site. Which of the following tells visitors and search engines what your site is about?

Option 1
If you're plagued by the cliffs of insanity or the pits of despair, sign up for one of our online classes! Learn the meaning of the word inconceivable. Find out the secret to true love overcoming death. Become skilled in hiding your identity with only a mask. And once you graduate, you'll get a peanut. We mean it.

Option 2
See our class schedule here. We provide extensive instruction and valuable gifts upon graduation.

When you link to other pages in your site, ensure that the anchor text (the text used for the link) is descriptive of those pages. For instance, you might link to your products page with the text "Inigo Montoya's sword collection" or "Buttercup's dresses" rather than "products page" or the ever-popular "click here".

Why are links important?
Links are important for a number of reasons. They are a key way to drive traffic to your site. Visitors of other sites can learn about your site through links to it. You can use links to other sites to provide valuable information to your visitors. And just as links let visitors know about your site, they also let search engines know about it. Links also tell search engines and potential visitors about your site. The anchor text describes what your site is about and the number of relevant links to your pages are an indicator of how popular and useful those pages are. (You can find a list of the links to your site and the most common anchor text used in those links in webmaster tools.)

A blog is a great way to build links, because it enables you to create new content on a regular basis. The more useful content you have, the greater the chances someone else will find that content valuable to their readers and link to it. Several people at the BlogHer session asked about linking out to other sites. Won't this cause your readers to abandon your site? Won't this cause you to "leak out" your PageRank? No, and no. Readers will appreciate that you are letting them know about resources they might be interested in and will remember you as a valuable source of information (and keep coming back for more!). And PageRank isn't a set of scales, where incoming links are weighted against outgoing ones and cancel each other out. Links are content, just as your words are. You want your site to be as useful to your readers as possible, and providing relevant links is a way, just as writing content is, to do that.

The key is compelling content
Google's main goal is to provide the most useful and relevant search results possible. That's the key thing to keep in mind as you look at optimizing your site. How can you make your site the most useful and relevant result for the queries you care about? This won't just help you in the search results, which after all, are just the means to the end. What you are really interested in is keeping your visitors happy and coming back. And creating compelling and useful content is the best way to do that.

Wednesday, March 28, 2007

An update on spam reporting

(Note: this post has been translated into English from our German blog.)

In 2006 one of our initiatives in the area of communication was to notify some webmasters in case of a violation of our Webmaster Guidelines (e.g. by using a "particular search engine friendly" software that generates doorways as an extra). No small number of these good-will emails to webmasters have been brought about by spam reports from our users.

We are proud of our users who alert us to potential abuses for the sake of the whole internet community. We appreciate this even more, as PageRank™ (and thus Google search) is based on a democratic principle, i.e. a webmaster is giving other sites a "vote" of approval by linking to it.

In 2007 as an extension and complement of this democratic principle, we want to further increase our users' awareness of webmaster practices that do or do not conform to Google's standards. Such informed users are then able to take counter-action against webspam by filing spam reports. By doing so a mutually beneficial process can be initiated. Ultimately, not only will all Google users benefit from the best possible search quality, but also will spammy webmasters realize that their attempts to unfairly manipulate their site's ranking will pay off less and less.

Our spam report forms are provided in two different flavors: an authenticated form that requires registration in Webmaster Tools, and an unauthenticated form. Currently, we investigate every spam report from a registered user. Spam reports to the unauthenticated form are assessed in terms of impact, and a large fraction of those are reviewed as well.

So, the next time you can't help thinking that the ranking of a search result was not earned by virtue of its content and legitimate SEO, then it is the perfect moment for a spam report. Each of them can give us crucial information for the continual optimization of our search algorithms.

Interested in learning more? Then find below answers to the three most frequent questions.

FAQs concerning spam reports:

Q: What happens to an authenticated spam report at Google?
A: An authenticated spam report is analyzed and then used for evaluating new spam-detecting algorithms, as well as to identify trends in webspam. Our goal is to detect all the sites engaging in similar manipulation attempts automatically in the future and to make sure our algorithms rank those sites appropriately. We don´t want to get into an inefficient game of cat and mouse with individual webmasters who have reached into the wrong bag of tricks.

Q: Why are there sometimes no immediately noticeable consequences of a spam report?
A: Google is always seeking to improve its algorithms for countering webspam, but we also take action on individual spam reports. Sometimes that action will not be immediately visible to an outside user, so there is no need to submit a site multiple times in order for Google to evaluate a URL. There are different reasons that might account for a user´s false impression that a particular spam report went unnoticed. Here are a few of those reasons:

  • Sometimes, Google might already be handling the situation appropriately. For example, if you are reporting a site that seems to engage in excessive link exchanging, it could be the case that we are already discounting the weight of those unearned backlinks correctly, and the site is showing up for other reasons. Note that changes in how Google handles backlinks for a site are not immediately obvious to outside users. Or it may be the case that we already deal with a phenomenon such as keyword stuffing correctly in our scoring, and therefore we are not quite as concerned about something that might not look wonderful, but that isn't affecting rankings.
  • A complete exclusion from Google´s SERPs is only one possible consequence of a spam report. Google might also choose to give a site a "yellow card" so that the site can not be found in the index for a short time. However, if a webmaster ignores this signal, then a "red card" with a longer-lasting effect might follow. So it's possible that Google is already aware of an issue and communicating with the webmaster about that issue, or that we have taken action other than a removal on a spam report.
  • Sometimes, simple patience is the answer, because it takes time for algorithmic changes to be thoroughly checked out, or for the externally displayed PageRank to be updated.
  • It can also be the case that Google is working on solving the more general instance of an issue, and so we are reluctant to take action on an individual situation.
  • A spam report may also just have been considered unjustified. For example, this may be true for a report whose sole motivation appears to attempt to harm a direct competitor with a better ranking.

Q: Can a user expect to receive feedback for a spam report?
A: This is a common request, and we know that our users might like verification of the reported URLs or simple confirmation that the spam report had been taken care of. Given the choice how to spend our time, we have decided to invest our efforts into taking action on spam reports and improving our algorithms to be more robust. But we are open to consider how to scale communication with our users going forward.

Monday, March 26, 2007

Tips for Eastern European webmasters

In 2006 we ramped up on international webmaster issues and particularly tried to support Eastern Europe. We opened several offices in the region, improved our algorithms with respect to these languages, and localized many of our products. Should I find only one word to describe these markets, I would say they are diverse. Still, they have two things in common: their online markets are currently in a developing phase and a high number of webmasters and search engine optimizers work there in a variety of languages. We are aware that a certain amount of webspam is generated in this region and we would like to reinforce that we have been working hard to take action on it both algorithmically and manually. Since I have seen some common phenomena in a bunch of these markets, here are a couple of suggestions for Eastern European webmasters and SEOs:
  • Avoid link exchanges. If a fellow webmaster approaches you with some sketchy offer, just refuse. Instead, work on the content of your site. Once you have the quality content, you can use the buzzing blogger community and social web services in your language to get nice linkbaits. Creating good content for your language community will pay off. Help the high-quality people in your language community and they will re-power you.
  • Use regional and geographical domains in line with their purpose. First, a sidenote for the Western webmasters: some Eastern European countries like Poland and Russia have so-called regional or geographical domains. Imagine that all the states in the U.S. had their official second level domain and if you wanted to open your webshop delivering to Kentucky, you could do it cheap or for free on eg. ky.us. This could help Google serve geographically relevant search results. In case you wish to sell organic soaps to people in Szczecin, do open your webshop on szczecin.pl. If you are from Kalmykia and would like to show the world the beauty of your area, go ahead and set up your Kalmyki travel site on kalmykia.ru. If you like a region, support it by hosting your site on the related regional or geographical domain. Be aware that webspam on these regional domains violates the correct use of them and prevents the development of your country's web culture.
  • Say no to Cybersquats! Sneaky registering of strong online brands with Belarusian, Estonian or Slovak top level domains is just bad. While it will not particularly help you boost the ranking of your site, cybersquatting often has created disappointed users and legal actions as side effects.
  • Think long-term. You have your share of responsibility for the development of your market. Creating quality sites that target users who search for highly specific content in your particular language will help you get your market into a more mature status -- and mature markets mean mature publisher revenue too.

Friday, March 16, 2007

Site content and use of web catalogues

Sites with more content can have more opportunities to rank well in Google. It makes sense that having more pages of good content represent more chances to rank in search engine result pages (SERPs). Some SEOs however, do not focus on the user’s needs, but instead create pages solely for search engines. This approach is based on the false assumption that increasing the volume of web pages with random, irrelevant content is a good long-term strategy for a site. These techniques are usually accomplished by abusing qlweb style catalogues or by scraping content from sources known for good, valid content, like Wikipedia or the Open Directory Project.

These methods violate Google's webmaster guidelines. Purely scraped content, even from high quality sources, does not provide any added value to your users. It's worthwhile to take the time to create original content that sets your site apart. This will keep your visitors coming back and will provide useful search results.

In order to provide best results possible to our Polish and non-Polish users, Google continues to improve its algorithms for validating web content.

Google is willing to take action against domains that try to rank more highly by just showing scraped or other autogenerated pages that don't add any value to users. Companies, webmasters, and domain owners who consider SEO consultation should take care not to spend time on methods which will not have worthwhile long-term results. Choosing the right SEO consultant requires in-depth background research, and their reputation and past work should be important factors in your decision.

PS: Head on over to our Polish discussion forum, where we're monitoring the posts and chiming in when we can!

Treść oraz katalogi na serwisach internetowy

Serwisy o dużej ilości stron mają szanse na wyższe pozycje w indeksie Google. Oznacza to, że oferując wiele stron z niepowtarzalną treścią można polepszyć notowania w wynikach wyszukiwarek (SERP). Fakt ten jest znany i wykorzystywany przez przedsiębiorstwa oferujące usługi pozycjonowania witryn internetowych. Często jednak nie jest brane pod uwagę, że treść strony powinna być tworzona dla użytkowników, a nie dla wyszukiwarek (w tym Google). Takie podejście prowadzi do błędnego założenia, że wystarczy zwiększyć ilość stron konkretnej domeny, dodając na przykład katalogi z dowolną, niejednokrotnie zupełnie nieistotną treścią, aby na dłuższy okres czasu wypozycjonować domenę. Przejawia się to między innymi nadużywaniem katalogów typu qlweb lub kopiowaniem znanych z jakościowo dobrej treści serwisów, jak Wikipedia lub Open Directory Project.

Takie metody są bez wątpliwości rozbieżne z wytycznymi Google dla webmasterów. Dowolnie skopiowane treści, nawet jeżeli dobrej jakości, nie stanowią większej wartości informacyjnej dla użytkowników. Aby wyróżnić serwis internetowy, warto poświęcić czas na tworzenie nowej treści, dzięki czemu można zwiększyć lojalność użytkowników i dostarczyć przydatnych wyników w wyszukiwarce.

W trosce o naszych polskich użytkowników (i nie tylko) Google konsekwentnie ulepsza algorytmy weryfikujące merytoryczną wartość serwisów internetowych.

Google jest skłonny podejmować działania przeciwko domenom, których webmasterzy usiłują osiągnąć lepsze pozycje w wynikach poprzez dodawanie skopiowanej lub automatycznie wygenerowanej treści, która nie stanowi żadnej wartości dla użytkowników. Przedsiębiorstwa, webmasterzy oraz właściciele domen biorący pod uwagę konsultacje specjalistów SEO, powinni zadbać o to, żeby ich czas nie był wykorzystywany na stosowanie metod nieprzynoszących długoterminowych rezultatów. Przy wyborze doradców oraz firm oferujących pozycjonowanie, ich reputacja jest kluczowym czynnikiem i powinna zostać dokładnie zweryfikowana przed podjęciem ostatecznej decyzji.

PS: Zapraszamy na naszą polską grupe dyskusyjną, na której z zainteresowaniem czytamy Wasze wpisy i staramy się na nie reagować.

Posted by Kaspar Szymanski, Search Quality

Thursday, March 15, 2007

Get a more complete picture about how other sites link to you

For quite a while, you've been able to see a list of the most common words used in anchor text to your site. This information is useful, because it helps you know what others think your site is about. How sites link to you has an impact on your traffic from those links, because it describes your site to potential visitors. In addition, anchor text influences the queries your site ranks for in the search results.

Now we've enhanced the information we provide and will show you the complete phrases sites use to link to you, not just individual words. And we've expanded the number we show to 100. To make this information as useful as possible, we're aggregating the phrases by eliminating capitalization and punctuation. For instance, if several sites have linked to your site using the following anchor text:

Site 1 "Buffy, blonde girl, pointy stick"
Site 2 "Buffy blonde girl pointy stick"
Site 3 "buffy: Blonde girl; Pointy stick."

We would aggregate that anchor text and show it as one phrase, as follows:

"buffy blonde girl pointy stick"

You can find this list of phrases by logging into webmaster tools, accessing your site, then going to Statistics > Page anaysis. You can view this data in a table and can download it as a CSV file.

And as we told you last month, you can see the individual links to pages of your site by going to Links > External links. We hope these details give you additional insight into your site traffic.

Wednesday, March 14, 2007

Brand new German Webmaster Central Blog

For those German-speaking folks among our readers of this English Webmaster Central Blog we have exciting news: We have just launched the German Webmaster-Zentrale Blog! This is a tribute to the fact that the German-speaking webmaster community is our second biggest audience of this blog. The German Webmaster Blog will provide you with first-hand information tailored towards our German-speaking webmasters. The blog will contain a mix of German versions of postings from this blog as well as unique postings about market-specific issues.

So German speakers around the world check out this new resource for questions about indexing, ranking, quality guidelines for webmasters, and how to design websites with the user in mind. We'll also be participating in the German discussion forum, so head over there if you have questions or other things you'd like to talk about.

Don't speak German? We want to talk to webmasters all over the world, so stay tuned for more!

Tuesday, March 6, 2007

All about robots

Search engine robots, including our very own Googlebot, are incredibly polite. They work hard to respect your every wish regarding what pages they should and should not crawl. How can they tell the difference? You have to tell them, and you have to speak their language, which is an industry standard called the Robots Exclusion Protocol.

Dan Crow has written about this on the Google Blog recently, including an introduction to setting up your own rules for robots and a description of some of the more advanced options. His first two posts in the series are:
Controlling how search engines access and index your website
The Robots Exclusion Protocol
Stay tuned for the next installment.

While we're on the topic, I'd also like to point you to the robots section of our help center and our earlier posts on this topic:
Debugging Blocked URLs
All About Googlebot
Using a robots.txt File

Update: For more information, please see our robots.txt documentation.

Monday, March 5, 2007

Using the robots meta tag

Recently, Danny Sullivan brought up good questions about how search engines handle meta tags. Here are some answers about how we handle these tags at Google.

Multiple content values
We recommend that you place all content values in one meta tag. This keeps the meta tags easy to read and reduces the chance for conflicts. For instance:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

If the page contains multiple meta tags of the same type, we will aggregate the content values. For instance, we will interpret

<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="NOFOLLOW">

The same way as:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

If content values conflict, we will use the most restrictive. So, if the page has these meta tags:

<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="INDEX">

We will obey the NOINDEX value.

Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there's no need to tag pages with content values of INDEX or FOLLOW.

Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to "ROBOTS". To provide instruction for only Googlebot, set the meta name to "GOOGLEBOT". If you want to provide different instructions for different search engines (for instance, if you want one search engine to index a page, but not another), it's best to use a specific meta tag for each search engine rather than use a generic robots meta tag combined with a specific one. You can find a list of bots at robotstxt.org.

Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta tags is interpreted in exactly the same way:

<meta name="ROBOTS" content="NOODP">
<meta name="robots" content="noodp">
<meta name="Robots" content="NoOdp">

If you have multiple content values, you must place a comma between them, but it doesn't matter if you also include spaces. So the following meta tags are interpreted the same way:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

If you use both a robots.txt file and robots meta tags
If the robots.txt and meta tag instructions for a page conflict, Googlebot follows the most restrictive. More specifically:
  • If you block a page with robots.txt, Googlebot will never crawl the page and will never read any meta tags on the page.
  • If you allow a page with robots.txt but block it from being indexed using a meta tag, Googlebot will access the page, read the meta tag, and subsequently not index it.
Valid meta robots content values
Googlebot interprets the following robots meta tag values:
  • NOINDEX - prevents the page from being included in the index.
  • NOFOLLOW - prevents Googlebot from following any links on the page. (Note that this is different from the link-level NOFOLLOW attribute, which prevents Googlebot from following an individual link.)
  • NOARCHIVE - prevents a cached copy of this page from being available in the search results.
  • NOSNIPPET - prevents a description from appearing below the page in the search results, as well as prevents caching of the page.
  • NOODP - blocks the Open Directory Project description of the page from being used in the description that appears below the page in the search results.
  • NONE - equivalent to "NOINDEX, NOFOLLOW".
A word about content value "NONE"
As defined by robotstxt.org, the following direction means NOINDEX, NOFOLLOW.

<META NAME="ROBOTS" CONTENT="NONE">

However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.

Update: For more information, please see our robots meta tag documentation.

Friday, March 2, 2007

Using the site: command

The site: command enables you to search through a particular site. For instance, a searcher could look for references to [Buffy] in this blog by doing the following search:

site:googlewebmastercentral.blogspot.com buffy

Webmasters sometimes use this command to see a list of indexed pages for a site, like this:

site:www.google.com

Note that with this command, there's no space between the colon and the URL. A search for www.site.com returns URLs that begin with www and a search for site.com returns URLs for all subdomains. (So, site:google.com returns URLs such as www.google.com, checkout.google.com, and finance.google.com). You can do this search from Google or you can go to your webmaster tools account and use the link under Statistics > Index stats. Note that whether this link includes the www depends on how you have added the site to your account.

Historically, Google has avoided showing pages that appear to be duplicate (e.g., pages with the same title and description) in search results. Our goal is to provide useful results to the searcher. However, with a site: command, searchers are likely looking for a full list of results from that site, so we are making a change to do that. In some cases, a site: search doesn't show a full list of results even when the pages are different, and we are resolving that issue as well. Note that this is a display issue only and doesn't in any way affect search rankings. If you see this behavior, simply click the "repeat the search with omitted results included" link to see the full list. The pages that initially don't display continue to show up for regular queries. The display issue affects only a site: search with no associated query. In addition, this display issue is unrelated to supplemental results. Any pages in supplemental results display "Supplemental Result" beside the URL.

Because this change to show all results for site: queries doesn't affect search rankings at all, it will probably happen in the normal course of events as we merge this change into the next time that we push a new executable for handling the site: command. As a result, it may be several weeks or so before you start to see this change, but we'll keep monitoring it to make sure the change goes out.