Wednesday, February 25, 2009

Canonical Link Element: presentation from SMX West

A little while ago, Google and other search engines announced support for a canonical link element that can help site owners with duplicate content issues. I recreated my presentation from SMX West and you can watch it below:

You can access the slides directly or follow along here:

By the way, Ask just announced that they will support the canonical link element. Read all about it in the blog entry.

Thanks again to Wysz for turning this into a great video.

In fact, you might not have seen it, but we recently created a webmaster videos channel on YouTube. If you're interested, you can watch the new webmaster channel. If you subscribe to that channel, you'll always find out about new webmaster-related videos from Google.

Monday, February 23, 2009

Introducing the Google Webmaster Central YouTube Channel

In his State of the Index presentation, Matt Cutts said that one of the things to look for from Google in 2009 is continued communication with webmasters. On the Webmaster Central team, we've found that using video is a great way to reach people. We've shown step-by-step instructions on how to use features of Webmaster Tools, shared our presentations with folks who were unable to make it to conferences, and even taken you through a day in the life of our very own Maile Ohye as she meets with many Googlers involved in webmaster support.

We plan on releasing more videos like these in the future, so we've opened up our own channel on YouTube to host webmaster-related videos. Our first video is already up, and we'll have more to share with you soon. If you want to be the first to know when we release something new, you can subscribe to us using your YouTube account, or grab this RSS feed if you'd like to keep track in your feed reader. Please let us know how you like the channel, and use the comments in this post to share your ideas for future videos.

And while we'll all do our best to make sure Matt Cutts understands that Rick Rolling is so last year, be careful where you click on April 1st.

Friday, February 20, 2009

Best practices against hacking

These days, the majority of websites are built around applications to provide good services to their users. In particular, are widely used to create, edit and administrate content. Due to the interactive nature of these systems, where the input of users is fundamental, it's important to think about security in order to avoid exploits by malicious third parties and to ensure the best user experience.

Some types of hacking attempts and how to prevent them

There are many different types of attacks hackers can conduct in order to take partial or total control of a website. In general, the most common and dangerous ones are SQL injection and cross-site scripting (XSS).

SQL injection is a technique to inject a piece of malicious code in a web application, exploiting a security vulnerability at the database level to change its behavior. It is a really powerful technique, considering that it can manipulate URLs (query string) or any form (search, login, email registration) to inject malicious code. You can find some examples of SQL injection at the Web Application Security Consortium.

There are definitely some precautions that can be taken to avoid this kind of attack. For example, it's a good practice to add a layer between a form on the front end and the database in the back end. In PHP, the PDO extension is often used to work with parameters (sometimes called placeholders or bind variables) instead of embedding user input in the statement. Another really easy technique is character escaping, where all the dangerous characters that can have a direct effect on the database structure are escaped. For instance, every occurrence of a single quote ['] in a parameter must be replaced by two single quotes [''] to form a valid SQL string literal. These are only two of the most common actions you can take to improve the security of a site and avoid SQL injections. Online you can find many other specific resources that can fit your needs (programming languages, specific web applications ...).

The other technique that we're going to introduce here is cross-site scripting (XSS). XSS is a technique used to inject malicious code in a webpage, exploiting security vulnerabilities of web applications. This kind of attack is possible where the web application is processing data obtained through user input and without any further check or validation before returning it to the final user. You can find some examples of cross-site scripting at the Web Application Security Consortium.

There are many ways of securing a web application against this technique. Some easy actions that can be taken include:
  • Stripping the input that can be inserted in a form (for example, see the strip tags function in PHP);
  • Using data encoding to avoid direct injection of potentially malicious characters (for example, see the htmlspecialchars function in PHP);
  • Creating a layer between data input and the back end to avoid direct injection of code in the application.
Some resources about CMSs security

SQL injection and cross-site scripting are only two of the many techniques used by hackers to attack and exploit innocent sites. As a general security guideline, it's important to always stay updated on security issues and, in particular when using third party software, to make sure you've installed the latest available version. Many web applications are built around big communities, offering constant support and updates.
To give a few examples, four of the biggest communities of Open Source content management systems—Joomla, WordPress, PHP-Nuke, and Drupal—offer useful guidelines on security on their websites and host big community-driven forums where users can escalate issues and ask for support. For instance, in the Hardening WordPress section of its website, WordPress offers comprehensive documentation on how to strengthen the security of its CMS. Joomla offers many resources regarding security, in particular a Security Checklist with a comprehensive list of actions webmasters should take to improve the security of a website based on Joomla. On Drupal's site, you can access information about security issues by going to their Security section. You can also subscribe to their security mailing list to be constantly updated on ongoing issues. PHP-Nuke offers some documentation about Security in chapter 23 of their How to section, dedicated to the system management of this CMS platform. They also have a section called Hacked - Now what? that offers guidelines to solve issues related to hacking.

Some ways to identify the hacking of your site

As mentioned above, there are many different types of attacks hackers can perform on a site, and there are different methods of exploiting an innocent site. When hackers are able to take complete control of a site, they can deface it (changing the homepage), erase all the content (dropping the tables of your database), or insert malware or cookie stealers. They can also exploit a site for spamming, such as by hiding links pointing to spammy resources or creating pages that redirect to malware sites. When these changes in your application are evident (like defacing), you can easily spot the hacking activity; but for other types of exploits, in particular those with spammy intent, it won't be so obvious. Google, through some of its products, offers webmasters some ways of spotting if a site has been hacked or modified by a third party without permission. For example, by using Google Search you can spot typical keywords added by hackers to your website and identify the pages that have been compromised. Just open and run a site: search query on your website, looking for commercial keywords that hackers commonly use for spammy purposes (such as viagra, porn, mp3, gambling, etc.):

[ viagra]

If you're not already familiar with the site: search operator, it's a way to query Google by restricting your search to a specific site. For example, the search will only return results from the Official Google Blog. When adding spammy keywords to this type of query, Google will return all the indexed pages of your website that contain those spammy keywords and that are, with high probability, hacked. To check these suspicious pages, just open the cached version proposed by Google and you will be able to spot the hacked behavior, if any. You could then clean up your compromised pages and also check for any anomalies in the configuration files of your server (for example on Apache web servers: .htaccess and httpd.conf).
If your site doesn't show up in Google's search results anymore, it could mean that Google has already spotted bad practices on your site as a result of the hacking and may have temporarily removed it from our index, due to infringement of our webmaster quality guidelines.

In order to constantly keep an eye on the presence of suspicious keywords on your website, you could also use Google Alerts to monitor queries like: viagra OR casino OR porn OR ringtones

You will receive an email alert whenever these keywords are found in the content of your site.

You can also use Google's Webmaster Tools to spot any hacking activity on your site. Webmaster Tools provide statistics about top search queries for your site. This data will help you to monitor if your site is ranking for suspicious unrelated spammy keywords. The 'What Googlebot sees' data is also useful, since you'll see whether Google is detecting any unusual keywords on your site, regardless of whether you're ranking for them or not.

If you have a Webmaster Tools account and Google believes that your site has been hacked, often you will be notified according to the type of exploit on your site:
  • If a malicious third party is using your site for spammy behaviors (such as hiding links or creating spammy pages) and it has been detected by our crawler, often you will be notified in the Message Center with detailed information (a sample of hacked URLs or anchor text of the hidden links);
  • If your site is exploited to place malicious software such as malware, you will see a malware warning on the 'Overview' page of your Webmaster Tools account.
Hacked behavior removed, now what?

Your site has been hacked or is serving malware? First, clean up the malware mess and then do one of the following:
  • If your site was hacked for spammy purpose, please visit our reconsideration request page through Webmaster Tools to request reconsideration of your site;
  • If your site was serving malware to users, please submit a malware review request on the 'Overview' page of Webmaster Tools.
We hope that you'll find these tips helpful. If you'd like to share your own advice or experience, we encourage you to leave a comment to this blog post. Thanks!

Wednesday, February 18, 2009

State of the Index: my presentation from PubCon Vegas

It seems like people enjoyed when I recreated my Virtual Blight talk from the Web 2.0 Summit late last year, so we decided to post another video. This video recreates the "State of the Index" talk that I did at PubCon in Las Vegas late last year as well.

Here's the video of the presentation:

and if you'd like to follow along, here are the slides:

You can also access the presentation directly. Thanks again to Wysz for recording this video and splicing the slides into the video.

Thursday, February 12, 2009

Specify your canonical

Carpe diem on any duplicate content worries: we now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that's accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.

Let's take our old example of a site selling Swedish fish. Imagine that your preferred version of the URL and its content looks like this:

However, users (and Googlebot) can access Swedish fish through multiple (not as simple) URLs. Even if the key information on these URLs is the same as your preferred version, they may show slight content variations due to things like sort parameters or category navigation:

Or they have completely identical content, but with different URLs due to things such as a tracking parameters or a session ID:

Now, you can simply add this <link> tag to specify your preferred version:

<link rel="canonical" href="" />

inside the <head> section of the duplicate content URLs:

and Google will understand that the duplicates all refer to the canonical URL: Additional URL properties, like PageRank and related signals, are transferred as well.

This standard can be adopted by any search engine when crawling and indexing your site.

Of course you may have more questions. Joachim Kupke, an engineer from our Indexing Team, is here to provide us with the answers:

Is rel="canonical" a hint or a directive?
It's a hint that we honor strongly. We'll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.

Can I use a relative path to specify the canonical, such as <link rel="canonical" href="product.php?item=swedish-fish" />?
Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL.

Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.

What if the rel="canonical" returns a 404?
We'll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.

What if the rel="canonical" hasn't yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we'll immediately reconsider the rel="canonical" hint.

Can rel="canonical" be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.

What if I have contradictory rel="canonical" designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.

Can this link tag be used to suggest a canonical URL on a completely different domain?
**Update on 12/17/2009: The answer is yes! We now support a cross-domain rel="canonical" link element.**

Previous answer below:
No. To migrate to a completely different domain, permanent (301) redirects are more appropriate. Google currently will take canonicalization suggestions into account across subdomains (or within a domain), but not across domains. So site owners can suggest vs. vs., but not vs.

Sounds great—can I see a live example?
Yes, helped us as a trusted tester. For example, you'll notice that the source code on the URL specifies its rel="canonical" as:

The two URLs are nearly identical to each other, except that Nelvana_Limited, the first URL, contains a brief message near its heading. It's a good example of using this feature. With rel="canonical", properties of the two URLs are consolidated in our index and search results display's intended version.

Feel free to ask additional questions in our comments below. And if you're unable to implement a canonical designation link, no worries; we'll still do our best to select a preferred version of your duplicate content URLs, and transfer linking properties, just as we did before.

Update: this link-tag is currently also supported by, Microsoft Live Search and Yahoo!.

Update: for more information, please see our Help Center articles on canonicalization and rel=canonical.

Wednesday, February 11, 2009

Help us help you

You're a webmaster, right? Well, we love webmasters! To ensure we give you the best support possible, we've set up a survey to get your thoughts on Webmaster Central and our related support efforts. If you have a few extra minutes this week, please click here to give us your honest feedback.

Thanks from all of us on the Webmaster Central Team.

Google Friend Connect introduces the social bar

Update: The described product or service is no longer available.

In our previous Google Friend Connect posts, we've enjoyed connecting with you, the webmasters, and hearing your feedback about Friend Connect. We're now standing on our own two feet -- find us over at the new Social Web Blog where we just announced the new social bar feature.

The social bar packages many of the basic social functions -- sign-in, site activities, comments, and members -- into a single strip that appears at the top or bottom of your website. You can use it alone, or use it to complement your existing social gadgets, by putting it on the top or bottom of as many of your webpages as you want.

For anyone visiting your site, the social bar offers a snapshot of the activity taking place within your website's community. One click on any these features produces a convenient, interactive drop-down gadget, so users get all the functionality of the Friend Connect gadgets, while you save real estate on your website. With the social bar, visitors can:
  • Join or sign in to your site, view and edit their profiles, and change their personal settings.
  • View recent activity on your website, including new members and posts on any of your pages.
  • Post on your wall or read and reply to others' comments.
  • See the other members of your site, check out other peoples' profiles, and become friends. Users can also find out if any of their existing friends are members of your site.
Watch this quick video to learn how easy it is to add a social bar to your website:

To try out the social bar before deciding whether to add it to your website, visit: