External Source

On top of our own content, Avenue Web Media brings you the best articles online from other sources we endorse.The following is an article from an external source and as such it wasn't written by a member of our team. If you like the article visit the source for more quality content.

Unwritten Google Webmaster Guideline: Don't End URLs in .0

Posted by Jane Copland

Many of you saw this post from seoco.co.uk this morning (or its Sphinn thread) about our Web 2.0 Awards being removed from Google's index. We noticed the same thing late last night and spent some time this morning going through what could have happened. We were relatively sure that we must have inadvertently linked to a bad neighborhood; the page is very link-heavy and includes some lesser-known sites as well as big guns like Google Blog Search and Last.fm.

For those of you who haven't heard about it, this is what we saw this morning:

That pointless little bar at the bottom of the screen that we constantly tell people not to worry about had gone from full of green (7/10) to sadly gray overnight. The story at Google was even worse.

The page you see ranking first is the full list of winners and honorable mentions, but it is the "shortened" version. The main page, URL http://www.seomoz.org/web.2.0 was gone.

To make a long story short, this morning, Rand got in touch with Google and was advised that changing the URL so it doesn't end in ".0" would be a wise decision. Google would prefer not to make an official or public comment, but they did give us permission to share this tidbit. Naturally, we investigated deeper, and found that it's not just inadvisable, but literally impossible to get a URL indexed in Google's engine if it ends with a .0 (similar to how Google won't index file extensions ending in .exe or .tgz).

Whilst there is plenty of evidence that URLs ending in .0 often belong to spam pages (wild guess here, but let's say there are 800,000 or so URLs on the web ending in a ".0" and maybe, oh... I don't know, 0.5% of them are worth indexing), I'm not sure that this is a good metric by which to determine an immediate penalty. Some other decent pages that have been hit in a similar way include http://en.wikipedia.org/wiki/Windows_1.0, which enjoys a healthy number of backlinks but which won't appear in Google. This page, URL http://en.wikipedia.org/wiki/Web_2.0, appears in Google's index as http://en.wikipedia.org/wiki/Web_2. None of the URLs which redirect to include the slash are flagged.

Becoming more fascinated by this, we did some investigating. What we discovered was that this penalty is indeed limited to the number zero. URLs ending in .n where "n" is any other number are not removed. If Google finds a version of the page that resolves with the slash, you'll avoid the penalty. In one instance, a page that resolved with underscores in place of the stop was indexed.

Below is an assortment of URLs which are indexed in Yahoo! (and many also in Live), but which show no PageRank and do not appear in Google's index. Below those, I've listed very similar pages that are indexed, but which do not end in .0.

Out of Google's Index (but in Yahoo!):

In the index:

This page has PageRank (it shows a PR 3), but didn't show up in a Google search: http://www.fileplanet.com/62709/60000/fileinfo/WinZip-9.0-
http://www.fileplanet.com/62709/60000/fileinfo/WinZip-9.0 is not indexed and has no PageRank. Call this duplicate content if you will, but it still shows the same trend in action.

You'll notice some interesting things, such as the fact that en.opensuse.org/Bugs:Most_Annoying_Bugs_10.3 is indexed but en.opensuse.org/Bugs:Most_Annoying_Bugs_10.0 is not.

Quite simply, making sure a page resolves with a slash will avoid this problem. I'm of the opinion that this is a pretty silly thing to penalise for without some sort of human review, but it's important that we pick up on things like this so that we can avoid such "false positive" penalties. Make sure to add "check for URLs ending in .0" to your next checklist for site reviews and please, do share if you've found any other filename extensions that exhibit similar behaviour from any of the engines in the comments.

UPDATE: en.wikipedia.org/wiki/SAML_1.1 also seems to be suffering from a penalty and it will be useful to go through some more URLs that end in .n to gauge whether or not they're penalised. Most of the examples we saw that didn't involve a zero had not been hit in any way. I'd love to know how extensive this filter really is.

Do you like this post? Yes No

Similar entries

  • Google have just updated their guidelines in regards to rewriting URLs.

    Previously, the guideline stated:

    "Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index"

    Google have now removed this guideline, saying they can now index URLs that contain that parameter. Google have also posted a blog entry explaining the difference between dynamic URL's and static URL's, and encourage you to let Google handle the problem.

  • Posted by Jane Copland

  • Posted by willcritchlow

    It's early on a rainy Tuesday morning here in London as I settle down to write the post I should have written yesterday...

  • Posted by randfish

    Every SEO has their strong points and their weak ones. For me, subjects like content creation and keyword research have always come naturally, but others like methodically using analytics data to improve and running manual link building campaigns have always been a struggle. Today, I'd thought it would be interesting to get a bit self-critical and talk about those items on the SEO-to-do-list that cause us the greatest struggles.

    Perhaps if we indulge our catharsis, we can grow stronger.

  • The following are a few high-profile sites that use the Drupal platform.

    Fast Company, a business/technology magazine with over 200,000 pages.

    AOL used Drupal in a few projects, including their corporate information website.

    Popular Science has been covering science and technology news since 1837. The site has been up since 1999 and was recently re-developed to switch to the Drupal platform. Over 60,000 pages.