I ran across a hot topic on Sphinn titled “Google Showing Bias Towards .org TLDs” and thought to take a look at how other sites/people are trying to run tests on what matters to Google. I am always appreciative to see people trying to run tests that go beyond just pure guesses as I’ve stated before about statistical SEO. Below are some of the findings from The Google Cache on what TLDs matter:

Preliminary Results:

The results were quite shocking. The .org subdomains outranked all other extensions. As you can see, the .nets and .coms are intermixed, some not ranking at all, but the .orgs are stacked at the top. While these results must be taken with a grain of salt until they can be verified on a much larger scale, it does indicate that there may be some bias towards the .org top level domain. (many have suspected this) These results have shown true on appx 80% of datacenters we have tested.

[...]

Implications:

  1. Further study is definitely needed. Virante will be expanding the number of test subjects greatly and testing with and without subdomains.
  2. Considering the costs are quite similar, it may make sense to begin using .orgs, like our good friends at SEOMoz

I’ve gone through my share of econometric papers, so I usually quickly turn my eyes to what the set-ups were to test such an experiment. First, let’s look at the methodology:

There are several ranking factors we need to control.

  1. domain age (purchase new domains at same time)
  2. link profile (use Google sitemaps for indexing)
  3. indexing age (randomize ordering of multiple subdomains in sitemaps submissions)
  4. on-site factors (identical text, content)

So, for the preliminary examination, we purchased 3 domains, identically named, with different top level extensions (.org,.com,.net). We then created 3 separate subdomains on each of these domains so that we could create some sort of result duplication and randomize the order of submissions to Google sitemaps. Finally, we created identical content on each site and identical sitemaps.

A good start on having some controls, but misses out on a couple other areas that could influence the results (note that these may not have an impact, just areas that should be controlled for and seen even if they matter):

  • Interaction effects
  • Time lag (beyond a few minutes)
  • Google server that is indexing
  • Random indexing
  • Google bot (link crawler)
  • Other TLDs

Let’s assume for this point that the above effects do not play a part in whether TLDs matter. The largest gap is the actual statistical significance of the test–exactly how many trials were run? The test seems to have run just 9 trials and with 4 controls this leaves only a degree of freedom of 4 (9-4-1). The confidence level is so far below any level of statistical significance that it makes any ability to claim any kind of implication pointless. Still, they did claim that 80% of the data centers did show similar results (although that in of itself does not show a valid confidence level or how many tests they ran).

So, now that I’ve stated that, let’s look at the implications:

  1. Further study is definitely needed. Virante will be expanding the number of test subjects greatly and testing with and without subdomains.
  2. Considering the costs are quite similar, it may make sense to begin using .orgs, like our good friends at SEOMoz

I look forward to expanding the tests at a level that fully captures a confidence level with statistical significance as mainly you have to be able to run large enough tests to make sure you remove all kinds of random error biases. Large tests, though cumbersome, can help to remove some of those random errors.

Another factor missing is just how large of a potential boost this actually gives–you can jump up and down to prove that Matt Cutts is lying when he says TLDs do not matter (which, I’ll admit would be enough for kicks), but for statistical sakes, if the boost is less than the amount of error in the regression, it really provides very little benefit in the end.

Lastly, but most importantly, even when you get results that show one thing, you have to be very careful about how you explain what the data shows. Assuming that the data is correct, then it would make sense to say “begin using .orgs,” but one cannot extrapolate beyond that. The test was not run for a long-run scale to determine whether having a .org over the long-term actually matters–it could be important purely for indexing purposes at the start, but may not matter or may even be worse off as time continues on when other factors play a part in getting sites to rank in Google SERPs.

All hope is never lost as any good experimenter always caveats their work as The Google Cache does:

Possible Causes Aside from Overt Bias and Further Caveats:

  1. Google’s shifting algorithm is built on profiling characteristics - while a .org bias might exist today, it could easily shift tomorrow if spammers start hoarding .orgs
  2. Google gives bias to .org subdomains, but not .org domains
  3. Google likes to group .org subdomains in the search results, but does not care about grouping .coms or .nets
  4. MattCutts is screwing with me

Gotta love #4–although I would have added the fun “-### Penalty” that WMW forums are always spouting.