Independent and intellectual thoughts ranging from China, SEO, and other international topics
28 Nov
Search Engine Optimization, though around for about a decade or so now since its inception, is still a relatively new field of work. New enough that I would still qualify SEO as a soft science as there are no publicly detailed statistical studies using regression analysis and other econometrics work on what factors really help sites rank on the top of search engines (This is not to say that there are not any statistical analyses out there, just none using proper regression analyses outside of Excel).
Try searching the following phrases in Google:
the lack of any papers on these topics? I’m not surprised as I doubt even 5% of the people I chat with randomly at parties know what SEO or a Search Strategist is—and I live in the heart of Silicon Valley with a large number of computer nerd friends.
Yes, there are a lot of assumptions and well-known concepts about what will obviously work (eg: links, title tags, etc.) but most of this does not account for various other factors that may or may not actually affect how you rank (eg: age of domain, traffic, etc.) and even more acutely, to what degree.
Additionally, I am not referring to various white papers or case studies about what was done and later saw some changes—what I am fascinated by is what percent to various SEO strategies affects how sites rank, whether it’s overall or niche markets.
Of course one needs a darn good business reason to spend time and resources to actually develop these analyses that unfortunately would likely only matter for a month or two at most before the next algorithm update changes what matters and by how much, so until there are a lot of academic classes teaching SEO, it is highly unlikely to see true econometric regression analysis (and I mean panel data regressions, not time-series or cross-sectional as you would be better just doing correlations instead since non-panel data regression analyses are about as accurate).
However, it is entirely possible that the SEO for Firefox and SeoQuake plugin softwares could go down this route if they can afford some high-level analysts (and for their free products, I would be surprised if they could). Still, SeoQuake did put out an article on “How search query niche determines the behavior of Google SERP” and am applauding the effort at using their plugin to analyze their database of information. That said I personally believe they need to do a better job of that as their analysis is quite flawed.
Unfortunately, their core definition of defining a “white hat” site as a site that stayed in the top 20 Google search result pages (SERPs) for 18 days within a time period analysis of one month (July 12 to August 19) is very short-sighted. Essentially, they are taking a snapshot of one month, where a number of factors could cause a lot of noise to dirty up the data—I’ve seen horrible situations from clients inadvertently telling Googlebot to de-index their pages in robots.txt file to new situations with a brand new client hit the top of results (temporarily) as a brand new business. To account for these factors, the time span has to be far longer than one month in order to properly weed “blackhat” sites from new businesses or “white hat” sites making a mistake that causes them to drop on a few keywords until it is noticed and fixed. Even with creating two separate databases to deal with a site’s position fluctuation (which is a great way to try to deal with that problem) one month is not enough to deal with “white hat” fluctuations and true “black hat” sites.
My only other quibble (comparatively to a huge assumption flaw) is that it would be far more useful to compare more bad-neighborhood keywords (adult, dating, etc) versus what could be more of a pure-neighborhood keywords (schools, government, etc) that would truly prove their point if totally opposite ends show a drastic difference, although even this would require a lot more keyword neighborhoods than just six neighborhoods that could essentially have been the luck of the draw in the choices.
That all said and done, I do hope they continue to try more analyses (correlations please at least!) and improve upon it in order to actually hit upon the opportunities of research as they mention:
- Detecting techniques which help actively attract SE traffic;
- Comparing keyword quality and volume for different sites;
- Analyzing a niche considering its SERP competition and chances for high rankings;
- Inventing up-to-date optimization techniques for other niches;
- Finding niches where “white hat” or spamdexing sites prevail;
- Spotting “privileged” sites;
- and many more.
3 Responses for "Econometrics and SEO – The Statistical Unknown Factors of Search Engine Optimization Regression Analysis & SeoQuake Analysis"
Great site, this is the first time i am coming across with a site,that talks about the statistical side of SEO, from a statistical background and now conducting research in online marketing..i find this site quiet useful and a good reference point.
Thanks, and I do have to admit it is quite an odd mixture as it just commonly seems in SEO that few people really want to do some kind of statistical research but rather just quasi-guesses and assumptions on what works.
Good day,
Thank you very much for your interest in our service SEODigger.
Now we are preparing to launch a new service for specialists in SEO and SEM area and we’d like to invite you to participate in beta testing.
Your questions and proposals are really important for us, so if you have any, don’t hesitate to adress us.
If you are interested in being involved in beta testing, just inform us.
Our email is: mail(at)semrush.com
Best regards,
SEODigger, SEOQuake and SEMRush Team
Leave a reply