The Blackhats have finally caught on (publically at least) to the fact that Google is not doing a very good job with scraping sites, much less authority sites that copy and paste content from your own site. Interestingly enough, Quadzilla has noted that Yahoo is doing a relatively good job on the other hand with its search results:

In the meantime, while Google (and in fact all the Search engines) continues to serve up scraper after scraper - Scrape away! It’s the blackhat technique that just won’t die and continues to drive traffic and make money.

What’s even worse about the scraping aspect is that this is nearly considered to be an acceptable thing to do since according to Google, they cannot determine whether or not you permitted those sites (say the Associated Press) to re-post your content.

It’s the same concept when you see Aaron Wall posting his own same content around the web nearly verbatim and having them rank quite well. This tolerance for duplicate content is rather ridiculous from a user perspective, but from Google’s perspective it lessens their burden from trying to determine what is “truth” and what is a good search result.

But the question remains, what happens when the two need to be the one and the same?

Update (11/13/2008):

SEO ROI has found a site in particular already taking advantage of this nasty aspect on Aimclear:

Trademark Productions steal other people’s content, edit it for the sake of passing through search engine duplicate content filters, and try to pass themselves off as experts you should trust? They’re stealing from Aimclear, Clickz and others.

Here’s the fun kicker from the Sphinn post comment by sockmoney to highlight the problem still going on today:

I battled a site last year that had copied over 10,000 pages of content from my site.  It was brought to my attention by a competitor who was also being scraped.

I wrote a program that would crawl their site, pull the dup URL, and report on it against the matching page on my site.

Now keep in mind, my site is 10 years old, the scraper site was less than one year old.

I filed my report as a DMCA violation with Google (my crawler was only able to match 6,000+ pages copied, so that is what I sent to Google in the format they required).

Google contacted the site owners, they countered and said they have broken no copyright laws.  Google said we cannot do anything else, sorry.  In my mind, Google should be able to see my content was there first, and they do in most cases, so why can’t they assign a scraper penalty to the site(s)?  Instead, they choose in this case to let their site continue to grow “acting” like a legit site with legit content, but all along they were simply existing off our content.

My only option was to hire an attorney.  I was advised that copyright law is a Federal law, and that it would require going to Federal court, which would cost me a minimum of 30-50k in legal fees.