the news frontier

The Growing Problem of Search Engine Spam

And what Google says it’s doing about it
January 25, 2011

Last week, Google News’s Krishna Bharat spoke at Columbia University about what makes his search engine so helpful and efficient for journalists. Reporters and editors don’t need to spend time thinking about marketing the news they produce to the whole world’s audience, he argued—they can concentrate on the production side, and Google’s algorithm will take care of delivering readers to them. Nevertheless, most online writers and editors do take that extra, buzzword-y step of “search engine optimization” before they publish. Thoughtful, specific headlines help; so do tags and keywords.

But what about when that very effective system gets taken advantage of? Especially in the past few months, there has been an explosion of sites that exist for the sole purpose of waylaying a curious Googler for just a second, enough to win a few page views and flash a few ads in her face. “Scraper sites” or “mirror sites” that copy and paste whole articles from reputable sites into ad-happy spam sites are one problem. Content farms like Demand Media and Associated Content are another. Some Google users have noticed a difference in search quality lately, and they say that smart spammers are learning to game the system, making it harder for the rest of us to find what we want online.

“Google is being infiltrated on a vast scale by content farms,” wrote ReadWriteWeb. “Google has become a jungle: a tropical paradise for spammers and marketers,” wrote TechCrunch. “Google has become a snake that too readily consumes its own keyword tail,” wrote blogger Paul Kedrosky.

Jeff Atwood of programming help site Stack Overflow writes about an annoying phenomenon wherein readers searching for the site’s content on Google would be directed to “scraper sites” or “mirror sites” that had copied-and-pasted the relevant pages onto an ad-happy spam site. Worse yet, the original content wouldn’t show up on Google searches at all, or they would be so far down the list that people would give up. For a similar story—with pictures!—check out this shoe blog post, in which the author searches Google for a specific blog post, with title and source, but doesn’t find it until scrolling through an entire page of irrelevance.

On Friday, Google’s principal engineer Matt Cutts wrote on the company blog that he and his team are aware of the criticism and are working to respond to the challenge. For instance:

The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.

Sign up for CJR's daily email

Cutts previously told CNET that over 200 factors determine where a website will rank in Google’s search results, and that tweaks and changes to the algorithm happen every day. Google has also launched a Google Chrome extension that allows users to quickly provide feedback about spam they encounter online, similar to the “Report Spam” button within the Gmail interface. (Download the extension, learn how to use it, and chuckle at the many spam comments below the post here.) Search Engine Land reports that Cutts has also hinted at giving users the power to block or “blacklist” entire domain names from their Google searches. Would you block Answers.com or eHow.com from your Google searches if you could?

Joseph Tarkatoff at paidContent makes a great point:

Any move by Google that could give less prominence to results from sites like the Yahoo Contributor Network or Demand Media would be a major blow to those companies’ business models, which in large part depend on being ranked highly on the search engine. The timing of the announcement could not be worse for Demand Media, which is expected to go public next week.

Too true. Any business model that depends on the whims of another company to make money—such as, in this case, a search engine with a proprietary algorithm and the financial and technical resources to do anything necessary to uphold its reputation for efficiency and user-friendliness—is probably not a good idea. Or, rather, it’s an okay idea for a little while, but as soon as Google discovers that something is gumming up the machine, that’s the end of that business model.

Parenthetically, when I started writing this post I was trying to remember the name of that little bird that rides on top of the rhinocerous. It turned out to not be the best metaphor to use, because, as I remembered, the rhino and the tickbird have a symbiotic, rather than parasitic, relationship. A more fitting metaphor for spammers might be a tapeworm, or some kind of viral infection. In any case, I mention it just to point out that when I typed in “bird on rhinocerous,” Google led me to this truly illuminating page on Answers.com:

Lauren Kirchner is a freelance writer covering digital security for CJR. Find her on Twitter at @lkirchner