cloud control

The secretive business of fighting content piracy

Private automated systems are sweeping the internet for illegal content--but they sometimes catch legitimate media coverage, too
June 19, 2014

Nate Glass fell into the takedown business almost by accident. Back in 2008, as a sales and marketing guy in the “adult industry,” he was driving in an RV, showing stores around the country how to market the products his employer was producing. It was, he says, the worst three years of his life; the charms of living out of an RV can wear off quickly. And he kept hearing the same thing from the stores: people just weren’t buying as much product as they used to. Everyone was getting their porn for free, online.

With little to entertain him in the evening, Glass decided to do some research. “I read the whole DMCA act,” he says. The Digital Millennium Copyright Act, passed in 1998, was meant to address exactly the problem the adult industry was facing: illegal use of copyrighted content on the internet. The act created a mechanism for copyright holders to send notice to the sites infringing on their rights and demand that the content be taken down. “It seemed like there was a pretty straightforward process,” says Glass.

He convinced his studio let him try fighting back against pirated videos. When it worked, other studios started clamoring for his services, and soon he opened a business, Takedown Piracy, entirely dedicated to providing them. Originally, he would just plug the name of a video into a search engine and send the notices by hand. Now, he says, “You have to automate a big chunk of this. There’s no way you could keep up with how fast things are pirated if you had to look at every single thing.” His company sends notices targeting “at least 100,000 infringements every day.”

Since Glass started his company, the business of policing piracy has exploded. Big content companies increasingly outsource that job to independent companies–enforcement vendors–which depend on a suite of proprietary techniques to seek out and flag pirated content. These are companies few people have ever heard of: Degban, MarkMonitor Anti Piracy, Remove Your Media, DMCA Force, and Digimarc are just a few.

These companies play a key role the churn of the bootleg internet–pirate groups have jiggered up systems that swiftly disperse copyrighted content across the Web, proprietary algorithms seek out those infringing uses and automatically generate DMCA takedown notices, and ISPs and search engines like Google automatically process them.
Much of the time, this system catches pirated material and tries to limit its spread. But it’s also generated takedown notices for sites that aren’t doing anything wrong–including work from newspapers and other media engaged in legitimate criticism and reporting on copyrighted work.

This wasn’t how the DMCA takedown system was supposed to work. “The presumption was that there would be an element of human judgement involved in evaluating if something was infringing. There’s a lot of gray area,” says Joe Karaganis, vice president of Columbia University’s American Assembly, a collaborator on the Takedown Project, which is attempting to better understand this system as it works in practice. “No one knows what the automated system does to that space of judgment and what the practical impact is on freedom of speech, and freedom of expression.”

Sign up for CJR's daily email

And, really, outside of pirate groups, enforcement vendors, and ISPs, no one really knows exactly how the automated system works. “What this looks like day to day is still shrouded in mystery,” Karaganis says.

To an extent, what’s happening is clear: These companies write algorithms that search out certain content–on sites that are extremely unlikely to be using this content in any way that qualifies as fair use–and generate takedown notices, which are sent to ISPs. Notices flagging sites that are more likely to be using content legitimately might be reviewed by a human being before being sent on. Some companies create digital fingerprints for content–code that they can search for across the Web and easily flag.

And this automation has increased the volume of takedown notices exponentially.

Back in 2006, nearly eight years after the Digital Millennium Copyright Act passed, legal scholars Jennifer Urban and Laura Quilter thought that “a review of the law seems in order.” Takedown notices aren’t public records, but some recipients–most notably Google–had made a practice of releasing them. Urban and Quilter decided to look at what notices they could and get a sense of how the DMCA was being used. Their project included “all notices submitted to Google Inc.” between March 2002 and August 2005. The total: 734.

In 2012 alone, by contrast, Google received more than 441,000 takedown notices. And a single notice might list dozens of copyright claims and point out hundreds of offending URLs. In the past year, Google says, it has received requests from 4,622 copyright owners to remove 24,440,925 URLs, in 44,078 specified domains.

The number of notices has increased most exponentially in the past few years–after both senders and receivers of notices began automating the process. A study released this spring found that “the year on year increase in the number of Google’s notices is 304% (for 2010), 305% (for 2011) and 524% (for 2012).” Twitter, which also makes takedown notices public, saw an increase of “1,248% (for 2011) and 61% (for 2012),” according to the study. In 2012, the social media company fielded more than 6,600 takedown requests. In 2013, it received almost twice that — a total of more than 12,400.

To some extent, this system does discourage piracy. Data from one enforcement vendor, for instance, showed that takedown efforts increase sales of ebooks. But the automation has also meant that perfectly legal content has been flagged for takedown. This can be amusing when copyright owners flag their own content. But it’s less funny when legitimate work gets caught in automated sweeps. Techdirt’s Mike Masnick flags the example of Warner Bros.’s Wrath of the Titans: Takedown notices went out for the movie’s IMDB page–and also for articles from BBC America and the Charleston Post & Courier.

It’s hard to say how often these sorts of mistakes happen or what sort of impact they’re having on people who are trying to use copyrighted content legitimately online, because there’s little transparency from anyone involved in this system–not from ISPs and search engines, not from content creators and enforcement vendors, and certainly not from content pirates. That’s part of what the Takedown Project–a collaboration led by Berkeley Law School, where Urban now works, and the American Assembly–is meant to address. The project’s researchers are trying to look comprehensively at “the impact of automat[ing] both sending and receiving process of notice and takedown” and to survey online services providers about their half of this system.

On the receiving end, few ISPs have the same incentive as Google to release takedown notices. Some smaller ISPs receive few enough that they continue to process them by hand. Since relatively few companies make these notices public, it’s hard to say how much the increasing volume of notices is a problem for Google alone. It may be that Google receives as many as half of all the notices sent–one vendor told CJR that maybe two-thirds of their notices are Google bound.

Big content companies and the organizations that represent them are quick to point out that there’s a limit to Google’s transparency, too. On other counts–how, exactly, it processes takedown requests, what effect they might have on Google’s search algorithms–the company has been much quieter.

If content creators want transparency from Google, though, Google wants the same from enforcement vendors.

“We need more transparency from…the enforcement vendors community,” Fred von Lohmann, Google’s legal director for copyright said at a government-run conference this spring. “We need to understand their cost structure, their business models and the technical procedures they have in place for generating notices and ensuring accuracy.”

Google does run a “Trusted Copyright Removal Program” which, essentially, speeds up the takedown process for the companies that have made this their business. There’s little information publicly available about this program, but 95 percent of the takedown notices Google receives come from these “sophisticated submitters,” Lohmann said at the conference.

One difficulty for enforcement vendors is dealing with sites like Blogger or WordPress, where, unlike say, moviesdownload24.com or moviesfofree.org, the URL alone doesn’t indicate that any content or links to content are very, very likely to be illegitimate.

“Blogspot is very problematic in terms of catching what’s legitimate and what’s not legitimate,” says Eric Green, who runs Remove Your Content, says. “It’s like going in a war zone to do your business.” (His work hasn’t exactly gone over well with Blogspot users, either.)

How enforcement vendors deal with these greyer areas, though, is itself obscure. Beyond the basics, it’s hard to say how vendors actually do business. They won’t talk much about it, both because the methods they use are proprietary (and sometimes patented) and because, they say, they don’t want to tip off content pirates.

“I don’t want to give out specifics about anything out there that’s proprietary,” says Green. These methodologies, he said, “come from our programmers. Our programmers cost money.”

“We have a different method for each type of piracy”–streaming, file-hosting services, torrents, says Glass, the Takedown Piracy founder. Some of what the company does is “almost like investigatory work,” he says: “Part of what I’m doing is lurking among pirate communities and watching to see what they’re saying and fine-tuning our system to find the stuff that they’re uploading.”

But beyond that, he says, “I can’t show all my cards.”

Sarah Laskow is a writer and editor in New York City. Her work has appeared in print and online in Grist, Good, The American Prospect, Salon, The New Republic, and other publications.