Book ‘em

Piracy.lab is gathering data on digital book sharing

In anticipation of Congress’ next big fight over copyright, legal academics are working to gather data and learn how copyright actually works in the real world. But lawyers aren’t the only academics who have been using empirical techniques to gather information on the people who work within—and outside of—copyright law in its current form. Scholars of business, anthropology, and literature have also been collecting data on intellectual property, creativity and innovation.

One new project that’s looking at how creative work moves around the real world is Columbia University’s piracy.lab, which its principal investigator, Professor Dennis Tenen, describes as “a scrappy little outfit…actively seeking funds for empirical research on IP.” It aims to look at “illicit knowledge, information ‘leaks,’ and underground archives.” In other words, content piracy.

The lab’s first project, begun last semester, focuses on a little-studied area of modern copyright violation. While plenty has been written on illegally shared movies and music, there’s been less attention to an older form of information technology—books.

Tenen’s interest in what he calls “underground libraries” grew out of his experience studying and working in comparative literature departments. “I started hearing the story more and more from people from India or China or Russia or even Europe, and they say that I am here, in this program, because I had access to this particular book-sharing site.” These are places that offer PDFs of expensive books for free. Tenen heard his colleagues say, again and again, that without these resources, they wouldn’t have had access to the academic texts they needed to do their research. These books, he says, are “usually in the English language; they’re very difficult to get; they’re even more expensive overseas, and sometimes they’re just not available.”

On a practical level, pirating books is a different business than pirating music or movies. Hosting and downloading pirated books is easy: The files are small, and book pirates sometimes bundle them into large lots—2,500 books with a single click. But creating pirated books can be more difficult. As e-readers have pumped up digital publishing, it’s become possible for pirates to strip book files of protective measures and shop them around. (Some publishers have found that they endure about the same levels of piracy whether they protect their e-book files or make them easily sharable.)

But for many books, sharing a copy online still means scanning in the pages of a physical book to create a digital copy. It can be a labor of love. Or it can be a statement of principle.

“There’s very little research, empirical or otherwise, on these book-sharing communities,” says Tenen. A good bit of what is out there is about book piracy’s long legacy: in its early days, the entire American publishing industry, for instance, was largely based on republishing works created and copyrighted on the other side of the Atlantic Ocean.

To start creating their first data set, Tenen and his colleagues, a group that includes other professors and a handful of grad students, have been looking at a site called Library Genesis. It’s based in Russia, where there’s a vigorous market for pirated books, and it describes itself as “a scientific community targeting collection of books on natural science disciplines and engineering.” It’s said to have more than 800,000 works—some of them public domain, but many of them not—available for download.

So far, piracy.lab has downloaded more than a million posts from the Library Genesis forums, with the aim of better understanding the social dynamics of piracy. They’re asking questions like: Are there 10 people pirating everything? Or is it 100? A thousand? A million? Is there a core group that’s driving everything? What’s driving them? Are they ideological? Do they have a political agenda about freedom of information? Or are they acting more like collectors?

The lab is just starting to tease out answers to these questions: It’s clear that there is a core group of people populating Library Genesis with content (who, Tenen says, “never seem to sleep”). But it’s less clear why they’re doing it.

Understanding these book pirating communities isn’t just an academic exercise—answering these question can help inform the copyright debates in this country, particularly about how should academic work should be made available to the public. It’s an issue that won’t go away. Many of Aaron Swartz’s supporters speculated that he was downloading JSTOR articles in bulk on the principle that academic work should not be hidden behind an expensive paywall. The White House, earlier this year, said that any taxpayer-funded research should be distributed freely a year after its first publication. One aim of piracy.lab is to give publishers and university libraries information about what there’s demand for, how these book pirates have succeed at fulfilling that demand, and how more licit book-providers might learn from them.

“Basically, they’re librarians,” says Tenen. “They’re doing the same thing a library does. They’re saying, ‘Okay, we have a new digital library. How do we archive things?’ They created a distribution system…Already that’s really interesting, and I think a library could learn from it.”

Disclosure: CJR has received funding from the Motion Picture Association of America (MPAA) to cover intellectual-property issues, but the organization has no influence on the content.

Has America ever needed a media watchdog more than now? Help us by joining CJR today.

Sarah Laskow is a writer and editor in New York City. Her work has appeared in print and online in Grist, Good, The American Prospect, Salon, The New Republic, and other publications. Tags: