In anticipation of Congress’ next big fight over copyright, legal academics are working to gather data and learn how copyright actually works in the real world. But lawyers aren’t the only academics who have been using empirical techniques to gather information on the people who work within—and outside of—copyright law in its current form. Scholars of business, anthropology, and literature have also been collecting data on intellectual property, creativity and innovation.
One new project that’s looking at how creative work moves around the real world is Columbia University’s piracy.lab, which its principal investigator, Professor Dennis Tenen, describes as “a scrappy little outfit…actively seeking funds for empirical research on IP.” It aims to look at “illicit knowledge, information ‘leaks,’ and underground archives.” In other words, content piracy.
The lab’s first project, begun last semester, focuses on a little-studied area of modern copyright violation. While plenty has been written on illegally shared movies and music, there’s been less attention to an older form of information technology—books.
Tenen’s interest in what he calls “underground libraries” grew out of his experience studying and working in comparative literature departments. “I started hearing the story more and more from people from India or China or Russia or even Europe, and they say that I am here, in this program, because I had access to this particular book-sharing site.” These are places that offer PDFs of expensive books for free. Tenen heard his colleagues say, again and again, that without these resources, they wouldn’t have had access to the academic texts they needed to do their research. These books, he says, are “usually in the English language; they’re very difficult to get; they’re even more expensive overseas, and sometimes they’re just not available.”
On a practical level, pirating books is a different business than pirating music or movies. Hosting and downloading pirated books is easy: The files are small, and book pirates sometimes bundle them into large lots—2,500 books with a single click. But creating pirated books can be more difficult. As e-readers have pumped up digital publishing, it’s become possible for pirates to strip book files of protective measures and shop them around. (Some publishers have found that they endure about the same levels of piracy whether they protect their e-book files or make them easily sharable.)
But for many books, sharing a copy online still means scanning in the pages of a physical book to create a digital copy. It can be a labor of love. Or it can be a statement of principle.
“There’s very little research, empirical or otherwise, on these book-sharing communities,” says Tenen. A good bit of what is out there is about book piracy’s long legacy: in its early days, the entire American publishing industry, for instance, was largely based on republishing works created and copyrighted on the other side of the Atlantic Ocean.
To start creating their first data set, Tenen and his colleagues, a group that includes other professors and a handful of grad students, have been looking at a site called Library Genesis. It’s based in Russia, where there’s a vigorous market for pirated books, and it describes itself as “a scientific community targeting collection of books on natural science disciplines and engineering.” It’s said to have more than 800,000 works—some of them public domain, but many of them not—available for download.
So far, piracy.lab has downloaded more than a million posts from the Library Genesis forums, with the aim of better understanding the social dynamics of piracy. They’re asking questions like: Are there 10 people pirating everything? Or is it 100? A thousand? A million? Is there a core group that’s driving everything? What’s driving them? Are they ideological? Do they have a political agenda about freedom of information? Or are they acting more like collectors?
The lab is just starting to tease out answers to these questions: It’s clear that there is a core group of people populating Library Genesis with content (who, Tenen says, “never seem to sleep”). But it’s less clear why they’re doing it.