For at least a decade, librarians, technologists, and academics have been discussing an idea that seems as inevitable as it is challenging: a centralized, digitized public library that would contain all of the country’s books, images, and archival materials, and be accessible to anyone with an Internet connection. It would be like having the Library of Congress—plus every local library branch and museum and archive in the country—right on your laptop, smartphone, or tablet. Last December, the Sloan Foundation funded an initiative to study this proposal in earnest, and the Digital Public Library of America committee began meeting at Harvard’s Berkman Center for Internet and Society.
A steering committee met again last week to discuss what needs to happen next to make the DPLA a reality; the meeting was off the record, but the moderator blogged about some of the big questions that were raised, such as, What will this mean for the public library system as it exists now? What will the infrastructure look like, and how will they actually go about scanning and digitizing everything in the country? Who will be in charge?
In addition to the questions listed there, we had some more: What would will this mean for the authors of copyrighted material? How is the project different from what Google is trying to do by scanning the world’s books? And who’s going to pay for it? To find out more, assistant editor Lauren Kirchner spoke with Maura Marx, a fellow at the Berkman Center who is also executive director of The Open Knowledge Commons, and this is an edited version of that conversation.
What kind of consensus did the conference last week bring among all the various groups involved? Did you decide what it the DPLA would look like on the user end, and who might actually be in charge of the organization, etcetera?
Some rough consensus came out, such as: a DPLA has multiple uses, not distinct users, for instance. We had the conversation: Who are our users? Are they research users, or public users? And really what the group said loud and clear was, we shouldn’t be building something channeled at specific users. We should be building an open platform so that any type of use, any type of service, could be built on top of this infrastructure. We should really be exposing data and content in such a way that unforeseen uses can be accommodated in the future. That’s an important architectural distinction.
How decisions will be made hasn’t been scoped out yet; currently there is a Steering Committee to steer the project, and one of the major work-streams we’ve identified [to look at next] is on the subject of governance: How are people going to be represented? How will decisions be made? All of these things will be looked at, and right now the period of April to June, we’re going to use to plan out those next eighteen months of work in each one of those work-streams.
What are other countries doing in this area, and what have you learned from looking at those projects?
Europeana is a pan-European aggregator, a portal that you can go to to search all of the European digital libraries. It’s kind of a meta-aggregator; it’s aggregating from local aggregators, like the National Library of Austria (which is aggregating from libraries in Salzbur, Vienna, and Innsbruck), for instance. So you can search, in theory, the entire European output through Europeana. So we have learned from them—they’ve built this portal, and now they’re going about trying to transition into that kind of open platform that I mentioned earlier. They want to make sure that people can pull content into where they live. So they want to be available for Flickr and all kinds of social media, to pull content in, rather than users having to go to the portal to search for this stuff. So that was a very valuable lesson. Part of it is timing: they started working on that five years ago, and that’s how the world looked then.
Is the assumption that this project would involve public funding at some point, as the European version does?
I don’t think there’s any assumptions about funding at all, except for the fact that I know that there are foundations that are very interested and very supportive of this effort, and we’ve been supported by the Sloan Foundation thus far in just getting this work off the ground. I think it’s important for all of us to think about sustainability. You know, many libraries are still looking at digitization or digital operations as this special other thing, that needs extra support from somewhere. And even though a project like this would certainly need some kind of infusion of money to get started and to build infrastructure, I think it’s everyone’s goal—to stop having digital be this special, extra thing. It can’t be! It is the way people find information now.
So if anything, hopefully part of what DPLA accomplishes is to be an advocacy platform, and a way for libraries to start shifting their budgets into operations that will be of benefit to the whole community rather than just locally. You still have so much duplication of efforts in so many things, and that just has to stop—it’s such a waste of money. Public libraries have a lot of services that you won’t be able to duplicate in the digital world, of course, but still, a large part of what they do is digital and could benefit from being more closely networked with other digital libraries.
So what would this mean for traditional public libraries, when and if this project becomes a reality—in terms of the funding they might get, or how they’ll be used in the future? Is that part of the discussion?
Yes, absolutely. We had a lot of public librarians at this meeting who were able to speak to those issues. Hopefully, it wouldn’t have any effect on their funding, because public libraries do provide so many services that are local, and important to their communities. This would be a complement to the huge collections that local libraries already have. In addition, a lot of libraries have wonderful local history collections, that would be an important part of a DPLA. That’s one of the challenges we’re looking at: local history collections and genealogy materials are always among the most sought-after material in libraries. So how do you get that stuff digitized, uploaded, and part of the whole? Those are some of the really fun and interesting problems that we have to solve.
What do you imagine this would look like on the user end?
I hope that there’s not one predefined front end to what gets done. I hope that we think about serving mobile users, I hope that we think about incorporating a large degree of interactivity from people out there: something like the Flickr project with the Library of Congress project, where the Library of Congress posted a lot of their photographs, and then had users creating metadata and tags for them, and there was a real exchange of data that went back and forth. I think you have to be able to tap into what people know about, and what interests people. There’s also so many different naturally occurring communities of interest, so ostensibly, each one of them would be able to tap into materials that serve their interest. I hope we don’t have just one way in to all this.
How is this project different from Google Books?
Google has done a great job in digitizing millions of books, and, actually, they’ve helped by showing us what’s possible. But there are other materials: there are images, there are manuscripts, there are audio/visual materials—it’s not all books. Who knows, maybe Google will be a part of this, too. There’s been an enormous amount of work that’s been done already. Many of the libraries involved in this effort have worked closely with Google as well. I think that public/private partnerships are going to be a part of this in lots of ways. There’s open content that’s totally open and totally free, and then there’s content that might appear free to the user, but that has something happening on the back end—someone is paying for it somehow—which is how libraries have been paying for content already anyway.
The Google settlement has been held up for a very long time now, and who knows what’s going to happen to it. It’s a great pity that we don’t have an answer to the orphan works problem. Part of what DPLA will do is to try to help create legislative solutions to this problem. I mean, Google created a private contract around a problem that really should be legislated. And so no one has given up on the desire to create a true legislative solution for everyone, for the orphan works problem.
What would need to happen, in terms of legislation, before DPLA could become a reality?
Well, I’m not a lawyer; my colleagues could wax much more eloquently on this. But generally speaking, certainly we need to be able to provide access to things that are truly orphaned. We need to know that we’re not going to be sued because we provide access to something that has no locatable rights holder. We also need a way to tease out the different layers of content: there’s Mickey Mouse and Harry Potter, and then there’s the book that was written by an academic five years ago, that they would be delighted to have someone read. You know, it’s out of print, it’s not going to make any more revenue, really. So can we not find a way to change the one-size-fits-all nature of copyright, so that the scholarly publishing community can do what it was supposed to do, which was to spread knowledge and increase scholarship? It was not supposed to just publish books that would be locked up for the next lifetime plus seventy years, that no one would be able to locate in an onine environment, much less read or analyze?
But then what would this mean for the authors of content that’s still current, still under copyright, and would still be making money on its own? Would those types of content not be included in the DPLA?
Many are already being included through your public library, through a company called “Overdrive,” that provides access to in-copyright digital books: you download them and then they disappear in two weeks, for instance. But no one—I can say that with one hundred thousand percent certainty—no one at this meeting was ever advocating for rights-holders to be in any way slighted or disadvantaged. Everyone supports copyright and its purpose, in creating an environment where innovation can thrive, where creators are rewarded for their creation. Everyone is clear that that should be a vital part of the DPLA.
On the user end, would access to the DPLA be free to everyone? Or is that still up for discussion?
That’s the ideal, I think. But I don’t know how things will look, because you may have different tiers of content. For example, I, as a citizen of Cambridge and of Massachusetts I have access to the overdrive database through my Boston library card, and there are things I have access to because of where I live. Then add to that that I have access to other materials because I am part of an academic community. So I could see the DPLA having a similar tiered access mechanism.
Then there are different types of content: there’s content that’s wide open in the public domain that has no restrictions on access and then there’s another type that is in the scholarly arena and has a short period of generating revenue, after which it might become more broadly available, then there’s the “Disney” type of content that would have to be paid for, either on the front end or the back end. But again, who knows. There’s a lot of work that’s going to be done on this, so I can’t say at this point.
Has there been any discussion about the inclusion of news products? Such as current or archived magazines or newspapers?
Yes, we had the National Digital Newspaper project here. Because, again, no one wants to create some dusty vat of stuff that no one wants to look at. Newspaper content is another highly requested chunk of content. So, yes, we’re definitely thinking about that.