Print stories can be lost, but digital stories last forever, captured for eternity in some nebulous internet ether or on a hard drive in a desk drawer. At least, that’s the vague theory assumed by many producers and consumers of digital news. Once something is posted or backed up, it never really disappears—and if that’s true, archiving digital work seems less urgent. That line of thinking is exactly why so many news organizations risk losing years’ worth of stories.
As we move deeper into the digital era, we’ve recognized the need to preserve and digitize print content, but we’re still in the early stages of understanding how we safely archive our digital news. A survey released last week by The Missouri School of Journalism’s Donald W. Reynolds Institute shows how much outlets are losing when they don’t effectively archive their work, which many do not.
Among the 476 digital and hybrid news organizations that participated in the survey, 27 percent of hybrid news organizations and 17 percent of online-only enterprises said they’ve experienced a significant loss of news content due to technical failure. To Edward McCain, the digital curator of journalism at the institute, these numbers confirm a very basic but largely overlooked fact of digital media enterprises: Digital content is fragile and easily lost.
Take The Columbia Missourian, run out of The Missouri School of Journalism, which lost 15 years’ worth of stories and seven years’ worth of images in a single server crash in 2002. Although a backup did exist, the system that was holding the material had become obsolete, rendering the information irretrievable. It was actually this experience that inspired the school to become invested in digital preservation, and it has since launched several projects through its Journalism Digital News Archive initiative, including last week’s survey.
The Missourian crash is “a textbook example of what can and does happen,” says McCain. When the preservation of digital news is under-prioritized, what’s at risk is not only an individual journalist’s work, or the news enterprise’s backstory and legacy, but also our cultural heritage, he adds.
“There’s a growing level of awareness, but overall, from talking to people and from the survey, I think we have a very long way to go,” McCain explains. “I don’t think there’s a significant number of people who understand the difference between a backup and a preservation system.”
The difference is this: A backup system is a short-term strategy for retrieval of information within a period of days, weeks, or a few years. It is a short-term solution because the technology can become obsolete or storage devices can be damaged. Far too many news organizations rely on single backups for storage of their digital content, says McCain: “We’re still kind of thinking that [storage devices] are like paper, that you can put a hard drive on a shelf and come back in one year and find it in the same condition.”
For the individual journalist, a good backup system is probably the best he or she can realistically do to protect a portfolio, but news institutions with a backstory of thousands of articles, photos, and videos can take much greater measures.
An archive, then, is a long-term preservation system that ensures both the survival of content and easy access to it through descriptions and cataloging. The data is monitored for sudden changes that might mean a loss of content and is re-formatted when it migrates to new, updated systems. The information is organized and search terms are applied so old articles can easily be found and used for reference, or even be re-published, a strategy used by several media organizations.
In short, an archive is a comprehensive system that needs to be developed and monitored by a professional—meaning, it isn’t cheap. That’s exactly why digital preservation isn’t a priority to most outlets and why some are even getting rid of archives that are too expensive to maintain.
“The economics just aren’t there,” says Victoria McCargar, an archivist, lecturer, and consultant on digital management with a background in journalism. She explains that some news outlets will drop old material rather than spend the money to incorporate it into an archive.
US News, for example, deleted its pre-2007 archives of digitized and native digital content in February, leaving the stories with LexisNexis and EBSCO.
While the costs of building an archive deter many news outlets, these same outlets miss out on the potential to monetize their archives, either through paywalls or by reusing and repurposing old content, as an opportunity for revenue.
Smaller and online-only news organizations lag especially behind due to the costs and efforts involved, says McCargar, while some of the bigger news organizations do have real archives in place.