Covering the databases

Data journalism and information visualization is a burgeoning field. Every week, Between the Spreadsheets will analyze, interrogate, and explore emerging work in this area. Between the Spreadsheets is brought to you by CJR and Columbia’s Tow Center for Digital Journalism.

Remember that kid in second grade who wouldn’t let you play with his toy truck? Chances are he grew up to be a journalist. Sharing doesn’t come naturally to journalists; scooping stories and securing sources requires a degree of secrecy. But when it comes to data, journalists and media organizations need to make a concerted effort to support open data policies. And that means they have to share.

Open data is information that’s available in an accessible format which is free from license restrictions so it can be easily reused. In practice, this means online access. The UK government is moving towards passing laws that require public bodies to release more of their data like this. The open data white paper is currently making its way through Parliament, in which proposals are set out to enforce more government transparency by enforcing the release of agencies’ data. In the US, Data.gov, an online resource of federal government data, is the result of President Barack Obama’s Open Government Initiative.

Journalists who want to use this governmental data will have to do some finessing. First, they need to get hold of it. In the absence of open data, this means working through bureaucratic channels, occasionally paying fees and writing requests in order to get their hands on the spreadsheets they’re after.

The Texas Tribune uses public data to create stories like this one about government employee salaries. This straightforward database is the result of a lot of work by Ryan Murphy, who works on the majority of the data stories for the nonprofit website. Murphy told CJR that he submitted various requests to state agencies under the Texas Public Information Act, waited for their responses, had to pay a few of them an administrative fee, and only then scrutinized the data he finally received. His job would be a lot easier if he could simply download an Excel file from any given state agency.

The Tribune used this data for several subsequent stories on public employee salaries, which may not have been possible without the searchable database of all the state government employees’ salaries Murphy built. Murphy spent most of his time “cleaning” the data: as there is no standardized format for the way these agencies hold their information, Murphy had to create one and put all data in it—the fields included employee names, salaries and job titles.

But the Tribune didn’t just build the database; it also made all the numbers that went into it available to download, as it does with all data it obtains. The Texas Tribune does this because it believes this information should be in the public domain, Murphy said. All anyone who wants to use Tribune’s data has to do is credit the site in accordance with its Creative Commons license.

One reason why The Texas Tribune can be so open with its data is because it’s a nonprofit. Unlike its commercial counterparts, its success doesn’t depend on page views and ad sales so it doesn’t need to rely on the exclusivity of its data. The Tribune asks users to credit its data because it wants also to keep track of what’s happening to it; who’s using it and what he’s doing with it. This helps create a community of collaborators, allowing journalists to pool resources, learn new techniques and technologies, and disseminate their work.

The Texas Tribune isn’t the only publication that supports open data. ProPublica (another nonprofit), The Guardian, and The New York Times (both commercial entities, so their data generosity is particularly acute) all make big data—federal and global-level figures, for example—available. The more transparent the data is, the more it invites the public to scrutinize it.

Data sharing is good news for journalism. Advocates for open data hinge their argument on the democratic need for transparency. If journalists are able to get their hands on data quickly and easily, they can work with it and reveal the stories behind the numbers to the public. By publishing spreadsheets of data from which they found those stories and allowing others to use that data, they’re also acting as platforms for hosting data. They’re walking the walk, supporting what they themselves are asking for: easily accessible data.

Anna Codrea-Rado is a digital media associate at the Tow Center for Digital Journalism at the Columbia University Graduate School of Journalism. Follow her on Twitter @annacod.

Covering the databases

About

Support CJR

Advertise