In February, ProPublica launched the “Data Store,” a site from which journalists from outside of the organization could purchase data sets put together by the ProPublica staff often after months of painstaking work.
The store has had modest offerings in its first few weeks, selling a handful of cleaned-up data sets for $200 to $1,000 each to journalists. At the moment, the data sets are centered on the organization’s investigations into healthcare, but there are plans to grow the store further. The organization is planning to make available nearly every data set used in its reporting, according to Scott Klein, ProPublica’s senior editor for news applications.
News outlets have to varying degrees made available data sets to the public, but ProPublica’s Data Store is the most ambitious acknowledgement that data could in itself be valuable products of journalism. By offering data alongside traditional stories and interactive features, the nonprofit is not only testing a new revenue stream, but expands the impact of the information that could only have been assembled by journalists.
“The data sets we’ve typically kept private - highly crunched, cleaned and analyzed data sets and the data that powers our high-impact online databases - are now available at a small cost,” Klein said via email.
Aside from those cases in which the organization agreed not to release the raw data in order to get it, ProPublica is planning to release “all of the data we use in our stories and interactive databases, either free or for a small fee,” Klein said. While it’s still early in the experiment, Klein said there has been a lot of interest.
Klein acknowledges that the sale of data sets is very unlikely to offset the entire cost of acquiring and putting the data together. But it gives ProPublica a chance to test the market with a new product in such a way that could help defray some costs.
Data that is unaccompanied by packaged narrative has always held allure in online journalism, largely because it’s usually been simple enough for the audience to find their own stories. Between 2009 to 2013, The Guardian posted hundreds of data sets on its datablog which cover everything from where British citizens are most likely to get arrested abroad to every Doctor Who villian since 1963. The Washington Post has maintained a similar landing page for its data sets. Earlier this month, the New York Times shared its database of more than 800,000 healthcare providers and the amount of money Medicare reimbursed them for in 2012.
In those examples, organizations have essentially curated data that audiences could spin into their own stories. But larger organizations have long been providing raw data, most notably in realms such as business and politics.
“If you look at newsrooms like the AP, Bloomberg, and Reuters, you’ll see that at their core are data products, some of which are very profitable indeed,” Klein said. “There’s no question that selling data is a rich opportunity for many newsrooms.”
Revenue potential aside, the store also magnifies the effect of ProPublica’s legwork. Klein pointed to the fact that his organization’s “Dollars for Docs” data was used in work by more than 175 other newsrooms. By spreading its data sets through the Data Store, ProPublica has essentially expanded the network of journalists that could make the impact that is at the center of the organization’s mission.
It “means that if another newsroom does great journalism using our material, it’s a great day in the office,” Klein said. “In a traditional model that would be called ‘getting scooped by our own story,’ but that’s a huge success for us.”