The story was already great, even before Daniel Gilbert opened his first spreadsheet. Thousands of citizens in the southern Virginia area Gilbert covered for the Bristol Herald Courier (daily circulation: 30,000) had leased their mineral rights to oil and gas companies in exchange for royalties. Twenty years later, they alleged, the companies had not paid, adding up to potentially millions of dollars owed. As Gilbert learned, the complaint was complicated. It involved esoteric oil and gas practices and regulations, a virtually unknown state oversight agency, the rules of escrow accounts—and finally, some very angry people and a handful of very big companies. With these facts alone, he could have written a stellar story giving voice to citizens’ complaints, and shining a light on a little-known regulatory agency. That, in many newsrooms, would have been plenty.
But Gilbert, who officially covered the courts for the paper, wasn’t satisfied simply to raise the specter of noncompliance. Whenever a well produced natural gas, the energy company was supposed to make a monthly payment into a corresponding escrow account. These payment schedules were public. So were the production records. All Gilbert had to do was match the production records with the payment schedules to see who had—and had not—been paid.
Easier said than done. Gilbert requested the information he needed and received spreadsheets with thousands of rows of information. In Excel, a typical computer monitor displays less than a hundred rows and ten wide columns. Gilbert’s data was much too massive to cram into this relatively modest template. So he started with one month’s worth of information, using the program’s “find” function to match wells and their corresponding accounts. One by one. Control-f, control-f, control-f. It was tedious and time-consuming. There was a story there, he was certain. But control-f would not find it.
What would you do? Could you navigate, process, and make sense of thousands of rows of data? If you have not yet had to ask yourself this question, there is no time like the present.
Most journalists are just like Gilbert, with daily computer skills that include Internet searches, word processing, and maybe some basic calculations in Excel, none of which enables journalists to truly mine large collections of data. Meanwhile, the amount of raw data available to journalists has mushroomed. At the federal level, the Obama administration’s “open government” initiative has given rise to new sources like Data.gov, a website devoted to the aggregation and easy dissemination of national data sets. State and local governments have followed suit, making much of the data they collect available online. More elusive tranches of data have been pried loose by nonprofit organizations courtesy of the Freedom of Information Act; an inquisitive journalist can download them in minutes. “I’m constantly amazed and surprised about what’s out there,” said Thomas Hargrove, a national correspondent for Scripps-Howard News Service who often leads data-based research projects for the chain’s fourteen newspapers and nine television stations.
Against this backdrop, the ability to find, manipulate, and analyze data has become increasingly important, not only for teams of investigative journalists, but for beat reporters. It is hard to conceive of a beat that doesn’t generate data—even arts reporters evaluate budgets and have access to nonprofit organizations’ tax returns. What’s more, because the universe of data is vast and growing, and the stories that use it are rare, data-based journalism has become a powerful way to stand out in the crowded news cycle. “When you acquire a certain level of data skills and literacy, you can punch way above your weight,” says Derek Willis, a web developer at The New York Times and author of the computer-assisted reporting blog, The Scoop. “Simply put, you can do things others can’t.”
And last but certainly not least, readers like data. They like charts and interactive graphics and searchable databases. At The Texas Tribune, which has published more than three dozen interactive databases and usually adds or updates one a week on average, the data sets account for 75 percent of the site’s overall traffic.

Corporations of all sorts LOVE to drowned journalists and other with information then they can say "we gave it to them; it's their problem if they can't decipher it." SURE!!! Just like yesterday's report in Business, NY Times that Microsoft, and Google earlier, had placed their information for news and/or income/profits/losses etc on THEIR own site but didn't bother to give it to anyone else. So if one wants to find out, he/she has to go to each site for the subject which may not be on the same headline or algorithm. That works only to the advantage of the company not to those needing the information. It's so much like Mr Murdoch's action of requiring that all his papers MUST only be on Microsoft's Explorer. So though I usually use Firefox for easier reading and copying, if something comes up for TLS his supplement for Times of London he will let one see it but to copy it--if I can find it--I must switch to Explorer and hope it's being listed. They have an archive but it's "being upgraded"--has been for 11 months. I hope it's good if it makes it up online. Murdoch would rather get rid of TLS since it has heavier articles than the kind he likes but since he gains so much flack when he suggests doing it from all 6 continents, he backs off. The SEC misworded a regulation and it has backfired in terms of corporation listings. W/ the new Republican House who's going to get these changed and corrected??
#1 Posted by Patricia Wilson, CJR on Wed 10 Nov 2010 at 05:19 PM
Before contemplating anything like the techniques you outline, reporters need to learn to hand simple single numbers or a handful of numbers. My favorite example is unemployment data. Most reporters are content to use the government's unrealistically low U3 measure, while at the same time the government releases the much more realistic U6 number. And these same reporters have no sense of month to month fluctuations in the numbers, whether the magnitude is significantly greater than chance, and the fact the phenomenon is not stationery but increasing as population rapidly increases. Even regular economic beat reporters for media like the NYTimes take this approach. And I don't know what it takes to wake them up. Certainly readers like myself regularly provide feedback to no effect.
#2 Posted by Bob Richardson, CJR on Wed 10 Nov 2010 at 06:05 PM