This week’s data journalism wins and fails

Darts to WSJ and Slate; laurels to Medium, NYT, and CIR

“Data journalism” is an increasingly visible storytelling form that puts numbers, statistics, and databases front and center. Like all journalism, there’s good examples of data journalism that illuminate news stories or provide a new way of looking at things. And there are bad examples that use numbers ineffectively, or simply incorrectly. We’re going to explore these types of stories on a recurring basis, explaining what works, what doesn’t, and why.

The Center for Investigative Reporting gets a LAUREL for acquiring and distilling 14 years’ worth of US Coast Guard accident records in a piece published June 28. The news organization explained its approach: “Instead of relying on the graphs and charts that officials elected to include in those reports, CIR sought the raw data behind them.” Combined with narratives of some of the Coast Guard’s deadly incidents, the data helps show the enhanced role of the fifth branch of the military since the September 11 attacks and the inherent risks that come with their increased engagement. Without CIR’s dogged attempts to get the data, stories such as those about a Coast Guard helicopter flying into wires in 2010 can be dismissed as isolated incidents. Making the data public provides empirical evidence that Coast Guard injuries have increased over the past several years.

So many DARTS to news organizations that tried to explain the Supreme Court’s Hobby Lobby ruling through data. The Wall Street Journal’s Politics Counts blog found that the customers of the craft store, whose conservative Christian owners won the right to deny employees contraceptive coverage on religious grounds, are heavily Republican. While this may be true, this provided no insight about the court’s ruling nor the company’s stance on birth control. Is the Journal implying that Republicans shop at Hobby Lobby because of the owners’ politics? This data analysis doesn’t prove that.

Slate posted a piece with the headline “How Many People Could the Hobby Lobby Ruling Affect?” The story notes that 90 percent of companies in the United States are “closely held” like Hobby Lobby, meaning they are owned by five or fewer people. That may be true, but the author’s ending conclusion has nothing to do with that stat. She notes, “it’s extremely unlikely that all of those companies are about to claim a religious exemption from providing coverage of contraception.” The question in their headline, which implies the story will conclude with statistical analysis, remains unanswered. A CNN Money piece addressing essentially the same question approaches it better by acknowledging that no one knows how broadly the ruling will apply and it’s too soon to place a numerical quality to the impact.

The New York Times’ data blog The Upshot gets a LAUREL for explaining an algorithm that predicts the success of a tweet. The algorithm, the center of an April 2014 paper published by three computer scientists, is based on the analysis of 11,000 pairs of tweets. It isn’t all that accurate, predicting which message will get retweeted more than the other only 67 percent of the time. That’s only slightly better than the average rate of a human guessing. An accompanying interactive guessing game, “Can You Tell What Makes a Good Tweet?” helps readers show that everything can’t, in fact, be predicted through data analysis. While the package in itself didn’t analyze any data, it explored data as a compelling topic in an extremely engaging way.

A LAUREL to Medium’s “i ❤ data” blog for explaining that, statistically speaking, 29 tends to be the best performing length for listicles. Running statistical tests on three months of BuzzFeed lists, data scientist Gilad Lotan explains that the number 29 consistently outperforms lists of other lengths with regard to audience score. (Other odd numbers do well too.) “Numerous folks are claiming that odd-length listicles, especially on BuzzFeed, are the preferred length-du-jour,” Lotan writes. And after explaining some complex statistical analysis, he confirms such an approach is effective. Impressively, Lotan ends his post not by offering editorial recommendations to list-based journalists. Instead, he starts a debate arguing the merits of using certain tactics, such as controlling list length, to drive clicks. The number 29 might be effective, but there should be some thought before instituting it as a pillar of editorial policy, he says.

Has America ever needed a media watchdog more than now? Help us by joining CJR today.

Tanveer Ali is a Chicago-based journalist who is Chicago's data reporter and social media producer. He has reported for the Chicago News Cooperative, WBEZ, and GOOD Magazine, among others. A former staff writer at the Detroit News, he received a master's in journalism from the Medill School of Journalism. Tags: