Just as computer scientists figured out search algorithms that sorted information, taking away a part of journalism’s role, others are now writing algorithms that assemble data into stories. The most high-profile exponent of this practice is Narrative Science, a collaboration between computer scientists and the Medill Journalism School at Northwestern University. Kristian Hammond, the data scientist leading the company, envisions a world in which everything from your cholesterol level to the state of your garbage bin creates continual streams of information that can be reassembled in story form. Narrative Science uses algorithms to produce basic, bread-and-butter stories that don’t require much flair in the writing—high-school-sports reports, local-government-meeting recaps, company financial results. Since these sorts of stories can be produced using unprecedented levels of automation, they offer a realistic chance of cutting newsroom costs. And although Hammond has a vested interest in predicting that vast amounts of data will be turned into personal, local, national, and international stories, his vision is also a logical extension of current trends.
Javaun Moradi, a digital strategist and product developer for NPR, is one of a new breed of digital journalists who are working to weave the use of algorithms and new kinds of data into the arsenal of skills in the newsroom. In particular, he sees sensor networks—low-cost devices that civic-interest groups use to monitor things like air quality—as a potential data source. “It’s coming at us whether we like it or not,” he says. “A lot of inexpensive devices will start sending us a great deal more information.” Moradi can easily imagine journalists building and maintaining their own networks of information. “Up until now,” he notes, “journalists have had really very little data, and mostly other people’s data, acquired from elsewhere.” At the same time, Moradi points out, there are bound to be new dilemmas and challenges around the ownership and control of information.
Alex Howard, who writes about data journalism, government, and the open-data movement for O’Reilly Media, also flags the ownership and control of data as a key issue. “For lots of types of data—finance, for instance—there are laws that say who can obtain it and who can use it,” Howard notes. “But new kinds of information don’t necessarily have legal and regulatory frameworks.” How newsrooms obtain and handle information—what their standards and practices are—is likely to become an important part of differentiating news brands.
Journalism by numbers does not mean ceding human process to the bots. Every algorithm, however it is written, contains human, and therefore editorial, judgments. The decisions made about what data to include and exclude adds a layer of perspective to the information provided. There must be transparency and a set of editorial standards underpinning the data collection.
The truth is, those streams of numbers are going to be as big a transformation for journalism as rise of the social Web. Newsrooms will rise and fall on the documentation of real-time information and the ability to gather and share it. Yet while social media demands skills of conversation and dissemination familiar to most journalists, the ability to work with data is a much less central skill in most newsrooms, and still completely absent in many. Automation of stories and ownership of newly collected data could both reduce production costs and create new revenue sources, so it ought to be at the heart of exploration and experimentation for newsrooms. But news executives have missed the cues before. The industry shot itself in the foot 15 years ago by failing to recognize that search and information filtering would be a core challenge and opportunity for journalism; this time, there is an awareness that data will be similarly significant, but once again the major innovations appear destined to come from outside the field.