univac, the first general-purpose digital computer, was put into production in 1951. By 1952 journalists at CBS were already using it to predict the outcome of the presidential election based on early vote returns. They even featured it live on air:
Almost seventy years later, journalists are still at it.
In the past decade the early successes of Nate Silver and his site FiveThirtyEight have helped solidify data-driven election prediction as an important—even expected—product from major newsrooms. We’ve also seen predictive data journalism applied to a growing array of other topics, from sports to culture and business. The New York Times recently published an interactive model exploring how much worse the novel coronavirus could get in the absence of swift action.
And yet a recent study found that only about 5 percent of data journalism projects have some kind of projective outlook—an eye on the future. As predictive journalism expands beyond its roots in elections into a variety of other social domains, what should journalists be thinking about to use it ethically in their practice?
Predicting aspects of social life is quite a different challenge from, say, forecasting the weather. Weather predictions don’t have to take into account the idiosyncratic behavior of individuals, who make their own choices but are also subject to influence. This creates a new ethical dilemma for journalists, who must reckon with how their news organizations’ behavior—publishing or framing information in particular ways—might influence how predicted events unfold.
Consider the potential impact of election predictions in 2016, many of which had Hillary Clinton as the clear favorite. It’s possible that individual Hillary supporters saw those predictions and thought something like, “She’s got this one in the bag. I’m busy on Tuesday, and my vote won’t be decisive anyway, so I don’t need to vote.” According to one study, election predictions may indeed depress voter turnout, depending on how those predictions are presented to people.
The important point here is that the act of publication may create a feedback loop that dampens (or amplifies) the likelihood of something actually happening. News organizations that publish predictions need to be aware of their own role in influencing the outcome they are predicting.
Of course, publishing a prediction about a sporting event may influence behavior (e.g., betting) that affects individuals but doesn’t rise to the level of affecting society, while an election or public health prediction could be far more influential.
LISTEN: A visit to an ER covid-19 unit
News organizations should be thinking carefully about how they expect predictions to be used by readers. How might a published prediction change individual behavior in the future? What individual decisions might it affect, and what are the implications of a mistaken prediction that misleads someone? Journalists need to think through the social dynamics of their projections.
Let’s look at a recent, timely example through this lens: the Times’ piece on “How Much Worse the Coronavirus Could Get, in Charts.”
Those charts depict the projected peak number of infections, the peak number of ICU cases, and the total number of deaths as a function of when and how severely interventions are taken. Interactivity allows the user to explore how an earlier, later, milder, or more aggressive intervention strategy might change those outcomes.
The article clearly articulates its goals, quoting epidemiologist Ashleigh Tuite, who helped develop the model. “The point of a model like this is not to try to predict the future but to help people understand why we may need to change our behaviors or restrict our movements, and also to give people a sense of the sort of effect these changes can have,” Tuite says. Here, the model’s predictions are explicitly about changing behavior—helping readers (both citizens and policymakers) to see the positive implications of acting immediately and aggressively to “flatten the curve.”
The article hedges about the uncertainty in the model, suggesting that warmer spring weather could affect outcomes in unknown ways. Communicating the uncertainty of a prediction, while challenging, can help soften the ostensible authority of a mathematical model. Conveying uncertainty can take various forms, including textual insinuations of contingency, confidence bounds on charts, explanations of probabilities, and articulation of multiple potential outcomes. Journalists need to develop more ways to do this well.
The Times article could go further in examining the implications of the model’s assumptions and in considering competing interests (e.g., individual liberty and freedom of movement, the health of the economy) at varying levels of intervention. And while there is a fair degree of transparency in terms of how the model was parameterized (e.g., a 1 percent case fatality rate was assumed), the authors could provide another layer of higher-fidelity transparency for the true wonks. Pyramids of transparency information can help suit different levels of interest.
ICYMI: The Tow Center’s covid-19 newsletter
Although the accuracy of the model is only truly knowable in retrospect, making the nuts and bolts of its process visible can at least help readers put predictions in perspective. If a model is built on a set of flimsy assumptions, readers can be appropriately skeptical of what it tells them.
One organization practicing exceptional transparency in predictive modeling for the 2020 elections is the Washington Post. In addition to blog posts detailing its election modeling, the Post publishes in-depth academic papers, and even the code for some models. As data scientist Lenny Bronner writes, “It’s important to explain what our models can and cannot do.” It’s not that everyone will look at all that information, but that it’s available for inspection to the few who really want to kick the tires. (Disclosure: I spent the fall of 2019 on sabbatical at the Post.)
Notably, the Times’ covid-19 model sits in the opinion section, as do models from the Post related to predicting the Democratic primary. Statistical models, and their predictions, are interpretations of data that contain a variety of subjective decisions. At the same time, this isn’t really the same kind of individual subjectivity you typically find in an opinion piece. Careful modeling is closer to a form of analysis—a well-grounded interpretation based on evidence and data. Additional transparency can expose the subjectivity in that interpretation, as can end-user interactivity with some of the subjective modeling decisions or parameters.
As predictions grow into and beyond their journalistic roots in elections, transparency, uncertainty communication, and careful consideration of the social dynamics of predictive information will be essential to their ethical use. We should expect the experiences of data journalists to coalesce into a set of ethical expectations and norms. We’re not there yet, but perhaps one day there will even be a style guide for predictive journalism.
Nicholas Diakopoulos is an assistant professor at the Northwestern University School of Communication, the author of the book Automating the News: How Algorithms Are Rewriting the Media, and a regular contributor to CJR.