You may not realize it, but you’ve probably already used machine learning technology in your journalism. Perhaps you used a service like Trint to transcribe your interviews, punched in some text for Google to translate, or converted the Mueller Report into readable text. And if you haven’t used it yourself, machine learning is probably at work in the bowels of your news organization, tagging text or photos so they can be found more easily, recommending articles on the company website or social media to optimize their reach or stickiness, or trying to predict who to target for subscription discounts.
Machine learning has already infiltrated some of the most prosaic tasks in journalism, speeding up and making possible stories that might otherwise have been too onerous to report. We’re already living the machine-learning future. But, particularly on the editorial side, we’ve only begun to scratch the surface.
To be clear: I’m not here to hype you on a fabulous new technology. Sorry, machine learning is probably not going to save the news industry from its financial woes. But there’s nonetheless a lot of utility for journalists to discover within it. What else can machine learning do for the newsroom? How can journalists use it to enhance their editorial work in new ways? And what should they be wary of as they take up these powerful new tools?
The phrase “machine learning” describes a kind of finely crafted and engineered tool. Trint, for example, is able to transcribe an audio clip because its algorithm has learned how patterns of sound correspond to patterns of letters and words. Such algorithms are trained on many hours of manually transcribed audio. The algorithm learns the ways that patterns in audio translate into patterns of text, and can then perform transcription on new samples of audio.
More formally, machine learning refers to the use of algorithms that learn patterns from data and are able to perform tasks—like transcription—without being explicitly programmed to do so. Machine learning approaches and specific technical algorithms come in several different flavors, and each may be suited to a different use. These approaches are often distinguished by the amount and type of human feedback provided:
- In supervised learning, a dataset of carefully annotated examples is provided for the algorithm to study. Documents might be tagged “interesting” or “uninteresting” to an investigation by journalists, and once trained, the algorithm can classify new documents according to these categories. This has proven valuable to investigative journalists who want to filter large volumes of documents or data based on known patterns of interest.
- Another variant, weakly supervised learning, also provides the algorithm with examples, but rather than annotating each item individually, humans specify filtering rules that define data sets that are large but “noisy” (containing a lot of useless information alongside the useful pieces). The International Consortium of Investigative Journalists is working with a machine learning group at Stanford to see how exactly this might apply in journalistic scenarios.
- On the other hand, unsupervised learning approaches don’t require annotation. Instead, they allow for the algorithm to find patterns in data, such as groups of entries that share a characteristic, and are typically used to cluster or link records that are similar. The Associated Press used one such technique in a story analyzing unintentional child shootings to find cases with noisy data that cluster together with data from clearer-cut, less noisy cases. At The New York Times, such techniques help campaign finance reporters link multiple donation records to the same donor, as detailed in my forthcoming book, Automating the News: How Algorithms are Rewriting the Media.
Unsupervised approaches to grouping or clustering can sometimes be made more efficient by providing targeted feedback to the machine-learning system. For instance, Dedupe, a tool for grouping and linking noisy records, has been used by investigative journalists at the Minneapolis StarTribune for its “Shielded by the Badge” series. Dedupe uses an approach called active learning. As the system tries to cluster items together, it asks for feedback from a human trainer on the items it’s least confident about. This maximizes the value of human feedback for improving the results over time.
- Reinforcement learning is yet another type of machine learning—it also doesn’t need labeled data. It does make use of feedback to the algorithm over time, however. Headline testing uses this method, with a click on a headline providing positive reinforcement and feedback, from which the algorithm learns which version of the headline it should try with the next user.
There are still a few problems of which practitioners should be cognizant as they consider how to use these techniques.
ICYMI: All our righteous scumbags
First: bias. Duke University’s Tech & Check Collaborative uses the ClaimBuster system to monitor text, using a machine-learned model to identify “factual claims” in so-called check-worthy factual sentences, which are then sent to fact checkers. The model it uses was trained on 20,000 hand-labelled sentences from past US presidential debates.
Academic researchers evaluated ClaimBuster’s analysis of 21 transcripts from the 2016 US presidential debates. The researchers compared the topics of the factual claims identified by the ClaimBuster algorithm to the topics of the claims selected by human fact-checkers at CNN and Politifact. As compared to the human fact-checkers, the ClaimBuster algorithm picked up more claims about the economy and fewer about social issues. If human fact-checkers were to rely solely on the ClaimBuster algorithm, its biases would steer them away from social-issue claims—an outcome that may not be desirable from a public interest standpoint.
Another aspect of the evaluation showed that Donald Trump had fewer claims marked as “check-worthy” by the system than Hillary Clinton did. Trump’s rhetorical style may have made his statements less susceptible to being identified by the algorithm. Since the ClaimBuster system heavily weights the presence of numbers in its selection of claims, a lack of specifics from Trump might have something to do with his results.
Such machine-learned systems might orient (or divert) attention in characteristic ways. Journalists using these tools ought to be aware of that possibility, and prepare to fill the gaps as needed. Editors in particular should oversee, monitor, and set bounds on how coverage might be shaped by such systems.
There is also the problem of uncertainty. Machine-learned models that predict scores or classify categories of documents are rarely 100 percent certain of their outputs. They’re statistical in nature. That means that journalists need to remain skeptical of them, and verify their outputs rigorously.
Various validation methodologies can be used to assess the overall quality of a model, but an individual output could still be an anomaly. If a model effectively accuses an individual or specific organization of wrongdoing and could lead to severe negative consequences for them, then caution is warranted when publishing the outputs of the model. Awareness of uncertainty is key. But if the output of a machine-learning system is only used internally to a newsroom and there will always be a journalist checking it before anything is published, then there’s less of a concern. Journalists need to ask what the chance is that a model’s prediction or categorization could be wrong—and what’s at stake if it is?
Journalists who have used machine learning in their work acknowledge that these algorithms are imperfect. They can miss potentially newsworthy documents, meaning that a given investigation of a trove of records might not be as comprehensive as it could if someone manually checked every document. But sometimes a story doesn’t require an exhaustive accounting of cases, and finding most or even some newsworthy documents in a massive pile is all that’s needed for a solid journalistic contribution.
If the output of a machine-learned model can be corroborated through other sources and still remain newsworthy, that’s ideal. At the Los Angeles Times an investigation used machine learning to evaluate the quality of police data. The main result from the model—that the LAPD had been systematically under-reporting serious assaults in their crime statistics—was corroborated by the department, which had itself just concluded an internal audit of their data, according to the Times’s reporting. That triangulation of evidence helped boost the reporters’ confidence in their machine learning result.
Finally, because of the wide variety of machine-learning approaches available, part of the challenge for journalism is figuring out which techniques are appropriate (and useful) for particular journalistic tasks. One way to tackle this challenge would be to invite experts in machine learning to take up residence in newsrooms where they could determine which strains of machine learning could be most useful to the journalists there. Another possibility might be to invite editorial thinkers to do fellowships in computing environments. With more collaboration over time, we can flesh out where and when machine learning is most useful in journalism, and thus broaden the capacities of even the largest newsrooms to investigate the secrets hidden in the vastness of digital data.
In summary, I’m bullish on the capabilities and opportunities that machine learning presents to editorial work, but also cautious enough to remind readers that machine learning is not the answer to every journalistic task. The grand challenge moving forward is to experiment with when and where the different flavors of machine learning truly do bring new editorial value, and when, in fact, we may just want to rely on good ol’ human learning.