A few weeks ago BuzzFeed published a video of former President Barack Obama making some decidedly out-of-character quips. What’s most noteworthy about the video, however, isn’t that Obama made the remarks, but rather, that he didn’t. Instead, a voice recording of actor Jordan Peele impersonating Obama was fed into a media synthesis algorithm that generated the video. The result appears astoundingly authentic. See for yourself:
Nothing online is quite as it appears, now less than ever. Thanks to a new breed of neural network machine-learning algorithms, compelling yet fictitious video, images, voice, and text can be synthesized whole cloth. Photos of imaginary faces can be realistically fabricated by computers—their emotions, skin, age, and gender dialed in by a knob on a machine. Style transfer can change the environmental context of an image, portraying winter as summer, or what was sunny as rainy. Videos of politicians can be produced as you might control a puppet. And faces can be swapped from one body to another, creating what are popularly known as “deepfakes,” opening up an array of threats to reputation, security, and privacy. Harrowing stuff.
But in a way, this technological leap could actually be good news for journalists—and might also provide an opportunity for the kind of goodwill gesture that tech platforms ought to extend to a suspicious public.
Sure, photos have been manipulated basically since photographic technology was invented. And the media itself is a simulacrum of reality, in which each selection, edit, highlight, or turn of phrase shapes the audience’s interpretation of events. What’s new here is that media-synthesis algorithms further fracture any expectation of authenticity for recorded media while enabling a whole new scale, pervasiveness, potential for personalization, and ease of use for everyone from comedians to spies. Faked videos could upset and alter people’s formation of accurate memories around events. And visual evidence may largely lose its teeth as strategic misinformers use the specter of the technology to undermine any true veriticality.
So what happens when the public can no longer trust any media they encounter online? How can a society have an informed understanding of world events when media can so easily be polluted by algorithmic media synthesis?
Dire as the case may be, it could offer a great comeback opportunity for mainstream media. As the public learns that it can no longer trust what it sees online, few intermediaries are better placed to function as trusted validators and assessors of mediated reality than professionally trained journalists with access to advanced forensics tools. To capture this opportunity, journalists and news organizations should pursue strategies like forensics training, technical tool development, and process standardization and transparency.
News organizations and educational institutions need to ramp up training in media forensics techniques. There are telltale signs of altered and synthesized media that an expert eye can pick out—Hany Farid’s book on photo forensics offers a few alternatives, for instance. Statistical analysis of pixel colors, intensities, and their regularities may indicate editing or splicing of images; reflections and vanishing points can expose geometric aberrations; and sensor noise or compression artifacts can also be giveaways. In video, the mouths of synthesized faces can sometimes flicker or look unnatural; eyes can take on the glazed look of zombies. The algorithms aren’t perfect, but journalists, like all investigators, need trained eyes to see the flaws.
The development and integration of computational forensics tools are certain to be just as important as media forensics training. Even if synthesized content might sometimes fool the human eye, a forensic algorithm’s statistical eye may know it’s faked. A recent research project called FaceForensics uses machine learning to detect whether a video of a face is real with 98.1 percent accuracy. Another approach looks for blood flow in video of a person’s face in order to see if pixels periodically get redder when the heart pumps blood. The National Institute of Standards and Technology (NIST) is stimulating additional research on the topic with its Media Forensics Challenge, and, in fact, there are hundreds of digital forensics research papers published annually.
Much of this technology is still several hops away from the kind of cheap public availability that would make it practical in reporting, however. While there are a few integrated tools, such as InVid, that aid media verification, most computational forensics methods are still research prototypes and are far from accessible to the workflow of quotidian journalism. More translational work needs to be done.
It’s also incumbent on parties other than news organizations to police faked video material. Some of the other stakeholders have research resources of their own, to say nothing of deep pockets: The information platforms that often end up hosting synthesized media could help with that much-needed translation. If Facebook and YouTube were to integrate the FaceForensics algorithm, for instance, they could flag and visibly mark videos that are suspected fakes. That would be another cue to users and the media to be cautious about the authenticity of a video, and it might demonstrate tech platforms’ willingness to act in the best interests of society, rather than solely in pursuit of short-term financial gains.
To build that much-needed trust, the platforms would also need to be transparent about what such an “authentication” meant. If the process was integrated into something like YouTube’s restricted-mode filter, end users could then control whether flagged videos are automatically hidden. And if tech companies were to make media verification algorithms freely available via APIs, computational journalists could integrate verification signals into their larger workflows however they see fit, much as they currently do for now-prosaic tasks like geocoding street addresses into latitudes and longitudes.
Ultimately, though, media forensics techniques can only take us so far. They can be difficult to use, require a high level of training to interpret, often aren’t definitive, and, like any other form of information security, will need ongoing and sustained support and attention. Another layer of forensics considers the context of media in determining authenticity: If an image can be so easily synthesized, metadata about the time, place, social setting, or other context will become increasingly important for proper verification. If a suspiciously compelling image is uploaded by an account created yesterday and with what appear to be legions of bot followers, that’s another clue in the calculus. Interpreting context for the sake of verification is a new form of media literacy where journalists, again, will need training, expertise, and tools that help make sense of that cloud of context.
Just as verification practices for social media have been codified and are practiced by organizations like Storyful and Bellingcat—which follow rigorous procedures to triangulate, corroborate, and substantiate content and its provenance—journalists need to extend and codify workflows for assessing whether an image, video, or text is a result of some media synthesis algorithm. News organizations should double down on transparency of method. Robust and standardized processes for the verification and debunking of synthesized media need to be developed and openly published. Then, news organizations need to publicly commit to adhering to those standards. It’s about trust. People might flock to media brands they know are following meticulous and exhaustive procedures.
If we all can’t trust our eyes on the internet, perhaps we can trust that a media outlet is following a rigorous process to ensure that whatever they do publish is authentic. Synthesized media could be just the thing that drives the public back into the arms of mainstream news organizations.