Sign up for The Media Today, CJR’s daily newsletter.
In recent weeks, the spotlight of attention has fallen on new generative AI tools—such as OpenAI’s ChatGPT, Microsoft’s Bing chatbot, and Google’s Bard—which have been subject to debate about their potential to refashion how journalists work.
For data and computational journalists in particular, AI tools such as ChatGPT have the potential to assist in a variety of tasks such as writing code, scraping PDF files, and translating between programming languages. But tools like ChatGPT are far from perfect and have shown to ‘hallucinate’ data and scatter errors throughout the text they generate.
We talked with Nicholas Diakopoulos, associate professor in communication studies and computer science at Northwestern University and a former Tow Fellow, about how to navigate those risks, whether ChatGPT can be a helpful tool for journalism students and novice programmers, and how journalists can track their steps when using AI.
Diakopoulos recently launched the Generative AI in the Newsroom Project which explores how journalists can responsibly use generative AI.
As part of the project, news producers are encouraged to submit a pitch on how they envision using the technology for news production. You can read more about the project here. This conversation has been edited and condensed for clarity.
SG: For journalists and students who are novices to programming, how useful do you think ChatGPT can be in helping with computational tasks?
ND: As a user myself, I’ve noticed that ChatGPT can certainly be useful for solving certain kinds of programming challenges. But I’m also aware that you need a fairly high level of competence in programming already to make sense of it and write the right queries, and then be able to synthesize the responses into an actual solution. It could potentially be useful for intermediate coders as long as you know the basics, how to evaluate the responses, and how to put things together. But if you don’t know how to read code, it’s going to give you a response, and you’re not really going to know if it’s doing what you wanted.
There’s a reason why we have programming languages. It’s because you need to precisely state how a problem needs to be solved in code. Whereas when you say it in natural language, there’s a lot of ambiguity. So, obviously, ChatGPT is good at trying to guess how to disambiguate what your question is and give you the code that you want, but it might not always get it right.
I’m wondering if journalism students will lose some fundamental knowledge if they use ChatGPT for assignments. When it comes to learning how to program, are students better off learning how to write code from scratch rather than rely on ChatGPT?
One lens that I look at this problem through is substitution versus complementarity of AI. People get afraid when you start talking about AI substituting someone’s labor. But in reality, most of what we see is AI complementing expert labor. So you have someone who already is an expert and then AI gets kind of married into that person and augments them so that they’re smarter and more efficient. I think ChatGPT is a great complement for human coders who know something about what they’re doing with code, and it can really accelerate your ability.
You started a project called AI in the Newsroom where journalists can submit case studies of how they’ve used ChatGPT within the newsroom. How is that project going?
People have been submitting, I’ve had contact with more than a dozen people with ideas that are at various levels of maturity. From different kinds of organizations such as local news media, national, international, and regional publications, and startups. There’s such a range of people who are interested in exploring the technology and seeing how far they can take it with their particular use case. I have contact with some legal scholars here at the University of Amsterdam Institute for Information Law where I’m on sabbatical. They’re looking at issues of copyright and terms of use, which I know are quite relevant and important for practitioners to be aware of.
I’m also exploring different use cases myself with the technology. I’ve been writing a blog about it and to put the pilot projects out there to help folks in the community and to understand what the capabilities and limitations are. So, overall, I’m pretty pleased with the project. I think it’s progressing well. Hopefully, we’ll see some of these projects mature over the next month and start publishing them.
Now that you’ve been looking at what journalists are submitting, do you have a better intuition about what things ChatGPT could help assist in the newsroom?
There’s just so many different use cases that people are exploring. I don’t even know if there’s gonna be one thing that it’s really good at. People are exploring rewriting content, summarizing and personalization, news discovery, translation, and engaged journalism. To me, part of the appeal of the project is exploring that range. Hopefully, in a couple of months, these projects can start to mature and get more feedback. I’m really pushing people to evaluate their use case. Like, how do you know that it’s working at a level of accuracy and reliability where you feel comfortable rolling it out as part of your workflow?
A main concern among computational journalists is that ChatGPT will sometimes ‘hallucinate’ data. For instance, maybe you use it to extract data from a PDF and everything works fine on the first page. But when you do it with 2,000 PDFs, suddenly errors are scattered throughout. How do you navigate that risk?
Accuracy is a core value of journalism. With AI systems and machine learning systems, there’s a statistical element of uncertainty which means it’s basically impossible to guarantee 100 percent accuracy. So you want to get your system to be as accurate as possible. But at the end of the day, even though that is a core journalistic value and something to strive for, whether or not something needs to be 100 percent accurate depends on the kinds of claims that you want to make using the information that’s generated from the AI system.
So if you want a system that’s going to identify people who are committing fraud, based on analyzing a bunch of PDF documents and you plan to publicly indict those individuals, based on your analysis of those documents, you better be damn sure that that’s accurate. From years of talking to journalists about stuff like this they’re probably not going to rely only on a machine learning tool to come up with that evidence. They might use that as a starting point. But then they’ll triangulate that with other sources of evidence to raise their level of certainty.
There might be other use cases, though, where it doesn’t really matter if there’s a 2 or 5 percent error rate, because maybe you’re looking at a big trend. Maybe the trend is so big that a 5 percent error rate doesn’t hide it even with a little bit of error around it. So it’s important to think about the use case and how much error it can tolerate. Then you can figure out, well, how much error does this generative AI tool produce? Does it actually meet my needs in terms of the kinds of evidence I want to produce for the kinds of claims I want to make?
Do you imagine some kind of AI class or tutorial for journalists in the future on how to use AI responsibly?
I’d like to avoid a future where people feel like they can be fully reliant on automation. There may be some hard and fast rules about situations where you need to go through and manually check the output and situations where you don’t need to check it. But I’d like to think that a lot is in between those two extremes. The Society for Professional Journalists puts out a book called Media Ethics which is basically all of their case studies and reflections around different types of journalism ethics. It could be interesting to think about it this way, maybe that book needs a chapter on AI to start parsing out in what situations there’s more problematic things that can happen and in what situations there are fewer.
Maybe it’s not all that different from how it’s done now where we have these core journalism constructs like accuracy or the Do No Harm principle. When you’re publishing information your goal is to balance the public interest value of the information against the potential harm it could cause to someone innocent. So you have to put those two things in balance. When you think about errors from AI or generative AI summarizing something, applying that kind of rubric might make sense. Like, what is the potential harm that could come from this error? Who might be hurt from that information? What damage might that information cost?
Yeah, and journalists make errors, too, when dealing with data.
There’s a difference, though, and it comes back to the accountability question. When a human being makes a mistake, you have a very clear line of accountability. Someone can explain their process and realize why they missed this thing or made an error. Now, that’s not to say that AI shouldn’t be accountable. It’s just that to trace human accountability through the AI system is much more complex.
If a generative AI system makes an error in a summary, you could blame Open AI if they made the AI system. Although when you use their system you also agree to their terms of use, and assume responsibility for the output accuracy. So Open AI says it’s your responsibility as the user and they assign responsibility to you. They don’t want to be accountable for the mistake. And contractually, they’ve obligated you to be responsible for it. So now it’s your problem. Are you willing to take responsibility and be accountable as let’s say, the journalist or the news organization that uses that tool?
How would a journalist keep track of using AI in case they had to track back to an error?
That’s a great question. Keeping track of prompts is one way to think about it. So that as the user of the technology there’s a notion of what was my role in using the technology. What were the parameters that I used to prompt the technology? That’s at least a starting point. So if I did something irresponsible in my prompt, there would be an example of negligence. For instance, if I prompt something to summarize a document, but I set the temperature at 0.9t and a high temperature means that there’s a lot more randomness in the output.
You should know that if you’re going to use these models. You should know that if you set the temperature high, it’s going to introduce a lot more noise in the output. So maybe you do bear some responsibility if there’s an error in that output. Maybe you should have set the temperature to zero or much lower in order to reduce that potential for randomness in the output. I do think, as a user, you should be responsible in how you’re prompting, and what parameters you are choosing and be prepared to explain how you use the technology.
Has America ever needed a media defender more than now? Help us by joining CJR today.