The Media Today

The Times sued OpenAI. OpenAI says that isn’t the whole story.

January 11, 2024
Image via Wikimedia Commons.

In late December, the New York Times filed a lawsuit against OpenAI and its artificial-intelligence software ChatGPT. According to a Times story, the suit alleges that OpenAI used millions of Times articles to train “automated chatbots that now compete with the news outlet as a source of reliable information,” and in so doing has tried to “free-ride on the Times’s massive investment in its journalism.” Although the lawsuit doesn’t request a specific amount from OpenAI, it says that the company should hand over “billions of dollars” in damages. The Times also wants OpenAI to destroy any AI models, databases, and training data that were based on copyrighted material taken from the paper.

The Times parent company said that it approached OpenAI with its concerns earlier in 2023 and tried to find an “amicable solution,” but failed. An OpenAI spokesperson, however, told the Times that the company was “surprised and disappointed” by the lawsuit—of which, the company said, it had no advance notice—and that it believed the two sides had been “moving forward constructively” toward some kind of licensing agreement when the suit was filed. The spokesperson added that OpenAI respects the rights of content creators and owners, and is “committed to working with them to ensure they benefit from AI technology and new revenue models.” Apart from that, the company didn’t respond to the claims in the suit.

That was, until this week, when OpenAI released a thousand-plus-word commentary on the lawsuit. Its response suggests that any partnership between the company and the newspaper is, if not out of the question, then at least some way off. OpenAI argued the lawsuit is “without merit” for a number of reasons, including the company’s belief that incorporating content from publishers such as the Times in its AI training data is covered by the fair-use exemption in US copyright law, and its claim that—in spite of this belief—the company provides publishers and other creators with the ability to opt out of having their content ingested by the company’s tools, because that is “the right thing to do.” (OpenAI says that the Times only opted out last August.)

The OpenAI statement alleges that the Times “is not telling the full story” in its lawsuit—specifically, in its claim that ChatGPT has re-created entire articles taken from the paper when prompted to do so. OpenAI said that such “regurgitation” of entire articles is “a rare bug that we are working to drive to zero,” and that the company expects its users to “act responsibly” when they make a request of its software; “intentionally manipulating our models to regurgitate,” the company says, “is not an appropriate use of our technology and is against our terms of use.” OpenAI also claims that the Times mentioned such verbatim-quotation cases during the negotiations between the two companies, but that the newspaper “repeatedly refused to share any examples, despite our commitment to investigate and fix any issues.” OpenAI also says that these kinds of results can only be achieved by using “intentionally manipulated prompts” that, for example, include lengthy excerpts of existing articles. (Some tech analysts and ChatGPT users have also said that reproducing such results is now difficult, if not impossible.)

The core of OpenAI’s response to the lawsuit revolves around whether its activities—pulling in large quantities of content from the internet to use as training data for its “large language model” AI engine—are covered by the fair-use exception in copyright law. As I wrote in October, following a number of other lawsuits aimed at AI companies, the question of whether fair use applies to this practice is as yet untested, though a number of experts in both law and media believe that it should. OpenAI notes in its statement that “a wide range of academics, library associations, civil society groups, creators, authors, and others” have argued in submissions to the US Copyright Office that this sort of activity should be covered by the fair-use principle.

As I explained in October, determining whether an AI engine’s ingestion of copyrighted content is covered by fair use is complicated, not only because AI systems and their mechanics are opaque, but because the fair-use principle itself is not a black-and-white concept. In order to adjudicate a fair-use case, courts have to consider four separate, and in some cases competing, factors: the purpose of the use and whether it is “transformative,” the nature of the copyrighted work, the amount of the work used, and what effect the use has on the market for the original. Nonprofit uses are more likely to be found to be fair, but the Copyright Office notes that this doesn’t make all nonprofit uses fair and all commercial uses not. Using an entire work also doesn’t necessarily invalidate a claim of fair use.

Sign up for CJR's daily email

Google has already won two cases based on the argument that ingesting large quantities of copyrighted content is fair use. In 2006, an adult-entertainment site claimed that Google infringed its copyright by creating small thumbnail photos of its content, but the court ultimately decided that allowing images to be searched easily was “fundamentally different” from simply creating a copy. The other case involved a project in which Google scanned more than twenty million books into a searchable database. The Authors’ Guild, a professional organization that represents the interests of writers, argued that this constituted copyright infringement, but in 2013, a judge ruled that this, too, was fair use.

After the Times filed its lawsuit against OpenAI, Will Oremus noted in the Washington Post that judges have so far been wary of the argument that simply ingesting copyrighted works constitutes a fair-use violation. Jason Bloom, an intellectual-property lawyer, told the Post that this kind of activity “is more likely to be considered fair use, based on precedent, because you’re not publicly displaying the work” in the search result itself. Bloom told the Post that he believes a court will look not at the inputs into OpenAI’s software-training processes, but rather the outputs returned by ChatGPT and similar tools in response to user prompts. “If the outputs aren’t infringing,” Bloom argued, “then anything that took place before isn’t infringing” either.

The Times, of course, is not the only actor concerned about the impact that the burgeoning generative-AI industry could have on journalism—and yesterday, the Senate Judiciary Committee convened a hearing on that exact topic. In prereleased remarks, Roger Lynch, the CEO of Condé Nast, said that AI companies “copy and display our content without permission or compensation in order to build massive commercial businesses that directly compete with us.” However, Jeff Jarvis, a journalism professor at the City University of New York, said in his prereleased remarks that while the Times lawsuit invokes a long tradition of copyright protections for newspapers, they were not actually covered by US copyright law until 1909—and even then, there was a debate as to whether news articles should be added to the law. Before that date, Jarvis said in his prepared remarks, newspapers often employed “scissor editors” to copy one another’s articles.

Back in the present day, other major outlets have sought to protect their content using means other than copyright litigation. Last year, the Associated Press struck a licensing deal that allows OpenAI to use AP stories as part of its training database. Last month, OpenAI signed a more extensive deal with Axel Springer, the German publisher that owns Politico and Business Insider, that will present ChatGPT users with summaries of selected stories from Axel Springer–owned properties, with attribution and links to the full article. (The financial terms of these agreements have not been disclosed.)

The Times itself is exploring how it might benefit from AI technology: the paper recently hired an editorial director of artificial-intelligence initiatives to “establish protocols for the newsroom’s use of AI and examine ways to integrate the technology” into the paper’s journalism. Some observers have suggested that the Times may end up following in the footsteps of the AP and Axel Springer, seeing its lawsuit as a gambit in a broader negotiation over how much OpenAI will eventually pay to license the paper’s content. That may be so. But the lawsuit could also become an important turning point in how we—and the law—look at copyright and AI.


Other notable stories:

  • Last night, Nikki Haley and Ron DeSantis faced off in a one-on-one Republican primary debate hosted by CNN; Donald Trump, the front-runner, once again refused to participate, but did counterprogram the debate with a simultaneous town hall on Fox News (to CNN’s apparent chagrin—Jake Tapper suggested that Fox was chasing “ratings > journalism”). While his rivals slugged it out, Trump enjoyed “a charmed night,” the Times reports, facing only a few “gently skeptical questions” from members of the audience (one of whom shouted “love you!”). Meanwhile, Chris Christie, the only candidate to consistently take aim at Trump during the campaign, dropped out. His exit could benefit Haley in New Hampshire, in particular—though prior to his announcement, he could be heard on a hot mic predicting that Haley is “going to get smoked.”
  • In 2022, a group of philanthropic funders made a splash when they announced a twenty-million-dollar commitment to set up a nonprofit news startup in Houston, a venture that launched last year as the Houston Landing. Now the outlet appears to be in turmoil after Peter Bhatia, the CEO, fired Mizanur Rahman, the editor in chief, and Alex Stuckey, an experienced investigative reporter. Bhatia told the pair that the site needs a “a fairly comprehensive reset” if it is to thrive in the “digital space,” but remaining staffers have reacted poorly to the firings—in a letter to the board, they said that the decision “blindsided” them, that it appeared unjustified, and that they remain “baffled” despite Bhatia’s explanations. Nieman Lab’s Sophie Culpepper has more details.
  • For CJR, Seth Stern, of the Freedom of the Press Foundation, notes a disturbing trend of lower courts ordering “prior restraints” that block news organizations from reporting certain information—despite clear-cut precedent holding that such rulings are unconstitutional—and argues that affected outlets should consider simply ignoring them, particularly since appeals can be costly and time-consuming. “It seems the only way judges are going to stop is if they learn that the press will disregard their orders, shame them on editorial pages, and dare them to imprison journalists for doing their jobs,” Stern writes. “Is that contempt of court? Maybe. But censorial judges deserve contempt.”
  • Last year, Tamara Kay, a professor at the University of Notre Dame, sued the Irish Rover, a student newspaper, over two stories detailing her supposed support for abortion rights. Kay said that the stories were false and that she had been “harassed, threatened, and experienced damage to her residential property” as a result. This week, however, a judge dismissed the case, ruling that Kay had failed to show that the Rover doubted the truth of its stories. The AP notes that the case “raised questions about press freedom and academic freedom at one of the nation’s preeminent Catholic universities.”
  • And Aaron Rodgers, the injured New York Jets quarterback, ignited a media firestorm last week when he appeared on Pat McAfee’s show on ESPN—which has frequently counted Rodgers as a guest, despite his conspiratorial rhetoric—and made unfounded claims tying Jimmy Kimmel, a late-night host on ABC (which, like ESPN, is owned by Disney), to the late sex trafficker Jeffrey Epstein. Yesterday, McAfee finally announced that Rodgers will no longer appear on his show through the end of the football season.

ICYMI: Johnnie Kallas of Cornell’s Labor Action Tracker on the labor beat in 2024

Mathew Ingram is CJR’s chief digital writer. Previously, he was a senior writer with Fortune magazine. He has written about the intersection between media and technology since the earliest days of the commercial internet. His writing has been published in the Washington Post and the Financial Times as well as by Reuters and Bloomberg.