This fall, the Big Ten and the Southeastern Conference offered college football fans more original content than ever, posting preview stories, in-game quarterly recaps, and immediate post-game analysis online—and all of it was created by a computer. The company behind the effort, Narrative Science, was born out of collaboration between the Medill School of Journalism and the McCormick School of Engineering and Applied Science at Northwestern University, and readers apparently never noticed the absence of a human scribe. Janet Paskin asked Kristian Hammond, a professor of computer science at Northwestern and the chief technology officer of Narrative Science, how a computer program generates a sports story and what that means for the future of journalism.
Whose idea was it to write a program to create sports stories?
We set out to build something that could write a genuine story based upon data. There’s a tradition of database journalism—using census data, crime data, financial statements to create stories. Sports just happened to be first.
I’m not sure sportswriters are very happy about that. Doesn’t it imply that game stories are formulaic?
No, the opposite. If this were formulaic, it’d be easy. We have to make everything that’s implicit in a writer’s skill set explicit to a machine. The balance of ‘what happened’ with what makes what happened interesting, and the figuring of the priorities, the structure of the narrative, all those things participate in the system of building a story. It’s complex, and we love that. We have tremendous respect for sports writing.
What kind of data does the program need to create a game story?
The same data a reporter would use to write a recap if he weren’t at the game: box score and play-by-play, player stats, player trends, team statistics. We use a whole range of statistical information and draw in the things that end up being interesting. If a player is close to breaking a record, within the conference, or a team record, or a personal best, we can notice those things and articulate them.
How do you go from there to a narrative?
You have to characterize the structure of the game. We know that the facts that are in the story are in the data, so how do we pull the interesting facts out? First, we characterized the plays: Did it change the score? Did it change who was winning? Did it set up a play that changed the score or who was winning? And then we created our idea of an “angle”: Was this a surprise victory? Was this a rout? Was this a back-and-forth? Once we figured out what was important, we can apply the angles and generate a story.
How did you teach the program to “speak sportswriter”?
We have writers who know what the structure of a story is and what kind of language people are genuinely used to. One of our guys is a stringer for the Chicago Tribune; he covers high school baseball, high school basketball, and he writes for us. We go for that language.
As you broke game stories down into their essential features and components, what did you learn?
Beyond the angle on a game, there is also what we call “compulsories.” For example, in baseball, it doesn’t matter what happened in the game, you have to say something about the pitcher’s performance.
What are the other requirements?
We had to consider and program for point of view and tone. When we did our first set of stories, one of the editorial comments from the Big Ten Network was, when a Big Ten team plays a non-conference team, even if the Big Ten team loses, we would appreciate it if you would say something nice about the Big Ten team.
Also, when we’re looking at things like, what constitutes a blowout, that’s different for high school football, college football, and professional football. Finding those inflection points ended up being and important learning experience. In high school football, you have some embarrassing moments where you don’t want to call it a blowout because it’s a large margin.
Is that about understanding the game, or being sensitive to teenagers?
If you put a Little League game through a system that was writing stories for a college or professional baseball, the result would be completely inappropriate for that audience. That we learned from Medill—who’s the audience and what’s important to them. If you’ve got a Little League game, you don’t say, “Timmy Jones disappointed everybody,” but in a college game, that’s completely acceptable.
Once the program has the data, how long does it take to write the story?
A matter of seconds.
And then, is there an editor? At what point do human hands touch the story?
When we have a new kind of sport or a new kind of story, a recap versus a quarterly update versus a preview, there are eyes on every story in the beginning. But by the end of the season in baseball, things were generated and published, because we’d taken care of all the glitches.
Could this put people out of work?
Our goal was always to generate stories in places where publishing organizations simply didn’t have the manpower anymore to write these stories. Local newspapers are fighting for survival. If we can provide highly localized content inexpensively, they can fill two more pages and sell ads. With that little bit of extra money, they can hire someone to write the stories we can’t touch. We’re providing a possible set of solutions that will help an industry, and if you help an industry, it will create more jobs.
Was that intuitive to your colleagues at Medill?
It was not their first response. But they’re very forward-looking. They understand that technology is not the answer, but it’s going to be part of the solution.
What’s next, after sports?
Finance is a wonderful area where there’s a tremendous amount of data. If you think about financial reporting, there aren’t that many companies where reporters are really paying close attention. We can pay attention to all of them.