the news frontier

Robot Journalism and the Future of Digital Media

More on Columbia’s new dual degree in journalism and computer science
April 19, 2010

Starting in 2011, Columbia University will be offering a new combined degree between the journalism school and engineering school, which will aim to blaze a trail in the future of online journalism and bridge the divide between computer techs and journalists—who increasingly work together in this digital media world, but don’t always speak the same language. CJR’s Alexandra Fenwick checked in (separately) with Bill Grueskin, dean of academic affairs at Columbia’s Graduate School of Journalism, and Julia Hirschberg, professor of computer science at Columbia’s Fu Foundation School of Engineering & Applied Science, to find out how computer engineering and journalism can combine to create a whole new breed of cross-disciplinary techno/journalist ninjas.

Alexandra Fenwick: Are the people in this new program going to be engineers first or journalists first?

Bill Grueskin: In order to be part of this program, they’re going to have to meet the admissions standards of both schools. To get in the building here, you have to write well, think cogently, and have a real appreciation for what journalists do. It’s a little more subjective whereas in the engineering school, it’s a little more objective, there’s certain courses and there’s more clearly delineated definitions about what you need to have taken to get into the master’s level. It’s more… do you have the academic and practical background to handle the load at the engineering school. So it will be a fairly small number of people. We view this as twelve to fifteen students a year. They will take the RW1 journalism essentials class, but will also take a seminar and a workshop specifically designed for them… Both sides wanted this to be a real dual program. Our plan is to have the students living in each school each semester, so you take some courses here, and some courses over there each semester.

AF: So you’re looking for that rare breed of engineer who can communicate well and a journalist who can speak in computer code?

BG: I’m not sure it’s quite as rare as we think it is. I don’t think we can cull a class of 200 students, but worldwide I think we can find twelve or fifteen people. And once it gets running, if you know this is out there, it might affect the classes you take earlier on.

AF: Is this an attempt to fix the broken business model of journalism?

Sign up for CJR's daily email

BG: This is not a partnership with the business school, but I think some of the innovations that are going to help journalism through this rough patch are going to be more technologically based than economically based. If journalism is going to survive, it has to deliver information to readers and viewers or listeners in a much more effective way. And one way you go about doing that is providing ways of surfacing information that is much more effective. [“Surfacing” in tech speak, is a verb that means bringing to the forefront, or to the surface.]

AF: Database mining has been discussed as one area that this hybrid degree could focus on. But sometimes it takes someone to actually sit there and read through every single document in a database, especially if it’s a poorly organized one. (See: the Los Angeles Times analysis of sudden acceleration complaints registered by Toyota, Lexus, and Scion drivers with the National Highway Transportation Safety Administration.) Can that hard work, that sweat equity of journalism, which doesn’t come cheap and takes a long time, be fixed or corrected for with a quick computer program?

Julia Hirschberg: There are well known technologies for text classification that allow you to put information in bins so you can spend less time reading it, a whole field of computational linguistics which studies how to get computers to understand language and generate it.

AF: What is something that is already out there that could be an example of the sort of thing that could come out of this dual degree program?

JH: Newsblaster (developed by colleague Kathleen McKeown) is a program that uses automatic summarization that helps you decide if you want to read something. It looks at lost of news sites every day, clusters them and then summarizes the thread by picking out sentences from different articles.

(Reporter’s note: This isn’t the first collaboration between the ink-stained wretches at Columbia’s journalism school and the calculator-wielding set at the university’s engineering school. Back in 2002, upon Newsblaster’s launch, journalism students were called in to fact check the program’s summaries, which, though largely accurate, admittedly sometimes included major errors.)

AF: That could be useful for news aggregators, but I can also see how it could be useful for media criticism, which often depends on reading the archived body of work on a single topic to see where the press got it right and got it wrong.

JH: Right, it tracks stories over time to see how information gets added or revised. There’s also automatic question answering, not just a search, which answers a question like “Name the participants in event X,” or “Describe a certain trial.” You can do search, and a lot of retrieval ranks relevance, and you can hope the most relevant item rises to the top.

AF: Isn’t that what librarians do?

JH: I take my hat off to every librarian, but they’re not always available and may not have the expertise or resources. Certainly some people are better at Web search than others, but [the state of Web search] is not state of the art. The hope of collaboration is to develop better technologies to manage and search information.

AF: How do you make sure that the practical applications that come out of this program aren’t just innovative and “cool,” with lots of bells and whistles, but really journalism, or something that helps journalists?

BG: These students will be taking RW1, law, ethics, and the history of journalism, real meat and potatoes issues that concern journalism. But at a certain point, it’s almost wrong to look at this like training someone in law to practice law the way it’s always been practiced, or training someone in medicine to practice medicine the way it’s always been practiced. A lot of this is giving really smart young people the background skills they need to go develop and innovate things that you and I sitting in this office can’t imagine.

AF: What about the kind of innovations that are often new and useful, but aren’t pure journalism, in that they don’t involve reporting, or the journalism of verification.

BG: In its strictest sense, it may not involve any original reporting. But suppose you could come up with some app that could tell you in real time what crimes have been committed on the street that you’re walking down? Would that be journalism or not? Well, people read newspapers to find out what crime stories are going on in their neighborhood, and it’s all very random—if some reporter gets sick there might not be any stories about crime that day in the newspaper. So it may not be a 6,000-word magazine article, but it’s information that really helps people.

One of the things I really like to analyze is the property tax burden which is really unequally shared, not here in the city, but in the suburbs. So could you create a database that allows people to see how property tax rates are rendered in the community—which is all publicly available—play around with it, and size it up against purchase prices of those houses and what they’re probably worth now? A tool like that, if it was successful, everybody in the neighborhood would use it, but it would also give you really useful information about whether the tax burden is being equitably or inequitably distributed across a community. A reporter could sit down and get all this paper and put together their own little Excel program and come up with something, everybody would find it really interesting, and the next day it would be obsolete. Because the database is constantly getting updated.

So is that journalism? If your editor asked you to do a story on unfair property tax distribution, I think you would agree that is a journalistically valid thing. But is that something you should sit there with two big stacks of papers for six months? Probably not. And if you create a feed that’s pulling the information from the tax assessor’s office, then actually you build it and whoosh [sound of rocket taking off].

AF: So robot journalists—we’re being replaced by robots?

BG: Right, robots ([aughs]. But you could also use it as a tool from which you could be writing a lot of really interesting stories. For years down the road.

Alexandra Fenwick is an assistant editor at CJR.