In 2006 Adrian Holovaty, then a programmer and journalist of some reputation, wrote a blog post entitled, “A fundamental way newspaper sites need to change.” In the five years since he published it, Holovaty went on to win a Knight News Challenge grant, launch EveryBlock, sell it to MSNBC.com, and become one of the leading programmer/journalists working today. As the years passed, his post crystallized to become one of the more important, and prophetic, pieces of writing about what we now call data journalism.
So much of what local journalists collect day-to-day is structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers. Yet the information gets distilled into a big blob of text — a newspaper story — that has no chance of being repurposed.
what I mean by structured data: information with attributes that are consistent across a domain. Every fire has those attributes, just as every reported crime has many attributes, just as every college basketball game has many attributes.
This view has come to be accepted and championed by many important people and organizations. There are today many efforts to bring structure to all manner of information, and there’s of course lots of work left to be done. Notably, one slice of data that still lacks structure in the United States relates to journalists themselves.
We each have attributes like a phone number, e-mail address, title, beat, employment history, voting history, education history, Twitter username, published articles and reporting, frequently quoted sources… The list goes on.
These attributes don’t tell the whole story of a journalist, just as a box score doesn’t encapsulate a sports game. But they are material to the whole. And they are, for the most part, unavailable or at the very least disorganized and distributed. Unstructured, as Holovaty might say.
This reality was highlighted thanks to a recently launched effort by Ira Stoll, the former New York Sun vice president and managing editor who now runs FutureOfCapitalism.com. His new project is News Transparency, a website that seeks to act as a central database for the attributes of American journalists. Anyone can create a profile for a journalist or add to an existing profile. People can also offer feedback on the quality of a journalist’s work, or make note of a prediction made by the journalist.
“This site aims to improve the accuracy, quality, and transparency of journalism by making it easier to find out about the individual human beings who produce the news — human beings with opinions, relationships, history, and agendas,” reads the site’s about page. “That information should help readers, viewers, and listeners put what they are reading in better context, and it may even prompt some improvements by the journalists.”
News Transparency’s launch received a decent amount of press attention, with Poynter, Forbes, AFP and others writing about it. When I called Stoll recently to check in on the launch, he was on the other line with a French reporter. He said traffic on the site has been more international than expected.
That isn’t entirely surprising. The U.K. has been home to a similar site, Journalisted, since 2007. Journalisted describes itself as “an independent, not-for-profit website built to make it easier for you, the public, to find out more about journalists and what they write about.” Perhaps there are folks out there who have been waiting to see a little structure brought to American journalists.
Overall, the online trend is towards disclosing more information about journalists. News sites are putting journalist’s photos, e-mails, Twitter accounts, and other contact and connection information with a byline. Forbes’s new article page includes prominent merchandizing of the reporter’s information and most popular work. Google recently announced it will use the Google+ profiles of journalists to add byline information in Google News.
These are examples of very basic profile and contact information. In contrast, Journalisted offers analytics about what journalists cover, who they write about, how frequently they publish. These data have the potential to tell an interesting story about the people that cover the news—and therefore about the news itself. Imagine having access to a journalist’s commonly cited sources, basic information about their financial holdings, their most commonly covered topics, corrections to their work, their voting history, and so on.
One concern is putting this information out there without context could unfairly cloud the perception of a journalist’s work. Not every piece of reporting requires people to know what a person’s spouse does for a living, or which investments they hold. It’s more contextual. That’s a great argument for why we need this information in a structured format. That way, it can be gathered and stored and updated in an efficient and useful way. It can be disclosed in the right way, right away. It can then also be aggregated and analyzed to reveal trends and information.
News organizations are not, to my knowledge, doing this right now. (Am I wrong? Share an example in the comments.) Some operations have created topic and profile pages for prominent people that mix structured data with human curation. Here’s the New York Times topic page for Silvio Belusconi, for example. In contrast, here’s a profile page for a Times reporter. No surprise that a world leader gets a more robust page than a reporter. Fair enough.
But imagine if that reporter bio was broken up into individual attributes. Suddenly it would be easy to see how many Times reporters went to Harvard, or were born in Florida. Maybe there would be a tag cloud of the topics they write about, the names of the sources they cite. We’d be able to see which sources are perhaps overexposed by the paper, and which topics are getting the most and least amount of coverage. Wouldn’t Times editors find that interesting information to have? I bet readers would, too. Perhaps it would help us better understand trends in coverage and also to see patterns and potential conflicts of interest.
Think of it this way: this is the kind of data we’d love to have about the people, governments, agencies and organizations we cover. Why shouldn’t we offer the same level of transparency? Wouldn’t we benefit from a similar level of disclosure and transparency?
Stoll agreed the opportunity for his site exists in part because news organizations aren’t doing this themselves.
“I hope that this motivates them to do that,” he said. “If I were an editor I would say, ‘Gee, we ought to have our people on our platform and have this information on our platform, not his platform.’ But it’s not in there in most cases. A lot of these organizations make it difficult to access their reporters, and that’s why people pay tens of thousands of dollars to fancy PR firms.”
Stoll’s initiative invites the participation of the public and journalists to build out profiles and add information. (Journalisted is automated.) He has bit off something of an engagement challenge for himself, but so far Stoll said he’s been happy with the participation level. At one point soon after launch, he said, he worried he’d need to hire someone on to help him review the submissions and edits being made on the site. Aside from what can be offered on his site, he’s entertaining thoughts about how a database like his could be put to use for news consumers.
“I think once the data profiles are a little more developed there are all kinds of ways you can apply it,” he said. “You can have an app or a toolbar so if you’re browsing news and you see a name of a journalist it throws up an overlay that links to their profile.”
That kind of overlay could add interesting context to the reporting we encounter online. But, at the risk of sounding like I’m trying to move people away from
Still’s Stoll’s site, there is an opportunity for news organizations to build internal databases of this information and take a role in offering a new, meaningful level of disclosure and information about their journalists and the topics and people they cover.
If we could create standards for the structure of journalist attributes then the databases at different organizations could talk to each other and suddenly we have a very interesting and valuable database of journalists and their work. Of course, news organizations will differ in how much information they ask for and are willing to disclose.
Some outlets may feel comfortable listing phone numbers and e-mail addresses and the employment and education history of their journalists. Others may feel that’s too intrusive. Should they ask their journalists to disclose voting history if they cover politics? What about disclosures regarding any family relations that could crate conflicts of interest?
The possibilities for what could be built are intriguing, though not without challenges and questions. Stoll’s idea is a good one, as is Journalisted in the U.K. What I wonder is if we can move towards some generally agreed upon elements of disclosure that would see more consistency in terms of what news organizations share about their journalists? Can we create a structure for these attributes that enables them to be liberated from traditional bios and internal documents to live and useful and contextual?
The biggest question right now is: will news organizations that have made use of open data and voiced their support for the movement be willing to put their attributes where their mouths, and bylines, are?
Correction of the Week
CORRECTION - An earlier version of this story referred to Cain having a ‘cedar-quality’ mustache. The proper term is ‘theater-quality.’ — Politico