Data journalism and information visualization is a burgeoning field. Every week, Between the Spreadsheets will analyze, interrogate, and explore emerging work in this area. Between the Spreadsheets is brought to you by CJR and Columbia’s Tow Center for Digital Journalism.
Data journalism is often a collaborative process. When Nathan Matias, a research assistant at the MIT Media Lab Center for Civic Media, worked on a project about gender and the media that ran this week, he collaborated with the Guardian’s Datablog to publish the work.
The Datablog team had wanted to run a piece about gender and media. When they began their research, they found Matias was also working on a project in that field—researchers at the Center for Civic Media regularly blog about their nascent ideas rather than keeping them close so nobody can steal them. This openness fosters an environment which encourages collaborative contributions. Guardian reporter Lisa Evans — who now works at the Open Knowledge Foundation — got in touch with Matias, he told CJR.
The project was born out of Matias’s interest in gender representation in the news, Matias said. As he began to research, he discovered that women have significantly fewer bylines than men. His research plan from there had three parts. He began by gathering a year’s worth of news stories from three major UK papers: the Daily Mail, the Telegraph, and the Guardian. To do this he created a scraper — a piece of code that takes, or “scrapes,” data from a website — to get this information. The Daily Mail and Telegraph data are downloadable from their websites, but it was Guardian’s Open Platform — an open tool for developlers to use the Guardian’s data — that made this step very easy for him, he said.
Matias then obtained demographic information from the UK’s Office of National Statistics, allowing him to determine the gender identification of the authors whose articles he’d collected.
Matias wanted to look at social data too — Facebook likes and shares, for example — because he was primarily interested in how online journalism was changing the gender balance in the media. He wanted to look at not only whether women were writing the articles, but if their audiences were sharing them in an unbalanced way. He added social data to his collection using the “Amo” app Cole Gillespie — a Knight-Mozilla Fellow — built. Amo retrieves all the social data associated with any URL:
All of this data was then combined using a piece of software Matias wrote specifically for the project. Its final form was presented in the Datablog on Tuesday with an accompanying article that broke it down and explained the display to a general audience:
By collaborating with the editors at the Datablog, this piece was published much faster than if it appeared in a peer-reviewed journal. Publishing the piece on a newspaper website also allows Matias’s research to be presented in an accessible format. Matias said that Evans, the reporter, worked closely with him on the project, making sure the piece was crafted with a public audience in mind.
Gender inequality in reporter bylines has public interest written all over it. A data dream team is one made up of researchers and journalists—the journalists provide the platform and ability to distill information that comes from a rigorous, researched source.
Disclosure: Anna Codrea-Rado is a former employee of Guardian News and MediaAnna Codrea-Rado is a digital media associate at the Tow Center for Digital Journalism at the Columbia University Graduate School of Journalism. Follow her on Twitter @annacod. Tags: Between the Spreadsheets, Guardian, Nathan Matias