the news frontier

Visualizing Data, Telling a Story

Behind the scenes of The Guardian’s interactive WikiLeaks coverage
July 27, 2010

Of the three news outlets that broke the WikiLeaks story on Sunday, The Guardian, on its Web site, incorporated the most interactive and visual elements to help put the mass of data into context for its readers. The Guardian team, like The New York Times and Der Spiegel, had less than a month to analyze, distill, and present as many of the 92,000 classified incident reports as it chose. But how to choose?

According to a post by news editor Simon Rogers on his DataBlog, he and his staff received the data as a 92,201-row Excel spreadsheet, an incredibly unwieldy pile of information. Their first task was to whittle it down into a more manageable size so that they could more clearly see what they were dealing with. Luckily, the programmers on staff had gained experience with an even bigger dataset when they combed and analyzed COINS—a record of millions of items in UK government spending—for story ideas earlier this year.

The editors realized pretty early in the process that certain parts of the data that WikiLeaks provided should not be published, for security reasons. They removed the spreadsheet rows that held sensitive information about individual informants. They also removed incident reports that were too incomplete to be of use. What was left, programmers plugged into an internal database that reporters could easily search by keyword, date, location, or severity.

In an interview from London on Tuesday, Rogers recalled how the WikiLeaks assignment started with just a handful of investigative journalists on staff. As the scope of the project and the size of the dataset became clear, though, more and more programmers, designers, and technical staff were pulled in to help.

“News organizations are always terribly unfocused things,” Rogers said by phone. “So it was quite good to have something that was unique and interesting, but one that also really galvanized the whole organization.”

As soon as the data had been wrangled into a more usable format, the next step was to figure out what reporters and graphic designers should focus on to develop into print stories and an online package. At first, they had one reporter assigned to comb through each type of incident: IED, friendly fire, et cetera. It soon became clear that the reports involving IEDs were the most complete set of data, and tended to contain very exact geographic information. The editors decided that this would be a good focus for an easy-to-read visual element for the paper and its Web site.

Sign up for CJR's daily email

Rogers also noted that honing in on this type of incident made journalistic sense, as well. “We had to pick a key area, which obviously was IEDs,” he said. “IED use in Afghanistan has been the story of the war, how it’s spiraled out of control, and that could be told visually.”

So the editors handed over a limited dataset of IED reports to a small team of graphic designers: Alastair Dant, Paddy Allen, and Mark McCormick. Dant, who led the project, explained by phone on Tuesday that there was not enough time to actually read through all the incident reports to determine which ones might contain sensitive information, and so, to be safe, they simply cut the summary text from each one. What they were left with was purely quantitative: a list of about 16,500 IED explosions, each with its own time stamp, latitude and longitude, and number and type of associated casualties.

Designer Paul Scruton illustrated a static image and series of graphs for the print version of The Guardian to show the total number of IED attacks for each year of the war so far. Meanwhile, Dant, Allen, and McCormick worked to build two interactive features for the Web. One is a clickable Google map embedded with 300 “key incidents,” selected by the editors for military significance or narrative content, whether it’s “Marine guards shoot civilian in leg” or “Two border tribes fighting.” The second is an animated Flash timeline of all of the 16,500 IEDs from January 2004 to January 2010.

Each format provides a slightly different experience for the reader. According to Dant, judging by the feedback they have gotten from readers so far, “It seems that the simplest piece is the one that has reached people most effectively,” he said. “With every item you click on on that map, you’re guaranteed to get something potentially interesting to read. Whereas with the raw data, sixteen and a half thousand items, it can seem far more mundane, obviously in an almost shocking way.”

Dant said that he was slightly worried that piling all 16,500 blips onto a screen would desensitize the reader. Each one of those circles on the timeline has a story, has its own consequences attached, but with so many of them in one place, one could start to feel detached. It was slightly problematic to remove them from their associated narratives, he admitted. On the other hand, there is something to be said for the Flash timeline’s unique emotional effect, an experience caused by the gradual and exponential increase in explosions as the cursor reaches the present day.

The timeline can also illuminate patterns in violence, clustered in time and by geographic area. As Dant pointed out, when curious readers (and reporters) see such clusters, they can pause the playback and click on each explosion to identify it by report number. Then, if they are so inclined, they can look it up in the publicly available data on WikiLeaks and investigate it for themselves.

Dant said he knows that readers are still probably struggling to digest this “flood of information” that has come out since the story broke on Sunday. As of Tuesday, The Guardian’s Web site alone has about fifty stories analyzing the database’s contents. The interactive tools, he hopes, will help both Guardian reporters and readers in the future. The key is to be inspired to look further into the material now available, rather than to be merely overwhelmed by it.

“My hope is that, maybe in time, people will have the chance to sit down and use the interactive tools, and people will start to use it to mine the data a bit…. Potentially people are going to find more stories that way,” said Dant. “I wonder whether this can provide some kind of tool for people to find stories that haven’t really been told yet.”

Update: The first mention of Simon Rogers previously misstated his surname. It has since been corrected.

Lauren Kirchner is a freelance writer covering digital security for CJR. Find her on Twitter at @lkirchner