Of the three news outlets that broke the WikiLeaks story on Sunday, The Guardian, on its Web site, incorporated the most interactive and visual elements to help put the mass of data into context for its readers. The Guardian team, like The New York Times and Der Spiegel, had less than a month to analyze, distill, and present as many of the 92,000 classified incident reports as it chose. But how to choose?
According to a post by news editor Simon Rogers on his DataBlog, he and his staff received the data as a 92,201-row Excel spreadsheet, an incredibly unwieldy pile of information. Their first task was to whittle it down into a more manageable size so that they could more clearly see what they were dealing with. Luckily, the programmers on staff had gained experience with an even bigger dataset when they combed and analyzed COINS—a record of millions of items in UK government spending—for story ideas earlier this year.
The editors realized pretty early in the process that certain parts of the data that WikiLeaks provided should not be published, for security reasons. They removed the spreadsheet rows that held sensitive information about individual informants. They also removed incident reports that were too incomplete to be of use. What was left, programmers plugged into an internal database that reporters could easily search by keyword, date, location, or severity.
In an interview from London on Tuesday, Rogers recalled how the WikiLeaks assignment started with just a handful of investigative journalists on staff. As the scope of the project and the size of the dataset became clear, though, more and more programmers, designers, and technical staff were pulled in to help.
“News organizations are always terribly unfocused things,” Rogers said by phone. “So it was quite good to have something that was unique and interesting, but one that also really galvanized the whole organization.”
As soon as the data had been wrangled into a more usable format, the next step was to figure out what reporters and graphic designers should focus on to develop into print stories and an online package. At first, they had one reporter assigned to comb through each type of incident: IED, friendly fire, et cetera. It soon became clear that the reports involving IEDs were the most complete set of data, and tended to contain very exact geographic information. The editors decided that this would be a good focus for an easy-to-read visual element for the paper and its Web site.
Rogers also noted that honing in on this type of incident made journalistic sense, as well. “We had to pick a key area, which obviously was IEDs,” he said. “IED use in Afghanistan has been the story of the war, how it’s spiraled out of control, and that could be told visually.”
So the editors handed over a limited dataset of IED reports to a small team of graphic designers: Alastair Dant, Paddy Allen, and Mark McCormick. Dant, who led the project, explained by phone on Tuesday that there was not enough time to actually read through all the incident reports to determine which ones might contain sensitive information, and so, to be safe, they simply cut the summary text from each one. What they were left with was purely quantitative: a list of about 16,500 IED explosions, each with its own time stamp, latitude and longitude, and number and type of associated casualties.
Designer Paul Scruton illustrated a static image and series of graphs for the print version of The Guardian to show the total number of IED attacks for each year of the war so far. Meanwhile, Dant, Allen, and McCormick worked to build two interactive features for the Web. One is a clickable Google map embedded with 300 “key incidents,” selected by the editors for military significance or narrative content, whether it’s “Marine guards shoot civilian in leg” or “Two border tribes fighting.” The second is an animated Flash timeline of all of the 16,500 IEDs from January 2004 to January 2010.