Data journalism and information visualization is a burgeoning field. Every week, Between the Spreadsheets will analyze, interrogate, and explore emerging work in this area. Between the Spreadsheets is brought to you by CJR and Columbia’s Tow Center for Digital Journalism.
Behind every great interactive is an even greater chart. Charting, the graphical representation of data, is the crux of any effective information visualization. In March, ProPublica built an interactive diagram about election results that at first glance looks complicated and intricate but is in fact a rather straightforward chart. A linked explanatory blog post helps readers understand the interactive. ProPublica’s, like any other visualization, shouldn’t merely dazzle with its graphic savvy—first and foremost, it must inform.
The ProPublica piece, created by Al Shaw, Kim Barker, and Justin Elliott, untangles a web of Federal Election Commission data. The reporters gathered data from The New York Times’s campaign API, tidied it up and plotted it as a flowchart. The reason it looks so jazzy—multicolored lines and shapes go to and fro between the columns—is because they used a specific type of flowchart, a Sankey diagram, in which the width of the streams are proportional to their quantity. A flowchart is the natural choice to illustrate the ProPublica piece, because it shows the flow of money from point A (the campaigns and super PACs) to point B (the payees). Whether or not ProPublica should have specifically used a Sankey diagram, however, is another issue.
Edward Tufte—perhaps the most haloed graphic statistician among the design and tech community—once described a Sankey diagram as “probably the best statistical graphic ever drawn.” Tufte, who is professor emeritus of political science, statistics, and computer science at Yale, wrote a seminal book, The Visual Display of Quantitative Information, which was described by the Boston Globe as “a visual Strunk and White.”
In the book, Tufte referenced Charles Joseph Minard’s chart of the Napoleonic War. A French engineer, Minard was one of the earliest pioneers of data visualizations. His graph, “Losses of the French Army in the Russian campaign 1812-1813,” is a significant historical document.
Minard’s use of the Sankey diagram to convey his information makes sense. It shows Napoleon’s army’s losses, reconciling multiple variables at once and drawing correlations between them. Minard’s diagram visualizes the change in army size over time, charted against the change in temperature, the army’s course, and route. Because a Sankey diagram is supposed to show the direction of flow as well as rate, it becomes clear when looking at Minard’s chart that Napoleon’s army rapidly dwindled in size as it fought its way to Moscow. This gets the point across to the viewer—more soldiers died as the weather got colder—more quickly and dramatically than a write-up ever could.
The ProPublica piece doesn’t plot as many different variables as Minard’s. Nor do the sizes of the individual flows of money change. The primary statistical reason for this choice of diagram is to emphasize which campaign or super PAC is pumping out the most money. A Sankey diagram isn’t necessary to show that, but it does reflect the nature of the content: right in front of us, visually, is the “tangled web” of campaign and PAC donations the accompanying article describes.
In her book The Wall Street Journal Guide to Information Graphics, Dona Wong boils effective charting down to four principles: research, edit, plot, and review. This means following a process of using independent data sources; identifying the message in the numbers; choosing the right chart to present it; and then finally going back to cross-reference the data with the original source. It’s not as easy as it sounds, and when laid out in such a methodological way shows that choosing a chart choice is just one small part of the pie.
But it is an important piece. A chart must illuminate facts, draw parallels, and explain something to the reader in a visual vocabulary. If it falls short of this, it will be lost, just like the French army in the winter of 1812.Anna Codrea-Rado is a digital media associate at the Tow Center for Digital Journalism at the Columbia University Graduate School of Journalism. Follow her on Twitter @annacod. Tags: Between the Spreadsheets, propublica, Sankey diagram