How reporters used data covering the World Cup

Some pieces were revealing; others threw out numbers without saying anything

With the 2014 World Cup set to end on Sunday, this week’s edition of Data Darts and Laurels will focus on how journalists used data to explain everything from what’s going on in the games to fan behavior.

To start, we’re going to pay special attention to FiveThirtyEight, given their position as quantitative storytellers within the ESPN family. A LAUREL to FiveThirtyEight’s ability to analyze the sport without making predictions about the future. Benjamin Morris made the assertion that Argentina’s Lionel Messi is “impossible” in that he is the best player, with such an unlikely bunch of numbers that he seems to break the laws of statistics. Morris backs up that assertion effectively through a complex, technical analysis showing that Messi is unparalleled in everything from scoring with his weak foot to shooting from afar.

The number-based approach that Silver is known for can prove useful in arguments about sports that are otherwise based on observation or gut feelings. In another piece on the site, Nate Silver explained why the meme-inducing US goalkeeper Tim Howard objectively had the best match of the tournament in the team’s loss to Belgium. His piece about why extra time favors the favorites in a match — but how when a tied game goes to penalty kicks, pretty much anything goes — is also convincing. Especially in sports, data journalism seems to work best when helping describe why something might happen as opposed to yielding a prediction.

Speaking of predictions, a mix of LAURELS and DARTS to FiveThirtyEight for picking Brazil as 2014 World Cup champions. (To be fair, they weren’t the only ones.) In 2010, shortly before Silver joined The New York Times, he and ESPN developed the Soccer Power Index (SPI), an algorithm that factors in player strength, past performance, and even geography. This year, the algorithm found Brazil to be the overwhelming favorite, and no one can really fault it for that. However, as the Brazilians made their way to the semifinals through less than stellar play and after losing their two best players, the model still predicted Brazil would win ahead of its epic drubbing against Germany. Following the 7-1 loss, Silver could offer little other than the understatement that the match was “the most shocking result in World Cup history.” Objectively, it’s impossible to say what led to the collapse, but the SPI model fails to account for things like minutiae of play and team morale, so Brazil’s results show that the future cannot be fully predicted through data analysis.

LAURELS to pieces that grasp that social media shows how fans watching are responding to each individual match. Twitter’s own data team found that chatter gets quieter during each kick in a penalty shootout. During the United States-Germany game, Deadspin’s Regressing blog wrote about how “Nazi” was used over 30,000 times on Twitter, especially in the minutes surrounding a goal from Germany, proving that stereotypes are common crutches when it comes to trash talk. From a journalism standpoint, these pieces of data don’t require much thoughtful analysis, but they still reveal facets of spectators’ real-time behaviors that wouldn’t be possible without them. It will be interesting to see how social media data develops as a real-time source to explain people’s behavior with regard to sports and beyond.

A DART to Bloomberg Businessweek for coming to an incomplete conclusion that “soccer concussions are more frequent than you think.” The piece, by Eric Chemi, the magazine’s head of research, connects some high-profile injuries in the tournament in Brazil with the fact that in American high school athletics, concussions are frequent in soccer, particularly among girls. However, it is unclear in the piece whether the frequency of concussions are because of the popularity of the sport in high school or because soccer players are more concussion-prone than other athletes. This piece exhibits a recurring issue with data journalism trying to explain a news peg: Rather than truly explain the issue through statistical analysis, the author relies on simply providing potentially relevant numbers without fully explaining why they are relevant.

LAURELS to the Wall Street Journal and to The New York Times’ Upshot for offering some offbeat data journalism in their World Cup coverage. WSJ’s Geoff Foster sat through the first 32 of 64 World Cup matches and compiled statistics for “flopping,” displaying an injury on the field in a way that might benefit the team. This was an interesting example of how reporters can compile their own data sets and offer cogent analyses that add to the surrounding discussion.

Using nearly 20,000 online interviews in 19 countries, the Upshot found that most respondents thought Brazil would win it all. The analysis offered a fascinating glimpse of how “politics, geography and good old schadenfreude” play in rooting interest. Mexicans dislike the Americans. The Greeks hate the Germans. The Japanese and South Koreans hate each other. It also found how most Americans, at least at the beginning of the tournament, don’t consider themselves “very interested in soccer.” Given the tournament’s high viewership in the United States, parts of this survey would be worth revisiting to figure out whether soccer is truly becoming mainstream here.

Tanveer Ali is a Chicago-based journalist who is Chicago's data reporter and social media producer. He has reported for the Chicago News Cooperative, WBEZ, and GOOD Magazine, among others. A former staff writer at the Detroit News, he received a master's in journalism from the Medill School of Journalism. Tags: , ,