Look past the 2016 campaign’s gaudier elements, and you’ll notice a less sexy development across political media: Digital behavior is being quantified. The Associated Press has launched a real-time dashboard charting out political conversation and searches on Twitter and Google, while USA Today has partnered on a similar venture with Facebook. Outlets like The Washington Post and Politico now frequently cite the number of mentions candidates receive on social and traditional media during debates or leading up to votes. And forecasters have begun analyzing Google results as yet another data point in the ongoing quest to predict elections accurately.
Scroll through live coverage of debates—or of voters going to the polls, as they will in a handful of primary contests today—to see this new breed of information in the wild. Such digital metrics offer media a seemingly unprecedented ability to make bird’s-eye observations of wide swaths of Americans. Often published by internet giants or broken down by analytics firms, these data are increasingly accessible, easily digestible, and highly shareable—an enticing cocktail for news organizations constantly craving new content. But they should be handled with care.
Even FiveThirtyEight, the numbers-driven site that often taps the brakes on others’ use of data, ran a “Facebook primary” last month. A series of interactive maps showed what proportion of users “Liked” candidates’ pages in zip codes nationwide. But the short writeup that accompanied the project included an important caveat: “Anything seems possible this year, but, still, be careful how you interpret these numbers: Facebook likes are not votes.” (Emphasis theirs.)
Such is the inherent challenge of using this type of information in political coverage. The data is interesting, certainly, though it remains unclear how far it can be extrapolated to comment on real-world behavior. Academic research on these metrics is still in its infancy. And gravitation toward top-line numbers is a common habit among journalists with limited time or resources. The combination presents fertile ground for misusing these new data streams.
Among the success stories so far: the use of search trends to predict GOP primary outcomes. When news organizations began crowning winners and losers from four contests last Tuesday, they seemed to confirm what Google search activity the day before had suggested. Ted Cruz led queries for candidates in Idaho, John Kasich saw unexpectedly high interest from users in Michigan, and Marco Rubio posted lackluster Google numbers across all states in play. The final vote tallies would mirror those trends.
Google activity’s accuracy last week came after it similarly helped forecast outcomes in earlier GOP primaries, as economist Justin Wolfers pointed out in a March 1 piece for The New York Times.
But Wolfers and others have also warned of the limitations of this data. National search activity, for example, will show Donald Trump piquing Google users’ interest by large margins. But the real estate mogul draws more ambient interest than his GOP competitors—Have you heard what Trump said this time? To cut through this noise, former Google analyst Seth Stephens-Davidowitz says, prudent observers can compare national data against information from specific states. “If it’s a statewide bump, it’s more likely to be people voting for him, not just people interested in a news story,” he writes in an email.
Localized trend lines, then, are key in analyzing this data. Proximity to voting is another important qualifier. “On the day of the election, [Google searches] turn out to be a pretty close match to the results,” says Patrick Ruffini, co-founder of the opinion research firm Echelon Insights. “Farther out, there’s sort of a decoupling of the Google trends data from the results. But there’s still a correlation.”
Search activity’s predictive accuracy, though, hasn’t necessarily held across partisan lines. In the Democratic primary, Bernie Sanders’ campaign has been fueled by millennial backing, while Hillary Clinton culls a greater share of support from older generations. “If some of those populations aren’t using Google as much,” Ruffini says, “those numbers could be skewed. On the Republican side, you don’t see as many demographic differences in the vote.”
Other uses of Google trends would seem to carry fewer risks. Take coverage of the explosion in “How to move to Canada” searches after Trump’s Super Tuesday victories—it made for compelling coverage of a particular public sentiment. Still, without absolute numbers readily available, journalists’ observations were limited to the relative growth of such searches—not their prevalence nationwide. How serious those users might be is even more difficult to gauge.
Whereas search activity sheds light on Americans’ private curiosities, social media conversation is public by definition. But academics are likewise still learning how to measure and analyze that political speech. A 2013 study by then-Indiana University researcher Joseph DiGrazia suggested that the share of Twitter attention received by nearly 800 congressional candidates in 2010 and 2012 was a statistically significant indicator of their eventual vote shares.
“This holds true even when most of the sentiment is negative,” DiGrazia, now at Dartmouth College, adds in an interview. “You’re not going to go on Twitter to bash someone who’s unlikely to win.”
Still, he cautions against drawing conclusions from Twitter data rather than more traditional polling. “There haven’t been many presidential candidates since we’ve had Twitter, so the sample size [for research] is incredibly small,” DiGrazia says.
For Ruffini, a former digital strategist for the Republican National Committee, Twitter can be a useful comparison of interest between groups. He recalls the first GOP primary debate in August, when Beltway media tweeted relatively sparingly about Ben Carson. At the same time, however, the neurosurgeon drew a huge share of both Google searches and chatter across Twitter. August would be the month Carson began his steady ascent in national polls.
“We use that as an example of elite-public disconnect,” Ruffini says. “If you can isolate who political reporters talked about, and who is talked about more broadly, you can kind of isolate whether a candidate is getting more attention [from voters] than traditional media.”
Of course, such comparisons are difficult for journalists to pull off in real time. Many instead parrot top-line numbers passed around by the social network itself, often in easily embeddable graphics.
Sharing such data is no great harm, as long as there are caveats. Social media conversations provide a small window into which issues people are concerned about at particular moments in time. But user bases aren’t representative of the American public. What’s more, news organizations go a step too far when they portray quantitative measures of Twitter or Facebook reach as evidence that candidates are driving home their message.
While Trump has dominated social media conversation for much of the campaign, early findings by George Washington University researchers suggest Cruz has more efficiently leveraged it for tangible engagement. The latter’s most widely shared tweet was the announcement of his campaign, which included a video that presented his message clearly. Trump, meanwhile, often draws thousands of retweets with naked insults of his competitors or shallow criticisms of individual journalists—scattershot exclamations with the punctuation to prove it. Jeb Bush, meanwhile, drew tens of thousands of retweets by apologizing to his mother for smoking weed.
“Volume is surely going to mean something—you don’t want to be a tree that falls and nobody hears it,” says Lara Brown, one of the George Washington professors who heads the project. “At the same time, volume can’t be everything, especially when volume is off-message.”
Then again, perhaps Trump’s sheer dominance of Twitter conversation overcomes negative sentiment from hordes of critics, or his lack of coherent messaging. That may also be the case with news media mentions, which have similarly begun to find their way into meta analyses in recent months. Trump draws a huge amount of negative attention from the press en route to dominating its attention overall. It’s hard to say which is more important.
Journalists shouldn’t over-promise when they cite such data in stories. The known unknowns of these metrics call for a measure of caution, even if the political media environment doesn’t often reward it.