Big data, in the dark

This fall, two compelling stories about politics and “big data” are playing out in the media. The first one you’ve already heard about: in the wake of Barack Obama’s re-election victory, The Atlantic, Time, and many other outlets have introduced readers to the hackers-turned-campaign-aides whose sophisticated algorithms underpinned the Obama campaigns’ data-driven voter turnout, persuasion, and fund-raising efforts.

The second story is lower-profile, though potentially no less significant: a congressional Bipartisan Privacy Caucus led by Reps. Ed Markey and Joe Barton is conducting an inquiry into the largely self-regulated companies that collect, analyze, and buy and sell personal data—much the same types of data the Obama campaign relied upon for its celebrated digital successes. After soliciting information from nine of those “data brokers” earlier this year, the caucus has asked those companies and federal regulators to attend a congressional briefing this week. (The Senate Commerce Committee opened a similar probe last month.) “I’m hoping to ratchet up the transparency so we can foster a system of oversight and consumer control over their data,” Markey told The New York Times over the summer.

But there’s less overlap between these two stories than you might expect. As Ad Age’s Kate Kaye pointed out in a November 16 article, even as members of the House and Senate examine data-handling practices that have much in common with those that played a prominent role during the election, the major data firms employed by the campaigns are not among those being scrutinized. “NGP Van, the Democratic data powerhouse favored by the Obama team was not part of the inquiry,” wrote Kaye, who is one of the most experienced reporters on the online political advertising beat. “Other political data firms left out of the inquiry include Catalist, another Democratic data firm; Campaign Grid, which offers Republican data and online ad targeting; and Aristotle, a well-established non-partisan political data company.”

That gap—between the growing importance of sophisticated data manipulation to high-level campaigns, and the scant oversight of the companies and practices that make that data-crunching possible—is at the root of an important challenge now confronting reporters. Some of the core functions of political journalism are to explain to readers what campaigns are doing, and to track—and, as needed, push back against—the messages campaigns disseminate. In a campaign that uses big data to deliver tailored messages, those tasks get harder, for reasons both technical and logistical. And when the use of that data isn’t transparent, they get harder still.

The tension between the rules that politicians propose for commercial data handlers and the ones they abide by themselves has been flagged a couple of times, by reporters on the lookout for it. As Kaye pointed out in the lead of her November 16 article, the Obama administration supports a Do-Not-Track protocol for Internet browsers “that, if pervasive, would throw a wrench into the data-collection tactics that empowered the campaign.” And back in March, a prescient piece by Politico’s Dave Levinthal noted the friction between the administration’s “privacy bill of rights” for Internet users and its own practices. The situation online mirrors the do-not-call registry overseen by the Federal Trade Commission; the registry exempts calls made by political campaigns, which are outside of the FTC’s jurisdiction.

It’s a dynamic that has some observers predicting that efforts to regulate commercial data mining, by the FTC or other agencies, are doomed to falter. It’s also one that galls privacy advocates, who worry that the political sphere actually creates special concerns warranting heightened scrutiny.

“There’s a unique danger with respect to heightened political data gathering as opposed to run-of-the-mill data for advertising,” said Dan Auerbach of the Electronic Frontier Foundation, an advocacy and litigation group. Political campaigns tend to want to keep data for longer time periods, Auerbach said, and to build more sophisticated personal profiles. He also pointed out that the lack of control citizens have over their own data when used by a campaign is “not in keeping with the principle of letting a user delete their own information” outlined in the administration’s Privacy Bill of Rights.

In part because the campaigns’ practices are not fully transparent, it’s unclear just how much there is to worry about from a privacy perspective. Journalists who have led the way in covering this story say it is important to keep in mind the mostly self-imposed — if not legally required — limits on how the campaigns use data to send targeted messages. “There’s a lot of talk about how they’re directing specific messages to you [as an individual], but typically they’re not doing that,” Kaye said in an interview. “They put people in groups.” And Slate’s Sasha Issenberg devoted an April column to the tough restrictions the campaigns, wary of “the creepiness factor,” impose on their online advertising efforts. (In response to privacy complaints, the data director for Obama for America penned a Dec. 6 op-ed in The New York Times that ran under the headline “I am not Big Brother”; critics were not all persuaded.)

What is clear, though, is that the current standards for disclosure and transparency around the use of political data and digital campaigning—even on such basic questions as how money is being allocated—make reporting more challenging.

For example, while the Federal Communications Commission has long required TV stations to maintain public files of political ad buys—and those records are now online—there is no such equivalent for online ad buys. Campaigns “don’t have to report anything about what localities they’re buying in for online media,” said Kaye. “That’s a huge gap in information.”

“Campaigns only show their X dollars with a consulting firm to buy media—that’s all I know,” she said. “What I’d love, ideally, would be if the FEC would require standards in how things are reported.” That way, it would be clearer if spending went to buy advertisements or towards other media-related purchases, such as outside consulting. “I’d love to know who all the providers are—all the companies at work here,” she said.

As the news site ProPublica tried to tackle similar challenges, it reverse-engineered a solution. Reporter Lois Beckett explained how the site’s Message Machine, a project on online ad-targeting, and a piece about a Crossroads GPS online ad relied upon crowdsourcing, leaning on readers to report what they were seeing.

“If the campaigns won’t tell you what they’re doing, then work with your readers or people out there to try to get examples,” she said. “Then go back to some of the organizations and ask them, ‘Here’s an ad and here’s who got it in X way.’”

That approach had some success, Beckett said, but “it was limited”—because the campaigns get to decide how much they disclose, and “part of their strategy is no one has the right to know except for them.’” Take a look at Beckett’s regularly updated primer on the Obama campaign’s data practices, and the limits of disclosure are apparent. The answer to the first two questions, on the campaign’s data collection practices and what will happen to the data it harvested, begin: “It’s still not clear.”

Beckett doesn’t expect a new regulatory regime to fill in the blanks about specific campaign practices anytime soon. “When you ask political data and targeting specialists about the future of data, people who are insiders in this kind of political technology assume that no matter what the regulations are for consumer data tracking, politicians passing these laws will exempt political campaigns,” she said.

If campaigns are going to remain self-regulated, one way for reporters to get up to speed will be to educate themselves about how self-regulation now operates in the commercial sphere. On the privacy issue, for example, Beckett pointed to research by Carnegie Mellon professor Lorrie Cranor, who found that the standards industry groups created to regulate online advertisements may not meet Internet users’ common-sense expectations.

Reporting about big data and campaigns, then, can brew together several daunting journalistic tasks: having to decipher byzantine disclosure laws and to wrestle with data practices that are at times based on high-level computer science. The good news is there are some familiar points of entry: finding the players involved in decision-making, seeking out the relevant companies, and comparing what we know about campaigns’ actions to existing legislative proposals and to the expectations of ordinary Web users.

This material can get pretty technical, and far afield of the regular campaign beat; debates among international working groups over Do-Not-Track standards that affect cookies’ abilities to anonymize data for microtargeting may not offer much for reporters to grab onto. But in an era when politics consists of pairing church-going habits with online shopping histories to deliver messages designed by behavioral scientists to stir voters’ passions, the press needs to develop new tools and seek out different types of information as it works to hold campaigns accountable. Whether the necessary doors will open to allow journalists to get at that information, though, is still anyone’s guess.

“In Ohio, campaign coverage is anti-social”

Sam Petulla is a journalist based in New York City and a graduate of the Columbia University Graduate School of Journalism. He covers technology, policy, and the awkward relationship between the two, and has written for The Atlantic, The American Prospect, Wired, and other publications.

Big data, in the dark

About

Support CJR

Advertise