Miami has deep ties to the Caribbean. So when a devastating earthquake struck Haiti on January 12, The Miami Herald mobilized for one of its biggest stories of the year. Reporters were on a flight to the Dominican Republic that night and filing from Haiti the next day. The sense of mission extended to the paper’s Web site, where a special Haiti channel pulled together print coverage as well as video pieces, photo archives, and Twitter feeds from correspondents. Multimedia editor Rick Hirsch thought his site could open a window onto the tragedy for audiences around the world. “Haiti really is a local story for us,” he explains.

According to the Herald’s server logs, his hunch was right: traffic leapt by more than a third in January, to 35 million page views, the only time it broke 30 million in the six months before or after. Nearly 5.9 million different people visited the site that month, another high-water mark for the year.

But not according to comScore, the media measurement firm that, along with rival Nielsen, purports to be the objective word on what Americans do online. ComScore recorded fewer than 9 million page views for the Herald, and barely 1.6 million “unique visitors.” Even more distressing, comScore—whose clients include major advertisers and ad agencies—had the paper’s page views actually declining by 40 percent the month of the earthquake. “Those trends just don’t make sense,” insists Hirsch, whose newspaper subscribes to comScore as well. “We know our traffic went through the roof.”

The open secret of online publishing is that such wild discrepancies are routine. Whether you ask The Washington Post or a stand-alone site like Talking Points Memo (TPM), you’ll hear the same refrain: publishers looking at their own server data (via software like Omniture or Google Analytics) always see much more traffic than is reported by Nielsen and comScore, both of which extrapolate a site’s audience by tracking a small “panel” of Web users, just as Nielsen does for its famous TV ratings.

“The panel-based numbers are atrocious,” says Kourosh Karimkhany, TPM’s chief operating officer, pointing out that Nielsen and comScore have a hard time measuring workplace Web surfing. “But as long as they’re equally inaccurate for our competitors, it’s okay. It’s something we live with.”

For that matter, the two ratings firms frequently disagree with each other. In May, for example, Gannett’s various properties commanded 37.5 million unique visitors according to comScore, but only 25.6 million according to Nielsen. ComScore gave Washingtonpost.com an audience of 17 million people that month, but Nielsen recorded fewer than 10 million. And so on.

It’s fair to ask how business gets done amid such uncertainty. Who should a site’s sponsors—or for that matter, its journalists—believe?

Publishers say the cacophony scares away advertisers, a conclusion supported by a 2009 McKinsey & Company study commissioned by the Internet Advertising Bureau. Executives from Newser and MLB.com told The Wall Street Journal’s “Numbers Guy” columnist last February that undercounting by Nielsen and comScore keeps them off the radar of major advertisers, and hurts their bottom lines.

This messy situation has yielded any number of white papers and task forces; reform efforts are currently under way at the IAB, the Media Ratings Council, and the Newspaper Association of America, among others. Last year CBS, NBC, and Disney led the formation of a “Coalition for Innovative Media Measurement,” that seeks to establish a cross-platform standard to gauge total media usage.

In response, comScore has unveiled a new “hybrid” approach that claims to mash up panel results with server-side data for a more accurate count. This is a little ironic, since the raison d’être for the user panels is that server data can’t be trusted because it counts computers, not people, who may visit a site from more than one machine. Whatever the technical merits, one comparison found the “hybrid” counts boost audiences by 30 percent on average; some sites, like The Onion, saw traffic nearly triple. Nielsen has a similar system in the works.

Does this mean that finally, after fifteen years of mounting chaos in online metrics, a single standard will take hold? That something like the relative clarity of TV ratings will be achieved? Don’t bet on it. No trade group or task force can address the fundamental problem—if it is a problem—of counting online audiences: too much information.

The “banner ad” was standardized by the site HotWired in late 1994. The next step was obvious: HotWired began to report what share of people clicked on each banner, i.e. the “click-through rate,” giving advertisers a new way to think about the impact of their campaigns.

That origin story goes a long way toward explaining the informational mayhem that afflicts online media today. Every visit to, say, Salon or Nytimes.com yields a blizzard of things to measure and count—not just “click-throughs” but “usage intensity,” “engagement time,” “interaction rates,” and of course “page views” and “unique visitors,” to name a few. How deep into the site do visitors go? How long to do they stay? Match any numerator to any denominator to make a new metric.

The statistics accumulate not only at the sites you visit, but also in the servers of every advertiser or “content partner” whose material loads on the same Web page. Any of these servers can attach a “cookie” to your browser to recognize when you visit other sites in the same editorial or advertising networks. Data at each tier can be collected and analyzed (thus, measurement firms like Quantcast and Hitwise pull traffic figures from ISPs to come up with their own audience figures).

The Web has been hailed as the most measurable medium ever, and it lives up to the hype. The mistake was to assume that everyone measuring everything would produce clarity. On the contrary, clear media standards emerge where there’s a shortage of real data about audiences.

Nothing illustrates this better than Nielsen’s TV ratings system, which has enjoyed a sixty-year reign despite persistent doubts about its methodology. The company has responded to some critics over the years, for instance by increasing the number of Nielsen households and relying less on error-prone viewer “diaries.” It can’t do much about the most serious charge, that the panel is not a truly random sample and thus fails a basic statistical requirement.

But Nielsen’s numbers are better than nothing at all, and that’s what radio or TV broadcasting offers: no way to detect whether 5,000 people tuned in, or 5 million. With nothing to go on, accuracy matters less than consensus—having an agreed-upon count, however flawed, as long as it skews all networks equally.

Print publications have more hard data—a newspaper knows how many copies it distributes, though not how many people actually read them. So publishers rely on third-party auditors like the Audit Bureau of Circulations to certify the squishy “pass-along” multiples that magically transform a circulation of 192,000 at The Miami Herald, for instance, into a total “readership” of 534,000.

By comparison, computer networks are a paradise of audience surveillance. Why expect media outlets, agencies, and advertisers to abide by the gospel of one ratings firm, to only talk about one number, with so much lovely data pouring in from so many sources? “People use whatever numbers look good that month. It gives publishers some flexibility,” says Kate Downey, director of “audience analytics” at The Wall Street Journal, which subscribes to Nielsen, comScore, Omniture, and HitWise. “I think if everybody had the same numbers, we would hate that even more.”

There’s another reason for the lack of consensus about audiences on the Web: the numbers don’t matter as much to advertisers. As any Mad Men fan knows, Nielsen’s TV ratings are a kind of currency on Madison Avenue. An extra point or two of penetration translates into millions of dollars over a season. That’s why plot lines peak and the news gets trashier during “Sweeps Week,” when local ad rates are set.

Not so online. In May, comScore gave Yahoo 34 million more unique visitors (167 million) than Nielsen did (133 million). But it probably won’t cost Yahoo a penny if everyone believes the lower number, because Yahoo isn’t selling its total reach. Instead, Yahoo and other sites sell “ad impressions,” or sometimes actual “clicks,” which tally up one by one. Every time a banner loads up in front of you, the advertiser owes a little more money.

Advertisers and agencies still use third-party ratings to plan their campaigns. And sites with demographically appealing audiences, like the Times and the Journal, will flaunt those statistics to entice marketing departments. But this sort of planning is less decisive since advertisers can watch their campaigns play out live and make adjustments on the fly, based on which Web sites send more customers their way.

This is not to say that accuracy is passé. Some number of people was drawn to The Miami Herald’s Haiti coverage, and it would be helpful to know what that number is. “There are a lot of optional, high-cost, high-effort editorial projects a newspaper can choose to pursue,” says Rick Hirsch. “I wish I had the data to guide these editorial choices. Ironically, it’s still like being a traditional editor, making calls based on your gut instinct—you have more data, but it’s conflicting.”

One way through the morass is for publishers to learn to ignore the numbers they don’t trust. It seems inevitable that, over time, this will mean more emphasis on mining their own server stats. For the last year, the Times, Gawker, TPM, and other outlets have been testing a site-analysis tool called ChartBeat that focuses on the last fifteen seconds of activity at their sites: what people are reading, commenting on, searching for, linking to, and Twittering about. One startling revelation at TPM: almost all of the audience drops off before the halfway point of longer pieces. Such real-time diagnostics raises thorny journalistic questions, but it also makes monthly site rankings seem irrelevant.

And what about the clarity the industry yearns for? The only way to imbue an audience number with anything like the authority of the old TV ratings is with a new monopoly—if either Nielsen or comScore folds or, more likely, they merge. That kind of authority won’t mean greater accuracy, just less argument. Advertisers don’t need it, and Web sites shouldn’t want it.

This article was adapted from “Chaos Online: How a Faulty Metrics Affect Digital Journalism,” a report written by Graves, John Kelly, and Marissa Gluck. It was commissioned by Columbia’s Graduate School of Journalism and funding for the research was provided by Mary Graham, a member of the school’s Board of Visitors. The full report is available at www.journalism.columbia.edu/onlinedata.

Ends 7/31: If you'd like to help CJR and win a chance at one of
10 free print subscriptions, take a brief survey for us here.

Lucas Graves is an assistant professor in the school of journalism and mass communication at the University of Wisconsin. Follow him on Twitter at @gravesmatter.