How the COVID Tracking Project fills the public health data gap

Alexis Madrigal was in his kitchen on Wednesday, March 4, when his phone rang. It was a warm, sunny evening in Oakland, California, and Madrigal, a staff writer at The Atlantic, was cooking dinner for his two children. On the line was Robinson Meyer, a colleague at The Atlantic, who wanted to talk about the rapid spread of COVID-19.

Madrigal and Meyer had previously discussed how they might help people understand how little testing had been done. This time, Madrigal recalls, Meyer spoke with a new sense of urgency.

“Imagine that we are on the Army Corps of Engineers beat, and it’s five days before [Hurricane] Katrina,” Meyer told him then. “Like, what the eff are we doing here? We don’t have a testing number in the US.”

Madrigal walked outside with his computer and sat down in the sunshine. As he and Meyer talked, they started a Google Doc and created a form letter they could send to health officials. Then they divided the states between them and went to work. Throughout that night, they scoured health department websites and reached out to all fifty to get the most up-to-date numbers on how many people had been tested so far. In the morning, after a little sleep, they started writing.

Their story, “The Strongest Evidence Yet That America Is Botching Coronavirus Testing,” went live on The Atlantic’s website on March 6.

Among other details that turned up in their research was a verified total of 1,895 people who had been tested for COVID-19 in the US up to that point. 

Moments after the story ran, Madrigal received an email from Jeff Hammerbacher, a data scientist and founder of Related Science, a medication-discovery platform. Madrigal and Hammerbacher had met as freshmen in college, but had since fallen out of regular contact. Hammerbacher was surprised by what Madrigal and Meyer had done; he, too, had independently built a spreadsheet to track the numbers. Madrigal and Hammerbacher agreed to merge their data into a joint effort, which has since snowballed into the COVID Tracking Project, an intensive effort to update the numbers—including a state-by-state count of positive and negative results, among other information—on a daily basis. 

Since its launch, the project has roped in dozens of volunteers—journalists, researchers, developers, and graduate students among them—who pour hours into the project in addition to their full-time jobs and other responsibilities.

READ: Coronavirus and a Freelancer’s Dilemma

Among the project’s goals, Madrigal says, is to put pressure on the decision-makers responsible for getting out tests and reporting results. Although the CDC publishes testing numbers and results for other illnesses, including influenza, the agency has not been providing the same level of detail for COVID-19. At press time, the total number of tests done nationwide, according to the CDC, is less than half what the Tracking Project has compiled.

“It’s just classic accountability journalism,” Madrigal says. “In the old days, you would have published a big feature with these things and then the government would have been like, ‘Okay, fine. We’ll put the numbers out.’ That’s not what has happened.”

 

THE IMPORTANCE OF TESTING PEOPLE for COVID-19 is clear: without testing, there’s no way to know how widespread the virus is, where it is most concentrated, or how great anyone’s risk for infection is. There are clinical reasons, too, says William Schaffner, an infectious disease expert at Vanderbilt University in Nashville. “Clinicians like to know what they’re dealing with when taking care of sick patients, particularly those that are admitted to the hospital,” he says. “Whether it’s influenza or COVID-19, you have to use appropriate infection-control precautions, so when we can care for them, we don’t get sick ourselves.”

It’s also important to know how many tests have been done, because that number offers perspective on prevalence rates, Madrigal says. A handful of positive cases in a city means something different if the city has tested ninety people or nine thousand people. Without that denominator, it may look like one state has more cases simply because it has done more testing.

Producing an accurate count turned out to be more complicated than asking each state for a tally, as Madrigal discovered when he and Hammerbacher started comparing notes. Madrigal and Meyer had been thinking like journalists: their original spreadsheet, which represented a snapshot in time, was full of quotes from officials and estimates of testing capacities. Hammerbacher, on the other hand, was thinking like a data scientist. He had collected counts for several days in a row, offering a more longitudinal perspective that he updated every afternoon. He had also been keeping extensive notes that explained quirks in the data. For example, many states report how many specimens have been tested, but it is standard to test two specimens for each person. Determining an accurate count of people tested required dividing the total specimen number by two, which he accounted for in his numbers.

Hammerbacher’s version was more robust, Madrigal says, and became the framework for the COVID Tracking Project, which includes data for every state along with extensive notes that explain nuances in the data and reasons why the project’s totals might differ from published state data. Alaska, for example, doesn’t report whether its totals represent specimens or people. And as of March 16, according to the notes, Michigan was only reporting positive tests.

The project quickly took on a life of its own. Too busy to keep updating the database without help, the creators distributed a Google Doc, asking for volunteers. The first person to sign up was Erin Kissane, a friend of Meyer’s in Astoria, Oregon, whose background included editing technical books and working as editorial director at Open News, where she facilitated collaborations between newsrooms. When COVID-19 started spreading, she went looking for things she could do to help. After a week that felt like ten years with the COVID Tracking Project, she took on the role of managing editor.

The project now includes about sixty-five volunteers, with thirty to forty core contributors communicating daily by Slack. The volunteers—a group that includes journalists, researchers, developers, and graduate students, among others—aren’t paid for their efforts, nor does the COVID Tracking Project receive any funding. Still, they’ve poured hours into the project. After a week on the platform, the group came close to hitting its ten-thousand-message archivable limit—which, Kissane says, Slack generously extended without fee.

Team members compile data throughout the day and enter the numbers along with notes and questions, like what to do about states that record deaths within their totals of positive tests. Then two double-checkers come through to verify numbers and make decisions about ambiguities. An official update happens at 4pm every day, and smaller tweaks happen throughout the day. Data on the site is open for use by anyone who wants to cite it or use it to create visualizations. Plenty of publications have dived in, including the New York Times, Vox, Politico, and Minnesota Public Radio.

Other test-tracking sites exist, Kissane says, but they use scraping technology to grab numbers from around the Web. The COVID Tracking Project is the only one she knows of that relies on human power to collect and make sense of the numbers. It is an unusual combination of rigorous journalism and technological expertise. “We’re trying to apply journalistic principles of accuracy and human verification over speed, while relying on this pool of heavily technical volunteers who are handling everything that can be automated,” Kissane says. 

 

If we can help states clean up their data, understand what people need to know, and maybe keep their websites from falling over, that’s a piece we definitely want to be doing.

 

FOR THE MOST PART, the volunteers have never met in real life. Still, Kissane says, they share a sense of purpose during an uncertain time. “I think for a lot of people, working on what is a kind of dark and frustrating database would not be the best choice, psychologically. And then, for some of us, this seems to be how we are managing our anxieties,” she says. “The idea that we are doing something that may be useful to consumers, to newsrooms, and ultimately to public health authorities who desperately want to publish this information and can’t—that’s kind of weirdly soothing.” 

Collaboration goes beyond the group itself. In the process of collecting data from states, the COVID Tracking team has forged unexpected relationships with public health departments. The team recently sent around a document to state public health departments that explained what they should be publishing to make it easier for the project to compile and compare numbers. Occasionally, public health authorities respond to the team on Twitter. And in some cases, phone calls from the team, along with pressure from other journalists, have helped push states to change the way they report data or the amount of data they release. The group is starting to identify ways that it might be able to help state agencies with IT support and hosting assistance in cases where sites go down.

“It has been super encouraging to see states move from really opaque and patchy reporting to more comprehensive and clear reporting after we’ve asked,” Kissane says. “States are working so hard and, as far as we can tell, with very little support on this specific reporting piece. If we can help them clean up their data, understand what people need to know, and maybe keep their websites from falling over, that’s a piece we definitely want to be doing.”

The project’s future is still unclear. At first, Madrigal says, he thought the CDC might see his and Meyer’s original story and post the numbers he feels sure they have. They still haven’t done it, and so the project goes on. The team is now moving its information from Slack channels into documents it can preserve. And many times a day, Kissane says, cool ideas for side projects come up—like collecting county-level data or policy-related information. If the CDC eventually starts reporting testing data, Kissane says, there might be room to pursue some of those ideas.

“We didn’t mean to be doing this work over the long term, and every day we hope the CDC will put us out of business,” she says. “But until that happens, it appears that we are the closest thing to a human-validated, comprehensive source of this information.”

The kind of transparency the project offers has the potential to reduce collective anxiety, Schaffner believes. “People are anxious about this,” he says, “and one reason they are anxious is that they don’t have very much information about what’s happening in their neighborhood.”

For Madrigal, the phone call from Meyer that started the whole project is an emotional memory. He didn’t know at the time how much was going to happen next, but he realizes now that the call was the beginning of something new. “At that point, no schools were closed. The rest of the world was carrying on as per normal,” he says. “And over the next twenty-four to forty-eight hours—and now for everybody else over the last two weeks—everything kind of changed.”

Only now, he says, is the outside world coming into alignment with where his interior thoughts were in early March. “It feels like the world forked for me after getting that phone call,” he says. “Ever since then, everything has been different.”

THE MEDIA TODAY: Dr. Anthony Fauci’s tightrope act

Has America ever needed a media watchdog more than now? Help us by joining CJR today.

Emily Sohn is a freelance journalist in Minneapolis who has written for the Washington Post, New York Times, Nature, National Geographic, NPR, and many other outlets.

TOP IMAGE: Adobe Stock