From a lawyer’s perspective, there’s not much difference between what Auernheimer and Spitler did and what Scripps did—both used fairly simple scraping techniques to acquire information that wasn’t protected by a password, firewall, or other security precaution. Spitler’s code generated random numbers in order to test out all possible SIM card identifiers; Wolf emphasized that Scripps didn’t use unscrambling or random number generation in its code. “Nothing we used was sophisticated or required guesswork or isn’t used in other newsrooms in the most basic capacity,” he says. A spokesman for TerraCom emphasized that only a few hundred of the documents Scripps accessed were available through Google, and that accessing the rest required messing around with the URL to find “non-public directories.”

But, as a matter of law, none of these distinctions matter much. “The real issue is: If information is publicly available on the Web, does accessing that information violate the CFAA?” says EFF staff attorney Hanni Fakhoury.

It’s a slippery enough issue that the behavior of the reporter, researcher, or troll accessing the information makes a big difference.

“Ultimately it comes down to the way you disclose the information. We’ve liked the idea that people should responsibly disclose and that they try to go to the company first to resolve the issue,” says EFF’s Fakhoury.

And any reporter using scraping should pay attention to how they approach the task.

“A smart reporter will get in touch—if it’s government data—will get in touch with the agency first,” says Steve Doig, the Knight Chair in Journalism at Arizona State University, who has consulted with a host of publications on computer-assisted reporting. The best approach may be to avoid the issue altogether, by asking for a copy of a database. “At least set up the script in a way that it doesn’t overload the server,” Doig says. “Have it run in the small hours of the night and have a reasonable rate of requests, so that you’re not doing what’s basically a denial of service attack.”

Doig also cautioned, though, that reporters should pay attention to the terms of service of the website they’re accessing. “Companies that have gathered information may make it available, but their terms of use make it clear that they consider their data proprietary and valuable and they would not take well to scraping what all they have,” he says.

And that’s exactly why groups like EFF are pushing for reform to the CFAA. They argue that it’s so broadly written that violating terms of service like those could, in theory, land a person in prison for years. Operating in good faith and in the service of the public helps. But if a company or a government agency decided to go after a reporter for this type of document diving, it could.

Disclosure: CJR has received funding from the Motion Picture Association of America (MPAA) to cover intellectual-property issues, but the organization has no influence on the content.

Sarah Laskow is a writer and editor in New York City. Her work has appeared in print and online in Grist, Good, The American Prospect, Salon, The New Republic, and other publications.