CW: Far and away the skill that I’ve gotten the most use out of, and the first thing I recommend to anybody who’s interested in this kind of stuff is “screen-scraping.” There’s a free program called screen-scraper, which lets you go through a bunch of Web pages and take the data that you want from them and put it into a spreadsheet. If I had tried to do that by hand, it would have been days and days of tedious copying and pasting. But using this very simple piece of software, you can teach a computer how to do the copying and pasting for you.
When you build the data yourself, you can be fairly certain that no one else is going to have the story, because no one else has the data. We’ve done that a couple of times, and it becomes proprietary because it’s very hard for another publication to replicate what you’ve done.
Another project, your “Ideological Media Map”, would be the opposite kind of process—you took raw data from a research project, but then made it visual in a way that would be much more accessible to readers than a spreadsheet. That seems to align with a lot of things that Slate does, such as “The Explainer” column.
CW: Sure, that’s taking something that’s out there and presenting it in an appealing way. To me, that’s a perfectly successful project, if you allow readers to access and understand data, even if they could download it themselves.
Even after you collect the data, building these interactive elements must be incredibly time consuming.
DP: One of the things that I hope Labs is going to do is to develop templates for particular kinds of projects, so that even if the data is a different set of data, that you’ve already got the map function and you already know how it’s going to work. So you can create templates and then plug in different kinds of data sets. So we’ll have the job-loss map, and then maybe next time it’s not a map of the country, it’s a map of the world, and it’s not jobs, it’s McDonald’s.
CW: These things do take a fairly long time, much longer than writing an article. But we aspire to build a code library, so that we have all these different tools that are unique to Slate that we can then deploy very quickly if we want to use them again.
Do you think that these experiments are worth spending time on even if they might not necessarily have journalistic merit? Some things I see on the Labs server seem to have no real purpose, they’re just a fun thing to play around with. For instance, this “Facebook Name Explorer” charting all of the first and last names of Facebook users.
DP: I tend to be very liberal about this. When you have people like Chris and Jeremy, they’re going to have lots and lots of ideas. Some of them are going to be hardcore investigative journalism, some are going to be playful. If there are things they want to play around with, but that aren’t necessarily going to win us a National Magazine Award, that’s okay.
CW: The name map thing is actually pretty interesting. It certainly doesn’t have any news peg, or any argument attached to it, but I do think that names have a kind of sociological importance. For instance, you can see that some last names are associated with first names from a lot of different national origins, and other last names are more strictly tied to very common Biblical names like David or Christopher. I think this could be a tool for people to play around with and come to different conclusions; it’s more than a curiosity in my mind.
So I guess some of these data projects can start out as experiments without specific goals attached, but can then actually generate story ideas, maybe with the help of your readers.