Valuch admitted the process wasn’t perfect; but it showcases some of the techniques that can be used in crowdsourced verification. It’s interesting to note how the team used a mass of unverified reports in order to achieve accuracy. Ushahidi is a map-driven project, so it chose to cluster the unverified reports in order to look for patterns, but there are other ways of collecting, analyzing, and presenting this information. The challenge is to find a way to quickly and accurately sort and evaluate a mass of incoming reports according to your preferences. This is a core element of distributed verification, which I called “the best way to engineer trust in today’s information environment” in a previous column about WikiLeaks’ Afghanistan documents.
“The big motivation behind SwiftRiver, to be quite frank, was to solve two problems Ushahidi was having,” he told me by e-mail. “One, how to verify crowd sourced information, and two, how to filter realtime streams of data when it became overwhelming, without sacrificing the integrity of the stream. In other words, how can you speed up the process of vetting information from Twitter, RSS feeds, SMS and email.”
When put in those terms, it’s clear that SwiftRiver has uses beyond the crisis and incident mapping pursued by Ushahidi. Gosier said the goal “is to use algorithms to make humans more efficient at sifting through data. This means using semantic technology to summarize content, the social graph to measure reputation, interaction with content to calculate exactly what type of content the user wants to see more of.”
From his comments and the Haiti example above, you can begin to see the different elements that can aid in the verification process of data: location (is the report coming from the right place?); reputation (is the source trusted by me or by people who themselves are trusted?) content comparison/aggregation (via clustering or other methods to discover patterns); timing (is the report coming at the right time?).
The complicated nature of sifting and verifying a river of information, especially under time constraints, means that total automation is unlikely, or perhaps impossible. “With Swift our goal isn’t to completely automate verification … but Swift tries to help the user deal with preferred content first, and everything else after,” Gosier said.
As for the human element, “We’re betting the [farm] on hybrids. Using algorithms to optimize human interaction. We don’t feel humans can be removed from the process.”
Gosier said “a few” newsrooms are testing out the software. I asked him to explain how a news organization might make use of SwiftRiver. Here’s what he sent back:
In the case of the newsroom, a group of reports can aggregate as much realtime info as they want and trust that the sources the group finds to be most accurate will be the sources that are prioritized. If a newsroom were to run a campaign where they crowd source, like CNN does with iReport, they can then find those citizen journalists in the crowd who actually add value.
At the core of SwiftRiver is an acknowledgement that accuracy can be a matter of perception, or situation. The tool is meant to enable people to define what accuracy means to them, and then filter based on those parameters.
“If you are a user researching a specific subject or event, certain sources of information and certain types of information are going to be more relevant to you,” Gosler said. “Swift learns from what you prefer, in a given context, and helps you curate information based on what it learns. If you were to use SwiftRiver to curate information in a different context, the results would be different.”
Or, as Valuch put it, “These days, forget about having 100 percent verified information—but you can have trusted sources or things with high probability.”
Correction of the Week