Reporting about large-scale finance decisions is difficult for any journalist—but a team of German investigative reporters has crowdsourced a major investigative story revealing flaws in a closely guarded credit scoring algorithm.
Most citizens in Germany have a Schufa score, which is something like the FICO score in the US. Various bits of consumer data are put through a proprietary algorithm and out comes a risk assessment score indicating credit worthiness. The scores are used to inform financial decisions in all kinds of contexts: from banking and insurance to real estate rentals and other service contracts. But you don’t have to look very hard to find cracks in the accuracy of the scores. Anecdotal reports of discrepant scores, as well as a general lack of transparency and explanation around how the proprietary system operates, raise questions.
Late last November, the results of an almost year-long investigation into Schufa were published by Der Spiegel and the Bavarian Public Broadcaster. The investigation is an example of algorithmic accountability reporting: an attempt to uncover the power wielded by algorithmic decision-making systems and shed light on their biases, mistakes, or misuse. Algorithmic accountability entails understanding how and when people exercise power within and through an algorithmic system, and on whose behalf. Some outlets are introducing coverage that amounts to an emerging algorithms beat, oriented around reverse engineering, auditing, and otherwise critiquing algorithms in society.
I spoke to data journalist Patrick Stotz at Der Spiegel about his team’s process for getting the story. Two nonprofits, the Open Knowledge Foundation and Algorithm Watch, initially partnered to collect data. They crowdsourced thousands of personal credit reports from consumers, which were then passed on to and analyzed en masse by investigative journalists.
The journalists found that Schufa scores privilege older and female individuals, as well as individuals who change addresses less frequently. The analysis also revealed that since 1997 (when the Schufa score was first implemented), there have been four versions of the scoring algorithm. Some people inexplicably received lower scores with newer versions.
Crowdsourcing, data, and unintended consequences
Initially, Stotz says, one of the most challenging aspects was just getting all of the crowdsourced reports into a uniform structured format for analysis. Consumers receive their free reports as a paper print-out in the mail, and then have to digitize the documents themselves (typically by taking a photo on their phone) to upload to the database. Once uploaded, data journalists used optical character recognition (OCR) to extract text information from the forms.
The whole process is inefficient: of the 30,000 people who requested their free Schufa report (which takes several weeks to arrive by mail) only about 10 percent eventually uploaded the form. The gap between print and digital creates a substantial road block for this type of investigative journalism. (That difficulty should ease as data providers come into belated compliance with the General Data Protection Regulation [GDPR], which requires that information requested by electronic means “shall be provided in a commonly used electronic form.”)
An additional challenge to the digitization process was that no mechanism reminded any of those initial 30,000 people to come back after they had received the report. Out of a concern for data protection and a desire to maintain anonymity, Open Schufa didn’t collect emails for participants. In retrospect, Stotz says that it might have been better to make email notifications optional and allow interested participants to opt-in to sharing contact information. That would have also created a channel for additional reporting on the most interesting cases.
The reporters also discovered that, rather than broadening consumer access to information, GDPR may have increased friction in crowdsourced data collection. After the law came into effect last May investigators noticed that there was actually a reduction in the information provided to consumers in their free credit reports. The new reports were missing, for instance, multiple scores for different versions of the Schufa algorithm—scores that ended up being a vital part of Der Spiegel’s project. (Whether Schufa is interpreting GDPR appropriately by providing less information on credit reports without providing the reports digitally warrants additional attention.)
The ongoing use of different Schufa score versions is one of the more compelling findings exposed via the investigation. Unlike the slow evolution of nature, algorithms can be updated whenever the people controlling them deem it prudent. The capricious nature of updates to algorithms suggests an interesting line of questioning for proponents of algorithmic accountability: When and why do algorithms change versions? Should older versions be assumed inferior and retired? If some stakeholders are still using earlier (inferior) versions, should they be be held accountable for misusing the scores?
ICYMI: Spies (do not) like us
Journalists should be increasingly attuned to the age of the algorithms they are investigating. When they were created, and how often they are changed, may be pertinent to their ongoing use, especially in light of evolving social contexts and new data availability—algorithmic infrastructure built today could reverberate in society for decades. In this example, the Schufa company decided that version 1 of the score should be retired in June 2018, and version 2 should be retired in June 2019. But perhaps some form of regulation around algorithm versioning should be considered to formalize expectations around the retirement of older, presumably obsolete versions.
Whether another investigation of Schufa will be necessary the next time it introduces a new score remains to be seen. The initial political response to the investigation from the Minister of of Consumer Protection in Germany has called for more transparency around the variables used in the score and how they’re weighed. If more transparency becomes mandated through new regulations, the path forward might not be additional audits, but rather to focus journalistic attention on monitoring the information provided in transparency reports.