Q&A: When automation in government services fails

The public sector has typically been slower to adopt new technologies than the private sector, but automation is increasingly being deployed across a range of areas. For instance, replacing manual tasks—like document scanning or toll collecting—integrating data systems and automating complex decision-making processes. A recent report from the University of California, Berkeley’s Labor Center, however, has stressed how public sector automation is a high stakes challenge. “Many government processes determine fundamental quality of life issues: liberty versus incarceration, essential financial assistance, public safety, and custody of children,” the author Sara Hinkley writes. It’s not just about getting automation right for the twenty million people in the US who work for the local, state or federal government—fifteen percent of the workforce—it also means adoption must work for the far greater number of people interacting with government services.

And in many instances, it’s not working. A rush to find economic efficiencies and a desire to be considered ahead of the curve have led to adoption of automated processes plagued with errors. Amos Toh, a senior researcher on artificial intelligence and human rights at Human Rights Watch, has investigated the use of automation in places including Jordan, Texas and the UK. His findings have highlighted the human costs of introducing automation that can be prone to glitches; or, in some cases, where errors are less glitches than features. The UK Department for Work and Pensions (DWP), for instance, has introduced an automated assessment period for its flagship “Universal Credit” welfare system—five weeks of unpaid benefits while an automated system quantifies a person’s income—that has been criticized for incorrect measurement and for leaving families destitute. In that case, “the government held up automation to be one of their main goals in and of itself, which was revealing in the sense it affirmed this ethos of techno-solutionism that is baked into the development of Universal Credit and the modernization of the country’s social security system,” Toh told me. The toll of such tech-first thinking in government services, Toh’s research has revealed, is that often “people have to experience quite intense hardship”.

The challenge is for reporters to be able to cover these systems in depth—often a struggle given their stubborn opaqueness and officials’ unwillingness to release the underlying data or code—as well as documenting the resulting human costs. Recently, I spoke to Toh over the phone about his work, the future of automation in the public and private sectors, and how journalists can hold algorithms—and the systems they are embedded in—to account. Our conversation has been edited for clarity and length.

JB: In “Automated Neglect,” you looked at the World Bank’s Unified Cash Transfer Program in Jordan, which applies an algorithm that uses fifty-seven socioeconomic indicators to establish income wealth, ranking households from least poor to poorest. But you found there were significant errors. Can you just summarize the problems that have led to people below the poverty line missing out on vital support?

AT: What we found is the algorithm simply is unable to capture and do justice to the economic complexity of people’s lives. Essentially, it tries to distill people’s financial hardships into a crude ranking system that pits one household’s needs against another’s for cash support. But the very nature of poverty and economic hardship is that it can be very cyclical and unpredictable. A ranking that the algorithm spits out only provides a snapshot in time. This has led to many people being excluded, even though they are suffering from very real poverty. And when we went deeper, we found the algorithm embraces fairly stereotypical and inaccurate notions of what poverty should and shouldn’t look like. If you own a car that is less than five years of age, for example, you’re automatically excluded from the program.

For this report you conducted seventy interviews. Are there people that particularly stood out and highlighted the human cost of this dysfunctional system?

A woman we spoke to, in one of Jordan’s poorest villages, gestured towards the car that her family owns and said, ‘Look at this car. It has destroyed us.’ Even though they often don’t have petrol, and they really just use it to transport water, firewood and other essentials in a rural area with no reliable public transport, it counts against their application. This is the multidimensional reality the algorithm is unable to capture.

You found a strong gendered element, with only heads of household—who in Jordan tend to be men—being awarded benefits. Can you say more about the discriminatory practices the system reproduces?

One of the algorithm’s indicators looks at household size. The more household members, the greater your need, so that seems fairly logical, right? But what we found is that it’s actually a proxy for gender and citizenship discrimination. Jordan’s citizenship law does not allow women to pass on citizenship to their spouse or children on an equal basis with Jordanian men. For example, if you are a family of five headed by a Jordanian woman and a non-Jordanian man, then under this indicator, because only one of you holds citizenship, you would be counted as a family of one. So these types of families end up finding it a lot more difficult to qualify for support—or they get a benefit payment that is unable to meet their needs.

In the UK, the DWP has also deployed automation. In your 2020 report “Automated Hardship” you found this led to systematic errors, one of which being its inability to accurately assess how often people are paid. Can you say more about the problems faced by this program?

It’s first important to understand that, as the Jordan case, the UK program is means-tested. What the UK government automated was many parts of this means-testing system, including its ability to calculate how much income a claimant has coming in every year during what’s known as the monthly assessment period. What we found is that the system was inaccurately taking into account people’s income, because their pay cycles would not align with the monthly period for which that benefit is assessed. One of the more egregious examples—that was eventually addressed by and rectified by the Court of Appeal—was people paid on a monthly basis ended up having two monthly paychecks taken into account during their monthly assessment period; this led to a drastic and inaccurate lowering of your benefit. Unfortunately, the Court of Appeal left intact other problems, such as for people on non-monthly pay cycles. The fundamental problem is a misalignment between how the system calculates your income and the income that people earn. And the burden of that misalignment ends up being placed on the claimant.

One thing that struck me was how automation was used by the UK government as a legitimization of the five-week wait for Universal Credit applicants. To me that seemed disingenuous, using the fact people don’t understand these automated systems to justify what was quite an unpopular policy. Do you think that’s fair?

The opacity as well as the rigid insistence on automation both go hand-in-hand with entrenching policy approaches that are deeply contested. In the Jordan context, the policy approach is what’s known as ‘poverty targeting,’ which is really a particular way of trying to target very specific support to specific groups of people that are living under the official poverty line. What automation ends up doing in those situations is to justify the approach, and policymakers hold onto automation as a way to address chronic problems with this policy approach. What automation did in the Jordan context is to divert conversations away from far more fundamental issues—like the fact the official poverty line is not an adequate measure of the number of people that are struggling to fulfill their economic and social rights.

The discussion around automation also distracts from the fact that the poverty rate, even by the government on official measurements (which we contest), has gone up. It was 15.7 percent in 2018; now it’s at 24.1 percent. What that shows is that this poverty-targeted approach has not been an adequate buffer against economic hardship. But you know, both the World Bank and Jordanian government are holding out better data and better technology as a way to address chronic flaws with poverty targeting, when it’s the very approach of poverty targeting that is flawed.

Earlier this year, WIRED and Lighthouse Reports investigated the use of algorithms in Rotterdam to assess who should be investigated for benefit fraud, finding the system was discriminatory on race and gender grounds. But WIRED had unprecedented access to the underlying data, whereas Jordan’s National Aid Fund refused to give you its fifty-seven indicators. When it comes to reporting on automation, how can journalists increase algorithmic accountability?

In algorithmic accountability reporting, there is a very justifiable desire to get hold of the code, to do testing and really pinpoint what is discriminatory or biased or flawed or abusive about an algorithm. But the question is: if you can’t get hold of the algorithm, what can you do? There are two ways of approaching that question. One is, if we can’t get hold of the algorithm, maybe there’s nothing worth reporting. But there are ways to peel back the layers of a technological system that is abusive and problematic. An algorithm doesn’t just exist on its own doing harm. An algorithm is in a system of large databases and a system of human beings—office workers and staff that are feeding in information or trying to make sense of the algorithm and its underlying data. In the Jordan case, that approach paid off, trying to piece together what we know about this policy and the system in order to see how it was affecting people on the ground. So there are many starting points for accessing a technological system other than obtaining the code of an algorithm.

You’ve documented the impact on mental health, particularly around lack of human contact with these automated systems. What is the psychological cost on the people using them?

What we found [in the UK] is that the opacity of the system, as well as unpredictability, led to these huge variations in people’s benefit amounts, making it very difficult for people to budget. This created a lot of mental distress. Because you’re already trying to make do with very little, and any change in your benefit amounts have an outsized impact on whether you can feed your children one day or whether you have to go to a food bank the next. That took a huge psychological toll on the people we interviewed.

Introducing automation has often gone hand-in-hand with austerity measures and cost-cutting, which raises the question of access to the internet, to public libraries and so on. Do these digital systems shut some people out?

Whenever there is automation of a system, it’s really helpful to look at how the system is digitized—how applications move online, and what that actually means for vulnerable populations. In both Universal Credit as well as Jordan’s Unified Cash Transfer Program, both governments ultimately failed to take into account how moving systems online would disproportionately affect people in rural areas, older people, and people that don’t have digital literacy. In Jordan, this created an informal economy of intermediaries. Where people who want to apply would go to stationery shops that have internet service, known as maktabehs, or go to mobile phone shops to cash out their e-wallets, and rely on them to apply. So people are traveling long distances and paying a number of fees that cut into their benefits. This is the hidden toll of digitization and the digital divide.

Automation is used in commercial settings, too, like Uber verifications, and I’ve reported on how these systems can fail precarious workers in the gig economy. How should we think about automation in the private sector? Are we moving towards a system where people close to the bottom of society are going to have a digitally-mediated experience while those at the top can afford more human contact in their services?

Interesting, I think that’s true. It’s really about agency. People that are more privileged economically will be able to exercise more autonomy over whether they interact with automated systems. Whereas low income workers, for example, don’t have nearly as much say. Two years ago, we were in Texas interviewing gig workers. There was a single mother in Houston, and we interviewed her about the precarity of gig work and how everything is mediated through an algorithm, which controls her wages. She travels long distances for unpredictable and sometimes very low pay. And in Texas at the time, there was a pandemic aid program for independent contractors. She had to verify her biometric identity, using the facial recognition software the Texas unemployment agency was using, but there was an error. The agency told her, ‘Sorry, we can’t verify you using the software,’ so she didn’t get access. So both people’s wages and access to state support—the right to social security—being dictated by an algorithm shows how low-income workers are particularly affected by the evolution towards automation, in the public and private sectors.

It’s common for policymakers to praise these systems for their adaptability, speed, frictionless-ness. What do you see for where we’re heading on the balance between efficiency and error that policymakers are navigating?

I think efficiency is often a euphemism for cuts to social security programs. It’s true that with automated systems there’s always a risk of error. But I think a lot of the inefficiency of automated systems is actually by design, that it actually serves policy objectives. When you put an automated system between a person and the benefits they’re supposed to get, it creates an appearance of statistical objectivity, but can lead to very unjust outcomes. In Jordan, there is debate about people erroneously excluded by the algorithm. But that distracts from the broader problem: that this algorithm is simply unable and incapable of capturing the way people experience economic hardship.

In Jordan’s case, you’ve said “Using better technology and data to try and fix poverty targeting is like trying to replace the hubcaps when the wheels are falling off,” and you propose universalism of social services as a solution. Is universalism something you’d also support beyond Jordan?

I would support universalism globally. And I encourage reporters to look at it. There is sometimes this wariness about reporting on policy solutions, because of a view that might cross the line of journalistic objectivity. But part of any reporting should look at alternatives. Our position is that universal social services—such as universal child benefit, universal unemployment benefits, universal maternity benefits and so on—these category-based schemes should really be the anchor of any social security system. The question for journalists is, if not automation, could universalism—which many civil society organizations, policymakers and economists put forward—be a meaningful alternative? I think that’s a really good journalistic inquiry to pursue.

Header photo: A Department for Work and Pensions Jobcentre Plus. The UK deploys automation as part of its main welfare benefit, Universal Credit. Photo credit: Max. G. / Flickr

Q&A: When automation in government services fails

About the Tow Center

About

Support CJR

Advertise