Nov 19, 2019 9:15 AM

The Apple Card Didn't 'See' Gender—and That's the Problem

The way its algorithm determines credit lines makes the risk of bias more acute.

A view of the windows of Goldman Sachs headquarters in New York City

The AI Database →

Application

Ethics

Personal finance

Prediction

Company

Apple

End User

Consumer

Sector

Finance

The Apple credit card, launched in August, ran into major problems last week when users noticed that it seemed to offer smaller lines of credit to women than to men. The scandal spread on Twitter, with influential techies branding the Apple Card “fucking sexist,” “beyond f’ed up,” and so on. Even Apple’s amiable cofounder, Steve Wosniak, wondered, more politely, whether the card might harbor some misogynistic tendencies. It wasn’t long before a Wall Street regulator waded into the timeline of outrage, announcing that it would investigate how the card works to determine whether it breaches any financial rules.

The response from Apple just added confusion and suspicion. No one from the company seemed able to describe how the algorithm even worked, let alone justify its output. While Goldman Sachs, the issuing bank for the Apple Card, insisted right away that there isn't any gender bias in the algorithm, it failed to offer any proof. Then, finally, Goldman landed on what sounded like an ironclad defense: The algorithm, it said, has been vetted for potential bias by a third party; moreover, it doesn’t even use gender as an input. How could the bank discriminate if no one ever tells it which customers are women and which are men?

This explanation is doubly misleading. For one thing, it is entirely possible for algorithms to discriminate on gender, even when they are programmed to be “blind” to that variable. For another, imposing willful blindness to something as critical as gender only makes it harder for a company to detect, prevent, and reverse bias on exactly that variable.

The first point is more obvious. A gender-blind algorithm could end up biased against women as long as it’s drawing on any input or inputs that happen to correlate with gender. There’s ample research showing how such “proxies” can lead to unwanted biases in different algorithms. Studies have shown, for example, that creditworthiness can be predicted by something as simple as whether you use a Mac or a PC. But other variables, such as home address, can serve as a proxy for race. Similarly, where a person shops might conceivably overlap with information about their gender. The book Weapons of Math Destruction, by Cathy O’Neil, a former Wall Street quant, describes many situations where proxies have helped create horribly biased and unfair automated systems, not just in finance but also in education, criminal justice, and health care.

The idea that removing an input eliminates bias is “a very common and dangerous misconception,” says Rachel Thomas, a professor at the University of San Francisco and the cofounder of Fast.ai, a project that teaches people about AI.

This will only become a bigger headache for consumer companies as they become more reliant on algorithms to make critical decisions about customers—and as the public becomes more suspicious of the practice. We’ve seen Amazon pull an algorithm used in hiring due to gender bias, Google criticized for a racist autocomplete, and both IBM and Microsoft embarrassed by facial recognition algorithms that turned out to be better at recognizing men than women, and white people than those of other races.

What this means is algorithms need to be carefully audited to make sure bias hasn’t somehow crept in. Yes, Goldman said it did just that in last week’s statement. But the very fact that customers’ gender is not collected would make such an audit less effective. According to Thomas, companies must, in fact, “actively measure protected attributes like gender and race” to be sure their algorithms are not biased on them.

The Brookings Institution published a useful report in May on algorithmic bias detection and mitigation. It recommends examining the data fed to an algorithm as well as its output to check whether it treats, say, females differently from males, on average, or whether there are different error rates for men and women.

Without knowing a person’s gender, though, such tests are far more difficult. It may be possible for an auditor to infer gender from known variables and then test for bias on that. But this would not be 100-percent accurate, and it couldn’t show whether any particular person was subject to bias, according to O’Neil, author of Weapons of Math Destruction.

The Brookings report also recommends hiring legal as well as technical experts to monitor algorithms for unintended bias after they’ve been deployed.

The fact that financial businesses are prohibited by the Equal Credit Opportunity Act from using information such as gender or race in algorithmic decisions may actually make this problem worse by deterring those businesses from collecting this important information in the first place, says Paul Resnick, coauthor of the Brookings report and a professor at the University of Michigan’s School of Information. “It’s not simple to do a meaningful algorithm audit,” he says. “But it is important to do.”