(Machine) Learning Biases

Cross posting from Ali Minai’s excellent “Barbarikon” blog (this is, of course, Ali Minai’s writing, not mine)

 In a recent tweet, Congresswoman Alexandria Ocasio-Cortez – widely known as AOC – responded to a report from Amazon that facial recognition technology sometimes identified women as men when they have darker skin. She said:

When you don’t address human bias, that bias gets automated. Machines are reflections of their creators, which means they are flawed, & we should be mindful of that. It’s one good reason why diversity isn’t just “nice,” it’s a safeguard against trends like this

While I agree with the sentiment underlying her tweet, she is profoundly wrong about what is at play here, which can happen when you apply your worldview (i.e. biases) to things you’re not really familiar with. To be fair, we all do it, but here it is AOC, who is an opinion-maker and should be more careful. The error she makes here, though, is an interesting one, and get to some deep issues in AI.

The fact that machine learning algorithms misclassify people with respect to gender, or even confuse them with animals, is not because they are picking up human biases as AOC claims here. In fact, it because they are not picking up human biases – those pesky intuitions gained from instinct and experience that allow us to perceive subtle cues and make correct decisions. The machine, lacking both instinct and experience, focuses only on visual correlations in the data used to train it, making stupid errors such as relating darker skin with male gender. This is also why machine learning algorithms end up identifying humans as apes, dogs, or pigs – with all of whom humans do share many visual similarities. As humans, we have a bias to look past those superficial similarities in deciding whether someone is a human. Indeed, it is when we decide to override our natural biases and sink (deliberately) to the same superficial level as the machine that we start calling people apes and pigs. The errors being made by machines do not reflect human biases; they expose the superficial and flimsy nature of human bigotry.

There is also a deeper lesson in this for humans as well. Our “good” biases are not all just coded in our genes. They are mostly picked up through experience. When human experience becoming limited, we can end up having the same problem as the machine. If a human has never seen a person of a race other than their own, it is completely natural for them to initially identify such a person as radically different or even non-human. That is the result of a bias in the data (experience, in this case), not a fundamental bias in the mind. This is why travelers in ancient times brought back stories of alien beings in distant lands, which were then exaggerated into monstrous figures on maps etc. This situation no longer exists in the modern world, except when humans try to create it artificially through racist policies.

The machine too is at the mercy of data bias, but its situation is far worse than that of a human. Even if it is given an “unbiased” data set that includes faces of all races, genders, etc., fairly, it is being asked to learn to recognize gender (in this instance) purely from pictures. We recognize gender not only from a person’s looks, but also from how they sound, how they behave, what they say, their name, their expressions, and a thousand other things. We deprive the machine of all this information and then ask it to make the right choice. That is a huge data bias, comparable to learning about the humanity of people from distant lands through travelers’ tales. On top of that, the machine also has much simpler learning mechanisms. It is simply trying to minimize its error based on the data it was given. Human learning involves much more complicated things that we cannot even fully describe yet except in the most simplistic or metaphorical terms.

The immediate danger in handing over important decision-making to intelligent machines is not so much that they will replicate human bigotries, but that,within their limited capacities and limited data, they will fail to replicate the biases that make us fair, considerate, compassionate, and, well, human.

1+