How Math Can Be Racist: Giraffing

You may have heard about AOC catching a lot of flack from conservatives for claiming that computer algorithms can be biased – in the sense of being racist, sexist, et cetera. How, these people asked, can something made of math be biased? It’s math, so it must be objectively correct, right?

Well, any computer scientist or experienced programmer knows right away that being “made of math” does not demonstrate anything about the accuracy or utility of a program. Math is a lot more of a social construct than most people think. But we don’t need to spend years taking classes in algorithms to understand how and why the types of algorithms used in artificial intelligence systems today can be tremendously biased. Here, look at these four photos. What do they have in common?

image
image
image
image

You’re probably thinking “they’re all outdoors, I guess…?” But they have something much more profound in common than that. They’re all photos of giraffes!

At least, that’s what Microsoft’s world-class, state-of-the-art artificial intelligence claimed when shown each of these pictures. You don’t see any giraffes? Well, the computer said so. It used math to come to this conclusion. Lots of math. And data! This AI learns from photographs, which of course depict the hard truth of reality. Right?

It turns out that mistaking things for giraffes is a very common issue with computer vision systems. How? Why? It’s quite simple. Humans universally find giraffes very interesting. How many depictions of a giraffe have you seen in your life? And how many actual giraffes have you seen? Many people have seen one or two, if they’re lucky. But can you imagine seeing a real giraffe and not stopping to take a photo? Everyone takes a photo if they see a giraffe. It’s a giraffe!

The end result is that giraffes are vastly overrepresented in photo databases compared to the real world. Artificial intelligence systems are trained on massive amounts of “real world data” such as labeled photos. This means the learning algorithms see a lot of giraffes… and they come to the mathematically correct conclusion: giraffes are everywhere. One should reasonably expect there might be a giraffe in any random image.

Look at the four photos again. Each of them contains a strong vertical element. The computer vision system has incorrectly come to the belief that long, near-vertical lines in general are very likely to be a giraffe’s neck. This might be a “correct” adaptation if the vision system’s only task was sorting pictures of zoo animals. But since its goal is to recognize everything in the real world, it’s a very bad adaptation. Giraffes are actually very unlikely.

Now, here’s the clincher: there are thousands and thousands of things that are over-represented or under-represented in photo databases. The AI is thoroughly giraffed in more ways than we could possibly guess or anticipate. How do you even measure such a thing? You only have the data you have – the dataset you trained the AI with in the first place.

This is how computer algorithms “made of math” can be sexist, racist, or any other sort of prejudiced that a human can be. Face photo datasets are highly biased towards certain types of appearances. Datasets about what demographics are most likely to commit crimes were assembled by humans who may have made fundamentally racist decisions about who did and didn’t commit a crime. All datasets have their giraffes. Here’s a real world example where the giraffe was the name “Jared.”

Any time “a computer” or “math” is involved in making decisions, you need to ask yourself: what’s been giraffed up this time?


Thanks to Janelle Shane whose tweet showing her asking an AI how many giraffes are in the photograph of The Dress prompted this post.

Please note that Microsoft does try to take steps to correct their computer vision system’s errors, so the above photos may have improved their detections since they were first evaluated by @picdescbot. (They did all still register as giraffes on 31 Jan 2019.)