We Haven’t Come a Long Way Baby: How Gender Bias Is Rampant in AI

Last year I attended a Women in Technology breakfast at an industry event, and asked my usual question when I see “women in” anything attached to an event: Why? Why are we still talking about this? Why is it necessary to distinguish genders, to recognize a woman-sighting like we are a rare bird or an endangered animal?  But here’s the thing — even as I ask those questions, I’m aware that they have a real and valid answer: It’s still necessary. The truth is that there are not enough women in leadership positions, or the STEM fields, or a lot of places, and until we reach a point where there isn’t any place we aren’t, we must recognize where we are.

Dr. Judith Spitz of the Initiative for Women in Technology and Entrepreneurship in New York (WiTNY) was the speaker at the breakfast I attended, and she was fantastic — enough to make showing up for a presentation at 7 a.m. worthwhile. But some of the information she relayed was pretty startling. Specifically, she mentioned a study titled “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” by a team of researchers from Microsoft Research and Boston University, and yes, it’s exactly what it sounds like. Based on the data being used to train it (Google News articles), artificial intelligence software “exhibit[ed] female/male gender stereotypes to a disturbing extent.”

Just like humans, artificial intelligence learns what it lives — or what it is taught. Natural language understanding, defined by Gartner as “the comprehension by computers of the structure and meaning of human language (e.g., English, Spanish, Japanese), allowing users to interact with the computer using natural sentences,” is essential to AI because of the nuance involved in words and speech patterns. It’s not enough for a computer program to understand words; to truly function on an advanced level, it is essential to parse the words for meaning, context and intent. Therefore, AI programs must be trained. I got a great lesson in how this works last year when I interviewed Gordon Flesch’s Mike Adams for this article. The example of how the company’s “AskGordy” software, based on IBM’s Watson, was trained on USDA documents explains AI training simply and succinctly:

“USDA documents frequently contain the abbreviation ‘FV.’ Type ‘FV’ into Google and you’ll get a lot of results pertaining to ‘future value’ — the most typical meaning of FV. In agriculture, however, FV usually stands for ‘fruits and vegetables,’ and so for the USDA use-case scenario, Watson was trained to understand that its search is being performed in an agriculture series of documents. Additionally, Watson would know that the term ‘vegetables’ includes individual vegetables — it would also return responses relating to discrete vegetables (zucchini, artichoke, peas) rather than just the aggregate term ‘vegetables.’”

Now, apply this explanation to the previously mentioned study that used word embedding software — which finds word correlations and patterns in the way words appear next to each other — to analyze a “dictionary” that uses a corpus of 3 million words from Google News texts. Word embedding is an essential tool for machine learning, and is probably better illustrated than explained — let’s look at this  example from the Microsoft/BU team’s paper: “Given an analogy puzzle, ‘man is to king as woman is to x’ (denoted as man:king :: woman:x), simple arithmetic of the embedding vectors finds that x=queen is the best answer. … Similarly, x=Japan is returned for Paris:France :: Tokyo:x.” 

The problem is, the same logic that drew those conclusions found, as the title of the study implies, that when man = computer programmer, woman = homemaker. Further correlations included “father is to a doctor as a mother is to a nurse.”

“One might have hoped that the Google News embedding would exhibit little gender bias because many of its authors are professional journalists,” said the Microsoft/BU team’s paper. Sadly, that was not the case. Other publicly available embeddings trained using other algorithms yielded similar results, leading to the conclusion that word embeddings contain bias that not only reflects, but amplifies stereotypes, posing “a significant risk and challenge for machine learning and its applications.”

This isn’t the first time computers have discovered the inherent gender bias that permeates our society. A computer science professor who was programming image recognition software found the software was associating pictures of kitchens with women far more often than with men. This led to the research paper “Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-level Constraints.” It demonstrates how existing bias can become amplified once it’s fed into the programming. For example, “cooking is over 33 percent more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68 percent at test time.” It’s like a very sophisticated game of “telephone” where the players are computers and they’re not misquoting words, they’re mis-extrapolating data to the degree that a computer identified a picture of a man at a stove as a woman.

More recently, Reuters reported that Amazon had scrapped an automated recruiting tool because it demonstrated gender bias, failing to rate resumes neutrally. The big problem with this is that, once again, it was learned behavior. “Amazon’s computer models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period,” reported Reuters. “Most came from men, a reflection of male dominance across the tech industry.” The article notes that this led the tool to penalize resumes that included the word “women’s” as well as graduates of women’s colleges.

While horrifically problematic, this last example is also sadly emblematic of the tech industry (and let’s face it, the business world in general to an extent). Computers — which are machines — are not inherently biased; they are learning what they are taught. The software studying the Google News articles learned from millions of words written by humans, shaped into phrases that exhibited gender bias. If computers extrapolated from photographs that women belong in the kitchen, you can be sure that humans have done the same.

WiTNY’s website has plenty of statistics illustrating just how that problem presents itself. “While the number of women attending college is at an all-time high, the percentage of those women who graduate with degrees in technology-related disciplines is less than 1 percent, compared with 6 percent of men,” reads one stat. Another: “Over the past 20 years, the percentage of computer science degrees awarded to women has been declining steeply — from 37 percent to 18 percent.”

Fortunately, there are a number of programs seeking to change that. WiTNY focuses on the New York metropolitan area, but other areas have similar organizations. The DreamIT program from  CompTIA’s Advancing Women in Technology Community goes even further, with program materials available in the U.S., U.K., Australia and New Zealand. The acronym STEM was virtually unheard of when I was in high school and college; now it’s part of our standard vocabulary.

Is all of this going to make me stop complaining about “Women in Something” events? No. It’s made me realize, though, that I’m bothered not by their existence, but by the problems that they signify. Ever since Majel Barrett created the voice of the computer in Star Trek we’ve assumed that there is a certain gender equity inherent in technology and that the future would somehow naturally include equality. But what we’ve failed to recognize is the fact that computers are not sentient — even the most intelligent artificial intelligence is created and programmed by humans, and that carries with it years and zettabytes of bias. Even if computers become self-aware (and have you ever seen a movie that makes you think that’s something to look forward to?) they’re doing so based on what they’ve learned from humans. Let’s start giving them a better database to draw from.

is BPO Media and Research’s editorial director. As a writer and editor, she has specialized in the office technology industry for more than 20 years, focusing on areas including print and imaging hardware and supplies, workflow automation, software, digital transformation, document management and cybersecurity.