post-image

As employers increasingly use digital tools to process job applications, a new study from the University of Washington highlights the potential for significant racial and gender bias when using AI to screen resumes.

The UW researchers tested three open-source, large language models (LLMs) and found they favored resumes from white-associated names 85% of the time, and female-associated names 11% of the time. Over the 3 million job, race and gender combinations tested, Black men fared the worst with the models preferring other candidates nearly 100% of the time.

Why do machines have such a outsized bias for picking white male job candidates? The answer is a digital take on the old adage “you are what you eat.”

“These groups have existing privileges in society that show up in training data, [the] model learns from that training data, and then either reproduces or amplifies the exact same patterns in its own decision-making tasks,” said Kyra Wilson, a doctoral student at the UW’s Information School.

Wilson conducted the research with Aylin Caliskan, a UW assistant professor in the iSchool. They presented their results last week at the AAAI/ACM Conference on Artificial Intelligence, Ethics and Society in San Jose, Calif.

The experiment used 554 resumes and 571 job descriptions taken from real-world documents.

The researchers then doctored the resumes, swapping in 120 first names generally associated with people who are male, female, Black and/or white. The jobs included were chief executive, marketing and sales manager, miscellaneous manager, human resources worker, accountant and auditor, miscellaneous engineer, secondary school teacher, designer, and miscellaneous sales and related worker.

The results demonstrated gender and race bias, said Wilson, as well as intersectional bias when gender and race are combined.

One surprising result: the technology preferred white men even for roles that employment data show are more commonly held by women, such as HR workers.

This is just the latest study to reveal troubling biases with AI models — and how to fix them is “a huge, open question,” Wilson said.

It’s difficult for researchers to probe commercial models as most are proprietary black boxes, she said. And companies don’t have to disclose patterns or biases in their results, creating a void of information around the problem.

Simply removing names from resumes won’t fix the issue because the technology can infer someone’s identity from their educational history, cities they live in, and even word choices for describing their professional experiences, Wilson said. An important part of the solution will be model developers producing training datasets that don’t contain biases in the first place.

The UW scientists focused on open-source LLMs from Salesforce, Contextual AI and Mistral. The models chosen for the study were top-performing, Massive Text Embedding (MTE) models, which are a specific type of LLMs trained to produce numerical representations of documents, allowing them to be more easily compared to each other. That’s in contrast to LLMs like ChatGPT that are trained for generating language.

The authors noted that numerous previous studies have investigated foundation LLMs for bias, but few have looked at MTEs in this application, “adding further novelty and importance to this study.”

Spokespeople for Salesforce and Contextual AI said the LLMs used in the UW research were not intended for this sort of application by actual employers.

The Salesforce model included in the study was released “to the open source community for research purposes only, not for use in real world production scenarios. Any models offered for production use go through rigorous testing for toxicity and bias before they’re released, and our AI offerings include guardrails and controls to protect customer data and prevent harmful outputs,” said a Salesforce spokesperson by email.

Jay Chen, vice president of marketing for Contextual AI, said the LLM used was based on technology from Mistral and is not a commercial Contextual AI product.

“That being said, we agree that bias and ethical use of AI is an important issue today, and we work with all of our customers to mitigate sources of bias in our commercial AI solutions,” Chen said by email.

Mistral did not respond to GeekWire’s request for a comment.

While the propensity of bias in different software solutions for screening resumes is not known, some elected leaders are taking initial steps to help address the issue.

In a move to provide more comprehensive safeguards against discrimination, California passed a state law making intersectionality a protected characteristic, in addition to identities such as race and gender alone. The rule is not specific to AI-related biases.

New York City has a new law requiring companies using AI hiring systems to disclose how they perform. There are exemptions, however, if humans are still involved in the process.

But in an ironic twist, that can potentially make the selections even more biased, Wilson said, as people will sometimes put more trust in a decision from technology than humans. Her next research will focus on how human decision makers are interacting with these AI systems.