Soil bacteria exhale more CO2 after sugar-free meals

A PhD student from Oregon State University and researchers at Adobe have developed a new, cost-effective training technique for artificial intelligence systems that aims to make them less socially biased.

Eric Slyman of the OSU College of Engineering and the Adobe researchers call the new method FairDeDup, short for fair deduplication. Deduplication means removing redundant information from the data used to train AI systems, thus reducing the high computational costs of training.

Datasets collected from the Internet often contain biases that are present in society, the researchers said. When these biases are captured in trained AI models, they can serve to perpetuate unfair ideas and behavior.

By understanding how deduplication affects the prevalence of bias, it is possible to mitigate negative effects – such as an AI system that automatically shows only photos of white men when asked for a photo of a CEO, doctor, etc . while the intended use is shows different representations of people.

“We called it FairDeDup as a play on words for a previous cost-effective method, SemDeDup, which we improved by incorporating fairness considerations,” Slyman said. “While previous research has shown that removing this redundant data can enable accurate AI training with fewer resources, we find that this process can also exacerbate the harmful social biases that AI often learns.”

Slyman presented the FairDeDup algorithm last week in Seattle at the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

FairDeDup works by thinning the image caption datasets collected from the Internet through a process known as pruning. Pruning refers to choosing a subset of the data that is representative of the entire data set. When done in a content-aware manner, pruning enables informed decisions about which parts of the data stay and which go.

“FairDeDup removes redundant data and integrates auditable, human-defined dimensions of diversity to reduce bias,” Slyman said. “Our approach enables AI training that is not only cost-effective and accurate, but also fairer.”

In addition to occupation, race, and gender, other biases perpetuated during training may include age, geography, and culture.

“By addressing biases during data set pruning, we can create AI systems that are more socially just,” Slyman said. “Our work does not force AI to follow our own prescribed idea of fairness, but rather creates a path to prompt AI to act fairly when contextualized within particular settings and user groups in which it is deployed. We let people define what is fair in their environment instead of letting the internet or other large-scale data sets decide that.”

Working with Slyman were Stefan Lee, an assistant professor in the OSU College of Engineering, and Adobe's Scott Cohen and Kushal Kafle.