Research shows why AI models that analyze medical images can be biased

  • Art
  • July 1, 2024

Artificial intelligence models often play a role in medical diagnosis, especially when it comes to analyzing images such as X-rays. However, research has shown that these models don't always perform well across all demographics, and they tend to do worse among women and people of color.

These models also appear to develop some surprising skills. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient's race based on their chest X-rays — something the most skilled radiologists cannot do.

That research team has now found that the models that are most accurate at making demographic predictions also exhibit the largest “fairness gaps” — that is, discrepancies in their ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using “demographic shortcuts” when making their diagnostic evaluations, leading to incorrect results for women, black people and other groups, the researchers say.

“It is well known that high-capacity machine learning models are good predictors of human demographics such as self-reported race, gender, or age. This paper again demonstrates that capacity and then links that capacity to the lack of performance in different groups, which has never been done before,” said Marzyeh Ghassemi, an associate professor of electrical engineering and computer science at MIT, a member of MIT's Institute for Medical Engineering and Science and the study's lead author.

The researchers also found that they could retrain the models in a way that improved their fairness. However, their “debiasing” approach worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared.

“I think the key points are that, first, you need to thoroughly evaluate all external models on your own data, because any guarantee of fairness that model developers provide on their training data may not transfer to your population. Second, when there is enough data available is, you have to train models on your own data,” says Haoran Zhang, an MIT graduate and one of the lead authors of the new paper. MIT graduate Yuzhe Yang is also a lead author on the paper, which will appear in NaturopathyJudy Gichoya, an associate professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham Professor of electrical engineering and computer science at MIT, are also authors of the paper.

Removing prejudices

As of May 2024, the FDA has approved 882 AI-enabled medical devices, 671 of which are designed for use in radiology. Since 2022, when Ghassemi and her colleagues showed that these diagnostic models can accurately predict race, she and other researchers have shown that such models are also very good at predicting gender and age, even though the models are not trained to do those tasks.

“A lot of popular machine learning models have superhuman demographic prediction capabilities — radiologists can’t detect self-reported race in a chest X-ray,” Ghassemi says. “These are models that are good at predicting disease, but as they train, they learn to predict other things that may not be desirable.” In this study, the researchers wanted to see why these models don’t work as well for certain groups. In particular, they wanted to see if the models were taking demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can arise in AI models when they use demographic features to determine whether a medical condition is present, instead of relying on other features of the images.

Using publicly available chest X-ray datasets from Beth Israel Deaconess Medical Center in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, collapsed lung, or enlargement of the heart. They then tested the models on x-rays extracted from the training data.

Overall, the models performed well, but most showed “fairness differences” – that is, discrepancies between accuracy rates for men and women, and for white and black patients.

The models were also able to predict the gender, race and age of the radiographers. Furthermore, there was a significant correlation between each model's accuracy in making demographic predictions and the size of the fairness gap. This suggests that the models may be using demographic categorizations as a shortcut to make their disease predictions.

The researchers then attempted to reduce the differences in fairness using two types of strategies. For one set of models, they trained them to optimize “subgroup robustness,” meaning that the models are rewarded for performing better on the subgroup for which they perform worst, and are penalized if their error rate for one group is higher than the others.

In another set of models, the researchers forced them to remove all demographic information from the images, using “group adversarial” approaches. Both strategies worked reasonably well, the researchers found.

“For in-distribution data, you can use existing state-of-the-art methods to reduce the fairness differences without significantly compromising overall performance,” Ghassemi says. “Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group-adversarial methods try to remove group information entirely.”

Not always fairer

However, these approaches only worked when the models were tested on data from the same types of patients they were trained on—for example, only patients from the Beth Israel Deaconess Medical Center dataset.

When the researchers tested the models that had been “cleaned” using the BIDMC data to analyze patients from five other hospital datasets, they found that the overall accuracy of the models remained high, but that some models showed large gaps in fairness.

“If you question the model in one group of patients, that fairness doesn't necessarily apply when you move to a new group of patients from a different hospital in a different location,” says Zhang.

This is concerning because in many cases hospitals use models developed based on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.

“We found that even state-of-the-art models that perform optimally on data similar to their training sets are suboptimal – that is, they do not make the best trade-off between overall and subgroup performance – in new settings,” says Ghassemi. “Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from a single hospital or source, and then deployed broadly.”

The researchers found that the models stripped of bias using group adversarial approaches were slightly fairer when tested on new patient groups than the models stripped of bias using subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job at making fair predictions on new data sets.

The findings suggest that hospitals using these types of AI models should evaluate them on their own patient populations before implementing them, to ensure they are not giving inaccurate results for certain groups.

The research was funded by a Google Research Scholar Award, the Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program, RSNA Health Disparities, the Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart , Lung, and Blood Institute.

Related Posts

  • Art
  • July 6, 2024
  • 3 views
  • 3 minutes Read
Migrating starlings are not imitators

Young, naïve starlings search for their wintering grounds independently of experienced conspecifics. Starlings are highly social birds throughout the year, but that does not mean that they copy each other's…

  • Art
  • July 5, 2024
  • 4 views
  • 4 minutes Read
Fresh wind blows from historic supernova

A mysterious remnant of a rare type of supernova recorded in 1181 has been explained for the first time. Two white dwarf stars collided, creating a temporary “guest star” now…

Leave a Reply

Your email address will not be published. Required fields are marked *

You Missed

Prediction, Odds, Time Canada vs. Venezuela: 2024 Copa America Quarterfinals from Proven Football Expert

  • July 6, 2024
Prediction, Odds, Time Canada vs. Venezuela: 2024 Copa America Quarterfinals from Proven Football Expert

Inflation will be in the spotlight next week as stocks attempt to hold onto record highs

  • July 6, 2024
Inflation will be in the spotlight next week as stocks attempt to hold onto record highs

I am the fireworks man

  • July 6, 2024
I am the fireworks man

The British Labour Party won a resounding election victory

  • July 6, 2024
The British Labour Party won a resounding election victory

The best air quality monitors in 2024

  • July 6, 2024
The best air quality monitors in 2024

Greece allows six-day workweek for some industries

  • July 6, 2024
Greece allows six-day workweek for some industries

Leader of Australian territory where girl was killed by crocodile says species cannot outnumber region's population

  • July 6, 2024
Leader of Australian territory where girl was killed by crocodile says species cannot outnumber region's population

Migrating starlings are not imitators

  • July 6, 2024
Migrating starlings are not imitators

Biden vows to stay in race, beat Trump at Wisconsin rally

  • July 6, 2024
Biden vows to stay in race, beat Trump at Wisconsin rally

Ways to Eat a Ten-Pack of Hot Dogs and an Eight-Pack of Hot Dog Buns Without Having Any Extra Hot Dogs Leftover

  • July 6, 2024
Ways to Eat a Ten-Pack of Hot Dogs and an Eight-Pack of Hot Dog Buns Without Having Any Extra Hot Dogs Leftover

England vs Switzerland tips, odds, lineup prediction, live stream: Where to watch Euro 2024 online and on TV?

  • July 6, 2024
England vs Switzerland tips, odds, lineup prediction, live stream: Where to watch Euro 2024 online and on TV?