New study highlights problems that can arise when data published for one task is used to train algorithms for another
Significant advances in artificial intelligence (AI) over the past decade have relied on extensive algorithm training using massive, open-source databases. But when such datasets are used “off-label” and applied unintentionally, the results are subject to machine learning bias that compromises the integrity of the AI algorithm, according to a new study by researchers. researchers from the University of California at Berkeley and the University of Texas at Austin.
The results, published this week in the Proceedings of the National Academy of Scienceshighlight the problems that arise when data published for one task is used to train algorithms for another.
Researchers noticed this problem when they failed to replicate promising results from a medical imaging study. “After several months of work, we realized that the image data used in the article had been preprocessed,” said the study’s lead researcher, Michael Lustig, professor of electrical engineering and computer science at the UC Berkeley. “We wanted to raise awareness of the problem so that researchers could be more careful and publish more realistic results.”
The proliferation of free online databases over the years has helped support the development of AI algorithms in medical imaging. For magnetic resonance imaging (MRI), in particular, improvements in algorithms can result in faster scanning. Obtaining an MR image first involves acquiring raw measurements that encode a representation of the image. Image reconstruction algorithms then decode the measurements to produce the images that clinicians use for diagnoses.
Some datasets, such as the famous ImageNet, include millions of images. Datasets that include medical images can be used to train AI algorithms used to decode measurements obtained during a scan. The study’s lead author, Efrat Shimron, a postdoctoral researcher in Lustig’s lab, said new researchers inexperienced in AI may not be aware that files in these medical databases are often preprocessed and not raw. .
As many digital photographers know, raw image files contain more data than their compressed counterparts, so it’s important to train AI algorithms on databases of raw MRI measurements. But such databases are rare, so software developers sometimes download databases with processed MR images, synthesize seemingly raw measurements from them, and then use them to develop their image reconstruction algorithms. pictures.
Researchers have coined the term “implicit data crimes” to describe the biased search results that result from developing algorithms using this flawed methodology. “This is an easy mistake to make because data processing pipelines are applied by data custodians before data is stored online, and these pipelines are not always described. Thus, it is not always clear which images are processed and which are raw,” Shimron said. “This leads to a problematic mix-and-match approach when developing AI algorithms.”
Too good to be true
To demonstrate how this practice can lead to performance bias, Shimron and colleagues applied three well-known MRI reconstruction algorithms to raw and processed images based on the fastMRI dataset. When processed data was used, the algorithms produced images that were up to 48% better – visibly clearer and sharper – than images produced from raw data.
“The problem is that these results were too good to be true,” Shimron said.
The study’s other co-authors are Jonathan Tamir, an assistant professor of electrical and computer engineering at the University of Texas at Austin, and Ke Wang, a UC Berkeley Ph.D. student in Lustig’s lab. The researchers performed further tests to demonstrate the effects of the processed image files on the image reconstruction algorithms.
Starting from raw files, the researchers processed the images in controlled steps using two common data processing pipelines that plague many open-access MRI databases: the use of commercial scanner software and data storage with JPEG compression. They trained three image reconstruction algorithms using these datasets, and then measured the accuracy of the reconstructed images against the extent of the data processing.
“Our results showed that all the algorithms behave the same way: when implemented on processed data, they generate images that look good, but they appear different from the original unprocessed images,” Shimron said. “The difference is strongly correlated to the extent of data processing.”
“Too optimistic” results
The researchers also investigated the potential risk of using pre-trained algorithms in a clinical setup, taking algorithms that had been pre-trained on processed data and applying them to real-world raw data.
“The results were striking,” Shimron said. “Algorithms that had been adapted to the processed data performed poorly when they had to deal with raw data.”
The images may look great, but they’re inaccurate, the study authors said. “In some extreme cases, small, clinically important details related to pathology might be completely missing,” Shimron said.
While the algorithms may report sharper images and faster image acquisitions, the results cannot be reproduced with raw clinical or CT data. These “overly optimistic” results reveal the risk of translating biased algorithms into clinical practice, the researchers said.
“No one can predict how these methods will work in clinical practice, and that creates a barrier to clinical adoption,” said Tamir, who earned her Ph.D. in electrical engineering and computer science at UC Berkeley and was a former member of Lustig’s lab. “It also makes it difficult to compare various competing methods, as some may report performance on clinical data, while others may report performance on processed data.”
Shimron said it was important to expose such “data breaches” as industry and academia are rapidly working to develop new AI methods for medical imaging. She said data curators could help by providing a full description on their website of the techniques used to process the files in their dataset. Additionally, the study offers specific guidelines to help MRI researchers design future studies without introducing these machine learning biases.
Funding from the National Institute of Biomedical Imaging and Bioengineering and the National Science Foundation Institute for Foundations of Machine Learning helped support this research.