Self-driving car algorithms may have a problem with racism
Tesla recently announced the latest version of its autonomous car software, depending on the role of the software in ten reported collisions with emergency vehicles that are the subject of a federal agency investigation. While these collisions have occurred for a variety of reasons, a major factor is that the artificial intelligence driving the car is not used to seeing flashing lights and vehicles being pulled over the shoulder, so algorithms under- underlying reactions are unpredictable and catastrophic.
Modern AI systems are “qualified” on massive datasets of photographs and video footage of different sources, and use this training to determine appropriate behavior. But, if the images don’t include many examples of specific behaviors, like how to slow down near emergency vehicles, the AI won’t learn the appropriate behaviors. Thus, they crash into ambulances.
Given these types of disastrous failures, a recent trend in machine learning is to identify these overlooked cases and create “synthetic” training data to help AI learn. Using the same algorithms that Hollywood used to assemble the Incredible Hulk in “The Avengers: Endgame” from a stream of ones and zeros, photorealistic images of emergency vehicles that never existed in real life are conjured from the digital aether and powered by AI.
I have designed and used these algorithms for the past 20 years, starting with the software used to generate the sorting hat in “Harry Potter and the Sorcerer’s Stone”, to recent Pixar movies, where I was a principal investigator. .
Using these algorithms to train AIs is extremely dangerous, as they were specifically designed to represent white humans. All the sophisticated physics, computing, and statistics behind this software have been designed to realistically represent the diffuse glow of pale, white skin and the sleek highlights of long, straight hair. In contrast, researchers in computer graphics have not systematically studied the shine and shine that characterize dark and black skin, nor the characteristics of hair with an afro texture. As a result, the physics of these visual phenomena is not encoded in Hollywood algorithms.
True, synthetic blacks have been portrayed in movies, like in last year’s Pixar movie “Soul”. Corn In the wings, lighting designers found they had to push the software way beyond its default settings and learn all the new lighting techniques to create these characters. These tools weren’t designed to make humans non-white; even the most technically sophisticated artists in the world strove to use them effectively.
Regardless, these same white human generation algorithms are currently being used by start-up companies like Data generator and Synthetic AI generate “miscellaneous” human data sets specifically intended for consumption by the AIs of tomorrow. A critical review of some of their results reveal the same patterns. White skin is faithfully depicted, but the characteristic glow of black skin is either worryingly missing or painfully overlit.
Once the data from these flawed algorithms is ingested by AIs, the provenance of their malfunctions will become almost impossible to diagnose. When the Tesla Roadsters start to run disproportionately on black paramedics or Oakland residents with natural hairstyles, cars will not be able to signal that “no one told me what black skin looks like in real life. “. The behavior of artificial neural networks is notoriously difficult to trace to specific problems in their learning sets, making the source of the problem extremely opaque.
Synthetic training data is a handy shortcut when real-world collection is too expensive. But AI practitioners should ask themselves: Given the possible consequences, is it worth it? If the answer is no, they should be pushing to do it the hard way: by collecting real-world data.
Hollywood should do its part and invest in the research and development of algorithms that are rigorously, measurable, and demonstrably capable of representing all of humanity. Not only will this expand the range of stories that can be told, it could literally save someone’s life. Otherwise, even if you admit that black lives matter, your car soon won’t.
Theodore Kim is Associate Professor of Computer Science at Yale University. @TheodoreKim