How to Hack an Intelligent Machine

AI scientists try to trick smart systems into making dumb gaffes

AI researchers from Google easily spoofed an image-recognition system so that a banana was identified mistakenly as a toaster.

Banar Fil Ardhi *Getty Images*

This week Microsoft and Alibaba stoked new fears that robots will soon take our jobs. The two companies independently revealed that their artificial intelligence systems beat humans at a test of reading comprehension. The test, known as the Stanford Question Answering Dataset (SQuAD), was designed to train AI to answer questions about a set of Wikipedia articles.

Like the image-recognition software already deployed in commercial photo apps, these systems lend the impression that machines have become increasingly capable of replicating human cognition: identifying images or sounds, and now speed reading text passages and spewing back answers with human-level accuracy.

Machine smarts, though, are not always what they seem. The tech mavens who develop deep-learning networks and other AI systems are finding out just how fragile their creations are by drilling down to see if the machines really know anything. Stress-testing software—before it is loaded into a self-driving car, for instance—will be crucial to avoid the blunders that could lead to catastrophic accidents. “In some domains neural nets are actually superhuman, like they’re beating human performance,” says Anish Athalye, a Massachusetts Institute of Technology graduate student who researches AI. “But they have this weird property that it seems that we can trick them pretty easily.”

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Two preprint articles by Athalye and other students at MIT, collectively known as LabSix, demonstrated they could make a deep-learning system—one trained to recognize objects from thousands of examples—into thinking a picture of a skier was a dog (pdf) and a turtle was a rifle (pdf). A December paper from the Google Brain Team, the company’s AI research contingent, used a different approach to trick the system into classifying a banana as a toaster.

None — Placing the adversarial patch, resembling a psychedelic toaster, near the image of the banana made the Google image-recognition system identify the content of the picture as a toaster rather than the relevant piece of fruit. Credit: *Adversarial Patch*, by Tom Brown et al [cs.CV] 27 Dec 2017

In LabSix’s method an algorithm slightly modifies the color or brightness of every pixel in the image. Although the picture looks the same to you or me, these subtle changes cause the system to interpret it as something entirely different. Camouflaging the image modification “makes it more relevant to a real-world attack,” Athalye says. “If you see somebody put up a road sign in the real world that looks all psychedelic, people might think, ‘oh something fishy is going on here,’ and it will be investigated. But if you have something that looks like a speed limit sign to you but your self-driving car thinks it’s something completely different, that’s a much scarier scenario.”

With the toaster, Google Brain took a different tactic. Instead of changing images individually, they wanted to develop a technical foil that could be placed in any scene. This meant creating a new unique image—an adversarial patch—that confuses the deep-learning system and distracts it from focusing on other items. Instead of blending in, the toaster patch needed to stand out. “Given that the patch only has control of pixels within the small circle that it’s in, it turned out that the best way for the patch to fool the classifiers was to become very salient,” Googler Tom Brown wrote in an e-mail. “A traditional adversarial attack changes all the pixels in a single image by a small amount. For the adversarial patch, we change a few pixels by a large amount.”

To work outside a lab, the patch also had to be resilient to the visual noise in the real world. In earlier studies, just changing the orientation or brightness of the altered image could defeat the adversarial technique. A doctored picture of a cat viewed straight on is classified as guacamole, but turn the cat sideways and the system knows it’s looking at a cat again. The toaster patch, by contrast, can be presented in any lighting or orientation and still sabotage the system. “This was more difficult to develop because it meant training the patch in a wide variety of simulated scenes so that we could find a single patch that is successful in all of them,” Brown wrote.

Although the examples are silly, the potential real-world implications are deadly serious. Athalye speculated an adversarial attack could trick a self-driving car into ignoring a stop sign. Or that it could disguise an x-ray image of a bomb during airport baggage screening. A goal of the research of Athalye and Brown is to help identify weaknesses in the technology before it is deployed.

Gary Marcus, a professor of psychology at New York University, suggests AI is susceptible to being duped in this way because “the machine doesn't understand the scene as a whole,” he told me. AI can recognize objects but it fails to comprehend what the object is or what it’s used for. It is not “truly understanding the causal relationships between things, truly understanding who’s doing what to whom and why.”

After the headlines about AI systems acing the reading-comprehension tests, Marcus disparaged the results, saying what the machine was doing had nothing to do with true comprehension. Marcus tweeted: “The SQuAD test shows that machines can highlight relevant passages in text, not that they understand those passages.”

Instead of training an AI system on hundreds of thousands of examples, Marcus thinks the field should take its cues from cognitive psychology to develop software with a deeper understanding. Whereas deep learning can identify a dog and even classify its breed from an image it has never seen before, it does not know the person should be walking the dog instead of the dog walking the person. It does not comprehend what a dog really is and how it is supposed to interact with the world. “We need a different kind of AI architecture that’s about explanation, not just about pattern recognition,” Marcus says.

Until it can do that, our jobs are safe—at least for awhile.