Skip to main content

Deep-Learning Networks Rival Human Vision

AI now matches or exceeds the ability of experts in medicine and other fields to interpret what they see

For most of the past 30 years, computer vision technologies have struggled to help humans with visual tasks, even those as mundane as accurately recognizing faces in photographs. Recently, though, breakthroughs in deep learning, an emerging field of artificial intelligence, have finally enabled computers to interpret many kinds of images as successfully as, or better than, people do. Companies are already selling products that exploit the technology, which is likely to take over or assist in a wide range of tasks that people now perform, from driving trucks to reading scans for diagnosing medical disorders.

Recent progress in a deep-learning approach known as a convolutional neural network (CNN) is key to the latest strides. To give a simple example of its prowess, consider images of animals. Whereas humans can easily distinguish between a cat and a dog, CNNs allow machines to categorize specific breeds more successfully than people can. It excels because it is better able to learn, and draw inferences from, subtle, telling patterns in the images.

Convolutional neural networks do not need to be programmed to recognize specific features in images—for example, the shape and size of an animal’s ears. Instead they learn to spot features such as these on their own, through training. To train a CNN to separate an English springer spaniel from a Welsh one, for instance, you start with thousands of images of animals, including examples of either breed. Like most deep-learning networks, CNNs are organized in layers. In the lower layers, they learn simple shapes and edges from the images. In the higher layers, they learn complex and abstract concepts—in this case, features of ears, tails, tongues, fur textures, and so on. Once trained, a CNN can easily decide whether a new image of an animal shows a breed of interest.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


CNNs were made possible by the tremendous progress in graphics processing units and parallel processing in the past decade. But the Internet has made a profound difference as well by feeding CNNs’ insatiable appetite for digitized images.

Computer-vision systems powered by deep learning are being developed for a range of applications. The technology is making self-driving cars safer by enhancing the ability to recognize pedestrians. Insurers are starting to apply deep-learning tools to assess damage to cars. In the security camera industry, CNNs are making it possible to understanding crowd behavior, which will make public places and airports safer. In agriculture, deep-learning applications can be used to predict crop yields, monitor water levels and help detect crop diseases before they spread.

Deep learning for visual tasks is making some of its broadest inroads in medicine, where it can speed experts’ interpretation of scans and pathology slides and provide critical information in places that lack professionals trained to read the images—be it for screening, diagnosis, or monitoring of disease progression or response to therapy. This year, for instance, the U.S. Food and Drug Administration approved a deep-learning approach from the start-up Arterys for visualizing blood flow in the heart; the purpose is to help diagnose heart disease. Also this year, Sebastian Thrun of Stanford University and his colleagues described a system in Nature that classified skin cancer as well as dermatologists did. The researchers noted that such a program installed on smartphones, which are ubiquitous around the world, could provide “low-cost universal access to vital diagnostic care.” Systems are also being developed to assess diabetic retinopathy (a cause of blindness), stroke, bone fractures, Alzheimer’s disease and other maladies.

Apurv Mishra, an inventor and TED fellow, is chief technology officer at doc.ai--an artificial-intelligence company focused on health care. Previously he was COO of Datawallet, founder of Glavio Wearable Computing and vice president of Hypios. He has served on the World Economic Forum’s Global Agenda Council on Emerging Technologies. He holds a master’s degree in technology policy from the University of Cambridge.

More by Apurv Mishra