No Bones about It: People Recognize Objects by Visualizing Their “Skeletons”

This basic ability gives humans a leg up on computers

Do humans learn the same way as computers? Cognitive psychologists have debated this question for decades, but in the past few years the remarkable accomplishments of deep-learning computer systems have fanned the flames, particularly among researchers who study object recognition.

Humans effortlessly know that a tree is a tree and a dog is a dog no matter the size, color or angle at which they’re viewed. In fact, identifying such visual elements is one of the earliest tasks children learn. But researchers have struggled to determine how the brain does this simple evaluation. As deep-learning systems have come to master this ability, scientists have started to ask whether computers analyze data—and particularly images—similarly to the human brain. “The way that the human mind, the human visual system, understands shape is a mystery that has baffled people for many generations, partly because it is so intuitive and yet it’s very difficult to program” says Jacob Feldman, a psychology professor at Rutgers University.

A paper published in Scientific Reports in June comparing various object recognition models came to the conclusion that people do not evaluate an object like a computer processing pixels, but based on an imagined internal skeleton. In the study, researchers from Emory University, led by associate professor of psychology Stella Lourenco, wanted to know if people judged object similarity based on the objects’ skeletons—an invisible axis below the surface that runs through the middle of the object’s shape. The scientists generated 150 unique three-dimensional shapes built around 30 different skeletons and asked participants to determine whether or not two of the objects were the same. Sure enough, the more similar the skeletons were, the more likely participants were to label the objects as the same. The researchers also compared how well other models, such as neural networks (artificial intelligence–based systems) and pixel-based evaluations of the objects, predicted people’s decisions. While the other models matched performance on the task relatively well, the skeletal model always won.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

“There’s a big emphasis on deep neural networks for solving these problems [of object recognition]. These are networks that require lots and lots of training to even learn a single object category, whereas the model that we investigated, a skeletal model, seems to be able to do this without this experience,” says Vladislav Ayzenberg, a doctoral student in Lourenco’s lab. “What our results show is that humans might be able to recognize objects by their internal skeletons, even when you compare skeletal models to these other well-established neural net models of object recognition.”

Next, the researchers pitted the skeletal model against other models of shape recognition, such as ones that focus on the outline. To do so, Ayzenberg and Lourenco manipulated the objects in certain ways, such as shifting the placement of an arm in relation to the rest of the body or changing how skinny, bulging, or wavy the outlines were. People once again judged the objects as being similar based on their skeletons, not their surface qualities.

“This is top-flight work, and I was very impressed with the result,” says Feldman, who was not involved in the research. “They really give empirical evidence—I would say it demonstrated more convincingly than anything I’ve previously seen that shape similarity is computed in the human mind via similarity of shape skeletons.”

One concern with the study is that the authors generated the objects specifically from skeletons rather than deriving them from shapes, either natural or human-made, covered by skin, metal or other materials that people encounter in their day-to-day life. “The shapes that they generated are directly related to the hypothesis they’re testing and the conclusions they’re drawing,” says James Elder, a professor of human and computer vision at York University in Toronto. “If we’re interested in how important skeletons are to shape and object perception, we can’t really answer that question by only looking at the perception of skeleton-generated shapes. Because obviously in a world of skeleton-generated shapes, skeletons are probably fairly important because that’s the way those shapes were made.”

Elder suggests that while the model may explain people’s interpretation of shapes with clearly defined skeletons, such as animals or trees, it is not appropriate for all types of shapes, such as a rock or crumpled-up newspaper. Ayzenberg says that they are addressing this issue in follow-up studies using traditional shapes and naturalistic objects.

The researchers now wonder whether the skeletal model could be incorporated into deep-learning systems so that instead of exploring whether humans learn like computers, scientists could help a computer learn like a human.

“We’re optimistic that it will also speak to and inform artificial neural networks that are trying to simulate human perception,” Lourenco says. “There are shocking ways in which they break down that humans don’t, and so being informed by how humans recognize objects is also going to be very important for them.”