AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor

DeepMind’s Go game-playing AI—which dominated its human competition—just got better

Earlier this year the AlphaGo artificial intelligence program ended humanity’s 2,500 years of supremacy at the board game go. Not content with its 3–0 victory over the world’s top player, AlphaGo creator DeepMind Technologies on Wednesday unveiled an enhanced version—AlphaGo Zero—which the company says soundly thumped its predecessor program in an AI face-off, winning all 100 games played. But perhaps even more significant than these victories is how AlphaGo Zero became so dominant. Unlike the original AlphaGo, which DeepMind trained over time using large quantities of human knowledge and supervision, the new system’s algorithm taught itself to master the game.

AI lets computers recognize faces, make online purchasing recommendations and even parallel park cars. Computers gain these abilities from “learning algorithms,” written by humans who feed massive amounts of training data into an artificial neural network (named for its ability to process information in a way that is loosely based on the brain’s nerve cell structure). This process is called machine learning. In AlphaGo’s case this involved analyzing millions of moves made by human go experts and playing many, many games against itself to reinforce what it learned. AlphaGo defeated Ke Jie—the world’s top human go player—in May. In March 2016 it beat another top master, Lee Sedol, with the aid of multiple neural networks whose computers required 48 tensor processing units (TPUs)—specialized microchips designed specifically for neural network training.

AlphaGo Zero’s training involved four TPUs and a single neural network that initially knew nothing about go. The AI learned without supervision—it simply played against itself, and soon was able to anticipate its own moves and how they would affect a game’s outcome. “This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge,” according to a blog post authored by DeepMind co-founder Demis Hassabis and David Silver, who leads the company’s reinforcement learning research group. (DeepMind is a division of Alphabet, Inc., Google’s parent company.) One problem with AI that always relies on human knowledge is that such information may be too expensive, too unreliable or simply nonexistent in certain situations. “If similar techniques can be applied to other structured problems such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society,” the blog post says.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

AlphaGo Zero even devised its own unconventional strategies. The game go is typically played using “stones” colored either black or white on a board with a 19 by 19 grid. Each player places stones with the objective of surrounding an opponent’s. “In training, AlphaGo Zero discovered, played and ultimately learned to prefer a series of new joseki [corner sequence] variants that were previously unknown,” says DeepMind spokesperson Jon Fildes. Go games typically start with plays in the grid’s corners, allowing one player to gain a better overall position on the board. “Like move 37 in the second game against Lee Sedol, these moments of algorithmic inspiration give us a glimpse of the creativity of AlphaGo and the potential of AI,” the spokesperson adds. An Young-gil, a South Korean professional go player of 8-dan rank (9-dan is the highest), singled out move 37 as “rare and intriguing” play shortly after the March 2016 match.

DeepMind’s study describes “a very impressive technical result; and both their ability to do it—and their ability to train the system in 40 days, on four TPUs—is remarkable,” says Oren Etzioni, chief executive officer of the Allen Institute for Artificial Intelligence (AI2), an organization that Microsoft co-founder Paul Allen formed in 2014 to focus on AI’s potential benefits. “While many have used [reinforcement learning] before, the technical aspects of the work are novel.”

AlphaGo Zero’s success bodes well for AI’s mastery of games, Etzioni says. Still, “I think it would be a mistake to believe that we’ve learned something general about thinking and about learning for general intelligence,” he adds. “This approach won’t work in more ill-structured problems like natural-language understanding or robotics, where the state space is more complex and there isn’t a clear objective function.”

Unsupervised training is the key to ultimately creating AI that can think for itself, Etzioni says, but “more research is needed outside of the confines of board games and predefined objective functions” before computers can really begin to think outside the box.