During training, the players first face simple one-player games, such as finding a purple cube or placing a yellow ball on a red floor, to more complex multiplayer games like hide and seek, or capture the flag, where teams compete to be the first to find and grab their opponent’s flag. The playground manager has no specific goal, but aims to improve the general capability of its players over time.
Why is this cool? AIs like DeepMind’s AlphaZero have beaten the world’s best human players at chess and Go. But they can only learn one game at a time. As DeepMind’s co-founder Shane Legg put it when I spoke to him last year, it’s like having to swap out your chess brain for your Go brain each time you want to switch games.
Researchers are now trying to build AIs that can learn multiple tasks at once, which means teaching them general skills that make it easier to adapt to new tasks.
One exciting trend in this direction is open-ended learning, where AIs are trained on many different tasks without a specific goal. In many ways, this is how humans and other animals seem to learn, via aimless play. But this requires a vast amount of data. XLand generates that data automatically, in the form of an endless stream of challenges. It is similar to POET, an AI training dojo where two-legged bots learn to navigate obstacles in a 2D landscape. XLand’s world is much more complex and detailed, however.
XLand is also an example of AI learning to make itself, or what Jeff Clune, who helped develop POET and leads a team working on this topic at OpenAI, calls AI-generating algorithms (AI-GAs). “This work pushes the frontiers of AI-GAs,” says Clune. “It is very exciting to see.”