Researchers over at Google’s DeepMind AI division have unveiled an algorithm that has not only taught itself to play 49 different Atari video games, but has also managed to find out optimal strategy to win at a particular game.
Published in journal Nature, DeepMind’s Q-network (DQN) algorithm is able to play 49 computer games originally designed for the Atari 2600 including Breakout, River Raid, Boxing, and Enduro. DQN is not only comfortable playing them, but has also managed to beat human players in quite a few of them.
“This… is the first time that anyone has built a single general learning system that can learn directly from experience to master a wide range of challenging tasks,” said co-developer Demis Hassabis.
The researcher added that this particular achievement brings them close to a future run by smart, general-purpose robots which are able to teach themselves how to perform a particular task, store a “memory” of trial and error and adapt their actions for a better outcome next time around.
The team added that DQN powered machines, in future, may be able to perform highly complex tasks including driving cars, planning holidays and even scientific research.
The team decided to test their algorithm with minimal programming on an Atari game console from the 1980s. The only information they fed was the pixels on the screen, game score with a goal that it has to maximise the score it can achieve.
“Apart from that, they have no idea about what kind of game they are playing, what their controls do, what they’re even controlling in the game”, Hassabis added.
Without any previous experience, the DQN algorithm driven system was presented with an on-screen paddle and ball just like a human player would be. The system learns by activating computer keys randomly until it starts scoring through trial and error.
“The system kind of learns, adapts and gets better and better incrementally until eventually it becomes almost perfect on some of the games,” said Hassabis.
The team said that DQN managed to outperform professional human players in quite a few games. However, DQN isn’t at its best as it fared quite poorly in some including Pac-Man.
DeepMind’s Vlad Mnih said, “it’s very difficult to get your first points or first rewards, so if the game involves solving a maze then pressing keys randomly will not actually get you any points and then the system has nothing to learn from.”
Learning the unexpected
Beyond excelling at just playing the games and beating human players, DQN managed to figured out an optimal strategy in Breakout. It managed to define a strategy that involved digging a tunnel through one side of the wall and send the ball in to bounce behind it, breaking the bricks from the back.
Better than IBM’s Watson
DQN’s creators claimed that their system is way ahead of IBM’s Watson and Deep Blue because their system doesn’t rely on pre-programming. Researchers said that Watson and Deep Blue had largely been preprogrammed with their particular abilities.
“Whereas what we’ve done is build algorithms that learn from the ground up, so literally you give them perceptual experience and they learn how to do things directly from that perceptual experience,” Hassabis told journalists.
The main advantage of DQN is that such systems can learn and adapt to unexpected things – scenarios which its creators would haven’t even thought about. Further, creators of such learning systems aren’t required to know the solution themselves for the machine to work.
General purpose machines that are smart and capable at learning for themselves is the ultimate goal of the research, the researchers said. “We are many decades off from doing that. But I do think that this is the first significant rung of the ladder that we’re on”, Hassabis said.