You have to know when to hold ’em and when to fold ’em.
Now a program developed by computer scientist at the University of Alberta can do both – and do it much better than an entire cohort of professional poker players.
The achievement marks a new milestone for artificial intelligence (AI) involving deep learning, a style of programming that mimics certain aspects of how human brains acquire expertise. But while the program, dubbed DeepStack, represents a significant step, a rival U.S. team said that the method by which it was tested against humans was insufficient to reveal the true extent of its capabilities.
Games provide an important testbed for artificial intelligence because they offer a well-defined arena where programming approaches can be evaluated and compared. Last year, another deep learning system developed by Google DeepMind managed to beat the world champion at Go, a board game that is fiendishly complex despite its simple rules because of the number of possible decisions a player can make.
Poker – specifically the version known as Heads-Up No-Limit Texas Hold’em – presents a different kind of challenge. Unlike Go or chess, where both players can assess the state of the game simply by looking at the board, poker players must deal with incomplete knowledge because of cards that are hidden from view.
“The essence of poker is being able to make decisions when you don’t have all of the information that you need,” said Michael Bowling, who leads the university’s computer poker research group. Dr. Bowling added that the same kind of reasoning is often required when computers have to solve real-world problems, which makes poker an attractive hurdle for designers of intelligent systems.
The Alberta group has been working for 20 years on programs that try to “solve” poker. In 2008, it developed an algorithm that could defeat top human players at the heads-up limit version of the game, in which all bets are of fixed size. There are one thousand billion different decision points than can arise in such a game, a numerical challenge that Dr. Bowling compares to checkers. While not trivial, it’s a game that a computer can be hardwired to win.
In comparison, the no-limit version of the game is astronomically more complicated because players can choose to bet any amount up to the number of chips in their possession. A winning strategy often involves betting high when the opponent believes – incorrectly – that his or her hand is the stronger one.
In designing DeepStack, Dr. Bowling’s team, together with colleagues at the Czech Technical University in Prague, had to create a system that not only understood the strength of its own hand and make an informed guess about its opponent’s, but also weigh what its opponent might be thinking in order to bluff and conceal its own intentions.
“People think of bluffing as this very human, psychological thing, but it pretty much falls out of the mathematics of the game,” Dr. Bowling said. He added that DeepStack had to be able to learn how to bluff, otherwise “it would be a terrible player.”
The team developed a deep-learning system that tried to make the best choice by looking only a few actions ahead, otherwise it would be overwhelmed by the mathematical possibilities. The system was trained using an army of lesser computers who played through a multitude of game scenarios, gradually building up DeepStack’s intuition for what to do. An overview of how the program works along with the results of its matchups against human players were published Thursday in the journal Science.
To test the system, the team recruited 33 professional poker players from 17 countries to go toe-to-toe with DeepStack, with an offer of cash prizes up to $5,000 awarded to the top three players. The players were each given four weeks in late 2016 to complete 3,000 games against the program. Only a third of the human players went the full distance. Of those, all but one were beaten by a significant enough margin to rule out luck.
Tuomas Sandholm, who leads the Alberta group’s chief competitor at Carnegie Mellon University in Pittsburgh, said that DeepStack featured a new combination of programming methods that made it a potent player.
However, he cited several weaknesses in the way the system was tested, including the fact that the players DeepStack faced were not the world’s best and the prizes they were offered were likely not sufficient to motivate the players to perform at their sharpest. He also said that 3,000 matches would not provide humans with enough experience to learn to adjust and potentially outsmart the program.
Dr. Sandholm’s team has been working with a different system called Libratus that runs on a supercomputer and does not employ deep learning. but instead uses a trial-and-error approach called reinforcement learning. In a sign of how close the competition has become, last month Libratus beat a team of four top, human Heads-Up No-Limit Texas Hold’em players. There are no plans as yet for a tournament that would pit the two systems against each other.
In the meantime, Dr. Bowling said there was plenty of scope to beef up DeepStack’s capabilities and also to try it out on different variations of no-limit poker that more closely resemble a human championship game.Report Typo/Error