    Building a poker playing agent based on game logs using supervised learning

    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Using a high-level language to build a poker playing agent

    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 200

    A Profitable Online Poker Agent

    Jogos de informação incompleta tais como poker são uma fonte contínua de estudo e pesquisa no âmbito da inteligência artificial. No poker problemas como: modelação de oponentes; gestão de riscos e detecção de bluffs representam um desafio. O desenvolvimento de agentes capazes de considerar esses problemas e realizar cálculos probabilísticos é considerado como uma tarefa árdua de se realizar, uma vez que é exigida uma adaptação dinâmica para que seja criado um agente de poker robusto. Esta tese irá focar-se no desenvolvimento de um agente de poker capaz de jogar contra jogadores humanos e alcançar a adaptação dinâmica necessária para superar alguns jogadores humanos de poker online. Algo que será possível usando um conjunto de informações sobre cada jogador que o agente enfrenta. Utilizando como auxílio o Holdem Manager, uma ferramenta que regista mãos jogadas em salas de poker online, é possível obter estatísticas sobre todos os jogadores que o agente enfrenta nas mesas. O agente é capaz de explorar algumas destas estatísticas de maneira que possa decidir melhor sobre a acção a tomar. Alguns factores como quão agressivo é um adversário, a posição ocupada na mesa, quantos jogadores estão envolvidos, quanto dinheiro está em causa, e o par de cartas que o agente recebe são uma pequena porção do conjunto de informações utilizadas na determinação do comportamento do agente. Este agente foi desenvolvido baseando-se numa estratégia "short stack", e modelando adversários com o auxílio do conjunto de informações reunido através do Holdem Manager. Pela primeira vez na literatura do Computer Poker, são apresentados resultados de jogos de poker online, num ambiente controlado, contra jogadores humanos sem estes saberem que estão em jogo contra um agente. O agente é capaz de jogar poker online ao vivo contra jogadores humanos, e apresenta um pequeno lucro na vertente Texas Hold'em em micro limites6 de apostas, nomeadamente 0.01 e 0.02 cêntimos.Games of incomplete information, such as poker, are a continuous source of research and study in the area of artificial intelligence. Poker presents challenging problems such as opponent modeling, risk management and bluff detection. The development of agents capable of probabilistic calculations considering those problems is considered to be difficult to achieve, since dynamic adaption is required in order to create a robust computer poker player. This thesis focuses on the development of a poker agent able to play against human players and aiming to achieve the dynamic adaptation needed to beat some human players online. This will be achieved by using some sets of information about each player the agent plays against. Using Holdem Manager, a tool that registers the hands played in an online poker room; it is possible to obtain statistics about every player the agent is playing against. The agent is able to explore some of these statistics so that it can better decide on which action to take. Some factors like how aggressive an opponent is, the position held at the table, how many players are involved, how much money is involved, and the hand dealt to the agent are a few portions of the information sets used to compute the agent's behavior. This agent was developed based on a short-stack strategy, and through the use of the sets of information provided by the Holdem Manager. For the first time in the Computer Poker literature, results on online Poker agent games versus human players in a controlled environment are presented, and without the players being aware their opponent was a computer agent. The agent is able to play live online poker versus human players, and presents a small profit in the No-Limit Texas Hold'em poker game at micro stakes, namely 0.02 and 0.01 cents

    Machine learning applied to the context of Poker

    A combinação de princípios da teoria de jogo e metodologias de machine learning aplicados ao contexto de formular estratégias ótimas para jogos está a angariar interesse por parte de uma porção crescentemente significativa da comunidade científica, tornando-se o jogo do Poker num candidato de estudo popular devido à sua natureza de informação imperfeita. Avanços nesta área possuem vastas aplicações em cenários do mundo real, e a área de investigação de inteligência artificial demonstra que o interesse relativo a este objeto de estudo está longe de desaparecer, com investigadores do Facebook e Carnegie Mellon a apresentar, em 2019, o primeiro agente de jogo autónomo de Poker provado como ganhador num cenário com múltiplos jogadores, uma conquista relativamente à anterior especificação do estado da arte, que fora desenvolvida para jogos de apenas 2 jogadores. Este estudo pretende explorar as características de jogos estocásticos de informação imperfeita, recolhendo informação acerca dos avanços nas metodologias disponibilizados por parte de investigadores de forma a desenvolver um agente autónomo de jogo que se pretende inserir na classificação de "utility-maximizing decision-maker".The combination of game theory principles and machine learning methodologies applied to encountering optimal strategies for games is garnering interest from an increasing large portion of the scientific community, with the game of Poker being a popular study subject due to its imperfect information nature. Advancements in this area have a wide array of applications in real-world scenarios, and the field of artificial intelligent studies show that the interest regarding this object of study is yet to fade, with researchers from Facebook and Carnegie Mellon presenting, in 2019, the world’s first autonomous Poker playing agent that is proven to be profitable while confronting multiple players at a time, an achievement in relation to the previous state of the art specification, which was developed for two player games only. This study intends to explore the characteristics of stochastic games of imperfect information, gathering information regarding the advancements in methodologies made available by researchers in order to ultimately develop an autonomous agent intended to adhere to the classification of a utility-maximizing decision-maker

    Computing card probabilities in Texas Hold'em

    Developing Poker agents that can compete at the level of a human expert can be a challenging endeavor, since agents' strategies must be capable of dealing with hidden information, deception and risk management. A way of addressing this issue is to model opponents' behavior in order to estimate their game plan and make decisions based on such estimations. In this paper, several hand evaluation and classification techniques are compared and conclusions on their respective applicability and scope are drawn. Also, we suggest improvements on current techniques through Monte Carlo sampling. The current methods to deal with risk management were found to be pertinent concerning the agent's decision-making process; nevertheless future integration of these methods with opponent modeling techniques can greatly improve overall Poker agents' performance

    A Study on Cognitive Biases in Gambling: Hot Hand and Gamblers' Fallacy

    People who appear to believe in the hot hand expect winning streaks to continue whereas those suffering from the gamblers’ fallacy unreasonably expect losing streaks to reverse. 565,915 sports bets made by 776 online gamblers in 2010 were used for analysis. People who won were more likely to win again whereas those who lost were more likely to lose again. However, selection of safer odds after winning and riskier ones after losing indicates that online sports gamblers expected their luck to reverse: they suffered from the gamblers’ fallacy. By following in the gamblers’ fallacy, they created their own hot hands. Some gamblers consistently outperformed their peers. They also consistently made higher profits or lower losses. They show real expertise. The key of real expertise is the ability to control loss

    A Deep Reinforcement Learning Neural Network Folding Proteins

    Παρά τη σημαντική πρόοδο, η πρόβλεψη δομής πρωτεϊνών από την "εξ αρχής" πρωτεϊνική ακολουθία (ab initio) παραμένει ένα άλυτο πρόβλημα. Μια καλή προσέγγιση αποτελεί το ηλεκτρονικό παιχνίδι παζλ Foldit [1], το οποίο παρείχε στην επιστημονική κοινότητα αρκετά χρήσιμα αποτελέσματα, αντίστοιχα ή ακόμα και καλύτερα από τις μέχρι τώρα υπολογιστικές λύσεις [2]. Χρησιμοποιώντας το Foldit, το κοινό του WeFold [3] είχε αρκετές επιτυχημένες συμμετοχές στην κριτική αξιολόγηση τεχνικών πρόβλεψης δομής των πρωτεϊνών. Βασιζόμενοι στην πρόσφατη έκδοση του Foldit, Folditstandalone [4], εκπαιδεύσαμε ένα νευρωνικό δίκτυο βαθιάς ενισχυτικής μάθησης, το DeepFoldit, για να βελτιώσει τη βαθμολογία που δίνεται σε μια ξεδιπλωμένη πρωτεΐνη, χρησιμοποιώντας τη μέθοδο Q-learning [5] με επανάληψη εμπειρίας (experience replay). Η παρούσα διπλωματική εργασία επικεντρώνεται στη βελτίωση του μοντέλου πρόβλεψης μέσω της ρύθμισης υπερπαραμέτρων. Εξετάσαμε διάφορες υλοποιήσεις, χρησιμοποιώντας διαφορετικές αρχιτεκτονικές μοντέλων και μεταβάλλοντας τις τιμές των υπερπαραμέτρων. Καταλήξαμε σε ένα μοντέλο που επιτυγχάνει καλύτερη ακρίβεια από την αρχική υλοποίηση. Ενισχύθηκε έτσι η απόδοση με το νέο μοντέλο και βελτιώθηκε η ικανότητά του για γενίκευση. Τα αρχικά αποτελέσματα δείχνουν ότι, δεδομένης μιας σειράς μικρών ξετυλιγμένων ευθύγραμμων πρωτεϊνικών μορίων για εκπαίδευση, το DeepFoldit μαθαίνει γρήγορα τις ακολουθίες δράσης που βελτιώνουν τη βαθμολογία τόσο στα δεδομένα που χρησιμοποιήθηκαν στη διαδικασία εκπαίδευσης (training set), όσο και στις νέες δοκιμαστικές πρωτεΐνες (test set). Αυτό είναι σημαντικό καθώς η βελτίωση της βαθμολογίας του παιχνιδιού σημαίνει την επίτευξη μιας καλύτερης αναδίπλωσης, το οποίο μας φέρνει ένα βήμα πιο κοντά στην λύση. Η προσέγγισή μας συνδυάζει την έξυπνη διεπαφή του Foldit με τη δύναμη της βαθιάς ενισχυτικής μάθησης.Despite considerable progress, ab initio protein structure prediction remains unoptimised. A crowdsourcing approach is the online puzzle video game Foldit [1], that provided several useful results that matched or even outperformed algorithmically computed solutions [2]. Using Foldit, the WeFold [3] crowd had several successful participations in the Critical Assessment of Techniques for Protein Structure Prediction. Based on the recent Foldit standalone version [4], we trained a deep reinforcement neural network called DeepFoldit to improve the score assigned to an unfolded protein, using the Q-learning method [5] with experience replay. The thesis is focused on model improvement through hyperparameter tuning. We examined various implementations by examining different model architectures and changing hyperparameter values to improve the accuracy of the model. The new model’s hyper-parameters also improved its ability to generalize. Initial results, from the latest implementation, show that given a set of small unfolded training proteins, DeepFoldit learns action sequences that improve the score both on the training set and on novel test proteins. This is important as improving the game score means obtaining a better folding, taking us one step closer to the solution. Our approach combines the intuitive user interface of Foldit with the efficiency of deep reinforcement learning

    Game theoretic modeling and analysis : A co-evolutionary, agent-based approach

