Search CORE

879 research outputs found

Machine learning applied to the context of Poker

Author: Martins Tiago Silva
Publication venue
Publication date: 01/01/2020
Field of study

A combinação de princípios da teoria de jogo e metodologias de machine learning aplicados ao contexto de formular estratégias ótimas para jogos está a angariar interesse por parte de uma porção crescentemente significativa da comunidade científica, tornando-se o jogo do Poker num candidato de estudo popular devido à sua natureza de informação imperfeita. Avanços nesta área possuem vastas aplicações em cenários do mundo real, e a área de investigação de inteligência artificial demonstra que o interesse relativo a este objeto de estudo está longe de desaparecer, com investigadores do Facebook e Carnegie Mellon a apresentar, em 2019, o primeiro agente de jogo autónomo de Poker provado como ganhador num cenário com múltiplos jogadores, uma conquista relativamente à anterior especificação do estado da arte, que fora desenvolvida para jogos de apenas 2 jogadores. Este estudo pretende explorar as características de jogos estocásticos de informação imperfeita, recolhendo informação acerca dos avanços nas metodologias disponibilizados por parte de investigadores de forma a desenvolver um agente autónomo de jogo que se pretende inserir na classificação de "utility-maximizing decision-maker".The combination of game theory principles and machine learning methodologies applied to encountering optimal strategies for games is garnering interest from an increasing large portion of the scientific community, with the game of Poker being a popular study subject due to its imperfect information nature. Advancements in this area have a wide array of applications in real-world scenarios, and the field of artificial intelligent studies show that the interest regarding this object of study is yet to fade, with researchers from Facebook and Carnegie Mellon presenting, in 2019, the world’s first autonomous Poker playing agent that is proven to be profitable while confronting multiple players at a time, an achievement in relation to the previous state of the art specification, which was developed for two player games only. This study intends to explore the characteristics of stochastic games of imperfect information, gathering information regarding the advancements in methodologies made available by researchers in order to ultimately develop an autonomous agent intended to adhere to the classification of a utility-maximizing decision-maker

Repositório Científico do Instituto Politécnico do Porto

Learning to Identify Bugs in Video Games

Author: Wilkins Ben
Publication venue
Publication date: 01/01/2023
Field of study

The use of intelligent software agents promises to revolutionise video game testing. While agents automate the time-consuming task of repeatedly playing a game in search of issues, humans can spend their time on the more creative aspects of game development. Despite the substantial advancements in game-playing that have made this possible, agents are reliant on humans, or hand-crafted guards, to determine whether there are issues with the game's design or functioning. This thesis aimed to develop testing agents that can identify issues with a game's function or bugs with minimal human involvement by learning from their prior experiences. The problem is framed as one of anomaly detection, where bugs correspond to abnormality or novelty in an agent's experience. A series of approaches based on Self-Supervised Learning and Causal Inference have been developed to enable an agent to measure abnormality or otherwise model the game to subsequently identify bugs. The focus was on laying the foundations for testing agents that operate over the same input/output modalities as human testers. The approaches were evaluated by testing a diverse collection of purpose-built video games, where they successfully identified bugs from a broad class. This thesis is among the first work to investigate the use of machine learning in the context of video game bug identification. It presents an exposition of the problem of learning intended behaviour, and then endeavours to develop solutions that demonstrate the benefits of using agents with learning capabilities for testing. Namely, ease of reuse across projects (reusability) and in identifying bugs that would otherwise require human involvement to be found (capability). The use of agents equipped with sophisticated game-playing algorithms and the identification tools outlined in this thesis offers a new framework for video game testing

Royal Holloway - Pure

Bandit algorithms for searching large spaces

Author: Dorard L.R.M.
Publication venue: UCL (University College London)
Publication date: 28/04/2012
Field of study

Bandit games consist of single-state environments in which an agent must sequentially choose actions to take, for which rewards are given. The objective being to maximise the cumulated reward, the agent naturally seeks to build a model of the relationship between actions and rewards. The agent must both choose uncertain actions in order to improve its model (exploration), and actions that are believed to yield high rewards according to the model (exploitation). The choice of an action to take is called a play of an arm of the bandit, and the total number of plays may or may not be known in advance. Algorithms designed to handle the exploration-exploitation dilemma were initially motivated by problems with rather small numbers of actions. But the ideas they were based on have been extended to cases where the number of actions to choose from is much larger than the maximum possible number of plays. Several problems fall into this setting, such as information retrieval with relevance feedback, where the system must learn what a user is looking for while serving relevant documents often enough, but also global optimisation, where the search for an optimum is done by selecting where to acquire potentially expensive samples of a target function. All have in common the search of large spaces. In this thesis, we focus on an algorithm based on the Gaussian Processes probabilistic model, often used in Bayesian optimisation, and the Upper Confidence Bound action-selection heuristic that is popular in bandit algorithms. In addition to demonstrating the advantages of the GP-UCB algorithm on an image retrieval problem, we show how it can be adapted in order to search tree-structured spaces. We provide an efficient implementation, theoretical guarantees on the algorithm's performance, and empirical evidence that it handles large branching factors better than previous bandit-based algorithms, on synthetic trees

UCL Discovery

Recommended from our members

People centred eco-design: consumer adoption of low and zero carbon products and systems

Author: Caird Sally
Potter Stephen
Roy Robin
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2007
Field of study

Literature review, research model and findings of exploratory empirical research on consumer adoption and effective use of low and zero carbon technologies ranging from a hybrid car to solar water heating systems

Open Research Online (The Open University)

Label Efficient 3D Scene Understanding

Author: Griffiths David
Publication venue: UCL (University College London)
Publication date: 28/10/2022
Field of study

3D scene understanding models are becoming increasingly integrated into modern society. With applications ranging from autonomous driving, Augmented Real- ity, Virtual Reality, robotics and mapping, the demand for well-behaved models is rapidly increasing. A key requirement for training modern 3D models is high- quality manually labelled training data. Collecting training data is often the time and monetary bottleneck, limiting the size of datasets. As modern data-driven neu- ral networks require very large datasets to achieve good generalisation, finding al- ternative strategies to manual labelling is sought after for many industries. In this thesis, we present a comprehensive study on achieving 3D scene under- standing with fewer labels. Specifically, we evaluate 4 approaches: existing data, synthetic data, weakly-supervised and self-supervised. Existing data looks at the potential of using readily available national mapping data as coarse labels for train- ing a building segmentation model. We further introduce an energy-based active contour snake algorithm to improve label quality by utilising co-registered LiDAR data. This is attractive as whilst the models may still require manual labels, these labels already exist. Synthetic data also exploits already existing data which was not originally designed for training neural networks. We demonstrate a pipeline for generating a synthetic Mobile Laser Scanner dataset. We experimentally evalu- ate if such a synthetic dataset can be used to pre-train smaller real-world datasets, increasing the generalisation with less data. A weakly-supervised approach is presented which allows for competitive per- formance on challenging real-world benchmark 3D scene understanding datasets with up to 95% less data. We propose a novel learning approach where the loss function is learnt. Our key insight is that the loss function is a local function and therefore can be trained with less data on a simpler task. Once trained our loss function can be used to train a 3D object detector using only unlabelled scenes. Our method is both flexible and very scalable, even performing well across datasets. Finally, we propose a method which only requires a single geometric represen- tation of each object class as supervision for 3D monocular object detection. We discuss why typical L2-like losses do not work for 3D object detection when us- ing differentiable renderer-based optimisation. We show that the undesirable local- minimas that the L2-like losses fall into can be avoided with the inclusion of a Generative Adversarial Network-like loss. We achieve state-of-the-art performance on the challenging 6DoF LineMOD dataset, without any scene level labels

UCL Discovery

Reinforcement Learning and Game Theory for Smart Grid Security

Author: Paul Shuva
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 01/01/2019
Field of study

This dissertation focuses on one of the most critical and complicated challenges facing electric power transmission and distribution systems which is their vulnerability against failure and attacks. Large scale power outages in Australia (2016), Ukraine (2015), India (2013), Nigeria (2018), and the United States (2011, 2003) have demonstrated the vulnerability of power grids to cyber and physical attacks and failures. These incidents clearly indicate the necessity of extensive research efforts to protect the power system from external intrusion and to reduce the damages from post-attack effects. We analyze the vulnerability of smart power grids to cyber and physical attacks and failures, design different gametheoretic approaches to identify the critical components vulnerable to attack and propose their associated defense strategy, and utilizes machine learning techniques to solve the game-theoretic problems in adversarial and collaborative adversarial power grid environment. Our contributions can be divided into three major parts:Vulnerability identification: Power grid outages have disastrous impacts on almost every aspect of modern life. Despite their inevitability, the effects of failures on power grids’ performance can be limited if the system operator can predict and identify the vulnerable elements of power grids. To enable these capabilities we study machine learning algorithms to identify critical power system elements adopting a cascaded failure simulator as a threat and attack model. We use generation loss, time to reach a certain percentage of line outage/generation loss, number of line outages, etc. as evaluation metrics to evaluate the consequences of threat and attacks on the smart power grid.Adversarial gaming in power system: With the advancement of the technologies, the smart attackers are deploying different techniques to supersede the existing protection scheme. In order to defend the power grid from these smart attackers, we introduce an adversarial gaming environment using machine learning techniques which is capable of replicating the complex interaction between the attacker and the power system operators. The numerical results show that a learned defender successfully narrows down the attackers’ attack window and reduce damages. The results also show that considering some crucial factors, the players can independently execute actions without detailed information about each other.Deep learning for adversarial gaming: The learning and gaming techniques to identify vulnerable components in the power grid become computationally expensive for large scale power systems. The power system operator needs to have the advanced skills to deal with the large dimensionality of the problem. In order to aid the power system operator in finding and analyzing vulnerability for large scale power systems, we study a deep learning technique for adversary game which is capable of dealing with high dimensional power system state space with less computational time and increased computational efficiency. Overall, the results provided in this dissertation advance power grids’ resilience and security by providing a better understanding of the systems’ vulnerability and by developing efficient algorithms to identify vulnerable components and appropriate defensive strategies to reduce the damages of the attack

Public Research Access Institutional Repository and Information Exchange