2,522 research outputs found

    Open-ended Learning in Symmetric Zero-sum Games

    Get PDF
    Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio

    A Simple Generative Model of Collective Online Behaviour

    Full text link
    Human activities increasingly take place in online environments, providing novel opportunities for relating individual behaviours to population-level outcomes. In this paper, we introduce a simple generative model for the collective behaviour of millions of social networking site users who are deciding between different software applications. Our model incorporates two distinct components: one is associated with recent decisions of users, and the other reflects the cumulative popularity of each application. Importantly, although various combinations of the two mechanisms yield long-time behaviour that is consistent with data, the only models that reproduce the observed temporal dynamics are those that strongly emphasize the recent popularity of applications over their cumulative popularity. This demonstrates---even when using purely observational data without experimental design---that temporal data-driven modelling can effectively distinguish between competing microscopic mechanisms, allowing us to uncover new aspects of collective online behaviour.Comment: Updated, with new figures and Supplementary Informatio

    Has The World Changed? My Neighbor Might Know Effects of Social Context on Routine Deviation

    Get PDF
    In two experiments we studied the effects of behavioral models on routine deviation decisions in observers. Participants repeatedly chose among four card-deck lotteries together with a human model (confederate, Exp. 1) or a non-human model (computer, Exp. 2) that made correct decisions in the majority of the trials. In a learning phase, participants acquired a choice routine (preferring one deck over others). In a subsequent test phase, participants had to adapt to changes in the payoff structure that required them to deviate from their routine. We found a strong tendency to maintain the routine despite negative feedback (routine effect). In a social situation (Exp.1), models decrease routine effects more intensely than in non social situations (Exp.2). The process of adaptation follows a belief updating process. Results indicate that the model effect is not due to an increase of the sample of relevant information nor to application of a simply copy heuristic. Rather, deviation models may provide a cue for change that fosters reevaluation of the situation in the observer.Experienced-based decision making, routine, habit, adaptation, social influence, Bayesian updating, novelty

    A Generalized Training Approach for Multiagent Learning

    Get PDF
    This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, α\alpha-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and α\alpha-Rank. We demonstrate the competitive performance of α\alpha-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where α\alpha-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain

    A Configurable Event-Driven Convolutional Node with Rate Saturation Mechanism for Modular ConvNet Systems Implementation

    Get PDF
    Convolutional Neural Networks (ConvNets) are a particular type of neural network often used for many applications like image recognition, video analysis or natural language processing. They are inspired by the human brain, following a specific organization of the connectivity pattern between layers of neurons known as receptive field. These networks have been traditionally implemented in software, but they are becoming more computationally expensive as they scale up, having limitations for real-time processing of high-speed stimuli. On the other hand, hardware implementations show difficulties to be used for different applications, due to their reduced flexibility. In this paper, we propose a fully configurable event-driven convolutional node with rate saturation mechanism that can be used to implement arbitrary ConvNets on FPGAs. This node includes a convolutional processing unit and a routing element which allows to build large 2D arrays where any multilayer structure can be implemented. The rate saturation mechanism emulates the refractory behavior in biological neurons, guaranteeing a minimum separation in time between consecutive events. A 4-layer ConvNet with 22 convolutional nodes trained for poker card symbol recognition has been implemented in a Spartan6 FPGA. This network has been tested with a stimulus where 40 poker cards were observed by a Dynamic Vision Sensor (DVS) in 1 s time. Different slow-down factors were applied to characterize the behavior of the system for high speed processing. For slow stimulus play-back, a 96% recognition rate is obtained with a power consumption of 0.85mW. At maximum play-back speed, a traffic control mechanism downsamples the input stimulus, obtaining a recognition rate above 63% when less than 20% of the input events are processed, demonstrating the robustness of the networkEuropean Union 644096, 687299Gobierno de España TEC2016-77785- P, TEC2015-63884-C2-1-PJunta de Andalucía TIC-6091, TICP120

    Policy Space Diversity for Non-Transitive Games

    Full text link
    Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants

    <i>Skyfall</i> and Global Casino Culture

    Get PDF

    Exploring social gambling: scoping, classification and evidence review

    Get PDF
    The aim of this report is to speculate on the level of concern we might have regarding consumer risk in relation to ‘social gambling.’ In doing so, this report is intended to help form the basis to initiate debate around a new and under-researched social issue; assist in setting a scientific research agenda; and, where appropriate, highlight concerns about any potential areas that need to be considered in terms of precautionary regulation. This report does not present a set of empirical research findings regarding ‘social gambling’ but rather gathers information to improve stakeholder understanding

    Understanding the convergence of online sports betting markets

    Get PDF
    Betting on sports via online platforms has rapidly become a popular form of gambling in many countries. Despite the growing body of research investigating the psychosocial and individual psychological factors determining gambling behaviour, much less attention has been devoted to understanding the market characteristics of online sports betting and its intersection with products from adjacent industries. From an economic convergence perspective, the present paper explores the integration of online sports betting within the digital, sporting and gambling sectors, examining how data markets, eSports, virtual sports, social gaming, immersive reality tools, sports media, sport sponsorship, fantasy sports, in-venue and in-stadium betting, poker and trading are all converging around betting activity. Through this convergence process, it is argued that internet-based sports gambling is colonizing different forms of entertainment, and expanding marketing opportunities, as well as raising psychosocial concerns about the influence of such an integration process.</jats:p

    Effort Estimation in Agile Software Development: A Systematic Map Study

    Get PDF
    Introduction − Making effort estimation as accurate and suitable for software development projects becomes a fundamental stage to favor its success, which is a difficult task, since the application of these techniques in constant changing agile development projects raises the need to evaluate different methods frequently.  Objectives− The objective of this study is to provide a state of the art on techniques of effort estimation in agile software development (ASD), performance evaluation and the drawbacks that arise in its application.  Method− A systematic mapping was developed involving the creation of research questions to provide a layout of this study, analysis of related words for the implementation of a search query to obtain related studies, application of exclusion, inclusion, and quality criteria to filter nonrelated studies and finally the organization and extraction of the necessary information from each study.   Results− 25 studies were selected; the main findings are: the most applied estimation techniques in agile contexts are: Estimation of Story Points (SP) followed by Planning Poker (PP) and Expert Judgment (EJ). The most frequent solutions supported in computational techniques such as: Naive Bayes, Regression Algorithms and Hybrid System; also, the performance evaluation measures Mean Magnitude of Relative Error (MMRE), Prediction Assessment (PRED) and Mean Absolute Error (MAE) have been found to be the most commonly used. Additionally, parameters such as feasibility, experience, and the delivery of expert knowledge, as well as the constant particularity and lack of data in the process of creating models to be applied to a limited number of environments are the challenges that arise the most when estimating software in agile software development (ASD)    Conclusions− It has been found there is an increase in the number of articles that address effort estimation in agile development, however, it becomes evident the need to improve the accuracy of the estimation by using estimation  techniques supported in machine learning  that have been shown to facilitate and improve the performance of this.  Key Words − Effort Estimation; Agile Software Development; Issues and Challenges; Automatic Learning; Performance Metrics  Introducción − Realizar una estimación de esfuerzo lo más precisa y adecuada para proyectos de desarrollo de software, se ha convertido en pieza fundamental para favorecer el éxito y desarrollo de los mismos, sin embargo, aplicar este tipo de estimación en proyectos de desarrollo ágil, en donde los cambios son constantes, la convierte en una tarea muy compleja de implementar.    Objetivo− El objetivo de este estudio es proveer un estado del arte sobre técnicas de estimación de esfuerzo en desarrollo de software ágil, la evaluación de su desempeño y los inconvenientes que se presentan en su aplicación.    Metodología− Se desarrolló un mapeo sistemático que involucró la creación de preguntas de investigación con el fin de proveer una estructura a seguir, análisis de palabras relacionadas con el tema de investigación para la creación e implementación de una cadena de búsqueda para la identificación de estudios relacionados con el tema, aplicación de criterios de exclusión, inclusión y calidad a los artículos encontrados para poder descartar estudios no relevantes y finalmente la organización y extracción de la información necesaria de cada artículo.     Resultados− De los 25 estudios seleccionados; los principales hallazgos son: las técnicas de estimación más aplicadas en contextos ágiles son: Estimación por medio de Puntos de Historia (SP) seguidos de Planning Poker (PP) y Juicio de Expertos (EJ). Soluciones soportadas en técnicas computacionales como: Naive Bayes, Algoritmos de Regresión y Sistema Híbridos; también se ha encontrado que la Magnitud Media del Error Relativo (MMRE), la Evaluación de la Predicción (PRED) y Error Absoluto Medio (MAE) son las medidas de evaluación de desempeño más usadas. Adicionalmente, se ha encontrado que parámetros como la viabilidad, la experiencia y la entrega de conocimiento de expertos, así como la constante particularidad y falta de datos en el proceso de creación de modelos para aplicarse a un limitado número de entornos son los desafíos que más se presentan al momento de realizar estimación de software en el desarrollo de software ágil (ASD)    Conclusiones− Se ha encontrado que existe un aumento en la cantidad de artículos que abordan la estimación de esfuerzo en el desarrollo ágil, sin embargo, se hace evidente la necesidad de mejorar la precisión de la estimación mediante el uso de técnicas de estimación soportadas en el aprendizaje de máquina que han demostrado que facilita y mejora el desempeño de este.    
    corecore