2,157 research outputs found

    SAI, a Sensible Artificial Intelligence that plays Go

    Full text link
    We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero paradigm. The winrate as a function of the komi is modeled with a two-parameters sigmoid function, so that the neural network must predict just one more variable to assess the winrate for all komi values. A second novel feature is that training is based on self-play games that occasionally branch -- with changed komi -- when the position is uneven. With this setting, reinforcement learning is showed to work on 7x7 Go, obtaining very strong playing agents. As a useful byproduct, the sigmoid parameters given by the network allow to estimate the score difference on the board, and to evaluate how much the game is decided.Comment: Updated for IJCNN 2019 conferenc

    SAI: A sensible artificial intelligence that plays with handicap and targets high scores in 9x9 Go

    Get PDF
    We develop a new framework for the game of Go to target a high score, and thus a perfect play. We integrate this framework into the Monte Carlo tree search - policy iteration learning pipeline introduced by Google DeepMind with AlphaGo. Training on 9Ă—9 Go produces a superhuman Go player, thus proving that this framework is stable and robust. We show that this player can be used to effectively play with both positional and score handicap. We develop a family of agents that can target high scores against any opponent, recover from very severe disadvantage against weak opponents, and avoid suboptimal moves

    Maine Campus March 02 1987

    Get PDF

    Adaptive evolution in static and dynamic environments

    Get PDF
    This thesis provides a framework for describing a canonical evolutionary system. Populations of individuals are envisaged as traversing a search space structured by genetic and developmental operators under the influence of selection. Selection acts on individuals' phenotypic expressions, guiding the population over an evaluation landscape, which describes an idealised evaluation surface over the phenotypic space. The corresponding valuation landscape describes evaluations over the genotypic space and may be transformed by within generation adaptive (learning) or maladaptive (fault induction) local search. Populations subjected to particular genetic and selection operators are claimed to evolve towards a region of the valuation landscape with a characteristic local ruggedness, as given by the runtime operator correlation coefficient. This corresponds to the view of evolution discovering an evolutionarily stable population, or quasi-species, held in a state of dynamic equilibrium by the operator set and evaluation function. This is demonstrated by genetic algorithm experiments using the NK landscapes and a novel, evolvable evaluation function, The Tower of Babel. In fluctuating environments of varying temporal ruggedness, different operator sets are correspondingly more or less adapted. Quantitative genetics analyses of populations in sinusoidally fluctuating conditions are shown to describe certain well known electronic filters. This observation suggests the notion of Evolutionary Signal Processing. Genetic algorithm experiments in which a population tracks a sinusoidally fluctuating optimum support this view. Using a self-adaptive mutation rate, it is possible to tune the evolutionary filter to the environmental frequency. For a time varying frequency, the mutation rate reacts accordingly. With local search, the valuation landscape is transformed through temporal smoothing. By coevolving modifier genes for individual learning and the rate at which the benefits may be directly transmitted to the next generation, the relative adaptedness of individual learning and cultural inheritance according to the rate of environmental change is demonstrated

    AI-based smart sensing and AR for gait rehabilitation assessment

    Get PDF
    Health monitoring is crucial in hospitals and rehabilitation centers. Challenges can affect the reliability and accuracy of health data. Human error, patient compliance concerns, time, money, technology, and environmental factors might cause these issues. In order to improve patient care, healthcare providers must address these challenges. We propose a non-intrusive smart sensing system that uses a SensFloor smart carpet and an inertial measurement unit (IMU) wearable sensor on the user’s back to monitor position and gait characteristics. Furthermore, we implemented machine learning (ML) algorithms to analyze the data collected from the SensFloor and IMU sensors. The system generates real-time data that are stored in the cloud and are accessible to physical therapists and patients. Additionally, the system’s real-time dashboards provide a comprehensive analysis of the user’s gait and balance, enabling personalized training plans with tailored exercises and better rehabilitation outcomes. Using non-invasive smart sensing technology, our proposed solution enables healthcare facilities to monitor patients’ health and enhance their physical rehabilitation plans.info:eu-repo/semantics/publishedVersio

    Future state maximisation as an intrinsic motivation for decision making

    Get PDF
    The concept of an “intrinsic motivation" is used in the psychology literature to distinguish between behaviour which is motivated by the expectation of an immediate, quantifiable reward (“extrinsic motivation") and behaviour which arises because it is inherently useful, interesting or enjoyable. Examples of the latter can include curiosity driven behaviour such as exploration and the accumulation of knowledge, as well as developing skills that might not be immediately useful but that have the potential to be re-used in a variety of different future situations. In this thesis, we examine a candidate for an intrinsic motivation with wide-ranging applicability which we refer to as “future state maximisation". Loosely speaking this is the idea that, taking everything else to be equal, decisions should be made so as to maximally keep one's options open, or to give the maximal amount of control over what one can potentially do in the future. Our goal is to study how this principle can be applied in a quantitative manner, as well as identifying examples of systems where doing so could be useful in either explaining or generating behaviour. We consider a number of examples, however our primary application is to a model of collective motion in which we consider a group of agents equipped with simple visual sensors, moving around in two dimensions. In this model, agents aim to make decisions about how to move so as to maximise the amount of control they have over the potential visual states that they can access in the future. We find that with each agent following this simple, low-level motivational principle a swarm spontaneously emerges in which the agents exhibit rich collective behaviour, remaining cohesive and highly-aligned. Remarkably, the emergent swarm also shares a number of features which are observed in real flocks of starlings, including scale free correlations and marginal opacity. We go on to explore how the model can be developed to allow us to manipulate and control the swarm, as well as looking at heuristics which are able to mimic future state maximisation whilst requiring significantly less computation, and so which could plausibly operate under animal cognition
    • …
    corecore