296 research outputs found

    General self-motivation and strategy identification : Case studies based on Sokoban and Pac-Man

    Get PDF
    (c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.In this paper, we use empowerment, a recently introduced biologically inspired measure, to allow an AI player to assign utility values to potential future states within a previously unencountered game without requiring explicit specification of goal states. We further introduce strategic affinity, a method of grouping action sequences together to form "strategies," by examining the overlap in the sets of potential future states following each such action sequence. We also demonstrate an information-theoretic method of predicting future utility. Combining these methods, we extend empowerment to soft-horizon empowerment which enables the player to select a repertoire of action sequences that aim to maintain anticipated utility. We show how this method provides a proto-heuristic for nonterminal states prior to specifying concrete game goals, and propose it as a principled candidate model for "intuitive" strategy selection, in line with other recent work on "self-motivated agent behavior." We demonstrate that the technique, despite being generically defined independently of scenario, performs quite well in relatively disparate scenarios, such as a Sokoban-inspired box-pushing scenario and in a Pac-Man-inspired predator game, suggesting novel and principle-based candidate routes toward more general game-playing algorithms.Peer reviewedFinal Accepted Versio

    Self-Motivated Composition of Strategic Action Policies

    Get PDF
    In the last 50 years computers have made dramatic progress in their capabilities, but at the same time their failings have demonstrated that we, as designers, do not yet understand the nature of intelligence. Chess playing, for example, was long offered up as an example of the unassailability of the human mind to Artificial Intelligence, but now a chess engine on a smartphone can beat a grandmaster. Yet, at the same time, computers struggle to beat amateur players in simpler games, such as Stratego, where sheer processing power cannot substitute for a lack of deeper understanding. The task of developing that deeper understanding is overwhelming, and has previously been underestimated. There are many threads and all must be investigated. This dissertation explores one of those threads, namely asking the question “How might an artificial agent decide on a sensible course of action, without being told what to do?”. To this end, this research builds upon empowerment, a universal utility which provides an entirely general method for allowing an agent to measure the preferability of one state over another. Empowerment requires no explicit goals, and instead favours states that maximise an agent’s control over its environment. Several extensions to the empowerment framework are proposed, which drastically increase the array of scenarios to which it can be applied, and allow it to evaluate actions in addition to states. These extensions are motivated by concepts such as bounded rationality, sub-goals, and anticipated future utility. In addition, the novel concept of strategic affinity is proposed as a general method for measuring the strategic similarity between two (or more) potential sequences of actions. It does this in a general fashion, by examining how similar the distribution of future possible states would be in the case of enacting either sequence. This allows an agent to group action sequences, even in an unknown task space, into ‘strategies’. Strategic affinity is combined with the empowerment extensions to form soft-horizon empowerment, which is capable of composing action policies in a variety of unknown scenarios. A Pac-Man-inspired prey game and the Gambler’s Problem are used to demonstrate this selfmotivated action selection, and a Sokoban inspired box-pushing scenario is used to highlight the capability to pick strategically diverse actions. The culmination of this is that soft-horizon empowerment demonstrates a variety of ‘intuitive’ behaviours, which are not dissimilar to what we might expect a human to try. This line of thinking demonstrates compelling results, and it is suggested there are a couple of avenues for immediate further research. One of the most promising of these would be applying the self-motivated methodology and strategic affinity method to a wider range of scenarios, with a view to developing improved heuristic approximations that generate similar results. A goal of replicating similar results, whilst reducing the computational overhead, could help drive an improved understanding of how we may get closer to replicating a human-like approach

    Self-organisation of internal models in autonomous robots

    Get PDF
    Internal Models (IMs) play a significant role in autonomous robotics. They are mechanisms able to represent the input-output characteristics of the sensorimotor loop. In developmental robotics, open-ended learning of skills and knowledge serves the purpose of reaction to unexpected inputs, to explore the environment and to acquire new behaviours. The development of the robot includes self-exploration of the state-action space and learning of the environmental dynamics. In this dissertation, we explore the properties and benefits of the self-organisation of robot behaviour based on the homeokinetic learning paradigm. A homeokinetic robot explores the environment in a coherent way without prior knowledge of its configuration or the environment itself. First, we propose a novel approach to self-organisation of behaviour by artificial curiosity in the sensorimotor loop. Second, we study how different forward models settings alter the behaviour of both exploratory and goal-oriented robots. Diverse complexity, size and learning rules are compared to assess the importance in the robot’s exploratory behaviour. We define the self-organised behaviour performance in terms of simultaneous environment coverage and best prediction of future sensori inputs. Among the findings, we have encountered that models with a fast response and a minimisation of the prediction error by local gradients achieve the best performance. Third, we study how self-organisation of behaviour can be exploited to learn IMs for goal-oriented tasks. An IM acquires coherent self-organised behaviours that are then used to achieve high-level goals by reinforcement learning (RL). Our results demonstrate that learning of an inverse model in this context yields faster reward maximisation and a higher final reward. We show that an initial exploration of the environment in a goal-less yet coherent way improves learning. In the same context, we analyse the self-organisation of central pattern generators (CPG) by reward maximisation. Our results show that CPGs can learn favourable reward behaviour on high-dimensional robots using the self-organised interaction between degrees of freedom. Finally, we examine an on-line dual control architecture where we combine an Actor-Critic RL and the homeokinetic controller. With this configuration, the probing signal is generated by the exertion of the embodied robot experience with the environment. This set-up solves the problem of designing task-dependant probing signals by the emergence of intrinsically motivated comprehensible behaviour. Faster improvement of the reward signal compared to classic RL is achievable with this configuration

    The free energy principle induces neuromorphic development

    Get PDF
    We show how any finite physical system with morphological, i.e. three-dimensional embedding or shape, degrees of freedom and locally limited free energy will, under the constraints of the free energy principle, evolve over time towards a neuromorphic morphology that supports hierarchical computations in which each ‘level’ of the hierarchy enacts a coarse-graining of its inputs, and dually, a fine-graining of its outputs. Such hierarchies occur throughout biology, from the architectures of intracellular signal transduction pathways to the large-scale organization of perception and action cycles in the mammalian brain. The close formal connections between cone-cocone diagrams (CCCD) as models of quantum reference frames on the one hand, and between CCCDs and topological quantum field theories on the other, allow the representation of such computations in the fully-general quantum-computational framework of topological quantum neural networks

    Antecipação na tomada de decisĂŁo com mĂșltiplos critĂ©rios sob incerteza

    Get PDF
    Orientador: Fernando JosĂ© Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia ElĂ©trica e de ComputaçãoResumo: A presença de incerteza em resultados futuros pode levar a indecisĂ”es em processos de escolha, especialmente ao elicitar as importĂąncias relativas de mĂșltiplos critĂ©rios de decisĂŁo e de desempenhos de curto vs. longo prazo. Algumas decisĂ”es, no entanto, devem ser tomadas sob informação incompleta, o que pode resultar em açÔes precipitadas com consequĂȘncias imprevisĂ­veis. Quando uma solução deve ser selecionada sob vĂĄrios pontos de vista conflitantes para operar em ambientes ruidosos e variantes no tempo, implementar alternativas provisĂłrias flexĂ­veis pode ser fundamental para contornar a falta de informação completa, mantendo opçÔes futuras em aberto. A engenharia antecipatĂłria pode entĂŁo ser considerada como a estratĂ©gia de conceber soluçÔes flexĂ­veis as quais permitem aos tomadores de decisĂŁo responder de forma robusta a cenĂĄrios imprevisĂ­veis. Essa estratĂ©gia pode, assim, mitigar os riscos de, sem intenção, se comprometer fortemente a alternativas incertas, ao mesmo tempo em que aumenta a adaptabilidade Ă s mudanças futuras. Nesta tese, os papĂ©is da antecipação e da flexibilidade na automação de processos de tomada de decisĂŁo sequencial com mĂșltiplos critĂ©rios sob incerteza Ă© investigado. O dilema de atribuir importĂąncias relativas aos critĂ©rios de decisĂŁo e a recompensas imediatas sob informação incompleta Ă© entĂŁo tratado pela antecipação autĂŽnoma de decisĂ”es flexĂ­veis capazes de preservar ao mĂĄximo a diversidade de escolhas futuras. Uma metodologia de aprendizagem antecipatĂłria on-line Ă© entĂŁo proposta para melhorar a variedade e qualidade dos conjuntos futuros de soluçÔes de trade-off. Esse objetivo Ă© alcançado por meio da previsĂŁo de conjuntos de mĂĄximo hipervolume esperado, para a qual as capacidades de antecipação de metaheurĂ­sticas multi-objetivo sĂŁo incrementadas com rastreamento bayesiano em ambos os espaços de busca e dos objetivos. A metodologia foi aplicada para a obtenção de decisĂ”es de investimento, as quais levaram a melhoras significativas do hipervolume futuro de conjuntos de carteiras financeiras de trade-off avaliadas com dados de açÔes fora da amostra de treino, quando comparada a uma estratĂ©gia mĂ­ope. AlĂ©m disso, a tomada de decisĂ”es flexĂ­veis para o rebalanceamento de carteiras foi confirmada como uma estratĂ©gia significativamente melhor do que a de escolher aleatoriamente uma decisĂŁo de investimento a partir da fronteira estocĂĄstica eficiente evoluĂ­da, em todos os mercados artificiais e reais testados. Finalmente, os resultados sugerem que a antecipação de opçÔes flexĂ­veis levou a composiçÔes de carteiras que se mostraram significativamente correlacionadas com as melhorias observadas no hipervolume futuro esperado, avaliado com dados fora das amostras de treinoAbstract: The presence of uncertainty in future outcomes can lead to indecision in choice processes, especially when eliciting the relative importances of multiple decision criteria and of long-term vs. near-term performance. Some decisions, however, must be taken under incomplete information, what may result in precipitated actions with unforeseen consequences. When a solution must be selected under multiple conflicting views for operating in time-varying and noisy environments, implementing flexible provisional alternatives can be critical to circumvent the lack of complete information by keeping future options open. Anticipatory engineering can be then regarded as the strategy of designing flexible solutions that enable decision makers to respond robustly to unpredictable scenarios. This strategy can thus mitigate the risks of strong unintended commitments to uncertain alternatives, while increasing adaptability to future changes. In this thesis, the roles of anticipation and of flexibility on automating sequential multiple criteria decision-making processes under uncertainty are investigated. The dilemma of assigning relative importances to decision criteria and to immediate rewards under incomplete information is then handled by autonomously anticipating flexible decisions predicted to maximally preserve diversity of future choices. An online anticipatory learning methodology is then proposed for improving the range and quality of future trade-off solution sets. This goal is achieved by predicting maximal expected hypervolume sets, for which the anticipation capabilities of multi-objective metaheuristics are augmented with Bayesian tracking in both the objective and search spaces. The methodology has been applied for obtaining investment decisions that are shown to significantly improve the future hypervolume of trade-off financial portfolios for out-of-sample stock data, when compared to a myopic strategy. Moreover, implementing flexible portfolio rebalancing decisions was confirmed as a significantly better strategy than to randomly choosing an investment decision from the evolved stochastic efficient frontier in all tested artificial and real-world markets. Finally, the results suggest that anticipating flexible choices has lead to portfolio compositions that are significantly correlated with the observed improvements in out-of-sample future expected hypervolumeDoutoradoEngenharia de ComputaçãoDoutor em Engenharia ElĂ©tric

    Principles of sensorimotor control and learning in complex motor tasks

    Get PDF
    The brain coordinates a continuous coupling between perception and action in the presence of uncertainty and incomplete knowledge about the world. This mapping is enabled by control policies and motor learning can be perceived as the update of such policies on the basis of improving performance given some task objectives. Despite substantial progress in computational sensorimotor control and empirical approaches to motor adaptation, to date it remains unclear how the brain learns motor control policies while updating its internal model of the world. In light of this challenge, we propose here a computational framework, which employs error-based learning and exploits the brain’s inherent link between forward models and feedback control to compute dynamically updated policies. The framework merges optimal feedback control (OFC) policy learning with a steady system identification of task dynamics so as to explain behavior in complex object manipulation tasks. Its formalization encompasses our empirical findings that action is learned and generalised both with regard to a body-based and an object-based frame of reference. Importantly, our approach predicts successfully how the brain makes continuous decisions for the generation of complex trajectories in an experimental paradigm of unfamiliar task conditions. A complementary method proposes an expansion of the motor learning perspective at the level of policy optimisation to the level of policy exploration. It employs computational analysis to reverse engineer and subsequently assess the control process in a whole body manipulation paradigm. Another contribution of this thesis is to associate motor psychophysics and computational motor control to their underlying neural foundation; a link which calls for further advancement in motor neuroscience and can inform our theoretical insight to sensorimotor processes in a context of physiological constraints. To this end, we design, build and test an fMRI-compatible haptic object manipulation system to relate closed-loop motor control studies to neurophysiology. The system is clinically adjusted and employed to host a naturalistic object manipulation paradigm on healthy human subjects and Friedreich’s ataxia patients. We present methodology that elicits neuroimaging correlates of sensorimotor control and learning and extracts longitudinal neurobehavioral markers of disease progression (i.e. neurodegeneration). Our findings enhance the understanding of sensorimotor control and learning mechanisms that underlie complex motor tasks. They furthermore provide a unified methodological platform to bridge the divide between behavior, computation and neural implementation with promising clinical and technological implications (e.g. diagnostics, robotics, BMI).Open Acces

    Intrinsic Motivation in Computational Creativity Applied to Videogames

    Get PDF
    PhD thesisComputational creativity (CC) seeks to endow artificial systems with creativity. Although human creativity is known to be substantially driven by intrinsic motivation (IM), most CC systems are extrinsically motivated. This restricts their actual and perceived creativity and autonomy, and consequently their benefit to people. In this thesis, we demonstrate, via theoretical arguments and through applications in videogame AI, that computational intrinsic reward and models of IM can advance core CC goals. We introduce a definition of IM to contextualise related work. Via two systematic reviews, we develop typologies of the benefits and applications of intrinsic reward and IM models in CC and game AI. Our reviews highlight that related work is limited to few reward types and motivations, and we thus investigate the usage of empowerment, a little studied, information-theoretic intrinsic reward, in two novel models applied to game AI. We define coupled empowerment maximisation (CEM), a social IM model, to enable general co-creative agents that support or challenge their partner through emergent behaviours. Via two qualitative, observational vignette studies on a custom-made videogame, we explore CEM’s ability to drive general and believable companion and adversary non-player characters which respond creatively to changes in their abilities and the game world. We moreover propose to leverage intrinsic reward to estimate people’s experience of interactive artefacts in an autonomous fashion. We instantiate this proposal in empowerment-based player experience prediction (EBPXP) and apply it to videogame procedural content generation. By analysing think-aloud data from an experiential vignette study on a dedicated game, we identify several experiences that EBPXP could predict. Our typologies serve as inspiration and reference for CC and game AI researchers to harness the benefits of IM in their work. Our new models can increase the generality, autonomy and creativity of next-generation videogame AI, and of CC systems in other domains
    • 

    corecore