5 research outputs found

    Programming Robosoccer agents by modelling human behavior

    Get PDF
    The Robosoccer simulator is a challenging environment for artificial intelligence, where a human has to program a team of agents and introduce it into a soccer virtual environment. Most usually, Robosoccer agents are programmed by hand. In some cases, agents make use of Machine learning (ML) to adapt and predict the behavior of the opposite team, but the bulk of the agent has been preprogrammed. The main aim of this paper is to transform Robosoccer into an interactive game and let a human control a Robosoccer agent. Then ML techniques can be used to model his/her behavior from training instances generated during the play. This model will be used later to control a Robosoccer agent, thus imitating the human behavior. We have focused our research on low-level behavior, like looking for the ball, conducting the ball towards the goal, or scoring in the presence of opponent players. Results have shown that indeed, Robosoccer agents can be controlled by programs that model human play.Publicad

    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

    Full text link
    Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. This re-weighted sampling strategy may be combined with any offline RL algorithm. We further analyze that the opportunity for performance improvement over the behavior policy correlates with the positive-sided variance of the returns of the trajectories in the dataset. We empirically show that while CQL, IQL, and TD3+BC achieve only a part of this potential policy improvement, these same algorithms combined with our reweighted sampling strategy fully exploit the dataset. Furthermore, we empirically demonstrate that, despite its theoretical limitation, the approach may still be efficient in stochastic environments. The code is available at https://github.com/Improbable-AI/harness-offline-rl

    Inducing models of human control skills

    Full text link

    Learning obstacle avoidance by a mobile robot

    Full text link

    Exploring participative learner modelling and its effects on learner behaviour

    Get PDF
    scholarship 64999/111091The educational benefits of involving learners as active players in the learner modelling process have been an important motivation for research on this form of learner modelling, henceforth referred to as participative learner modelling. Such benefits, conceived as the promotion of learners' reflection on and awareness of their own knowledge, have in most cases been asserted on the grounds of system design and supported only by anecdotal evidence. This dissertation explores the issue of whether participative learner modelling actually promotes learners' reflection and awareness. It does so by firstly interpreting 'reflection' and 'awareness' in light of "classical" theories of human cognitive architecture, skill acquisition and meta-cognition, in order to infer changes in learner abilities (and therefore behaviour) amenable to empirical corroboration. The occurrence of such changes is then tested for an implementation of a paradigmatic form of participative learner modelling: allowing learners to inspect and modify their learner models. The domain of application centres on the sensorimotor skill of controlling a pole on a cart and represents a novel type of domain for participative learner modelling. Special attention is paid to evaluating the method developed for constructing learner models and the form of presenting them to learners: the former is based on a method known as behavioural cloning for acquiring expert knowledge by means of machine learning; the latter deals with the modularity of the learner models and the modality and interactivity of their presentation. The outcome of this research suggests that participative learner modelling may increase the abilities of learners to report accurately their problem-solving knowledge and to carry out novel tasks in the same domain—the sort of behavioural changes expected from increased learners' awareness and reflection. More importantly perhaps, the research suggests a viable methodology for examining the educational benefits of participative learner modelling. It also exemplifies the difficulties that such endeavours will face
    corecore