5 research outputs found
Programming Robosoccer agents by modelling human behavior
The Robosoccer simulator is a challenging environment for artificial intelligence, where a human has to program a team of agents and introduce it into a soccer virtual environment. Most usually, Robosoccer agents are programmed by hand. In some cases, agents make use of Machine learning (ML) to adapt and predict the behavior of the opposite team, but the bulk of the agent has been preprogrammed. The main aim of this paper is to transform Robosoccer into an interactive game and let a human control a Robosoccer agent. Then ML techniques can be used to model his/her behavior from training instances generated during the play. This model will be used later to control a Robosoccer agent, thus imitating the human behavior. We have focused our research on low-level behavior, like looking for the ball, conducting the ball towards the goal, or scoring in the presence of opponent players. Results have shown that indeed, Robosoccer agents can be controlled by programs that model human play.Publicad
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
Most offline reinforcement learning (RL) algorithms return a target policy
maximizing a trade-off between (1) the expected performance gain over the
behavior policy that collected the dataset, and (2) the risk stemming from the
out-of-distribution-ness of the induced state-action occupancy. It follows that
the performance of the target policy is strongly related to the performance of
the behavior policy and, thus, the trajectory return distribution of the
dataset. We show that in mixed datasets consisting of mostly low-return
trajectories and minor high-return trajectories, state-of-the-art offline RL
algorithms are overly restrained by low-return trajectories and fail to exploit
high-performing trajectories to the fullest. To overcome this issue, we show
that, in deterministic MDPs with stochastic initial states, the dataset
sampling can be re-weighted to induce an artificial dataset whose behavior
policy has a higher return. This re-weighted sampling strategy may be combined
with any offline RL algorithm. We further analyze that the opportunity for
performance improvement over the behavior policy correlates with the
positive-sided variance of the returns of the trajectories in the dataset. We
empirically show that while CQL, IQL, and TD3+BC achieve only a part of this
potential policy improvement, these same algorithms combined with our
reweighted sampling strategy fully exploit the dataset. Furthermore, we
empirically demonstrate that, despite its theoretical limitation, the approach
may still be efficient in stochastic environments. The code is available at
https://github.com/Improbable-AI/harness-offline-rl
Exploring participative learner modelling and its effects on learner behaviour
scholarship 64999/111091The educational benefits of involving learners as active players in the learner modelling process
have been an important motivation for research on this form of learner modelling, henceforth
referred to as participative learner modelling. Such benefits, conceived as the promotion of learners' reflection on and awareness of their own knowledge, have in most cases been asserted
on the grounds of system design and supported only by anecdotal evidence. This dissertation explores the issue of whether participative learner modelling actually promotes
learners' reflection and awareness. It does so by firstly interpreting 'reflection' and
'awareness' in light of "classical" theories of human cognitive architecture, skill acquisition
and meta-cognition, in order to infer changes in learner abilities (and therefore behaviour)
amenable to empirical corroboration. The occurrence of such changes is then tested for an implementation of a paradigmatic form of participative learner modelling: allowing learners to
inspect and modify their learner models. The domain of application centres on the sensorimotor skill of controlling a pole on a cart
and represents a novel type of domain for participative learner modelling. Special attention is paid to evaluating the method developed for constructing learner models and the form of presenting them to learners: the former is based on a method known as behavioural cloning for acquiring expert knowledge by means of machine learning; the latter deals with the modularity of the learner models and the modality and interactivity of their presentation. The outcome of this research suggests that participative learner modelling may increase the abilities of learners to report accurately their problem-solving knowledge and to carry out novel tasks in the same domain—the sort of behavioural changes expected from increased learners' awareness and reflection. More importantly perhaps, the research suggests a viable methodology for examining the educational benefits of participative learner modelling. It also exemplifies the difficulties that such endeavours will face