27 research outputs found
Toward Real-Time Decentralized Reinforcement Learning using Finite Support Basis Functions
This paper addresses the design and implementation of complex Reinforcement
Learning (RL) behaviors where multi-dimensional action spaces are involved, as
well as the need to execute the behaviors in real-time using robotic platforms
with limited computational resources and training times. For this purpose, we
propose the use of decentralized RL, in combination with finite support basis
functions as alternatives to Gaussian RBF, in order to alleviate the effects of
the curse of dimensionality on the action and state spaces respectively, and to
reduce the computation time. As testbed, a RL based controller for the in-walk
kick in NAO robots, a challenging and critical problem for soccer robotics, is
used. The reported experiments show empirically that our solution saves up to
99.94% of execution time and 98.82% of memory consumption during execution,
without diminishing performance compared to classical approaches.Comment: Accepted in the RoboCup Symposium 2017. Final version will be
published at Springe
Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations
Modeling of real-world biological multi-agents is a fundamental problem in
various scientific and engineering fields. Reinforcement learning (RL) is a
powerful framework to generate flexible and diverse behaviors in cyberspace;
however, when modeling real-world biological multi-agents, there is a domain
gap between behaviors in the source (i.e., real-world data) and the target
(i.e., cyberspace for RL), and the source environment parameters are usually
unknown. In this paper, we propose a method for adaptive action supervision in
RL from real-world demonstrations in multi-agent scenarios. We adopt an
approach that combines RL and supervised learning by selecting actions of
demonstrations in RL based on the minimum distance of dynamic time warping for
utilizing the information of the unknown source dynamics. This approach can be
easily applied to many existing neural network architectures and provide us
with an RL model balanced between reproducibility as imitation and
generalization ability to obtain rewards in cyberspace. In the experiments,
using chase-and-escape and football tasks with the different dynamics between
the unknown source and target environments, we show that our approach achieved
a balance between the reproducibility and the generalization ability compared
with the baselines. In particular, we used the tracking data of professional
football players as expert demonstrations in football and show successful
performances despite the larger gap between behaviors in the source and target
environments than the chase-and-escape task.Comment: 14 pages, 5 figure
Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning
Learning Nash equilibrium (NE) in complex zero-sum games with multi-agent
reinforcement learning (MARL) can be extremely computationally expensive.
Curriculum learning is an effective way to accelerate learning, but an
under-explored dimension for generating a curriculum is the difficulty-to-learn
of the subgames -- games induced by starting from a specific state. In this
work, we present a novel subgame curriculum learning framework for zero-sum
games. It adopts an adaptive initial state distribution by resetting agents to
some previously visited states where they can quickly learn to improve
performance. Building upon this framework, we derive a subgame selection metric
that approximates the squared distance to NE values and further adopt a
particle-based state sampler for subgame generation. Integrating these
techniques leads to our new algorithm, Subgame Automatic Curriculum Learning
(SACL), which is a realization of the subgame curriculum learning framework.
SACL can be combined with any MARL algorithm such as MAPPO. Experiments in the
particle-world environment and Google Research Football environment show SACL
produces much stronger policies than baselines. In the challenging
hide-and-seek quadrant environment, SACL produces all four emergent stages and
uses only half the samples of MAPPO with self-play. The project website is at
https://sites.google.com/view/sacl-rl
Recommended from our members
Hypernetworks Analysis of RoboCup Interactions
Robotic soccer simulations are controlled environments in which the rich variety of interactions among agents make them good candidates to be studied as complex adaptive systems. The challenge is to create an autonomous team of soccer agents that can adapt and improve its behaviour as it plays other teams. By analogy with chess, the movements of the soccer agents and the ball form ever-changing networks as players in one team form structures that give their team an advantage. For example, the Defender’s Dilemma involves relationships between an attacker with the ball, a team-mate and a defender. The defender must choose between tackling the player with the ball, or taking a position to intercept a pass to the other attacker. Since these structures involve more that two interacting entities it is necessary to go beyond networks to multidimensional hypernetworks. In this context, this thesis investigates (i) is it possible to identify patterns of play, that lead a team to obtain an advantage ?, (ii) is it possible to forecast with a good degree of accuracy if a certain game action or sequence of game actions is going to be successful, before it has been completed ?, and (iii) is it possible to make behavioural patterns emerge in the game without specifying the behavioural rules in detail ? To investigate these research questions we devised two methods to analyse the interactions between robotic players, one based on traditional programming and one based on Deep Learning. The first method identified thousands of Defender’s Dilemma configurations from RoboCup 2D simulator games and found a statistically significant association between winning and the creation of the defender’s dilemma by the attackers of the winning team. The second method showed that a feedforward Artificial Neural Network trained on thousands of games can take as input the current game configuration and forecast to a high degree of accuracy if the current action will end up in a goal or not. Finally, we designed our own fast and simple robotic soccer simulator for investigating Reinforcement Learning. This showed that Reinforcement Learning using Proximal Policy Optimization could train two agents in the task of scoring a goal, using only basic actions without using pre-built hand-programmed skills. These experiments provide evidence that it is possible: to identify advantageous patterns of play; to forecast if an action or sequence of actions will be successful; and to make behavioural patterns emerge in the game without specifying the behavioural rules in detail
Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning
Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks
Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
Electrical and Electronic Engineerin
Robotics 2010
Without a doubt, robotics has made an incredible progress over the last decades. The vision of developing, designing and creating technical systems that help humans to achieve hard and complex tasks, has intelligently led to an incredible variety of solutions. There are barely technical fields that could exhibit more interdisciplinary interconnections like robotics. This fact is generated by highly complex challenges imposed by robotic systems, especially the requirement on intelligent and autonomous operation. This book tries to give an insight into the evolutionary process that takes place in robotics. It provides articles covering a wide range of this exciting area. The progress of technical challenges and concepts may illuminate the relationship between developments that seem to be completely different at first sight. The robotics remains an exciting scientific and engineering field. The community looks optimistically ahead and also looks forward for the future challenges and new development
Talent Identification and Development in Sports Performance
The identification and development of talent have always been a relevant topic in sports performance. In fact, a significant body of research is available worldwide discussing this longitudinal process, the qualities that underpin elite sports performance, and how coaches can facilitate the developmental process of talented athletes. Despite the continued interest given to issues of talent identification and development, recent literature highlights the low predictive value of applied and theoretical talent identification models.
Talent is the expression of a complex and multidimensional phenomenon, where, despite the existing practical recommendations, many coaches and stakeholders continue to fail to adequately value the distinction between growth, maturation, and training age. Technological resources have enabled important advances, however, this has been limited essentially to defining or validating motor skills variables or genetic markers that characterize the most talented athletes. Emerging technological resources and recent methodological advances are enabling integrated assessment and monitoring to include maturational, physiological, biomechanical, and perceptual skills while also creating optimal environments for performance and dealing with injury prevention and recovery
The frequency of falls in children judo training
Purpose: Falling techniques are inseparable part of youth judo training. Falling techniques are related to avoiding injuries exercises (Nauta et al., 2013). There is not good evidence about the ratio of falling during the training in children. Methods: 26 children (age 8.88±1.88) were video recorded on ten training sessions for further indirect observation and performance analysis. Results: Research protocol consisted from recording falls and falling techniques (Reguli et al., 2015) in warming up, combat games, falling techniques, throwing techniques and free fighting (randori) part of the training session. While children were taught almost exclusively forward slapping roll, backward slapping roll and sideward direct slapping fall, in other parts of training also other types of falling, as forward fall on knees, naturally occurred. Conclusions: Judo coaches should stress also on teaching unorthodox falls adding to standard judo curriculum (Koshida et al., 2014). Various falling games to teach children safe falling in different conditions should be incorporated into judo training. Further research to gain more data from groups of different age in various combat and non-combat sports is needed