Search CORE

159 research outputs found

Data-driven action-value functions for evaluating players in professional team sports

Author: Liu Guiliang
Publication venue
Publication date: 11/09/2020
Field of study

As more and larger event stream datasets for professional sports become available, there is growing interest in modeling the complex play dynamics to evaluate player performance. Among these models, a common player evaluation method is assigning values to player actions. Traditional action-values metrics, however, consider very limited game context and player information. Furthermore, they provide directly related to goals (e.g., shots), not all actions. Recent work has shown that reinforcement learning provided powerful methods for addressing quantifying the value of player actions in sports. This dissertation develops deep reinforcement learning (DRL) methods for estimating action values in sports. We make several contributions to DRL for sports. First, we develop neural network architectures that learn an action-value Q-function from sports events logs to estimate each team\u27s expected success given the current match context. Specifically, our architecture models the game history with a recurrent network and predicts the probability that a team scores the next goal. From the learned Q-values, we derive a Goal Impact Metric (GIM) for evaluating a player\u27s performance over a game season. We show that the resulting player rankings are consistent with standard player metrics and temporally consistent within and across seasons. Second, we address the interpretability of the learned Q-values. While neural networks provided accurate estimates, the black-box structure prohibits understanding the influence of different game features on the action values. To interpret the Q-function and understand the influence of game features on action values, we design an interpretable mimic learning framework for the DRL. The framework is based on a Linear Model U-Tree (LMUT) as a transparent mimic model, which facilitates extracting the function rules and computing the feature importance for action values. Third, we incorporate information about specific players into the action values, by introducing a deep player representation framework. In this framework, each player is assigned a latent feature vector called an embedding, with the property that statistically similar players are mapped to nearby embeddings. To compute embeddings that summarize the statistical information about players, we implement a Variational Recurrent Ladder Agent Encoder (VaRLAE) to learn a contextualized representation for when and how players are likely to act. We learn and evaluate deep Q-functions from event data for both ice hockey and soccer. These are challenging continuous-flow games where game context and medium-term consequences are crucial for properly assessing the impact of a player\u27s actions

Simon Fraser University Institutional Repository

Efficient Deep Reinforcement Learning via Planning, Generalization, and Improved Exploration

Author: Oh Junhyuk
Publication venue
Publication date: 01/01/2018
Field of study

Reinforcement learning (RL) is a general-purpose machine learning framework, which considers an agent that makes sequential decisions in an environment to maximize its reward. Deep reinforcement learning (DRL) approaches use deep neural networks as non-linear function approximators that parameterize policies or value functions directly from raw observations in RL. Although DRL approaches have been shown to be successful on many challenging RL benchmarks, much of the prior work has mainly focused on learning a single task in a model-free setting, which is often sample-inefficient. On the other hand, humans have abilities to acquire knowledge by learning a model of the world in an unsupervised fashion, use such knowledge to plan ahead for decision making, transfer knowledge between many tasks, and generalize to previously unseen circumstances from the pre-learned knowledge. Developing such abilities are some of the fundamental challenges for building RL agents that can learn as efficiently as humans. As a step towards developing the aforementioned capabilities in RL, this thesis develops new DRL techniques to address three important challenges in RL: 1) planning via prediction, 2) rapidly generalizing to new environments and tasks, and 3) efficient exploration in complex environments. The first part of the thesis discusses how to learn a dynamics model of the environment using deep neural networks and how to use such a model for planning in complex domains where observations are high-dimensional. Specifically, we present neural network architectures for action-conditional video prediction and demonstrate improved exploration in RL. In addition, we present a neural network architecture that performs lookahead planning by predicting the future only in terms of rewards and values without predicting observations. We then discuss why this approach is beneficial compared to conventional model-based planning approaches. The second part of the thesis considers generalization to unseen environments and tasks. We first introduce a set of cognitive tasks in a 3D environment and present memory-based DRL architectures that generalize better to previously unseen 3D environments compared to existing baselines. In addition, we introduce a new multi-task RL problem where the agent should learn to execute different tasks depending on given instructions and generalize to new instructions in a zero-shot fashion. We present a new hierarchical DRL architecture that learns to generalize over previously unseen task descriptions with minimal prior knowledge. The third part of the thesis discusses how exploiting past experiences can indirectly drive deep exploration and improve sample-efficiency. In particular, we propose a new off-policy learning algorithm, called self-imitation learning, which learns a policy to reproduce past good experiences. We empirically show that self-imitation learning indirectly encourages the agent to explore reasonably good state spaces and thus significantly improves sample-efficiency on RL domains where exploration is challenging. Overall, the main contribution of this thesis are to explore several fundamental challenges in RL in the context of DRL and develop new DRL architectures and algorithms to address such challenges. This allows us to understand how deep learning can be used to improve sample efficiency, and thus come closer to human-like learning abilities.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145829/1/junhyuk_1.pd

Deep Blue Documents at the University of Michigan

Autonomous Learning of Core Skills

Author: Lie Martin Forsberg
Publication venue
Publication date: 01/01/2022
Field of study

HIØ Brage

NORA - Norwegian Open Research Archives

Generalization strategies in reinforcement learning

Author: Snel M.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Deep learning for gait prediction: an application to exoskeletons for children with neurological disorders

Author: Kolaghassi Rania
Publication venue
Publication date
Field of study

Cerebral Palsy, a non-progressive neurological disorder, is a lifelong condition. While it has no cure, clinical intervention aims to minimise the impact of the disability on individuals' lives. Wearable robotic devices, like exoskeletons, have been rapidly advancing and proving to be effective in rehabilitating individuals with gait pathologies. The utilization of artificial intelligence (AI) algorithms in controlling exoskeletons, particularly at the supervisory level, has emerged as a valuable approach. These algorithms rely on input from onboard sensors to predict gait phase, user intention, or joint kinematics. Using AI to improve the control of robotic devices not only enhances human-robot interaction but also has the potential to improve user comfort and functional outcomes of rehabilitation, and reduce accidents and injuries. In this research study, a comprehensive systematic literature review is conducted, exploring the various applications of AI in lower-limb robotic control. This review focuses on methodological parameters such as sensor usage, training demographics, sample size, and types of models while identifying gaps in the existing literature. Building on the findings of the review, subsequent research leveraged the power of deep learning to predict gait trajectories for the application of rehabilitative exoskeleton control. This study addresses a gap in the existing literature by focusing on predicting pathological gait trajectories, which exhibit higher inter- and intra-subject variability compared to the gait of healthy individuals. The research focused on the gait of children with neurological disorders, particularly Cerebral Palsy, as they stand to benefit greatly from rehabilitative exoskeletons. State-of-the-art deep learning algorithms, including transformers, fully connected neural networks, convolutional neural networks, and long short-term memory networks, were implemented for gait trajectory prediction. This research presents findings on the performance of these models for short-term and long-term recursive predictions, the impact of varying input and output window sizes on prediction errors, the effect of adding variable levels of Gaussian noise, and the robustness of the models in predicting gait at speeds within and outside the speed range of the training set. Moreover, the research outlines a methodology for optimising the stability of long-term forecasts and provides a comparative analysis of gait trajectory forecasting for typically developing children and children with Cerebral Palsy. A novel approach to generating adaptive trajectories for children with Cerebral Palsy, which can serve as reference trajectories for position-controlled exoskeletons, is also presented

Kent Academic Repository

Deep learning for autonomous control of robot´s flippers in simulation

Author: Azayev Teymur
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date
Field of study

Diky současnému pokroku v algoritmech hlubokého a posilovaného učení, jsou neu-ronové sítě stále častejí použivane v růz-ných robotických ůlohách jako je na-priklad rizeni robotu pri prejizdeni nerov-neho terenu. Zkoumáme různe přístupy hlubokého učeni semi-autonomního algo-ritmu pohybu pro terénniho robota ur-čeneho pro učely 'Search&Rescue' misí s použití jenom čelní kamery a interocep-tivnich dat. Navrhujeme nový algoritmus učení s učitelem a implementujeme ho pro připad kde máme pouze reálného robota bez simulovaného prostředí. Dále navrhu-jeme metodu řešící problém multimoda-lity akcí pomocí Generative Adversarial Networks (GAN). Porovnávame reaktivní a rekurentní chováni implementované po-mocí RNN sítí. Simulátor je použit pro trenování pohybu robotu pomocí hlubo-kého posilovaného učení. Všechny algo-ritmy chováni jsou trenované jako celek, s použitím konvolučních neuronových sítí pro vysokodimenzionální vstupy. Zkou-máme a experimentálně vyhodnocujeme různé metody pro reward shaping jako je napřiklad low control effort a smooth locomotion. Experimenty na realném ro-botu s použitím naučené rekurentní sítě ze simulátoru ukazují, že algoritmus je pou-žitelný i bez nutnosti přeučení na reálném systému. Také navrhujeme dva algoritmy pro domain transfer založene na modifi-kací obrázku s použitím shody s Gram maticí a GAN sítí.Neural networks have seen increasing use in various robotic tasks such as locomotion largely due to advanced in Deep Learning techniques and Reinforcement Learning algorithms. We examine several Deep Learning approaches to learning a semi-autonomous locomotion policy for a ground based search and rescue robot using only front facing RGBD camera and proprioceptive data. A supervised learning approach is suggested and implemented for the case where we only have a real robot and no simulated environment. We also suggest a method to deal with potential issues of multimodal action distributions using an alternative loss proxy based on Generative Adversarial Networks. Reactive as well as recurrent policies implemented using RNNs are compared. A simulator is used to train policies for the robot using Deep Reinforcement Learning. All policies are trained end-to-end, using convolutional neural networks for high dimensional image inputs. We examine the performance of policies trained with variously shaped rewards such as low control effort and smooth locomotion. Experiments are performed on the real robot using a learned RNN policy in the simulator and observe that the policy is transferable with no finetuning to the real environment, albeit, with some performance degradation. We also suggest two potential methods of domain transfer based on image modification using Gram matrix matching and Generative Adversarial Networks

Digital Library of the Czech Technical University in Prague

Advanced Techniques for Design and Manufacturing in Marine Engineering

Author
Publication venue: 'MDPI AG'
Publication date: 21/03/2022
Field of study

Modern engineering design processes are driven by the extensive use of numerical simulations; naval architecture and ocean engineering are no exception. Computational power has been improved over the last few decades; therefore, the integration of different tools such as CAD, FEM, CFD, and CAM has enabled complex modeling and manufacturing problems to be solved in a more feasible way. Classical naval design methodology can take advantage of this integration, giving rise to more robust designs in terms of shape, structural and hydrodynamic performances, and the manufacturing process.This Special Issue invites researchers and engineers from both academia and the industry to publish the latest progress in design and manufacturing techniques in marine engineering and to debate the current issues and future perspectives in this research area. Suitable topics for this issue include, but are not limited to, the following:CAD-based approaches for designing the hull and appendages of sailing and engine-powered boats and comparisons with traditional techniques;Finite element method applications to predict the structural performance of the whole boat or of a portion of it, with particular attention to the modeling of the material used;Embedded measurement systems for structural health monitoring;Determination of hydrodynamic efficiency using experimental, numerical, or semi-empiric methods for displacement and planning hulls;Topology optimization techniques to overcome traditional scantling criteria based on international standards;Applications of additive manufacturing to derive innovative shapes for internal reinforcements or sandwich hull structures

Directory of Open Access Books (DOAB)

Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

Author: Goecks Vinicius G.
Publication venue
Publication date: 30/08/2020
Field of study

Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more information, see https://vggoecks.com

arXiv.org e-Print Archive

Texas A&M Repository