955 research outputs found
Learning Temporal Strategic Relationships using Generative Adversarial Imitation Learning
This paper presents a novel framework for automatic learning of complex
strategies in human decision making. The task that we are interested in is to
better facilitate long term planning for complex, multi-step events. We observe
temporal relationships at the subtask level of expert demonstrations, and
determine the different strategies employed in order to successfully complete a
task. To capture the relationship between the subtasks and the overall goal, we
utilise two external memory modules, one for capturing dependencies within a
single expert demonstration, such as the sequential relationship among
different sub tasks, and a global memory module for modelling task level
characteristics such as best practice employed by different humans based on
their domain expertise. Furthermore, we demonstrate how the hidden state
representation of the memory can be used as a reward signal to smooth the state
transitions, eradicating subtle changes. We evaluate the effectiveness of the
proposed model for an autonomous highway driving application, where we
demonstrate its capability to learn different expert policies and outperform
state-of-the-art methods. The scope in industrial applications extends to any
robotics and automation application which requires learning from complex
demonstrations containing series of subtasks.Comment: International Foundation for Autonomous Agents and Multiagent
Systems, 201
VPE: Variational Policy Embedding for Transfer Reinforcement Learning
Reinforcement Learning methods are capable of solving complex problems, but
resulting policies might perform poorly in environments that are even slightly
different. In robotics especially, training and deployment conditions often
vary and data collection is expensive, making retraining undesirable.
Simulation training allows for feasible training times, but on the other hand
suffers from a reality-gap when applied in real-world settings. This raises the
need of efficient adaptation of policies acting in new environments. We
consider this as a problem of transferring knowledge within a family of similar
Markov decision processes.
For this purpose we assume that Q-functions are generated by some
low-dimensional latent variable. Given such a Q-function, we can find a master
policy that can adapt given different values of this latent variable. Our
method learns both the generative mapping and an approximate posterior of the
latent variables, enabling identification of policies for new tasks by
searching only in the latent space, rather than the space of all policies. The
low-dimensional space, and master policy found by our method enables policies
to quickly adapt to new environments. We demonstrate the method on both a
pendulum swing-up task in simulation, and for simulation-to-real transfer on a
pushing task
Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories
With the explosion in the availability of spatio-temporal tracking data in
modern sports, there is an enormous opportunity to better analyse, learn and
predict important events in adversarial group environments. In this paper, we
propose a deep decision tree architecture for discriminative dictionary
learning from adversarial multi-agent trajectories. We first build up a
hierarchy for the tree structure by adding each layer and performing feature
weight based clustering in the forward pass. We then fine tune the player role
weights using back propagation. The hierarchical architecture ensures the
interpretability and the integrity of the group representation. The resulting
architecture is a decision tree, with leaf-nodes capturing a dictionary of
multi-agent group interactions. Due to the ample volume of data available, we
focus on soccer tracking data, although our approach can be used in any
adversarial multi-agent domain. We present applications of proposed method for
simulating soccer games as well as evaluating and quantifying team strategies.Comment: To appear in 4th International Workshop on Computer Vision in Sports
(CVsports) at CVPR 201
Pedestrian Trajectory Prediction with Structured Memory Hierarchies
This paper presents a novel framework for human trajectory prediction based
on multimodal data (video and radar). Motivated by recent neuroscience
discoveries, we propose incorporating a structured memory component in the
human trajectory prediction pipeline to capture historical information to
improve performance. We introduce structured LSTM cells for modelling the
memory content hierarchically, preserving the spatiotemporal structure of the
information and enabling us to capture both short-term and long-term context.
We demonstrate how this architecture can be extended to integrate salient
information from multiple modalities to automatically store and retrieve
important information for decision making without any supervision. We evaluate
the effectiveness of the proposed models on a novel multimodal dataset that we
introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from
a radar system and a CCTV camera system installed in a public place. The
performance is also evaluated on the publicly available New York Grand Central
pedestrian database. In both settings, the proposed models demonstrate their
capability to better anticipate future pedestrian motion compared to existing
state of the art.Comment: To appear in ECML-PKDD 201
A brief survey of deep reinforcement learning
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higherlevel understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) to scale to problems that were previously intractable, such as learning to play video games directly from pixels. DRL algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of RL, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. To conclude, we describe several current areas of research within the field
Trustworthy Edge Machine Learning: A Survey
The convergence of Edge Computing (EC) and Machine Learning (ML), known as
Edge Machine Learning (EML), has become a highly regarded research area by
utilizing distributed network resources to perform joint training and inference
in a cooperative manner. However, EML faces various challenges due to resource
constraints, heterogeneous network environments, and diverse service
requirements of different applications, which together affect the
trustworthiness of EML in the eyes of its stakeholders. This survey provides a
comprehensive summary of definitions, attributes, frameworks, techniques, and
solutions for trustworthy EML. Specifically, we first emphasize the importance
of trustworthy EML within the context of Sixth-Generation (6G) networks. We
then discuss the necessity of trustworthiness from the perspective of
challenges encountered during deployment and real-world application scenarios.
Subsequently, we provide a preliminary definition of trustworthy EML and
explore its key attributes. Following this, we introduce fundamental frameworks
and enabling technologies for trustworthy EML systems, and provide an in-depth
literature review of the latest solutions to enhance trustworthiness of EML.
Finally, we discuss corresponding research challenges and open issues.Comment: 27 pages, 7 figures, 10 table
- …