14,462 research outputs found
A Survey on Dialogue Systems: Recent Advances and New Frontiers
Dialogue systems have attracted more and more attention. Recent advances on
dialogue systems are overwhelmingly contributed by deep learning techniques,
which have been employed to enhance a wide range of big data applications such
as computer vision, natural language processing, and recommender systems. For
dialogue systems, deep learning can leverage a massive amount of data to learn
meaningful feature representations and response generation strategies, while
requiring a minimum amount of hand-crafting. In this article, we give an
overview to these recent advances on dialogue systems from various perspectives
and discuss some possible research directions. In particular, we generally
divide existing dialogue systems into task-oriented and non-task-oriented
models, then detail how deep learning techniques help them with representative
algorithms and finally discuss some appealing research directions that can
bring the dialogue system research into a new frontier.Comment: 13 pages. arXiv admin note: text overlap with arXiv:1703.01008 by
other author
Inverse Reinforcement Learning via Deep Gaussian Process
We propose a new approach to inverse reinforcement learning (IRL) based on
the deep Gaussian process (deep GP) model, which is capable of learning
complicated reward structures with few demonstrations. Our model stacks
multiple latent GP layers to learn abstract representations of the state
feature space, which is linked to the demonstrations through the Maximum
Entropy learning framework. Incorporating the IRL engine into the nonlinear
latent structure renders existing deep GP inference approaches intractable. To
tackle this, we develop a non-standard variational approximation framework
which extends previous inference schemes. This allows for approximate Bayesian
treatment of the feature space and guards against overfitting. Carrying out
representation and inverse reinforcement learning simultaneously within our
model outperforms state-of-the-art approaches, as we demonstrate with
experiments on standard benchmarks ("object world","highway driving") and a new
benchmark ("binary world")
Deep Generative Models with Learnable Knowledge Constraints
The broad set of deep generative models (DGMs) has achieved remarkable
advances. However, it is often difficult to incorporate rich structured domain
knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a
principled framework to impose structured constraints on probabilistic models,
but has limited applicability to the diverse DGMs that can lack a Bayesian
formulation or even explicit density evaluation. PR also requires constraints
to be fully specified a priori, which is impractical or suboptimal for complex
knowledge with learnable uncertain parts. In this paper, we establish
mathematical correspondence between PR and reinforcement learning (RL), and,
based on the connection, expand PR to learn constraints as the extrinsic reward
in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is
flexible to adapt arbitrary constraints with the model jointly. Experiments on
human image generation and templated sentence generation show models with
learned knowledge constraints by our algorithm greatly improve over base
generative models.Comment: Neural Information Processing Systems (NeurIPS) 201
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
Domain adaptation is an important open problem in deep reinforcement learning
(RL). In many scenarios of interest data is hard to obtain, so agents may learn
a source policy in a setting where data is readily available, with the hope
that it generalises well to the target domain. We propose a new multi-stage RL
agent, DARLA (DisentAngled Representation Learning Agent), which learns to see
before learning to act. DARLA's vision is based on learning a disentangled
representation of the observed environment. Once DARLA can see, it is able to
acquire source policies that are robust to many domain shifts - even with no
access to the target domain. DARLA significantly outperforms conventional
baselines in zero-shot domain adaptation scenarios, an effect that holds across
a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms
(DQN, A3C and EC).Comment: ICML 201
InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
The goal of imitation learning is to mimic expert behavior without access to
an explicit reward signal. Expert demonstrations provided by humans, however,
often show significant variability due to latent factors that are typically not
explicitly modeled. In this paper, we propose a new algorithm that can infer
the latent structure of expert demonstrations in an unsupervised way. Our
method, built on top of Generative Adversarial Imitation Learning, can not only
imitate complex behaviors, but also learn interpretable and meaningful
representations of complex behavioral data, including visual demonstrations. In
the driving domain, we show that a model learned from human demonstrations is
able to both accurately reproduce a variety of behaviors and accurately
anticipate human actions using raw visual inputs. Compared with various
baselines, our method can better capture the latent structure underlying expert
demonstrations, often recovering semantically meaningful factors of variation
in the data.Comment: 14 pages, NIPS 201
Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer
Knowledge transfer between tasks can improve the performance of learned
models, but requires an accurate estimate of the inter-task relationships to
identify the relevant knowledge to transfer. These inter-task relationships are
typically estimated based on training data for each task, which is inefficient
in lifelong learning settings where the goal is to learn each consecutive task
rapidly from as little data as possible. To reduce this burden, we develop a
lifelong learning method based on coupled dictionary learning that utilizes
high-level task descriptions to model the inter-task relationships. We show
that using task descriptors improves the performance of the learned task
policies, providing both theoretical justification for the benefit and
empirical demonstration of the improvement across a variety of learning
problems. Given only the descriptor for a new task, the lifelong learner is
also able to accurately predict a model for the new task through zero-shot
learning using the coupled dictionary, eliminating the need to gather training
data before addressing the task.Comment: 28 page
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
Universal Planning Networks
A key challenge in complex visuomotor control is learning abstract
representations that are effective for specifying goals, planning, and
generalization. To this end, we introduce universal planning networks (UPN).
UPNs embed differentiable planning within a goal-directed policy. This planning
computation unrolls a forward model in a latent space and infers an optimal
action plan through gradient descent trajectory optimization. The
plan-by-gradient-descent process and its underlying representations are learned
end-to-end to directly optimize a supervised imitation learning objective. We
find that the representations learned are not only effective for goal-directed
visual imitation via gradient-based trajectory optimization, but can also
provide a metric for specifying goals using images. The learned representations
can be leveraged to specify distance-based rewards to reach new target states
for model-free reinforcement learning, resulting in substantially more
effective learning when solving new tasks described via image-based goals. We
were able to achieve successful transfer of visuomotor planning strategies
across robots with significantly different morphologies and actuation
capabilities.Comment: Videos available at https://sites.google.com/view/upn-public/hom
Taskonomy: Disentangling Task Transfer Learning
Do visual tasks have a relationship, or are they unrelated? For instance,
could having surface normals simplify estimating the depth of an image?
Intuition answers these questions positively, implying existence of a structure
among visual tasks. Knowing this structure has notable values; it is the
concept underlying transfer learning and provides a principled way for
identifying redundancies across tasks, e.g., to seamlessly reuse supervision
among related tasks or solve many tasks in one system without piling up the
complexity.
We proposes a fully computational approach for modeling the structure of
space of visual tasks. This is done via finding (first and higher-order)
transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D,
and semantic tasks in a latent space. The product is a computational taxonomic
map for task transfer learning. We study the consequences of this structure,
e.g. nontrivial emerged relationships, and exploit them to reduce the demand
for labeled data. For example, we show that the total number of labeled
datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3
(compared to training independently) while keeping the performance nearly the
same. We provide a set of tools for computing and probing this taxonomical
structure including a solver that users can employ to devise efficient
supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at
http://taskonomy.vision
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning
Reinforcement learning has shown promise in learning policies that can solve
complex problems. However, manually specifying a good reward function can be
difficult, especially for intricate tasks. Inverse reinforcement learning
offers a useful paradigm to learn the underlying reward function directly from
expert demonstrations. Yet in reality, the corpus of demonstrations may contain
trajectories arising from a diverse set of underlying reward functions rather
than a single one. Thus, in inverse reinforcement learning, it is useful to
consider such a decomposition. The options framework in reinforcement learning
is specifically designed to decompose policies in a similar light. We therefore
extend the options framework and propose a method to simultaneously recover
reward options in addition to policy options. We leverage adversarial methods
to learn joint reward-policy options using only observed expert states. We show
that this approach works well in both simple and complex continuous control
tasks and shows significant performance increases in one-shot transfer
learning.Comment: Accepted to the Thirthy-Second AAAI Conference On Artificial
Intelligence (AAAI), 201
- …