629 research outputs found
Training with Exploration Improves a Greedy Stack-LSTM Parser
We adapt the greedy Stack-LSTM dependency parser of Dyer et al. (2015) to
support a training-with-exploration procedure using dynamic oracles(Goldberg
and Nivre, 2013) instead of cross-entropy minimization. This form of training,
which accounts for model predictions at training time rather than assuming an
error-free action history, improves parsing accuracies for both English and
Chinese, obtaining very strong results for both languages. We discuss some
modifications needed in order to get training with exploration to work well for
a probabilistic neural-network.Comment: In proceedings of EMNLP 201
An Imitation Game for Learning Semantic Parsers from User Interaction
Despite the widely successful applications, bootstrapping and fine-tuning
semantic parsers are still a tedious process with challenges such as costly
data annotation and privacy risks. In this paper, we suggest an alternative,
human-in-the-loop methodology for learning semantic parsers directly from
users. A semantic parser should be introspective of its uncertainties and
prompt for user demonstration when uncertain. In doing so it also gets to
imitate the user behavior and continue improving itself autonomously with the
hope that eventually it may become as good as the user in interpreting their
questions. To combat the sparsity of demonstration, we propose a novel
annotation-efficient imitation learning algorithm, which iteratively collects
new datasets by mixing demonstrated states and confident predictions and
re-trains the semantic parser in a Dataset Aggregation fashion (Ross et al.,
2011). We provide a theoretical analysis of its cost bound and also empirically
demonstrate its promising performance on the text-to-SQL problem. Code will be
available at https://github.com/sunlab-osu/MISP.Comment: Accepted to EMNLP 2020. 20 pages, 6 figure
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback
Counterfactual learning from human bandit feedback describes a scenario where
user feedback on the quality of outputs of a historic system is logged and used
to improve a target system. We show how to apply this learning framework to
neural semantic parsing. From a machine learning perspective, the key challenge
lies in a proper reweighting of the estimator so as to avoid known degeneracies
in counterfactual learning, while still being applicable to stochastic gradient
optimization. To conduct experiments with human users, we devise an easy-to-use
interface to collect human feedback on semantic parses. Our work is the first
to show that semantic parsers can be improved significantly by counterfactual
learning from logged human feedback data.Comment: Conference of the Association for Computational Linguistics (ACL),
2018, Melbourne, Australi
Learning Executable Semantic Parsers for Natural Language Understanding
For building question answering systems and natural language interfaces,
semantic parsing has emerged as an important and powerful paradigm. Semantic
parsers map natural language into logical forms, the classic representation for
many important linguistic phenomena. The modern twist is that we are interested
in learning semantic parsers from data, which introduces a new layer of
statistical and computational issues. This article lays out the components of a
statistical semantic parser, highlighting the key challenges. We will see that
semantic parsing is a rich fusion of the logical and the statistical world, and
that this fusion will play an integral role in the future of natural language
understanding systems.Comment: Accepted to the Communications of the AC
Learning to Understand Goal Specifications by Modelling Reward
Recent work has shown that deep reinforcement-learning agents can learn to
follow language-like instructions from infrequent environment rewards. However,
this places on environment designers the onus of designing language-conditional
reward functions which may not be easily or tractably implemented as the
complexity of the environment and the language scales. To overcome this
limitation, we present a framework within which instruction-conditional RL
agents are trained using rewards obtained not from the environment, but from
reward models which are jointly trained from expert examples. As reward models
improve, they learn to accurately reward agents for completing tasks for
environment configurations---and for instructions---not present amongst the
expert data. This framework effectively separates the representation of what
instructions require from how they can be executed. In a simple grid world, it
enables an agent to learn a range of commands requiring interaction with blocks
and understanding of spatial relations and underspecified abstract
arrangements. We further show the method allows our agent to adapt to changes
in the environment without requiring new expert examples.Comment: 19 pages, 9 figure
Reinforcement Learning
Reinforcement learning (RL) is a general framework for adaptive control,
which has proven to be efficient in many domains, e.g., board games, video
games or autonomous vehicles. In such problems, an agent faces a sequential
decision-making problem where, at every time step, it observes its state,
performs an action, receives a reward and moves to a new state. An RL agent
learns by trial and error a good policy (or controller) based on observations
and numeric reward feedback on the previously performed action. In this
chapter, we present the basic framework of RL and recall the two main families
of approaches that have been developed to learn a good policy. The first one,
which is value-based, consists in estimating the value of an optimal policy,
value from which a policy can be recovered, while the other, called policy
search, directly works in a policy space. Actor-critic methods can be seen as a
policy search technique where the policy value that is learned guides the
policy improvement. Besides, we give an overview of some extensions of the
standard RL framework, notably when risk-averse behavior needs to be taken into
account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research",
Springe
Extracting Action Sequences from Texts Based on Deep Reinforcement Learning
Extracting action sequences from natural language texts is challenging, as it
requires commonsense inferences based on world knowledge. Although there has
been work on extracting action scripts, instructions, navigation actions, etc.,
they require that either the set of candidate actions be provided in advance,
or that action descriptions are restricted to a specific form, e.g.,
description templates. In this paper, we aim to extract action sequences from
texts in free natural language, i.e., without any restricted templates,
provided the candidate set of actions is unknown. We propose to extract action
sequences from texts based on the deep reinforcement learning framework.
Specifically, we view "selecting" or "eliminating" words from texts as
"actions", and the texts associated with actions as "states". We then build
Q-networks to learn the policy of extracting actions and extract plans from the
labeled texts. We demonstrate the effectiveness of our approach on several
datasets with comparison to state-of-the-art approaches, including online
experiments interacting with humans.Comment: 7pages, 6 figure
Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function of
observing, retaining and possibly replicating or imitating the behaviour of
another agent. It is a core mechanism appearing in various instances of social
learning and has been found to be employed in several intelligent species,
including humans. In this paper, we investigate to what extent the explicit
modelling of other agents is necessary to achieve observational learning
through machine learning. Especially, we argue that observational learning can
emerge from pure Reinforcement Learning (RL), potentially coupled with memory.
Through simple scenarios, we demonstrate that an RL agent can leverage the
information provided by the observations of an other agent performing a task in
a shared environment. The other agent is only observed through the effect of
its actions on the environment and never explicitly modeled. Two key aspects
are borrowed from observational learning: i) the observer behaviour needs to
change as a result of viewing a 'teacher' (another agent) and ii) the observer
needs to be motivated somehow to engage in making use of the other agent's
behaviour. The later is naturally modeled by RL, by correlating the learning
agent's reward with the teacher agent's behaviour
VRGym: A Virtual Testbed for Physical and Interactive AI
We propose VRGym, a virtual reality testbed for realistic human-robot
interaction. Different from existing toolkits and virtual reality environments,
the VRGym emphasizes on building and training both physical and interactive
agents for robotics, machine learning, and cognitive science. VRGym leverages
mechanisms that can generate diverse 3D scenes with high realism through
physics-based simulation. We demonstrate that VRGym is able to (i) collect
human interactions and fine manipulations, (ii) accommodate various robots with
a ROS bridge, (iii) support experiments for human-robot interaction, and (iv)
provide toolkits for training the state-of-the-art machine learning algorithms.
We hope VRGym can help to advance general-purpose robotics and machine learning
agents, as well as assisting human studies in the field of cognitive science
Model-Free Imitation Learning with Policy Optimization
In imitation learning, an agent learns how to behave in an environment with
an unknown cost function by mimicking expert demonstrations. Existing imitation
learning algorithms typically involve solving a sequence of planning or
reinforcement learning problems. Such algorithms are therefore not directly
applicable to large, high-dimensional environments, and their performance can
significantly degrade if the planning problems are not solved to optimality.
Under the apprenticeship learning formalism, we develop alternative model-free
algorithms for finding a parameterized stochastic policy that performs at least
as well as an expert policy on an unknown cost function, based on sample
trajectories from the expert. Our approach, based on policy gradients, scales
to large continuous environments with guaranteed convergence to local minima.Comment: In Proceedings of the 33rd International Conference on Machine
Learning, 201
- …