2,600 research outputs found
Inferring Robot Task Plans from Human Team Meetings: A Generative Modeling Approach with Logic-Based Prior
We aim to reduce the burden of programming and deploying autonomous systems
to work in concert with people in time-critical domains, such as military field
operations and disaster response. Deployment plans for these operations are
frequently negotiated on-the-fly by teams of human planners. A human operator
then translates the agreed upon plan into machine instructions for the robots.
We present an algorithm that reduces this translation burden by inferring the
final plan from a processed form of the human team's planning conversation. Our
approach combines probabilistic generative modeling with logical plan
validation used to compute a highly structured prior over possible plans. This
hybrid approach enables us to overcome the challenge of performing inference
over the large solution space with only a small amount of noisy data from the
team planning session. We validate the algorithm through human subject
experimentation and show we are able to infer a human team's final plan with
83% accuracy on average. We also describe a robot demonstration in which two
people plan and execute a first-response collaborative task with a PR2 robot.
To the best of our knowledge, this is the first work that integrates a logical
planning technique within a generative model to perform plan inference.Comment: Appears in Proceedings of the Twenty-Seventh AAAI Conference on
Artificial Intelligence (AAAI-13
Implicit learning of recursive context-free grammars
Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning
experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have
not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing
features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured
the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both
distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes
even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between
individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for
tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex
context-free structures, which model some features of natural languages. They support the relevance of artificial grammar
learning for probing mechanisms of language learning and challenge existing theories and computational models of
implicit learning
Joint Video and Text Parsing for Understanding Events and Answering Queries
We propose a framework for parsing video and text jointly for understanding
events and answering user queries. Our framework produces a parse graph that
represents the compositional structures of spatial information (objects and
scenes), temporal information (actions and events) and causal information
(causalities between events and fluents) in the video and text. The knowledge
representation of our framework is based on a spatial-temporal-causal And-Or
graph (S/T/C-AOG), which jointly models possible hierarchical compositions of
objects, scenes and events as well as their interactions and mutual contexts,
and specifies the prior probabilistic distribution of the parse graphs. We
present a probabilistic generative model for joint parsing that captures the
relations between the input video/text, their corresponding parse graphs and
the joint parse graph. Based on the probabilistic model, we propose a joint
parsing system consisting of three modules: video parsing, text parsing and
joint inference. Video parsing and text parsing produce two parse graphs from
the input video and text respectively. The joint inference module produces a
joint parse graph by performing matching, deduction and revision on the video
and text parse graphs. The proposed framework has the following objectives:
Firstly, we aim at deep semantic parsing of video and text that goes beyond the
traditional bag-of-words approaches; Secondly, we perform parsing and reasoning
across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG
representation; Thirdly, we show that deep joint parsing facilitates subsequent
applications such as generating narrative text descriptions and answering
queries in the forms of who, what, when, where and why. We empirically
evaluated our system based on comparison against ground-truth as well as
accuracy of query answering and obtained satisfactory results
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
We propose to directly map raw visual observations and text input to actions
for instruction execution. While existing approaches assume access to
structured environment representations or use a pipeline of separately trained
models, we learn a single model to jointly reason about linguistic and visual
input. We use reinforcement learning in a contextual bandit setting to train a
neural network agent. To guide the agent's exploration, we use reward shaping
with different forms of supervision. Our approach does not require intermediate
representations, planning procedures, or training different models. We evaluate
in a simulated environment, and show significant improvements over supervised
learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 201
- …