15,785 research outputs found
Parsing Coordination for Spoken Language Understanding
Typical spoken language understanding systems provide narrow semantic parses
using a domain-specific ontology. The parses contain intents and slots that are
directly consumed by downstream domain applications. In this work we discuss
expanding such systems to handle compound entities and intents by introducing a
domain-agnostic shallow parser that handles linguistic coordination. We show
that our model for parsing coordination learns domain-independent and
slot-independent features and is able to segment conjunct boundaries of many
different phrasal categories. We also show that using adversarial training can
be effective for improving generalization across different slot types for
coordination parsing.Comment: The paper was published in SLT 2018 conferenc
Human-Robot Collaboration: From Psychology to Social Robotics
With the advances in robotic technology, research in human-robot
collaboration (HRC) has gained in importance. For robots to interact with
humans autonomously they need active decision making that takes human partners
into account. However, state-of-the-art research in HRC does often assume a
leader-follower division, in which one agent leads the interaction. We believe
that this is caused by the lack of a reliable representation of the human and
the environment to allow autonomous decision making. This problem can be
overcome by an embodied approach to HRC which is inspired by psychological
studies of human-human interaction (HHI). In this survey, we review
neuroscientific and psychological findings of the sensorimotor patterns that
govern HHI and view them in a robotics context. Additionally, we study the
advances made by the robotic community into the direction of embodied HRC. We
focus on the mechanisms that are required for active, physical human-robot
collaboration. Finally, we discuss the similarities and differences in the two
fields of study which pinpoint directions of future research
Microplanning with Communicative Intentions: The SPUD System
The process of microplanning encompasses a range of problems in Natural
Language Generation (NLG), such as referring expression generation, lexical
choice, and aggregation, problems in which a generator must bridge underlying
domain-specific representations and general linguistic representations. In this
paper, we describe a uniform approach to microplanning based on declarative
representations of a generator's communicative intent. These representations
describe the results of NLG: communicative intent associates the concrete
linguistic structure planned by the generator with inferences that show how the
meaning of that structure communicates needed information about some
application domain in the current discourse context. Our approach, implemented
in the SPUD (sentence planning using description) microplanner, uses the
lexicalized tree-adjoining grammar formalism (LTAG) to connect structure to
meaning and uses modal logic programming to connect meaning to context. At the
same time, communicative intent representations provide a resource for the
process of NLG. Using representations of communicative intent, a generator can
augment the syntax, semantics and pragmatics of an incomplete sentence
simultaneously, and can assess its progress on the various problems of
microplanning incrementally. The declarative formulation of communicative
intent translates into a well-defined methodology for designing grammatical and
conceptual resources which the generator can use to achieve desired
microplanning behavior in a specified domain
Conversation as Action Under Uncertainty
Conversations abound with uncetainties of various kinds. Treating
conversation as inference and decision making under uncertainty, we propose a
task independent, multimodal architecture for supporting robust continuous
spoken dialog called Quartet. We introduce four interdependent levels of
analysis, and describe representations, inference procedures, and decision
strategies for managing uncertainties within and between the levels. We
highlight the approach by reviewing interactions between a user and two spoken
dialog systems developed using the Quartet architecture: Prsenter, a prototype
system for navigating Microsoft PowerPoint presentations, and the Bayesian
Receptionist, a prototype system for dealing with tasks typically handled by
front desk receptionists at the Microsoft corporate campus.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (UAI2000
Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction
The human language is one of the most natural interfaces for humans to
interact with robots. This paper presents a robot system that retrieves
everyday objects with unconstrained natural language descriptions. A core issue
for the system is semantic and spatial grounding, which is to infer objects and
their spatial relationships from images and natural language expressions. We
introduce a two-stage neural-network grounding pipeline that maps natural
language referring expressions directly to objects in the images. The first
stage uses visual descriptions in the referring expressions to generate a
candidate set of relevant objects. The second stage examines all pairwise
relationships between the candidates and predicts the most likely referred
object according to the spatial descriptions in the referring expressions. A
key feature of our system is that by leveraging a large dataset of images
labeled with text descriptions, it allows unrestricted object types and natural
language referring expressions. Preliminary results indicate that our system
outperforms a near state-of-the-art object comprehension system on standard
benchmark datasets. We also present a robot system that follows voice commands
to pick and place previously unseen objects.Comment: 8 pages, 4 figures, Accepted at RSS 2017 Workshop on Spatial-Semantic
Representations in Robotic
Situational Grounding within Multimodal Simulations
In this paper, we argue that simulation platforms enable a novel type of
embodied spatial reasoning, one facilitated by a formal model of object and
event semantics that renders the continuous quantitative search space of an
open-world, real-time environment tractable. We provide examples for how a
semantically-informed AI system can exploit the precise, numerical information
provided by a game engine to perform qualitative reasoning about objects and
events, facilitate learning novel concepts from data, and communicate with a
human to improve its models and demonstrate its understanding. We argue that
simulation environments, and game engines in particular, bring together many
different notions of "simulation" and many different technologies to provide a
highly-effective platform for developing both AI systems and tools to
experiment in both machine and human intelligence.Comment: AAAI-19 Workshop on Games and Simulations for Artificial Intelligenc
The Role of Artificial Intelligence Technologies in Crisis Response
Crisis response poses many of the most difficult information technology in
crisis management. It requires information and communication-intensive efforts,
utilized for reducing uncertainty, calculating and comparing costs and
benefits, and managing resources in a fashion beyond those regularly available
to handle routine problems. In this paper, we explore the benefits of
artificial intelligence technologies in crisis response. This paper discusses
the role of artificial intelligence technologies; namely, robotics, ontology
and semantic web, and multi-agent systems in crisis response.Comment: 6 pages, 5 figures, 1 table, accepted for MENDEL 2008 14th
International Conference on Soft Computing, June 18-20, Brno, Czech Republi
Evaluating Personal Assistants on Mobile devices
The iPhone was introduced only a decade ago in 2007 but has fundamentally
changed the way we interact with online information. Mobile devices differ
radically from classic command-based and point-and-click user interfaces, now
allowing for gesture-based interaction using fine-grained touch and swipe
signals. Due to the rapid growth in the use of voice-controlled intelligent
personal assistants on mobile devices, such as Microsoft's Cortana, Google Now,
and Apple's Siri, mobile devices have become personal, allowing us to be online
all the time, and assist us in any task, both in work and in our daily lives,
making context a crucial factor to consider.
Mobile usage is now exceeding desktop usage, and is still growing at a rapid
rate, yet our main ways of training and evaluating personal assistants are
still based on (and framed in) classical desktop interactions, focusing on
explicit queries, clicks, and dwell time spent. However, modern user
interaction with mobile devices is radically different due to touch screens
with a gesture- and voice-based control and the varying context of use, e.g.,
in a car, by bike, often invalidating the assumptions underlying today's user
satisfaction evaluation.
There is an urgent need to understand voice- and gesture-based interaction,
taking all interaction signals and context into account in appropriate ways. We
propose a research agenda for developing methods to evaluate and improve
context-aware user satisfaction with mobile interactions using gesture-based
signals at scale
Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders
Two novel deep hybrid architectures, the Deep Hybrid Boltzmann Machine and
the Deep Hybrid Denoising Auto-encoder, are proposed for handling
semi-supervised learning problems. The models combine experts that model
relevant distributions at different levels of abstraction to improve overall
predictive performance on discriminative tasks. Theoretical motivations and
algorithms for joint learning for each are presented. We apply the new models
to the domain of data-streams in work towards life-long learning. The proposed
architectures show improved performance compared to a pseudo-labeled, drop-out
rectifier network
Theory of mind and decision science: Towards a typology of tasks and computational models
The ability to form a Theory of Mind (ToM), i.e., to theorize about others’ mental states to explain and predict behavior in relation to attributed intentional states, constitutes a hallmark of human cognition. These abilities are multi-faceted and include a variety of different cognitive sub-functions. Here, we focus on decision processes in social contexts and review a number of experimental and computational modeling approaches in this field. We provide an overview of experimental accounts and formal computational models with respect to two dimensions: interactivity and uncertainty. Thereby, we aim at capturing the nuances of ToM functions in the context of social decision processes. We suggest there to be an increase in ToM engagement and multiplexing as social cognitive decision-making tasks become more interactive and uncertain. We propose that representing others as intentional and goal directed agents who perform consequential actions is elicited only at the edges of these two dimensions. Further, we argue that computational models of valuation and beliefs follow these dimensions to best allow researchers to effectively model sophisticated ToM-processes. Finally, we relate this typology to neuroimaging findings in neurotypical (NT) humans, studies of persons with autism spectrum (AS), and studies of nonhuman primates
- …