8,141 research outputs found
Introduction: Multimodal interaction
That human social interaction involves the intertwined cooperation of different modalities is uncontroversial. Researchers in several allied ïŹelds have, however, only recently begun to document the precise ways in which talk, gesture, gaze, and aspects of the material surround are brought together to form coherent courses of action. The papers in this volume are attempts to develop this line of inquiry. Although the authors draw on a range of analytic, theoretical, and methodological traditions (conversation analysis, ethnography, distributed cognition, and workplace studies), all are concerned to explore and illuminate the inherently multimodal character of social interaction. Recent studies, including those collected in this volume, suggest that different modalities work together not only to elaborate the semantic content of talk but also to constitute coherent courses of action. In this introduction we present evidence for this position. We begin by reviewing some select literature focusing primarily on communicative functions and interactive organizations of speciïŹc modalities before turning to consider the integration of distinct modalities in interaction
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Planning and scheduling research at NASA Ames Research Center
Planning and scheduling is the area of artificial intelligence research that focuses on the determination of a series of operations to achieve some set of (possibly) interacting goals and the placement of those operations in a timeline that allows them to be accomplished given available resources. Work in this area at the NASA Ames Research Center ranging from basic research in constrain-based reasoning and machine learning, to the development of efficient scheduling tools, to the application of such tools to complex agency problems is described
Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds
We propose a computational model of situated language comprehension based on
the Indexical Hypothesis that generates meaning representations by translating
amodal linguistic symbols to modal representations of beliefs, knowledge, and
experience external to the linguistic system. This Indexical Model incorporates
multiple information sources, including perceptions, domain knowledge, and
short-term and long-term experiences during comprehension. We show that
exploiting diverse information sources can alleviate ambiguities that arise
from contextual use of underspecific referring expressions and unexpressed
argument alternations of verbs. The model is being used to support linguistic
interactions in Rosie, an agent implemented in Soar that learns from
instruction.Comment: Advances in Cognitive Systems 3 (2014
The Mechanics of Embodiment: A Dialogue on Embodiment and Computational Modeling
Embodied theories are increasingly challenging traditional views of cognition by arguing that conceptual representations that constitute our knowledge are grounded in sensory and motor experiences, and processed at this sensorimotor level, rather than being represented and processed abstractly in an amodal conceptual system. Given the established empirical foundation, and the relatively underspecified theories to date, many researchers are extremely interested in embodied cognition but are clamouring for more mechanistic implementations. What is needed at this stage is a push toward explicit computational models that implement sensory-motor grounding as intrinsic to cognitive processes. In this article, six authors from varying backgrounds and approaches address issues concerning the construction of embodied computational models, and illustrate what they view as the critical current and next steps toward mechanistic theories of embodiment. The first part has the form of a dialogue between two fictional characters: Ernest, the ïżœexperimenterïżœ, and Mary, the ïżœcomputational modellerïżœ. The dialogue consists of an interactive sequence of questions, requests for clarification, challenges, and (tentative) answers, and touches the most important aspects of grounded theories that should inform computational modeling and, conversely, the impact that computational modeling could have on embodied theories. The second part of the article discusses the most important open challenges for embodied computational modelling
Introduction: temporality in interaction
The authors establish a phenomenological perspective on the temporal constitution of experience and action. Retrospection and projection (i.e. backward as well as forward orientation of everyday action), sequentiality and the sequential organization of activities as well as simultaneity (i.e. participantsâ simultaneous coordination) are introduced as key concepts of a temporalized approach to interaction. These concepts are used to capture that every action is produced as an inter-linked step in the succession of adjacent actions, being sensitive to the precise moment where it is produced. The adoption of a holistic, multimodal and praxeological perspective additionally shows that action in interaction is organized according to several temporal orders simultaneously in operation. Each multimodal resource used in interaction has its own temporal properties
- âŠ