2,100 research outputs found
Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives
Disfluencies (i.e. interruptions in the regular flow of speech), are
ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that
occur the most frequently compared to other kinds of disfluencies. Yet, to the
best of our knowledge, there isn't a resource that brings together the research
perspectives influencing Spoken Language Understanding (SLU) on these speech
events. This aim of this article is to synthesise a breadth of perspectives in
a holistic way; i.e. from considering underlying (psycho)linguistic theory, to
their annotation and consideration in Automatic Speech Recognition (ASR) and
SLU systems, to lastly, their study from a generation standpoint. This article
aims to present the perspectives in an approachable way to the SLU and
Conversational AI community, and discuss moving forward, what we believe are
the trends and challenges in each area.Comment: To appear in TAL Journa
Conversing with a devil’s advocate: Interpersonal coordination in deception and disagreement
abstract: This study investigates the presence of dynamical patterns of interpersonal coordination in extended deceptive conversations across multimodal channels of behavior. Using a novel "devil’s advocate" paradigm, we experimentally elicited deception and truth across topics in which conversational partners either agreed or disagreed, and where one partner was surreptitiously asked to argue an opinion opposite of what he or she really believed. We focus on interpersonal coordination as an emergent behavioral signal that captures interdependencies between conversational partners, both as the coupling of head movements over the span of milliseconds, measured via a windowed lagged cross correlation (WLCC) technique, and more global temporal dependencies across speech rate, using cross recurrence quantification analysis (CRQA). Moreover, we considered how interpersonal coordination might be shaped by strategic, adaptive conversational goals associated with deception. We found that deceptive conversations displayed more structured speech rate and higher head movement coordination, the latter with a peak in deceptive disagreement conversations. Together the results allow us to posit an adaptive account, whereby interpersonal coordination is not beholden to any single functional explanation, but can strategically adapt to diverse conversational demands.The article is published at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.017814
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Speech timing cues reveal deceptive speech in social deduction board games
The faculty of language allows humans to state falsehoods in their choice of words. However, while what is said might easily uphold a lie, how it is said may reveal deception. Hence, some features of the voice that are difficult for liars to control may keep speech mostly, if not always, honest. Previous research has identified that speech timing and voice pitch cues can predict the truthfulness of speech, but this evidence has come primarily from laboratory experiments, which sacrifice ecological validity for experimental control. We obtained ecologically valid recordings of deceptive speech while observing natural utterances from players of a popular social deduction board game, in which players are assigned roles that either induce honest or dishonest interactions. When speakers chose to lie, they were prone to longer and more frequent pauses in their speech. This finding is in line with theoretical predictions that lying is more cognitively demanding. However, lying was not reliably associated with vocal pitch. This contradicts predictions that increased physiological arousal from lying might increase muscular tension in the larynx, but is consistent with human specialisations that grant Homo sapiens sapiens an unusual degree of control over the voice relative to other primates. The present study demonstrates the utility of social deduction board games as a means of making naturalistic observations of human behaviour from semi-structured social interactions
Artificial Intelligence for Suicide Assessment using Audiovisual Cues: A Review
Death by suicide is the seventh leading death cause worldwide. The recent
advancement in Artificial Intelligence (AI), specifically AI applications in
image and voice processing, has created a promising opportunity to
revolutionize suicide risk assessment. Subsequently, we have witnessed
fast-growing literature of research that applies AI to extract audiovisual
non-verbal cues for mental illness assessment. However, the majority of the
recent works focus on depression, despite the evident difference between
depression symptoms and suicidal behavior and non-verbal cues. This paper
reviews recent works that study suicide ideation and suicide behavior detection
through audiovisual feature analysis, mainly suicidal voice/speech acoustic
features analysis and suicidal visual cues. Automatic suicide assessment is a
promising research direction that is still in the early stages. Accordingly,
there is a lack of large datasets that can be used to train machine learning
and deep learning models proven to be effective in other, similar tasks.Comment: Manuscript submitted to Arificial Intelligence Reviews (2022
Continuous Interaction with a Virtual Human
Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access
- …