Search CORE

12 research outputs found

Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems

Author: García Jesús
Griol David
Molina José M.
Publication venue: 'Ediciones Universidad de Salamanca'
Publication date: 01/01/2013
Field of study

In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch

Author: C Kaland
Joan Borràs-Comes
M Swerts
Marc Swerts
P Prieto
P Prieto
Pilar Prieto
Á M Swerts
Á P Prieto
• Constantijn
Publication venue
Publication date: 11/04/2020
Field of study

Abstract Languages employ different strategies to mark an utterance as a polar (yes-no) question, including syntax, intonation and gestures. This study analyzes the production and perception of information-seeking questions and broad focus statements in Dutch and Catalan. These languages use intonation for marking questionhood, but Dutch also exploits syntactic variation for this purpose. A production task revealed the expected languagespecific auditory differences, but also showed that gaze and eyebrow-raising are used in this distinction. A follow-up perception experiment revealed that perceivers relied greatly on auditory information in determining whether an utterance is a question or a statement, but accuracy was further enhanced when visual information was added. Finally, the study demonstrates that the concentration of several response-mobilizing cues in a sentence is positively correlated with the perceivers' ratings of these utterances as interrogatives

CiteSeerX

The face is central to primate multicomponent signals

Author: Clark Peter R.
Kavanagh Eithne
Micheletta Jerome
Waller Bridget M.
Whitehouse Jamie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

A wealth of experimental and observational evidence suggests that faces have become increasingly important in the communication system of primates over evolutionary time and that both the static and moveable aspects of faces convey considerable information. Therefore, whenever there is a visual component to any multicomponent signal the face is potentially relevant. However, the role of the face is not always considered in primate multicomponent communication research. We review the literature and make a case for greater focus on the face going forward. We propose that the face can be overlooked for two main reasons: first, due to methodological difficulty. Examination of multicomponent signals in primates is difficult, so scientists tend to examine a limited number of signals in combination. Detailed examination of the subtle and dynamic components of facial signals is particularly hard to achieve in studies of primates. Second, due to a common assumption that the face contains “emotional” content. A priori categorisation of facial behavior as “emotional” ignores the potentially communicative and predictive information present in the face that might contribute to signals. In short, we argue that the face is central to multicomponent signals (and also many multimodal signals) and suggest future directions for investigating this phenomenon

University of Lincoln Institutional Repository

The importance of studying prosody in the comprehension of spontaneous spoken discourse

Author: Cevasco Jazmín
Marmolejo Ramos Fernando
Publication venue: 'Fundacion Universitaria Konrad Lorenz'
Publication date: 08/05/2013
Field of study

The study of the role of prosodic breaks and pitch accents in comprehension has usually focused on sentence processing, through the use of laboratory speech produced by both trained and untrained speakers. In comparison, little attention has been paid to their role in the comprehension and production of spontaneous discourse, or to the interplay between prosodic cues and pitch accents and the generation of inferences. This article describes studies which have focused on the effects of prosodic boundaries and pitch accents in sentence comprehension. Their results suggest that prosody has an early influence in the parsing of sentences as well as the processing of the information structure of a statement. It also presents a new model of spontaneous discourse comprehension that can accommodate paralinguistic factors, like pitch andprosody, and other communication channels and their relation to cognitive processes. Stemming from the model presented, future research directions are suggested as well as the importance of including spontaneous spoken discourse materials and examining the role of prosodic cues and pitch accents in the establishment of connections among spoken statements is highlighted

Publicaciones Fundación Universitaria Konrad Lorenz

Sound-Action Symbolism

Author: Vainio Lari
Vainio Martti
Publication venue
Publication date: 14/09/2021
Field of study

Recent evidence has shown linkages between actions and segmental elements of speech. For instance, close-front vowels are sound symbolically associated with the precision grip, and front vowels are associated with forward-directed limb movements. The current review article presents a variety of such sound-action effects and proposes that they compose a category of sound symbolism that is based on grounding a conceptual knowledge of a referent in articulatory and manual action representations. In addition, the article proposes that even some widely known sound symbolism phenomena such as the sound-magnitude symbolism can be partially based on similar sensorimotor grounding. It is also discussed that meaning of suprasegmental speech elements in many instances is similarly grounded in body actions. Sound symbolism, prosody, and body gestures might originate from the same embodied mechanisms that enable a vivid and iconic expression of a meaning of a referent to the recipient.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

Speaker Eyebrow Raises in the Transition Space Pursuing a Shared Understanding

Author: Clift Rebecca
Rossi Giovanni
Publication venue: University of Copenhagen
Publication date: 08/02/2024
Field of study

In this article, we examine a distinctive multimodal phenomenon: a participant, gazing at a recipient, raising both eyebrows upon the completion of their own turn at talk – that is, in the transition space between turns at talk (Sacks, Schegloff and Jefferson, 1974). We find that speakers deploy eyebrow raises in two related but distinct practices. In the first, the eyebrows are raised and held as the speaker presses the recipient to respond to a disaffiliative action (e.g. a challenge); in the second, the eyebrows are raised and quickly released in a so-called eyebrow flash as the speaker invites a response to an affiliative action (e.g. a joke). The former practice is essentially combative, the latter collusive. Although the two practices differ in their durational properties and in the kinds of actions that they serve, they also have something in common: they invoke a shared knowledge or understanding between speaker and recipient

University of Essex Research Repository

Eyebrow movements as signals of communicative problems in human face-to-face interaction

Author: Holler J.
Hömke P.
Levinson S.
Publication venue: 'Center for Open Science'
Publication date: 24/06/2022
Field of study

Repair is a core building block of human communication, allowing us to address problems of understanding in conversation. Past research has uncovered the basic mechanisms by which interactants signal and solve such problems. However, the focus has been on verbal interaction, neglecting the fact that human communication is inherently multimodal. Here, we focus on a visual signal particularly prevalent in signaling problems of understanding: eyebrow frowns and raises. We present a corpus study showing that verbal repair initiations with eyebrow furrows are more likely to be responded to with clarifications as repair solutions, repair initiations that were preceded by eyebrow actions as preliminaries get repaired faster (around 230 ms), and eyebrow furrows alone can be sufficient to occasion clarification. We also present an experiment based on virtual reality technology, revealing that addressees’ eyebrow frowns have a striking effect on speakers’ speech, leading them to produce answers to questions several seconds longer than when not perceiving addressee eyebrow furrows. Together, the findings demonstrate that eyebrow movements play a communicative role in initiating repair in spoken language rather than being merely epiphenomenal. Thus, they should be considered as core coordination devices in human conversational interaction

MPG.PuRe

Tailored perception: individuals’ speech and music perception strategies fit their perceptual abilities

Author: Dick Frederic
Holt Lori
Jasmin Kyle
Tierney Adam
Publication venue: 'American Psychological Association (APA)'
Publication date: 07/10/2019
Field of study

Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, e.g. pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual ‘strategies’) but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N=11) categorize linguistic stimuli similarly to controls (N=11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues—even when pitch differences in the stimuli were large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N=16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g. vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups

Birkbeck Institutional Research Online

Recommended from our members

What is functional communication? A theoretical framework for real-world communication applied to aphasia rehabilitation

Author: Doedens W. J.
Meteyard L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Aphasia is an impairment of language caused by acquired brain damage such as stroke or traumatic brain injury, that affects a person’s ability to communicate effectively. The aim of rehabilitation in aphasia is to improve everyday communication, improving an individual’s ability to function in their day-to-day life. For that reason, a thorough understanding of naturalistic communication and its underlying mechanisms is imperative. The field of aphasiology currently lacks an agreed, comprehensive, theoretically founded definition of communication. Instead, multiple disparate interpretations of functional communication are used. We argue that this makes it nearly impossible to validly and reliably assess a person’s communicative performance, to target this behaviour through therapy, and to measure improvements post-therapy. In this article we propose a structured, theoretical approach to defining the concept of functional communication. We argue for a view of communication as “situated language use”, borrowed from empirical psycholinguistic studies with non-brain damaged adults. This framework defines language use as: (1) interactive, (2) multimodal, and (3) contextual. Existing research on each component of the framework from non-brain damaged adults and people with aphasia is reviewed. The consequences of adopting this approach to assessment and therapy for aphasia rehabilitation are discussed. The aim of this article is to encourage a more systematic, comprehensive approach to the study and treatment of situated language use in aphasia

Central Archive at the University of Reading

PubMed Central

Radboud Repository

Processing referential expressions in German Sign Language

Author: Wienholz Anne
Publication venue
Publication date: 12/03/2018
Field of study

Georg-August-University Göttingen