12 research outputs found

    Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems

    Get PDF
    In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad

    Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch

    Get PDF
    Abstract Languages employ different strategies to mark an utterance as a polar (yes-no) question, including syntax, intonation and gestures. This study analyzes the production and perception of information-seeking questions and broad focus statements in Dutch and Catalan. These languages use intonation for marking questionhood, but Dutch also exploits syntactic variation for this purpose. A production task revealed the expected languagespecific auditory differences, but also showed that gaze and eyebrow-raising are used in this distinction. A follow-up perception experiment revealed that perceivers relied greatly on auditory information in determining whether an utterance is a question or a statement, but accuracy was further enhanced when visual information was added. Finally, the study demonstrates that the concentration of several response-mobilizing cues in a sentence is positively correlated with the perceivers' ratings of these utterances as interrogatives

    The face is central to primate multicomponent signals

    Get PDF
    A wealth of experimental and observational evidence suggests that faces have become increasingly important in the communication system of primates over evolutionary time and that both the static and moveable aspects of faces convey considerable information. Therefore, whenever there is a visual component to any multicomponent signal the face is potentially relevant. However, the role of the face is not always considered in primate multicomponent communication research. We review the literature and make a case for greater focus on the face going forward. We propose that the face can be overlooked for two main reasons: first, due to methodological difficulty. Examination of multicomponent signals in primates is difficult, so scientists tend to examine a limited number of signals in combination. Detailed examination of the subtle and dynamic components of facial signals is particularly hard to achieve in studies of primates. Second, due to a common assumption that the face contains “emotional” content. A priori categorisation of facial behavior as “emotional” ignores the potentially communicative and predictive information present in the face that might contribute to signals. In short, we argue that the face is central to multicomponent signals (and also many multimodal signals) and suggest future directions for investigating this phenomenon

    The importance of studying prosody in the comprehension of spontaneous spoken discourse

    Get PDF
    The study of the role of prosodic breaks and pitch accents in comprehension has usually focused on sentence processing, through the use of laboratory speech produced by both trained and untrained speakers. In comparison, little attention has been paid to their role in the comprehension and production of spontaneous discourse, or to the interplay between prosodic cues and pitch accents and the generation of inferences. This article describes studies which have focused on the effects of prosodic boundaries and pitch accents in sentence comprehension. Their results suggest that prosody has an early influence in the parsing of sentences as well as the processing of the information structure of a statement. It also presents a new model of spontaneous discourse comprehension that can accommodate paralinguistic factors, like pitch andprosody, and other communication channels and their relation to cognitive processes. Stemming from the model presented, future research directions are suggested as well as the importance of including spontaneous spoken discourse materials and examining the role of prosodic cues and pitch accents in the establishment of connections among spoken statements is highlighted

    Sound-Action Symbolism

    Get PDF
    Recent evidence has shown linkages between actions and segmental elements of speech. For instance, close-front vowels are sound symbolically associated with the precision grip, and front vowels are associated with forward-directed limb movements. The current review article presents a variety of such sound-action effects and proposes that they compose a category of sound symbolism that is based on grounding a conceptual knowledge of a referent in articulatory and manual action representations. In addition, the article proposes that even some widely known sound symbolism phenomena such as the sound-magnitude symbolism can be partially based on similar sensorimotor grounding. It is also discussed that meaning of suprasegmental speech elements in many instances is similarly grounded in body actions. Sound symbolism, prosody, and body gestures might originate from the same embodied mechanisms that enable a vivid and iconic expression of a meaning of a referent to the recipient.Peer reviewe

    Speaker Eyebrow Raises in the Transition Space Pursuing a Shared Understanding

    Get PDF
    In this article, we examine a distinctive multimodal phenomenon: a participant, gazing at a recipient, raising both eyebrows upon the completion of their own turn at talk – that is, in the transition space between turns at talk (Sacks, Schegloff and Jefferson, 1974). We find that speakers deploy eyebrow raises in two related but distinct practices. In the first, the eyebrows are raised and held as the speaker presses the recipient to respond to a disaffiliative action (e.g. a challenge); in the second, the eyebrows are raised and quickly released in a so-called eyebrow flash as the speaker invites a response to an affiliative action (e.g. a joke). The former practice is essentially combative, the latter collusive. Although the two practices differ in their durational properties and in the kinds of actions that they serve, they also have something in common: they invoke a shared knowledge or understanding between speaker and recipient

    Eyebrow movements as signals of communicative problems in human face-to-face interaction

    Get PDF
    Repair is a core building block of human communication, allowing us to address problems of understanding in conversation. Past research has uncovered the basic mechanisms by which interactants signal and solve such problems. However, the focus has been on verbal interaction, neglecting the fact that human communication is inherently multimodal. Here, we focus on a visual signal particularly prevalent in signaling problems of understanding: eyebrow frowns and raises. We present a corpus study showing that verbal repair initiations with eyebrow furrows are more likely to be responded to with clarifications as repair solutions, repair initiations that were preceded by eyebrow actions as preliminaries get repaired faster (around 230 ms), and eyebrow furrows alone can be sufficient to occasion clarification. We also present an experiment based on virtual reality technology, revealing that addressees’ eyebrow frowns have a striking effect on speakers’ speech, leading them to produce answers to questions several seconds longer than when not perceiving addressee eyebrow furrows. Together, the findings demonstrate that eyebrow movements play a communicative role in initiating repair in spoken language rather than being merely epiphenomenal. Thus, they should be considered as core coordination devices in human conversational interaction

    Tailored perception: individuals’ speech and music perception strategies fit their perceptual abilities

    Get PDF
    Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, e.g. pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual ‘strategies’) but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N=11) categorize linguistic stimuli similarly to controls (N=11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues—even when pitch differences in the stimuli were large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N=16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g. vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups
    corecore