1,697 research outputs found

    PRESENCE: A human-inspired architecture for speech-based human-machine interaction

    No full text
    Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially and performance appears to be asymptotic to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. This paper addresses these issues and presents a novel architecture for speech-based human-machine interaction inspired by recent findings in the neurobiology of living systems. Called PRESENCE-"PREdictive SENsorimotor Control and Emulation" - this new architecture blurs the distinction between the core components of a traditional spoken language dialogue system and instead focuses on a recursive hierarchical feedback control structure. Cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of interaction in which the system has in mind the needs and intentions of a user and a user has in mind the needs and intentions of the system

    Gender Ambiguity in Voice-Based Assistants: Gender Perception and Influences of Context

    Get PDF
    Recently emerging synthetic acoustically gender-ambiguous voices could contribute to dissolving the still prevailing genderism. Yet, are we indeed perceiving these voices as “unassignable”? Or are we trying to assimilate them into existing genders? To investigate the perceived ambiguity, we conducted an explorative 3 (male, female, ambiguous voice) × 3 (male, female, ambiguous topic) experiment. We found that, although participants perceived the gender-ambiguous voice as ambiguous, they used a profoundly wide range of the scale, indicating tendencies toward a gender. We uncovered a mild dissolve of gender roles. Neither the listener’s gender nor the personal gender stereotypes impacted the perception. However, the perceived topic gender indicated the perceived voice gender, and younger people tended to perceive a more male-like gender

    Do (and say) as I say: Linguistic adaptation in human-computer dialogs

    Get PDF
    © Theodora Koulouri, Stanislao Lauria, and Robert D. Macredie. This article has been made available through the Brunel Open Access Publishing Fund.There is strong research evidence showing that people naturally align to each other’s vocabulary, sentence structure, and acoustic features in dialog, yet little is known about how the alignment mechanism operates in the interaction between users and computer systems let alone how it may be exploited to improve the efficiency of the interaction. This article provides an account of lexical alignment in human–computer dialogs, based on empirical data collected in a simulated human–computer interaction scenario. The results indicate that alignment is present, resulting in the gradual reduction and stabilization of the vocabulary-in-use, and that it is also reciprocal. Further, the results suggest that when system and user errors occur, the development of alignment is temporarily disrupted and users tend to introduce novel words to the dialog. The results also indicate that alignment in human–computer interaction may have a strong strategic component and is used as a resource to compensate for less optimal (visually impoverished) interaction conditions. Moreover, lower alignment is associated with less successful interaction, as measured by user perceptions. The article distills the results of the study into design recommendations for human–computer dialog systems and uses them to outline a model of dialog management that supports and exploits alignment through mechanisms for in-use adaptation of the system’s grammar and lexicon

    Human-Machine Communication: Complete Volume 5. Gender and Human-Machine Communication

    Get PDF
    This is the complete volume of HMC Volume

    Model Based Teleoperation to Eliminate Feedback Delay NSF Grant BCS89-01352 - 3rd Report

    Get PDF
    We are conducting research in the area of teleoperation with feedback delay. Significant delays occur when performing space teleoperation from the earth as well as in subsea teleoperation where the operator is typically on a surface vessel and communication is via acoustic links. These delays make teleoperation extremely difficult and lead to very low operator productivity. We have combined computer graphics with manipulator programming to provide a solution to the delay problem. A teleoperator master arm is interfaced to a graphical simulation of the remote environment. Synthetic fixtures are used to guide the operators motions and to provide kinesthetic feedback. The operator\u27s actions are monitored and used to generate symbolic motion commands for transmission to, and execution by, the remote slave robot. While much of a task proceeds error free, when an error does occur, the slave system transmits data back to the master environment where the operator can then experience the motion of the slave manipulator in actual task execution. We have also provided for the use of tools such as an impact wrench and a winch at the slave site. In all cases the tools are unencumbered by sensors; the slave uses a compliant instrumented wrist to monitor tool operation in terms of resulting motions and reaction forces

    Embodied Cognitive Science of Music. Modeling Experience and Behavior in Musical Contexts

    Get PDF
    Recently, the role of corporeal interaction has gained wide recognition within cognitive musicology. This thesis reviews evidence from different directions in music research supporting the importance of body-based processes for the understanding of music-related experience and behaviour. Stressing the synthetic focus of cognitive science, cognitive science of music is discussed as a modeling approach that takes these processes into account and may theoretically be embedded within the theory of dynamic systems. In particular, arguments are presented for the use of robotic devices as tools for the investigation of processes underlying human music-related capabilities (musical robotics)

    Emergence of articulatory-acoustic systems from deictic interaction games in a "vocalize to localize" framework

    Get PDF
    International audienceSince the 70's and Lindblom's proposal to "derive language from non-language", phoneticians have developed a number of "substance-based" theories. The starting point is Lindblom's Dispersion Theory and Stevens's Quantal Theory, which open the way to a rich tradition of works attempting to determine and possibly model how phonological systems could be shaped by the perceptuo-motor substance of speech communication. These works search to derive the shapes of human languages from constraints arising from perceptual (auditory and perhaps visual) and motor (articulatory and cognitive) properties of the speech communication system: we call them "Morphogenesis Theories". More recently, a number of proposals were introduced in order to connect pre-linguistic primate abilities (such as vocalization, gestures, mastication or deixis) to human language. For instance, in the "Vocalize-to-Localize" framework that we adopt in the present work (Abry & al., 2004), human language is supposed to derive from a precursor deictic function, considering that language could have provided at the beginning an evolutionary development of the ability to "show with the voice". We call this type of theories "Origins Theories". We propose that the principles of Morphogenesis Theories (such as dispersion principles or the quantal nature of speech) can be incorporated and to a certain extent derived from Origins Theories. While Morphogenesis Theories raise questions such as "why are vowel systems shaped the way they are?" and answer that it is to increase auditory dispersion in order to prevent confusion between them, we ask questions such as "why do humans attempt to prevent confusion between percepts?" and answer that it could be to "show with the voice", that is, to improve the pre-linguistic deictic function. In this paper, we present a computational Bayesian model incorporating the Dispersion and Quantal Theories of speech sounds inside the Vocalize-to-Localize framework, and show how realistic simulations of vowel systems can emerge from this model

    Towards Modelling Trust in Voice at Zero Acquaintance

    Get PDF
    Trust is essential in many human relationships, especially where there is an element of inter-dependency. However, humans tend to make quick judgements about trusting other individuals, even those met at zero acquaintance. Past studies have shown the significance of voice in perceived trustworthiness, but research associating trustworthiness and different vocal features such as speech rate and fundamental frequency (f0) has yet to yield consistent results. Therefore, this paper proposes a method to investigate 1) the association between trustworthiness and different vocal features, 2) the vocal characteristics that Malaysian ethnic groups base their judgement of trustworthiness on and 3) building a neural network model that predicts the degree of trustworthiness in a human voice. In the method proposed, a reliable set of audio clips will be obtained and analyzed with SoundGen to determine the acoustical characteristics. Then the audio clips will be distributed to a large group of untrained respondents to rate their degree of trust in the speakers of each audio clip. The participants will be able to choose from 30 sets of audio clips which will consist of 6 audio clips each. The acoustic characteristics will be analyzed and com-pared with the ratings to determine if there are any correlations between the acoustic characteristic and the trustworthiness ratings. After that, a neural network model will be built based on the collected data. The neural network model will be able to predict the trustworthiness of a person’s voice. Keywords—prosody, trust, voice, vocal cues, zero acquaintance

    Explicit feedback from users attenuates memory biases in human-system dialogue

    Get PDF
    In human–human dialogue, the way in which a piece of information is added to the partners’ common ground (i.e., presented and accepted) constitutes an important determinant of subsequent dialogue memory. The aim of this study was to determine whether this is also the case in human-system dialogue. An experiment was conducted in which naïve participants and a simulated dialogue system took turns to present references to various landmarks featured on a list. The kind of feedback used to accept these references (verbatim repetition vs. implicit acceptance) was manipulated. The participants then performed a recognition test during which they attempted to identify the references mentioned previously. Self-presented references were recognised better than references presented by the system; however, such presentation bias was attenuated when the initial presentation of these references was followed by verbatim repetition. Implications for the design of automated dialogue systems are discussed

    The impact of voice on trust attributions

    Get PDF
    Trust and speech are both essential aspects of human interaction. On the one hand, trust is necessary for vocal communication to be meaningful. On the other hand, humans have developed a way to infer someone’s trustworthiness from their voice, as well as to signal their own. Yet, research on trustworthiness attributions to speakers is scarce and contradictory, and very often uses explicit data, which do not predict actual trusting behaviour. However, measuring behaviour is very important to have an actual representation of trust. This thesis contains 5 experiments aimed at examining the influence of various voice characteristics — including accent, prosody, emotional expression and naturalness — on trusting behaviours towards virtual players and robots. The experiments have the "investment game"—a method derived from game theory, which allows to measure implicit trustworthiness attributions over time — as their main methodology. Results show that standard accents, high pitch, slow articulation rate and smiling voice generally increase trusting behaviours towards a virtual agent, and a synthetic voice generally elicits higher trustworthiness judgments towards a robot. The findings also suggest that different voice characteristics influence trusting behaviours with different temporal dynamics. Furthermore, the actual behaviour of the various speaking agents was modified to be more or less trustworthy, and results show that people’s trusting behaviours develop over time accordingly. Also, people reinforce their trust towards speakers that they deem particularly trustworthy when these speakers are indeed trustworthy, but punish them when they are not. This suggests that people’s trusting behaviours might also be influenced by the congruency of their first impressions with the actual experience of the speaker’s trustworthiness — a "congruency effect". This has important implications in the context of Human–Machine Interaction, for example for assessing users’ reactions to speaking machines which might not always function properly. Taken together, the results suggest that voice influences trusting behaviour, and that first impressions of a speaker’s trustworthiness based on vocal cues might not be indicative of future trusting behaviours, and that trust should be measured dynamically
    • 

    corecore