1,697 research outputs found
PRESENCE: A human-inspired architecture for speech-based human-machine interaction
Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially and performance appears to be asymptotic to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. This paper addresses these issues and presents a novel architecture for speech-based human-machine interaction inspired by recent findings in the neurobiology of living systems. Called PRESENCE-"PREdictive SENsorimotor Control and Emulation" - this new architecture blurs the distinction between the core components of a traditional spoken language dialogue system and instead focuses on a recursive hierarchical feedback control structure. Cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of interaction in which the system has in mind the needs and intentions of a user and a user has in mind the needs and intentions of the system
Gender Ambiguity in Voice-Based Assistants: Gender Perception and Influences of Context
Recently emerging synthetic acoustically gender-ambiguous voices could contribute to dissolving the still prevailing genderism. Yet, are we indeed perceiving these voices as âunassignableâ? Or are we trying to assimilate them into existing genders? To investigate the perceived ambiguity, we conducted an explorative 3 (male, female, ambiguous voice) Ă 3 (male, female, ambiguous topic) experiment. We found that, although participants perceived the gender-ambiguous voice as ambiguous, they used a profoundly wide range of the scale, indicating tendencies toward a gender. We uncovered a mild dissolve of gender roles. Neither the listenerâs gender nor the personal gender stereotypes impacted the perception. However, the perceived topic gender indicated the perceived voice gender, and younger people tended to perceive a more male-like gender
Do (and say) as I say: Linguistic adaptation in human-computer dialogs
© Theodora Koulouri, Stanislao Lauria, and Robert D. Macredie. This article has been made available through the Brunel Open Access Publishing Fund.There is strong research evidence showing that people naturally align to each otherâs vocabulary, sentence structure, and acoustic features in dialog, yet little is known about how the alignment mechanism operates in the interaction between users and computer systems let alone how it may be exploited to improve the efficiency of the interaction. This article provides an account of lexical alignment in humanâcomputer dialogs, based on empirical data collected in a simulated humanâcomputer interaction scenario. The results indicate that alignment is present, resulting in the gradual reduction and stabilization of the vocabulary-in-use, and that it is also reciprocal. Further, the results suggest that when system and user errors occur, the development of alignment is temporarily disrupted and users tend to introduce novel words to the dialog. The results also indicate that alignment in humanâcomputer interaction may have a strong strategic component and is used as a resource to compensate for less optimal (visually impoverished) interaction conditions. Moreover, lower alignment is associated with less successful interaction, as measured by user perceptions. The article distills the results of the study into design recommendations for humanâcomputer dialog systems and uses them to outline a model of dialog management that supports and exploits alignment through mechanisms for in-use adaptation of the systemâs grammar and lexicon
Human-Machine Communication: Complete Volume 5. Gender and Human-Machine Communication
This is the complete volume of HMC Volume
Model Based Teleoperation to Eliminate Feedback Delay NSF Grant BCS89-01352 - 3rd Report
We are conducting research in the area of teleoperation with feedback delay. Significant delays occur when performing space teleoperation from the earth as well as in subsea teleoperation where the operator is typically on a surface vessel and communication is via acoustic links. These delays make teleoperation extremely difficult and lead to very low operator productivity. We have combined computer graphics with manipulator programming to provide a solution to the delay problem. A teleoperator master arm is interfaced to a graphical simulation of the remote environment. Synthetic fixtures are used to guide the operators motions and to provide kinesthetic feedback. The operator\u27s actions are monitored and used to generate symbolic motion commands for transmission to, and execution by, the remote slave robot. While much of a task proceeds error free, when an error does occur, the slave system transmits data back to the master environment where the operator can then experience the motion of the slave manipulator in actual task execution. We have also provided for the use of tools such as an impact wrench and a winch at the slave site. In all cases the tools are unencumbered by sensors; the slave uses a compliant instrumented wrist to monitor tool operation in terms of resulting motions and reaction forces
Embodied Cognitive Science of Music. Modeling Experience and Behavior in Musical Contexts
Recently, the role of corporeal interaction has gained wide recognition within cognitive musicology. This thesis reviews evidence from different directions in music research supporting the importance of body-based processes for the understanding of music-related experience and behaviour. Stressing the synthetic focus of cognitive science, cognitive science of music is discussed as a modeling approach that takes these processes into account and may theoretically be embedded within the theory of dynamic systems. In particular, arguments are presented for the use of robotic devices as tools for the investigation of processes underlying human music-related capabilities (musical robotics)
Emergence of articulatory-acoustic systems from deictic interaction games in a "vocalize to localize" framework
International audienceSince the 70's and Lindblom's proposal to "derive language from non-language", phoneticians have developed a number of "substance-based" theories. The starting point is Lindblom's Dispersion Theory and Stevens's Quantal Theory, which open the way to a rich tradition of works attempting to determine and possibly model how phonological systems could be shaped by the perceptuo-motor substance of speech communication. These works search to derive the shapes of human languages from constraints arising from perceptual (auditory and perhaps visual) and motor (articulatory and cognitive) properties of the speech communication system: we call them "Morphogenesis Theories". More recently, a number of proposals were introduced in order to connect pre-linguistic primate abilities (such as vocalization, gestures, mastication or deixis) to human language. For instance, in the "Vocalize-to-Localize" framework that we adopt in the present work (Abry & al., 2004), human language is supposed to derive from a precursor deictic function, considering that language could have provided at the beginning an evolutionary development of the ability to "show with the voice". We call this type of theories "Origins Theories". We propose that the principles of Morphogenesis Theories (such as dispersion principles or the quantal nature of speech) can be incorporated and to a certain extent derived from Origins Theories. While Morphogenesis Theories raise questions such as "why are vowel systems shaped the way they are?" and answer that it is to increase auditory dispersion in order to prevent confusion between them, we ask questions such as "why do humans attempt to prevent confusion between percepts?" and answer that it could be to "show with the voice", that is, to improve the pre-linguistic deictic function. In this paper, we present a computational Bayesian model incorporating the Dispersion and Quantal Theories of speech sounds inside the Vocalize-to-Localize framework, and show how realistic simulations of vowel systems can emerge from this model
Towards Modelling Trust in Voice at Zero Acquaintance
Trust is essential in many human relationships, especially where there is an element of inter-dependency. However, humans tend to make quick judgements about trusting other individuals, even those met at zero acquaintance. Past studies have shown the significance of voice in perceived trustworthiness, but research associating trustworthiness and different vocal features such as speech rate and fundamental frequency (f0) has yet to yield consistent results. Therefore, this paper proposes a method to investigate 1) the association between trustworthiness and different vocal features, 2) the vocal characteristics that Malaysian ethnic groups base their judgement of trustworthiness on and 3) building a neural network model that predicts the degree of trustworthiness in a human voice. In the method proposed, a reliable set of audio clips will be obtained and analyzed with SoundGen to determine the acoustical characteristics. Then the audio clips will be distributed to a large group of untrained respondents to rate their degree of trust in the speakers of each audio clip. The participants will be able to choose from 30 sets of audio clips which will consist of 6 audio clips each. The acoustic characteristics will be analyzed and com-pared with the ratings to determine if there are any correlations between the acoustic characteristic and the trustworthiness ratings. After that, a neural network model will be built based on the collected data. The neural network model will be able to predict the trustworthiness of a personâs voice.
Keywordsâprosody, trust, voice, vocal cues, zero acquaintance
Explicit feedback from users attenuates memory biases in human-system dialogue
In humanâhuman dialogue, the way in which a piece of information is added to the partnersâ common ground (i.e., presented and accepted) constitutes an important determinant of subsequent dialogue memory. The aim of this study was to determine whether this is also the case in human-system dialogue. An experiment was conducted in which naĂŻve participants and a simulated dialogue system took turns to present references to various landmarks featured on a list. The kind of feedback used to accept these references (verbatim repetition vs. implicit acceptance) was manipulated. The participants then performed a recognition test during which they attempted to identify the references mentioned previously. Self-presented references were recognised better than references presented by the system; however, such presentation bias was attenuated when the initial presentation of these references was followed by verbatim repetition. Implications for the design of automated dialogue systems are discussed
The impact of voice on trust attributions
Trust and speech are both essential aspects of human interaction. On the one hand, trust
is necessary for vocal communication to be meaningful. On the other hand, humans have
developed a way to infer someoneâs trustworthiness from their voice, as well as to signal their
own. Yet, research on trustworthiness attributions to speakers is scarce and contradictory,
and very often uses explicit data, which do not predict actual trusting behaviour. However,
measuring behaviour is very important to have an actual representation of trust. This thesis
contains 5 experiments aimed at examining the influence of various voice characteristics â
including accent, prosody, emotional expression and naturalness â on trusting behaviours
towards virtual players and robots. The experiments have the "investment game"âa method
derived from game theory, which allows to measure implicit trustworthiness attributions over
time â as their main methodology. Results show that standard accents, high pitch, slow
articulation rate and smiling voice generally increase trusting behaviours towards a virtual
agent, and a synthetic voice generally elicits higher trustworthiness judgments towards
a robot. The findings also suggest that different voice characteristics influence trusting
behaviours with different temporal dynamics. Furthermore, the actual behaviour of the
various speaking agents was modified to be more or less trustworthy, and results show
that peopleâs trusting behaviours develop over time accordingly. Also, people reinforce
their trust towards speakers that they deem particularly trustworthy when these speakers
are indeed trustworthy, but punish them when they are not. This suggests that peopleâs
trusting behaviours might also be influenced by the congruency of their first impressions
with the actual experience of the speakerâs trustworthiness â a "congruency effect". This
has important implications in the context of HumanâMachine Interaction, for example for
assessing usersâ reactions to speaking machines which might not always function properly.
Taken together, the results suggest that voice influences trusting behaviour, and that first
impressions of a speakerâs trustworthiness based on vocal cues might not be indicative of
future trusting behaviours, and that trust should be measured dynamically
- âŠ