7,844 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Clear Speech strategies and speech perception in adverse listening conditions

    Get PDF
    The study investigated the impact of different types of clear speech on speech perception in an adverse listening condition. Tokens were extracted from spontaneous speech dialogues in which participants completed a problem-solving task in good listening conditions or while experiencing a one-sided ‘communication barrier’: a real-time vocoder or multibabble noise. These two adverse conditions induced the ‘unimpaired’ participant to produce clear speech. When tokens from these three conditions were presented in multibabble noise, listeners were quicker at processing clear tokens produced to counter the effects of multibabble noise than clear tokens produced to counteract the vocoder, or tokens produced in good communicative conditions. A clarity rating experiment using the same tokens presented in quiet showed that listeners do not distinguish between different types of clear speech. Together, these results suggest that clear speaking styles produced in different communicative conditions have acoustic-phonetic characteristics adapted to the needs of the listener, even though they may be perceived as being of similar clarity

    Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech

    No full text
    International audienceSpeech produced in noise (or Lombard speech) is characterized by increased vocal effort, but also by amplified lip gestures. The current study examines whether this enhancement of visible speech cues may be sought by the speaker, even unconsciously, in order to improve his visual intelligibility. One subject played an interactive game in a quiet situation and then in 85dB of cocktail-party noise, for three conditions of interaction: without interaction, in face-to-face interaction, and in a situation of audio interaction only. The audio signal was recorded simultaneously with articulatory movements, using 3D electromagnetic articulography. The results showed that acoustic modifications of speech in noise were greater when the interlocutor could not see the speaker. Furthermore, tongue movements that are hardly visible were not particularly amplified in noise. Lip movements that are very visible were not more enhanced in noise when the interlocutors could see each other. Actually, they were more enhanced in the situation of audio interaction only. These results support the idea that this speaker did not make use of the visual channel to improve his intelligibility, and that his hyper articulation was just an indirect correlate of increased vocal effort

    Just A Little Respect: Authority And Competency In Women’s Speech

    Get PDF
    Young women have conflicting motivations directing how they use pitch, vocal fry, and uptalk intonation. High pitch and uptalk may emphasize their femininity, but low pitch and vocal fry are associated with better leadership. Thus, it is difficult to predict how young women will speak in a particular situation. This thesis measures how 16 young women used pitch, vocal fry, and uptalk in three different speech styles collected through videoconferencing calls. Surveys determined how the changes in speech affected the listener’s judgments of the speaker. The lowest average pitch was in interview style speech and the largest range of pitch in casual style speech. The young women used more uptalk in interview style speech than in presentation or casual speech. The highest amount of fry was in presentation style speech. Male participants were more likely than female participants to judge a speaker using uptalk as less competent

    The effect of age and hearing loss on partner-directed gaze in a communicative task

    Get PDF
    The study examined the partner-directed gaze patterns of old and young talkers in a task (DiapixUK) that involved two people (a lead talker and a follower) engaging in a spontaneous dialogue. The aim was (1) to determine whether older adults engage less in partner-directed gaze than younger adults by measuring mean gaze frequency and mean total gaze duration; and (2) examine the effect that mild hearing loss may have on older adult’s partner-directed gaze. These were tested in various communication conditions: a no barrier condition; BAB2 condition in which the lead talker and the follower spoke and heard each other in multitalker babble noise; and two barrier conditions in which the lead talker could hear clearly their follower but the follower could not hear the lead talker very clearly (i.e., the lead talker’s voice was degraded by babble (BAB1) or by a Hearing Loss simulation (HLS). 57 single-sex pairs (19 older adults with mild Hearing Loss, 17 older adults with Normal Hearing and 21 younger adults) participated in the study. We found that older adults with normal hearing produced fewer partner-directed gazes (and gazed less overall) than either the older adults with hearing loss or younger adults for the BAB1 and HLS conditions. We propose that this may be due to a decline in older adult’s attention to cues signaling how well a conversation is progressing. Older adults with hearing loss, however, may attend more to visual cues because they give greater weighting to these for understanding speech

    Before they can teach they must talk : on some aspects of human-computer interaction

    Get PDF

    Ethical Challenges in Data-Driven Dialogue Systems

    Full text link
    The use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm. A growing number of dialogue systems use conversation strategies that are learned from large datasets. There are well documented instances where interactions with these system have resulted in biased or even offensive conversations due to the data-driven training process. Here, we highlight potential ethical issues that arise in dialogue systems research, including: implicit biases in data-driven systems, the rise of adversarial examples, potential sources of privacy violations, safety concerns, special considerations for reinforcement learning systems, and reproducibility concerns. We also suggest areas stemming from these issues that deserve further investigation. Through this initial survey, we hope to spur research leading to robust, safe, and ethically sound dialogue systems.Comment: In Submission to the AAAI/ACM conference on Artificial Intelligence, Ethics, and Societ

    Language acquisition in a post-pandemic context: the impact of measures against COVID-19 on early language development

    Full text link
    Language acquisition is influenced by the quality and quantity of input that language learners receive. In particular, early language development has been said to rely on the acoustic speech stream, as well as on language-related visual information, such as the cues provided by the mouth of interlocutors. Furthermore, children's expressive language skills are also influenced by the variability of interlocutors that provided the input. The COVID-19 pandemic has offered an unprecedented opportunity to explore the way these input factors affect language development. On the one hand, the pervasive use of masks diminishes the quality of speech, while it also reduces visual cues to language. On the other hand, lockdowns and restrictions regarding social gatherings have considerably limited the amount of interlocutor variability in children's input. The present study aims at analyzing the effects of the pandemic measures against COVID-19 on early language development. To this end, 41 children born in 2019 and 2020 were compared with 41 children born before 2012 using the Catalan adaptation of the MacArthur Bates Communicative Development Inventories (MB-CDIs). Results do not show significant differences in vocabulary between pre- and post-Covid children, although there is a tendency for children with lower vocabulary levels to be in the post-Covid group. Furthermore, a relationship was found between interlocutor variability and participants' vocabulary, indicating that those participants with fewer opportunities for socio-communicative diversity showed lower expressive vocabulary scores. These results reinforce other recent findings regarding input factors and their impact on early language learning
    corecore