Search CORE

2,126 research outputs found

Acoustic and visual adaptations in speech produced to counter adverse listening conditions

Author: Hazan VL
Kim J
Publication venue: Inria
Publication date: 01/01/2013
Field of study

This study investigated whether communication modality affects talkers’ speech adaptation to an interlocutor exposed to background noise. It was predicted that adaptations to lip gestures would be greater and acoustic ones reduced when communicating face-to-face. We video recorded 14 Australian-English talkers (Talker A) speaking in a face-to-face or auditory only setting with their interlocutors who were either in quiet or noise. Focusing on keyword productions, acoustic-phonetic adaptations were examined via measures of vowel intensity, pitch, keyword duration, vowel F1/F2 space and VOT, and visual adaptations via measures of vowel interlip area. The interlocutor adverse listening conditions lead Talker A to reduce speech rate, increase pitch and expand vowel space. These adaptations were not significantly reduced in the face-to-face setting although there was a trend for a smaller degree of vowel space expansion than in the auditory only setting. Visible lip gestures were more enhanced overall in the face-to-face setting, but also increased in the auditory only setting when countering the effects of noise. This study therefore showed only small effects of communication modality on speech adaptations

UCL Discovery

Western Sydney ResearchDirect

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions

Author: Saigusa Julie Sachie
Publication venue: UCL (University College London)
Publication date: 28/09/2021
Field of study

In many situations it is necessary to produce speech in ‘adverse conditions’: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ‘tailored’ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed

UCL Discovery

Investigating Clear Speech Adaptations in Spontaneous Speech Produced in Communicative Settings

Author: Hazan V
Tuomainen OT
Publication venue: Research Institute for Linguistics of the Hungarian Academy of Sciences
Publication date: 01/01/2018
Field of study

In order to investigate the clear speech adaptations that individuals make when communicating in intelligibility-challenging conditions, it would seem essential to examine speech that is produced in interaction with a conversational partner. However, much of the literature on clear speech adaptations has been based on the analysis of sentences that talkers were instructed to read clearly. In this chapter, we review methods for eliciting spontaneous speech in interaction for the purpose of investigating clear speech phenomena. We describe in more detail the Diapix task (Van Engen et al., 2010) and DiapixUK picture pairs (Baker & Hazan, 2011) which have been used in the production of large corpora investigating clear speech adaptations. We present an overview of the analysis of spontaneous speech and clear speech adaptations from the LUCID corpora that include spontaneous speech recordings from children, young and older adults

UCL Discovery

Investigating Clear Speech Adaptations in Spontaneous Speech Produced in Communcative Settings

Author: Hazan Valerie
Tuomainen Outi
Publication venue: MTA Nyelvtudományi Intézet
Publication date: 01/01/2018
Field of study

Repository of the Academy's Library

The impact of the Lombard effect on audio and visual speech recognition systems

Author: Alghamdi N.
Barker J.P.
Maddock S.
Marxer R.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

When producing speech in noisy backgrounds talkers reflexively adapt their speaking style in ways that increase speech-in-noise intelligibility. This adaptation, known as the Lombard effect, is likely to have an adverse effect on the performance of automatic speech recognition systems that have not been designed to anticipate it. However, previous studies of this impact have used very small amounts of data and recognition systems that lack modern adaptation strategies. This paper aims to rectify this by using a new audio-visual Lombard corpus containing speech from 54 different speakers – significantly larger than any previously available – and modern state-of-the-art speech recognition techniques. The paper is organised as three speech-in-noise recognition studies. The first examines the case in which a system is presented with Lombard speech having been exclusively trained on normal speech. It was found that the Lombard mismatch caused a significant decrease in performance even if the level of the Lombard speech was normalised to match the level of normal speech. However, the size of the mismatch was highly speaker-dependent thus explaining conflicting results presented in previous smaller studies. The second study compares systems trained in matched conditions (i.e., training and testing with the same speaking style). Here the Lombard speech affords a large increase in recognition performance. Part of this is due to the greater energy leading to a reduction in noise masking, but performance improvements persist even after the effect of signal-to-noise level difference is compensated. An analysis across speakers shows that the Lombard speech energy is spectro-temporally distributed in a way that reduces energetic masking, and this reduction in masking is associated with an increase in recognition performance. The final study repeats the first two using a recognition system training on visual speech. In the visual domain, performance differences are not confounded by differences in noise masking. It was found that in matched-conditions Lombard speech supports better recognition performance than normal speech. The benefit was consistently present across all speakers but to a varying degree. Surprisingly, the Lombard benefit was observed to a small degree even when training on mismatched non-Lombard visual speech, i.e., the increased clarity of the Lombard speech outweighed the impact of the mismatch. The paper presents two generally applicable conclusions: i) systems that are designed to operate in noise will benefit from being trained on well-matched Lombard speech data, ii) the results of speech recognition evaluations that employ artificial speech and noise mixing need to be treated with caution: they are overly-optimistic to the extent that they ignore a significant source of mismatch but at the same time overly-pessimistic in that they do not anticipate the potential increased intelligibility of the Lombard speaking style

HAL AMU

White Rose Research Online

Children's acoustic and linguistic adaptations of peers with hearing impairment

Author: Granlund S
Hazan VL
Mahon HM
Publication venue: 'American Speech Language Hearing Association'
Publication date: 31/12/2017
Field of study

Purpose: This study aims to examine the clear speaking strategies used by older children when interacting with a peer with hearing loss, focusing on both acoustic and linguistic adaptations in speech. Method: The Grid task, a problem-solving task developed to elicit spontaneous interactive speech, was used to obtain a range of global acoustic and linguistic measures. Eighteen 9- to 14-year-old children with normal-hearing (NH) performed the task in pairs, once with a friend with NH, and once with a friend with a hearing-impairment (HI). Results: In HI-directed speech, children increased their fundamental frequency range and mid-frequency intensity, decreased the number of words per phrase, and expanded their vowel space area by increasing F1 and F2 range, relative to NH-directed speech. However, participants did not appear to make changes to their articulation rate, the lexical frequency of content words, or to lexical diversity, when talking to their friend with HI compared to their friend with NH. Conclusions: Older children show evidence of listener-oriented adaptations to their speech production; although their speech production systems are still developing, they are able to make speech adaptations to benefit the needs of a peer with HI, even without being given specific instruction to do so

UCL Discovery

Visual Speech Enhancement and its Application in Speech Perception Training

Author: Alghamdi Najwa
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/09/2017
Field of study

This thesis investigates methods for visual speech enhancement to support auditory and audiovisual speech perception. Normal-hearing non-native listeners receiving cochlear implant (CI) simulated speech are used as ‘proxy’ listeners for CI users, a proposed user group who could benefit from such enhancement methods in speech perception training. Both CI users and non-native listeners share similarities with regards to audiovisual speech perception, including increased sensitivity to visual speech cues. Two enhancement methods are proposed: (i) an appearance based method, which modifies the appearance of a talker’s lips using colour and luminance blending to apply a ‘lipstick effect’ to increase the saliency of mouth shapes; and (ii) a kinematics based method, which amplifies the kinematics of the talker’s mouth to create the effect of more pronounced speech (an ‘exaggeration effect’). The application that is used to test the enhancements is speech perception training, or audiovisual training, which can be used to improve listening skills. An audiovisual training framework is presented which structures the evaluation of the effectiveness of these methods. It is used in two studies. The first study, which evaluates the effectiveness of the lipstick effect, found a significant improvement in audiovisual and auditory perception. The second study, which evaluates the effectiveness of the exaggeration effect, found improvement in the audiovisual perception of a number of phoneme classes; no evidence was found of improvements in the subsequent auditory perception, as audiovisual recalibration to visually exaggerated speech may have impeded learning when used in the audiovisual training. The thesis also investigates an example of kinematics based enhancement which is observed in Lombard speech, by studying the behaviour of visual Lombard phonemes in different contexts. Due to the lack of suitable datasets for this analysis, the thesis presents a novel audiovisual Lombard speech dataset recorded under high SNR, which offers two, fixed head-pose, synchronised views of each talker in the dataset

White Rose E-theses Online

The influence of channel and source degradations on intelligibility and physiological measurements of effort

Author: Paulus Maximillian
Publication venue: UCL (University College London)
Publication date: 28/01/2021
Field of study

Despite the fact that everyday listening is compromised by acoustic degradations, individuals show a remarkable ability to understand degraded speech. However, recent trends in speech perception research emphasise the cognitive load imposed by degraded speech on both normal-hearing and hearing-impaired listeners. The perception of degraded speech is often studied through channel degradations such as background noise. However, source degradations determined by talkers’ acoustic-phonetic characteristics have been studied to a lesser extent, especially in the context of listening effort models. Similarly, little attention has been given to speaking effort, i.e., effort experienced by talkers when producing speech under channel degradations. This thesis aims to provide a holistic understanding of communication effort, i.e., taking into account both listener and talker factors. Three pupillometry studies are presented. In the first study, speech was recorded for 16 Southern British English speakers and presented to normal-hearing listeners in quiet and in combination with three degradations: noise-vocoding, masking and time-compression. Results showed that acoustic-phonetic talker characteristics predicted intelligibility of degraded speech, but not listening effort, as likely indexed by pupil dilation. In the second study, older hearing-impaired listeners were presented fast time-compressed speech under simulated room acoustics. Intelligibility was kept at high levels. Results showed that both fast speech and reverberant speech were associated with higher listening effort, as suggested by pupillometry. Discrepancies between pupillometry and perceived effort ratings suggest that both methods should be employed in speech perception research to pinpoint processing effort. While findings from the first two studies support models of degraded speech perception, emphasising the relevance of source degradations, they also have methodological implications for pupillometry paradigms. In the third study, pupillometry was combined with a speech production task, aiming to establish an equivalent to listening effort for talkers: speaking effort. Normal-hearing participants were asked to read and produce speech in quiet or in the presence of different types of masking: stationary and modulated speech-shaped noise, and competing-talker masking. Results indicated that while talkers acoustically enhance their speech more under stationary masking, larger pupil dilation associated with competing-speaker masking reflected higher speaking effort. Results from all three studies are discussed in conjunction with models of degraded speech perception and production. Listening effort models are revisited to incorporate pupillometry results from speech production paradigms. Given the new approach of investigating source factors using pupillometry, methodological issues are discussed as well. The main insight provided by this thesis, i.e., the feasibility of applying pupillometry to situations involving listener and talker factors, is suggested to guide future research employing naturalistic conversations

UCL Discovery

Vowel space area in later childhood and adolescence: Effects of age, sex and ease of communication

Author: Michèle Pettinato
Outi Tuomainen
Sonia Granlund
Valerie Hazan
Adank
Baker
Barutchu
Bent
Boyle
Bradlow
Busby
Calamai
Clopper
Cooke
Davis
Doherty-Sneddon
Doherty-Sneddon
Doherty-Sneddon
de Jong
Donegan
Eguchi
Fant
Faulkner
Ferguson
Ferguson
Ferguson
Fitch
Flipsen
Foulkes
Glasberg
Goldfield
Green
Green
Hazan
Hazan
Hillenbrand
Hillock-Dunn
Holliday
Jacewicz
Jeannerod
Kent
Kent
Krause
Labov
Labov
Lam
Lee
Lindblom
Logan
Moon
Nip
Perry
Prince
Quené
Redford
Redford
Rinker
Romeo
Rosen
Sabin
Sadagopan
Sapir
Scarborough
Simpson
Smiljanic
Smiljanic
Smiljanic
Stoel-Gammon
Strange
Sturm
Sundberg
Synnestvedt
Syrett
Uchanski
Van Engen
Vorperian
Walsh
Watson
Whiteside
Whiteside
Wonnacott
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1988
Field of study

This study investigated vowel space area (VSA) development in childhood and adolescence and its impact on the ability to hyperarticulate vowels. In experiment 1, 96 participants aged 9-14 years carried out an interactive task when communication was easy (no barrier, 'NB') and difficult (the speech of one participant was filtered through a vocoder, 'VOC'). Previous recordings from 20 adults were used as reference. Measures of VSA (ERB2), F1 and F2 ranges (ERB) and articulation rate were obtained. Children's VSA were significantly larger than adults'. From the age of 11, vowel hyperarticulation was evident in VOC, but only because VSA were gradually reducing with age in NB. The results suggest that whilst large VSA do not prevent children from hyperarticulating vowels, the manner in which this is achieved may not be adult-like. Experiment 2 was conducted to verify that large VSA were not a by-product of children being unable to see each other. Thirteen participants carried out the same task face-to-face with their interlocutor. Comparisons to matched participants from experiment 1 showed no differences in VSA, indicating that the audio-only modality did not influence results. Possible reasons for larger VSA in the spontaneous speech of children and adolescents are discussed

OPUS Augsburg

Elsevier - Publisher Connector

Crossref

UCL Discovery