Search CORE

408 research outputs found

Cortical tracking of unheard formant modulations derived from silently presented lip movements and its decline with age

Author: Hauswald Anne
Keitel Anne
Reisinger Patrick
Rösch Sebastian
Suess Nina
Weisz Nathan
Publication venue: BioRxiv
Publication date: 11/05/2021
Field of study

University of Dundee Online Publications

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

MEG, PSYCHOPHYSICAL AND COMPUTATIONAL STUDIES OF LOUDNESS, TIMBRE, AND AUDIOVISUAL INTEGRATION

Author: Jenkins III Julian
Publication venue
Publication date: 01/01/2011
Field of study

Natural scenes and ecological signals are inherently complex and understanding of their perception and processing is incomplete. For example, a speech signal contains not only information at various frequencies, but is also not static; the signal is concurrently modulated temporally. In addition, an auditory signal may be paired with additional sensory information, as in the case of audiovisual speech. In order to make sense of the signal, a human observer must process the information provided by low-level sensory systems and integrate it across sensory modalities and with cognitive information (e.g., object identification information, phonetic information). The observer must then create functional relationships between the signals encountered to form a coherent percept. The neuronal and cognitive mechanisms underlying this integration can be quantified in several ways: by taking physiological measurements, assessing behavioral output for a given task and modeling signal relationships. While ecological tokens are complex in a way that exceeds our current understanding, progress can be made by utilizing synthetic signals that encompass specific essential features of ecological signals. The experiments presented here cover five aspects of complex signal processing using approximations of ecological signals : (i) auditory integration of complex tones comprised of different frequencies and component power levels; (ii) audiovisual integration approximating that of human speech; (iii) behavioral measurement of signal discrimination; (iv) signal classification via simple computational analyses and (v) neuronal processing of synthesized auditory signals approximating speech tokens. To investigate neuronal processing, magnetoencephalography (MEG) is employed to assess cortical processing non-invasively. Behavioral measures are employed to evaluate observer acuity in signal discrimination and to test the limits of perceptual resolution. Computational methods are used to examine the relationships in perceptual space and physiological processing between synthetic auditory signals, using features of the signals themselves as well as biologically-motivated models of auditory representation. Together, the various methodologies and experimental paradigms advance the understanding of ecological signal analytics concerning the complex interactions in ecological signal structure

Digital Repository at the University of Maryland

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

Directory of Open Access Books (DOAB)

Effects of spatial separation on across-frequency grouping in narrowband speech

Author: Cepeda Miguel D.
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (M.S.)--Boston UniversityUnderstanding how we perceive speech in the face of competing sound sources coming from a variety of directions is an important goal in psychoacoustics. In everyday situations, noisy interference can obscure the content of a conversation and require listeners to integrate speech information across different frequency regions. Two studies will be explained that investigate the effects of spatial separation on the grouping of two spectrally separated, narrow bands of target speech with a variety of filler stimuli centered in between these bands. Target sentences taken from the IEEE corpus were broken into two 3/4-octave bands with the lowest centered around 370 Hz and the highest centered around 6kHz. The first study explored the spatial influences of spectral restoration. The primary experiment measured speech intelligibility of the speech bands (presented diotically) with a single band of noise between 700 Hz and 3 kHz used as the filler and then with the same noise band modulated by the target speech envelope as the filler. These fillers were presented diotically as well as with an ITD of 600 s leading to the left ear. Performance was worse for the unmodulated noise condition when the filler was separated spatially from the speech bands. Across-frequency grouping was not observed with the modulated noise conditions. The second study explored the effect of attention on intelligibility of speech bands presented from the left with related fillers. The filler objects used in this study were dual bands of vocoded or narrowband speech presented either from left or right. The fillers were derived from either the same target speech token (matched) or an independent sentence (conflicting). In a key experimental block, listeners were instructed to attend to the target speech on the left while either conflicting bands or, infrequently, matched bands were presented on the right. The infrequently presented matching trials were physically identical to trials in another block where listeners were instructed to attend to both ears. Results showed that splitting the target and filler across the ears degraded intelligibility, however, directed spatial attention had no effect on performance. These results demonstrate that speech elements group together strongly, overcoming spatial attention, even for degraded speech

Boston University Institutional Repository (OpenBU)

On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch

Author: Monahan Philip Joseph
Publication venue
Publication date: 01/01/2009
Field of study

The goal of this dissertation is to show that even at the earliest (non-invasive) recordable stages of auditory cortical processing, we find evidence that cortex is calculating abstract representations from the acoustic signal. Looking across two distinct domains (inferential pitch perception and vowel normalization), I present evidence demonstrating that the M100, an automatic evoked neuromagnetic component that localizes to primary auditory cortex is sensitive to abstract computations. The M100 typically responds to physical properties of the stimulus in auditory and speech perception and integrates only over the first 25 to 40 ms of stimulus onset, providing a reliable dependent measure that allows us to tap into early stages of auditory cortical processing. In Chapter 2, I briefly present the episodicist position on speech perception and discuss research indicating that the strongest episodicist position is untenable. I then review findings from the mismatch negativity literature, where proposals have been made that the MMN allows access into linguistic representations supported by auditory cortex. Finally, I conclude the Chapter with a discussion of the previous findings on the M100/N1. In Chapter 3, I present neuromagnetic data showing that the re-sponse properties of the M100 are sensitive to the missing fundamental component using well-controlled stimuli. These findings suggest that listeners are reconstructing the inferred pitch by 100 ms after stimulus onset. In Chapter 4, I propose a novel formant ratio algorithm in which the third formant (F3) is the normalizing factor. The goal of formant ratio proposals is to provide an explicit algorithm that successfully "eliminates" speaker-dependent acoustic variation of auditory vowel tokens. Results from two MEG experiments suggest that auditory cortex is sensitive to formant ratios and that the perceptual system shows heightened sensitivity to tokens located in more densely populated regions of the vowel space. In Chapter 5, I report MEG results that suggest early auditory cortical processing is sensitive to violations of a phonological constraint on sound sequencing, suggesting that listeners make highly specific, knowledge-based predictions about rather abstract anticipated properties of the upcoming speech signal and violations of these predictions are evident in early cortical processing

Digital Repository at the University of Maryland

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

Analysis by synthesis of engine sounds for the design of dynamic auditory feedback of electric vehicles

Author: Aramaki Mitsuko
Denjean Sébastien
Dupré Théophile
Kronland-Martinet Richard
Publication venue: EDP Sciences
Publication date: 01/01/2023
Field of study

In traditional combustion engine vehicles, the sound of the engine plays an important role in enhancing the driver’s experience of the vehicle’s dynamics, and contributes to both comfort and safety. However, with the development of quieter electric vehicles, drivers no longer receive this important auditory feedback, and this can lead to a less satisfying acoustic environment in the vehicle cabin. To address this issue, sonification strategies have been developed for electric vehicles to provide similar auditory feedback to the driver, but feedback from users has suggested that the sounds produced by these strategies do not blend seamlessly with the other sounds in the vehicle cabin. This study focuses on identifying the key acoustic parameters that create a sense of cohesion between the synthetic sounds and the vehicle’s natural soundscape, based on the characteristics of traditional combustion engine vehicles. Through analyzing the time and frequency of the noises produced by combustion engine vehicles, the presence of micro-modulations in both frequency and amplitude was identified, as well as resonances caused by the transfer of sound between the engine and the cabin. These parameters were incorporated into a synthesis model for the sonification of electric vehicle dynamics, based on the Shepard-Risset illusion. A perceptual test was conducted, and the results showed that the inclusion of resonances in the synthesized sounds significantly enhanced their naturalness, while micro-modulations had no significant impact

Directory of Open Access Journals

Representation of speech in the primary auditory cortex and its implications for robust speech processing

Author: Mesgarani Nima
Publication venue
Publication date: 05/08/2008
Field of study

Speech has evolved as a primary form of communication between humans. This most used means of communication has been the subject of intense study for years, but there is still a lot that we do not know about it. It is an oft repeated fact, that even the performance of the best speech processing algorithms still lags far behind that of the average human, It seems inescapable that unless we know more about the way the brain performs this task, our machines can not go much further. This thesis focuses on the question of speech representation in the brain, both from a physiological and technological perspective. We explore the representation of speech through the encoding of its smallest elements - phonemic features - in the primary auditory cortex. We report on how population of neurons with diverse tuning properties respond discriminately to phonemes resulting in explicit encoding of their parameters. Next, we show that this sparse encoding of the phonemic features is a simple consequence of the linear spectro-temporal properties of the auditory cortical neurons and that a Spectro-Temporal receptive field model can predict similar patterns of activation. This is an important step toward the realization of systems that operate based on the same principles as the cortex. Using an inverse method of reconstruction, we shall also explore the extent to which phonemic features are preserved in the cortical representation of noisy speech. The results suggest that the cortical responses are more robust to noise and that the important features of phonemes are preserved in the cortical representation even in noise. Finally, we explain how a model of this cortical representation can be used for speech processing and enhancement applications to improve their robustness and performance

Digital Repository at the University of Maryland

Formant-frequency variation and informational masking of speech by extraneous formants:evidence against dynamic and speech-specific acoustical constraints

Author: Bailey Peter J.
Roberts Brian
Summers Robert J.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2014
Field of study

How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints

Crossref

PubMed Central

Aston Publications Explorer