Search CORE

720 research outputs found

Resilience of English vowel perception across regional accent variation

Author: Best Catherine
Docherty Gerry
Evans Bronwen G
Foulkes Paul
Hay Jennifer
Mulak Karen E
Shaw Jason A
Publication venue
Publication date: 01/01/2018
Field of study

In two categorization experiments using phonotactically legal nonce words, we tested Australian English listeners’ perception of all vowels in their own accent as well as in four less familiar regional varieties of English which differ in how their vowel realizations diverge from Australian English: London, Yorkshire, Newcastle (UK), and New Zealand. Results of Experiment 1 indicated that amongst the vowel differences described in sociophonetic studies and attested in our stimulus materials, only a small subset caused greater perceptual difficulty for Australian listeners than for the corresponding Australian English vowels. We discuss this perceptual tolerance for vowel variation in terms of how perceptual assimilation of phonetic details into abstract vowel categories may contribute to recognizing words across variable pronunciations. Experiment 2 determined whether short-term multi-talker exposure would facilitate accent adaptation, particularly for those vowels that proved more difficult to categorize in Experiment 1. For each accent separately, participants listened to a pre-test passage in the nonce word accent but told by novel talkers before completing the same task as in Experiment 1. In contrast to previous studies showing rapid adaptation to talker-specific variation, our listeners’ subsequent vowel assimilations were largely unaffected by exposure to other talkers’ accent-specific variation

Directory of Open Access Journals

UCL Discovery

Western Sydney ResearchDirect

White Rose Research Online

Interaction of Working Memory, Compressor Speed and Background Noise Characteristics

Author: MacDonald Ewen
Ohlenforst Barbara
Souza Pamela
Publication venue
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

Absolute Pitch: Effects of Timbre on Note-Naming Ability

Author: Schellenberg E. Glenn
Vanzella Patrícia
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Absolute pitch (AP) is the ability to identify or produce isolated musical tones. It is evident primarily among individuals who started music lessons in early childhood. Because AP requires memory for specific pitches as well as learned associations with verbal labels (i.e., note names), it represents a unique opportunity to study interactions in memory between linguistic and nonlinguistic information. One untested hypothesis is that the pitch of voices may be difficult for AP possessors to identify. A musician’s first instrument may also affect performance and extend the sensitive period for acquiring accurate AP. Methods/Principal Findings: A large sample of AP possessors was recruited on-line. Participants were required to identity test tones presented in four different timbres: piano, pure tone, natural (sung) voice, and synthesized voice. Note-naming accuracy was better for non-vocal (piano and pure tones) than for vocal (natural and synthesized voices) test tones. This difference could not be attributed solely to vibrato (pitch variation), which was more pronounced in the natural voice than in the synthesized voice. Although starting music lessons by age 7 was associated with enhanced note-naming accuracy, equivalent abilities were evident among listeners who started music lessons on piano at a later age. Conclusions/Significance: Because the human voice is inextricably linked to language and meaning, it may be processed automatically by voice-specific mechanisms that interfere with note naming among AP possessors. Lessons on piano o

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Features of hearing: applications of machine learning to uncover the building blocks of hearing

Author: Weerts Lotte
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/11/2021
Field of study

Recent advances in machine learning have instigated a renewed interest in using machine learning approaches to better understand human sensory processing. This line of research is particularly interesting for speech research since speech comprehension is uniquely human, which complicates obtaining detailed neural recordings. In this thesis, I explore how machine learning can be used to uncover new knowledge about the auditory system, with a focus on discovering robust auditory features. The resulting increased understanding of the noise robustness of human hearing may help to better assist those with hearing loss and improve Automatic Speech Recognition (ASR) systems. First, I show how computational neuroscience and machine learning can be combined to generate hypotheses about auditory features. I introduce a neural feature detection model with a modest number of parameters that is compatible with auditory physiology. By testing feature detector variants in a speech classification task, I confirm the importance of both well-studied and lesser-known auditory features. Second, I investigate whether ASR software is a good candidate model of the human auditory system. By comparing several state-of-the-art ASR systems to the results from humans on a range of psychometric experiments, I show that these ASR systems diverge markedly from humans in at least some psychometric tests. This implies that none of these systems act as a strong proxy for human speech recognition, although some may be useful when asking more narrowly defined questions. For neuroscientists, this thesis exemplifies how machine learning can be used to generate new hypotheses about human hearing, while also highlighting the caveats of investigating systems that may work fundamentally differently from the human brain. For machine learning engineers, I point to tangible directions for improving ASR systems. To motivate the continued cross-fertilization between these fields, a toolbox that allows researchers to assess new ASR systems has been released.Open Acces

Spiral - Imperial College Digital Repository

Abstracts and analysis of recent research in speech-hearing testing.

Author: Menyuk Paula
Publication venue: Boston University
Publication date: 01/01/1955
Field of study

Thesis (Ed.M.)--Boston Universit

Boston University Institutional Repository (OpenBU)

Impact of a directional microphone on speech recognition in noise in a BICROS hearing aid

Author: Hopkins Williams Travis B
Publication venue: Digital Commons@Becker
Publication date: 01/01/2013
Field of study

A double blinded investigation of the performance of directional microphone on the receiver side of a BICROS hearing aid

Digital Commons@Becker

Echoes of echoes? An episodic theory of lexical access.

Author: Stephen D. Goldinger
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2002
Field of study

Crossref

Differences in the semantic structure of the speech experienced by late talkers, late bloomers, and typical talkers

Author: Hills Thomas T.
Jiménez Eva
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2022
Field of study

The present study investigates the relation between language environment and language delay in 63 British-English speaking children (19 typical talkers (TT), 22 late talkers (LT), and 22 late bloomers (LB) aged 13 to 18 months. Families audio recorded daily routines and marked the new words their child produced over a period of 6 months. To investigate how language environments differed between talker types and how environments corresponded with children’s developing lexicons, we evaluated contextual diversity—a word property that measures semantic richness—and network properties of language environments in tandem with developing vocabularies. The language environment experienced by the three talker types differed in their structural properties, with LT environments being least contextually diverse and least well-connected in relation to network properties. Notably, LBs’ language environments were more like those of TTs. Network properties of language environments also correlate with the rate of vocabulary growth over the study period. By comparing differences between language environments and lexical network development, we also observe results consistent with contributions to lexical development from different learning strategies for expressive vocabularies and different environments for receptive vocabularies. We discuss the potential consequences that structural differences in parental speech might have on language development and the contribution of this work to the debate on quantity versus quality

Warwick Research Archives Portal Repository

Single-Microphone Speech Enhancement and Separation Using Deep Learning

Author: Kolbæk Morten
Publication venue
Publication date: 01/01/2018
Field of study

The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of the generalizability capability of modern deep learning-based single-microphone speech enhancement algorithms. We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data. Furthermore, we propose uPIT, a deep learning-based algorithm for single-microphone speech separation and we report state-of-the-art results on a speaker-independent multi-talker speech separation task. Additionally, we show that uPIT works well for joint speech separation and enhancement without explicit prior knowledge about the noise type or number of speakers. Finally, we show that deep learning-based speech enhancement algorithms designed to minimize the classical short-time spectral amplitude mean squared error leads to enhanced speech signals which are essentially optimal in terms of STOI, a state-of-the-art speech intelligibility estimator.Comment: PhD Thesis. 233 page

arXiv.org e-Print Archive

VBN