720 research outputs found

    Resilience of English vowel perception across regional accent variation

    Get PDF
    In two categorization experiments using phonotactically legal nonce words, we tested Australian English listeners’ perception of all vowels in their own accent as well as in four less familiar regional varieties of English which differ in how their vowel realizations diverge from Australian English: London, Yorkshire, Newcastle (UK), and New Zealand. Results of Experiment 1 indicated that amongst the vowel differences described in sociophonetic studies and attested in our stimulus materials, only a small subset caused greater perceptual difficulty for Australian listeners than for the corresponding Australian English vowels. We discuss this perceptual tolerance for vowel variation in terms of how perceptual assimilation of phonetic details into abstract vowel categories may contribute to recognizing words across variable pronunciations. Experiment 2 determined whether short-term multi-talker exposure would facilitate accent adaptation, particularly for those vowels that proved more difficult to categorize in Experiment 1. For each accent separately, participants listened to a pre-test passage in the nonce word accent but told by novel talkers before completing the same task as in Experiment 1. In contrast to previous studies showing rapid adaptation to talker-specific variation, our listeners’ subsequent vowel assimilations were largely unaffected by exposure to other talkers’ accent-specific variation

    Absolute Pitch: Effects of Timbre on Note-Naming Ability

    Get PDF
    Background: Absolute pitch (AP) is the ability to identify or produce isolated musical tones. It is evident primarily among individuals who started music lessons in early childhood. Because AP requires memory for specific pitches as well as learned associations with verbal labels (i.e., note names), it represents a unique opportunity to study interactions in memory between linguistic and nonlinguistic information. One untested hypothesis is that the pitch of voices may be difficult for AP possessors to identify. A musician’s first instrument may also affect performance and extend the sensitive period for acquiring accurate AP. Methods/Principal Findings: A large sample of AP possessors was recruited on-line. Participants were required to identity test tones presented in four different timbres: piano, pure tone, natural (sung) voice, and synthesized voice. Note-naming accuracy was better for non-vocal (piano and pure tones) than for vocal (natural and synthesized voices) test tones. This difference could not be attributed solely to vibrato (pitch variation), which was more pronounced in the natural voice than in the synthesized voice. Although starting music lessons by age 7 was associated with enhanced note-naming accuracy, equivalent abilities were evident among listeners who started music lessons on piano at a later age. Conclusions/Significance: Because the human voice is inextricably linked to language and meaning, it may be processed automatically by voice-specific mechanisms that interfere with note naming among AP possessors. Lessons on piano o

    Features of hearing: applications of machine learning to uncover the building blocks of hearing

    Get PDF
    Recent advances in machine learning have instigated a renewed interest in using machine learning approaches to better understand human sensory processing. This line of research is particularly interesting for speech research since speech comprehension is uniquely human, which complicates obtaining detailed neural recordings. In this thesis, I explore how machine learning can be used to uncover new knowledge about the auditory system, with a focus on discovering robust auditory features. The resulting increased understanding of the noise robustness of human hearing may help to better assist those with hearing loss and improve Automatic Speech Recognition (ASR) systems. First, I show how computational neuroscience and machine learning can be combined to generate hypotheses about auditory features. I introduce a neural feature detection model with a modest number of parameters that is compatible with auditory physiology. By testing feature detector variants in a speech classification task, I confirm the importance of both well-studied and lesser-known auditory features. Second, I investigate whether ASR software is a good candidate model of the human auditory system. By comparing several state-of-the-art ASR systems to the results from humans on a range of psychometric experiments, I show that these ASR systems diverge markedly from humans in at least some psychometric tests. This implies that none of these systems act as a strong proxy for human speech recognition, although some may be useful when asking more narrowly defined questions. For neuroscientists, this thesis exemplifies how machine learning can be used to generate new hypotheses about human hearing, while also highlighting the caveats of investigating systems that may work fundamentally differently from the human brain. For machine learning engineers, I point to tangible directions for improving ASR systems. To motivate the continued cross-fertilization between these fields, a toolbox that allows researchers to assess new ASR systems has been released.Open Acces

    Abstracts and analysis of recent research in speech-hearing testing.

    Full text link
    Thesis (Ed.M.)--Boston Universit

    Impact of a directional microphone on speech recognition in noise in a BICROS hearing aid

    Get PDF
    A double blinded investigation of the performance of directional microphone on the receiver side of a BICROS hearing aid

    Echoes of echoes? An episodic theory of lexical access.

    Get PDF

    Differences in the semantic structure of the speech experienced by late talkers, late bloomers, and typical talkers

    Get PDF
    The present study investigates the relation between language environment and language delay in 63 British-English speaking children (19 typical talkers (TT), 22 late talkers (LT), and 22 late bloomers (LB) aged 13 to 18 months. Families audio recorded daily routines and marked the new words their child produced over a period of 6 months. To investigate how language environments differed between talker types and how environments corresponded with children’s developing lexicons, we evaluated contextual diversity—a word property that measures semantic richness—and network properties of language environments in tandem with developing vocabularies. The language environment experienced by the three talker types differed in their structural properties, with LT environments being least contextually diverse and least well-connected in relation to network properties. Notably, LBs’ language environments were more like those of TTs. Network properties of language environments also correlate with the rate of vocabulary growth over the study period. By comparing differences between language environments and lexical network development, we also observe results consistent with contributions to lexical development from different learning strategies for expressive vocabularies and different environments for receptive vocabularies. We discuss the potential consequences that structural differences in parental speech might have on language development and the contribution of this work to the debate on quantity versus quality

    Single-Microphone Speech Enhancement and Separation Using Deep Learning

    Get PDF
    The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of the generalizability capability of modern deep learning-based single-microphone speech enhancement algorithms. We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data. Furthermore, we propose uPIT, a deep learning-based algorithm for single-microphone speech separation and we report state-of-the-art results on a speaker-independent multi-talker speech separation task. Additionally, we show that uPIT works well for joint speech separation and enhancement without explicit prior knowledge about the noise type or number of speakers. Finally, we show that deep learning-based speech enhancement algorithms designed to minimize the classical short-time spectral amplitude mean squared error leads to enhanced speech signals which are essentially optimal in terms of STOI, a state-of-the-art speech intelligibility estimator.Comment: PhD Thesis. 233 page
    • …
    corecore