958 research outputs found

    Specific Language Impairments and Possibilities of Classification and Detection from Children's Speech

    Get PDF
    Many young children have speech disorders. My research focused on one such disorder, known as specific language impairment or developmental dysphasia. A major problem in treating this disorder is the fact that specific language impairment is detected in children at a relatively late age. For successful speech therapy, early diagnosis is critical. I present two different approaches to this issue using a very simple test that I have devised for diagnosing this disorder. In this thesis, I describe a new method for detecting specific language impairment based on the number of pronunciation errors in utterances. An advantage of this method is its simplicity; anyone can use it, including parents. The second method is based on the acoustic features of the speech signal. An advantage of this method is that it could be used to develop an automatic detection system. KeyKatedra teorie obvod

    Vowel-initial glottalization as a prominence cue in speech perception and online processing

    Get PDF

    Disentangling the role of biphone probability from neighborhood density in the perception of nonwords

    Get PDF
    In six experiments we explored how biphone probability and lexical neighborhood density influence listeners' categorization of vowels embedded in nonword sequences. We found independent effects of each. Listeners shifted categorization of a phonetic continuum to create a higher probability sequence, even when neighborhood density was controlled. Similarly, listeners shifted categorization to create a nonword from a denser neighborhood, even when biphone probability was controlled. Next, using a visual world eye-tracking task, we determined that biphone probability information is used rapidly by listeners in perception. In contrast, task complexity and irrelevant variability in the stimuli interfere with neighborhood density effects. These results support a model in which both biphone probability and neighborhood density independently affect word recognition, but only biphone probability effects are observed early in processing

    Analytical Study of CV Type Bodo Words using Formant Frequency Measure

    Get PDF
    Words can be categorized into different types according to the position of occurrences of vowels and consonants in the word. Accordingly we have CV (Consonant-Vowel), VC (Vowel-Consonant), CVC (Consonant-Vowel-Consonant), CVCC (Consonant-Vowel-Consonant-Consonant), CVVC (Consonant-Vowel-Vowel-Consonant etc type of words in most of the languages. As a first step towards the recognition of any speech signal, it is very much important to study the different types of words using some of the available techniques. Some of the approach which produces reliable and good results are Formant Frequency measure, Mel-frequency cepstral coefficients (MFCC) etc.  In this paper, a step has been taken to measure the formant frequency of CV type Bodo words to identify the distinct features of it. Formant Frequency, based on Formant Tracking Model can be defined as the spectral peak of the sound spectrum |P(f)|.Keywords— MFCC, Formant Tracking Model, FFT, Formant Frequency, Resonance Frequenc

    A robust sound perception model suitable for neuromorphic implementation

    Get PDF
    Coath M, Sheik S, Chicca E, Indiveri G, Denham S, Wennekers T. A robust sound perception model suitable for neuromorphic implementation. Neuromorphic Engineering. 2014;7(278):1-10.We have recently demonstrated the emergence of dynamic feature sensitivity through exposure to formative stimuli in a real-time neuromorphic system implementing a hybrid analog/digital network of spiking neurons. This network, inspired by models of auditory processing in mammals, includes several mutually connected layers with distance-dependent transmission delays and learning in the form of spike timing dependent plasticity, which effects stimulus-driven changes in the network connectivity. Here we present results that demonstrate that the network is robust to a range of variations in the stimulus pattern, such as are found in naturalistic stimuli and neural responses. This robustness is a property critical to the development of realistic, electronic neuromorphic systems. We analyze the variability of the response of the network to “noisy” stimuli which allows us to characterize the acuity in information-theoretic terms. This provides an objective basis for the quantitative comparison of networks, their connectivity patterns, and learning strategies, which can inform future design decisions. We also show, using stimuli derived from speech samples, that the principles are robust to other challenges, such as variable presentation rate, that would have to be met by systems deployed in the real world. Finally we demonstrate the potential applicability of the approach to real sounds

    On the design of visual feedback for the rehabilitation of hearing-impaired speech

    Get PDF

    Features of hearing: applications of machine learning to uncover the building blocks of hearing

    Get PDF
    Recent advances in machine learning have instigated a renewed interest in using machine learning approaches to better understand human sensory processing. This line of research is particularly interesting for speech research since speech comprehension is uniquely human, which complicates obtaining detailed neural recordings. In this thesis, I explore how machine learning can be used to uncover new knowledge about the auditory system, with a focus on discovering robust auditory features. The resulting increased understanding of the noise robustness of human hearing may help to better assist those with hearing loss and improve Automatic Speech Recognition (ASR) systems. First, I show how computational neuroscience and machine learning can be combined to generate hypotheses about auditory features. I introduce a neural feature detection model with a modest number of parameters that is compatible with auditory physiology. By testing feature detector variants in a speech classification task, I confirm the importance of both well-studied and lesser-known auditory features. Second, I investigate whether ASR software is a good candidate model of the human auditory system. By comparing several state-of-the-art ASR systems to the results from humans on a range of psychometric experiments, I show that these ASR systems diverge markedly from humans in at least some psychometric tests. This implies that none of these systems act as a strong proxy for human speech recognition, although some may be useful when asking more narrowly defined questions. For neuroscientists, this thesis exemplifies how machine learning can be used to generate new hypotheses about human hearing, while also highlighting the caveats of investigating systems that may work fundamentally differently from the human brain. For machine learning engineers, I point to tangible directions for improving ASR systems. To motivate the continued cross-fertilization between these fields, a toolbox that allows researchers to assess new ASR systems has been released.Open Acces

    Speech Communication

    Get PDF
    Contains reports on five research projects.National Institutes of Health (Grant 5 RO1 NS04332-12)National Institutes of Health (Grant HD05168-04)U.S. Navy Office of Naval Research (Contract N00014-67-A-0204-0069)Joint Services Electronics Program (Contract DAAB07-74-C-0630)National Science Foundation (Grant SOC74-22167

    Was That a Bag or a Bug? Perceptual Measures, Euclidean Distance, Mahalanobis Distance, and Pillai Scores in the Assessment of L2 Pronunciation

    Full text link
    Màster de Lingüística Aplicada i Adquisició de Llengües en Contextos Multilingües, Departament de Llengües i Literatures Modernes i d'Estudis Anglesos, Universitat de Barcelona. Curs: 2020-2021. Tutor: Joan C. Mora[eng] Researchers employ a variety of techniques to measure accuracy of second-language pronunciation. Little research has been done on certain measures that have been used more in recent studies, such as Mahalanobis distance and Pillai scores, and how they compare to perceptual measures. Using pre- and post-test recordings of 23 Spanish/ Catalan learners of English that were obtained using a delayed word repetition task in a previous, high-variability phonetic training study on the English phonemes /æ/ and /ᴧ/, this thesis examines the relationship between native-speaking judges’ word identification and goodness ratings, Euclidean distances, Mahalanobis distances, and Pillai scores in their evaluation of pronunciation accuracy and improvement between test times. For each acoustic metric, measures between native- and non-native speakers’ productions are taken as well as measures between non-native speakers’ realizations of /æ/ and /ᴧ/. An experimental way of computing perceptual ratings for items that are incorrectly identified by raters is also investigated and compared to existing measures
    corecore