687 research outputs found

    Cortical auditory processing of informational masking effects by target-masker similarity and stimulus uncertainty

    Get PDF
    Purpose: Understanding speech in a background of other people talking is one of the most difficult listening challenges for hearing-impaired individuals, and even for those with normal hearing. Speech-on-speech masking, is known to contribute to increased perceptual difficulty over non-speech background noise because of informational masking provided over and above the energetic masking effect. While informational masking research has identified factors of similarity and uncertainty between target and masker that contribute to reduced behavioral performance in speech background noise, critical gaps in knowledge including the underlying neural-perceptual processes remain. By systematically manipulating aspects of similarity and uncertainty in the same auditory paradigm, the current study proposed to examine the time course and objectively quantify these informational masking effects at both early and late stages of auditory processing using auditory evoked potentials (AEPs) in a two-factor repeated measures paradigm. Method: Thirty participants were included in this cross sectional repeated measures design. Target-masker similarity between target and masker were manipulated by varying the linguistic/phonetic similarity (i.e. language) of the talkers in the noise maskers. Specifically, four levels representing hypothesized increasing levels of informational masking were implemented: (1) No masker (quiet), (2) Mandarin (linguistically and phonetically dissimilar), (3) Dutch (linguistically dissimilar, but phonetically similar), and (4) English (linguistically and phonetically similar). Stimulus uncertainty was manipulated by task complexity, specifically target-to-target interval (TTI) of an auditory paradigm. Participants had to discriminate between English word stimuli (/bæt/ and /pæt/) presented in an oddball paradigm in each masker condition at +3 dB SNR by pressing buttons to either the target or standard stimulus (pseudo-randomized between /bæt/ and /pæt/ for all participants). Responses were recorded simultaneously for P1-N1-P2 (standard waveform) and P3 (target waveform). This design allowed for simultaneous recording of multiple AEP peaks, including analysis of amplitude, area, and latency characteristics, as well as accuracy, reaction time, and d’ behavioral discrimination to button press responses. Finally, AEP measurers were compared to performance on a behavioral word recognition task (NU-6 25-word lists) in the proposed language maskers and at multiple signal-to-noise ratios (SNRs) to further explore if AEP components of amplitude/area and latency are correlated to behavioral outcomes across proposed maskers. Results: Several trends in AEP and behavioral outcomes were consistent with the hypothesized hierarchy of increasing linguistic/phonetic similarity from Mandarin to Dutch to English, but not all differences were significant. The most supported findings for this factor were that all babble maskers significantly affected outcomes compared to quiet, and that the native language English masker had the largest effect on outcomes in the AEP paradigm, including N1 amplitude, P3 amplitude and area, as well as decreased reaction time, accuracy, and d’ behavioral discrimination to target word responses. AEP outcomes for the Mandarin and Dutch maskers, however, were not significantly different across all measured components. Outcomes for AEP latencies for both N1 and P3 also supported an effect of stimulus uncertainty, consistent with a hypothesized increase in processing time related to increased task complexity when target stimulus timing was randomized. In addition, this effect was stronger, as evidenced by larger effect sizes, at the P3 level of auditory processing compared to the N1. An unanticipated result was the absence of the expected additive effect between linguistic/phonetic similarity and stimulus uncertainty. Finally, trends in behavioral word recognition performance were generally consistent with those observed for AEP component measures such that no differences between Dutch and Mandarin maskers were found, but the English masker yielded the lowest percent correct scores. Furthermore, correlations between behavioral word recognition and AEP component measures yielded some moderate correlations, but no common AEP components accounted for a majority of variance for behavioral word recognition. Conclusions: The results of this study add to our understanding of auditory perception in informational masking in four ways. First, observable effects of both similarity and uncertainty were evidenced at both early and late levels of auditory cortical processing. This supports the use of AEPs to better understand the informational masking deficit by providing a window into the auditory pathway. Second, stronger effects were found for P3 response, an active, top-down level of auditory processing providing some suggestion that while informational masking degradation happens at lower levels, higher level active auditory processing is more sensitive to informational masking deficits. Third, the lack of interaction of main effects leads us to a linear interpretation of the interaction of similarity and uncertainty with an equal effect across listening conditions. Fourth, even though there were few and only moderate correlations to behavioral word recognition, AEP and behavioral performance data followed the same trends as AEP measures across similarity. Through both auditory neural and behavioral testing, language maskers degraded AEPs and reduced word recognition, but particularly using a native-language masker. The behavioral and objective results from this study provide a foundation for further investigation of how the linguistic content of target and masker and task difficulty contribute to difficulty understanding speech in noise

    Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

    Get PDF
    As an essential approach to understanding human interactions, emotion classification is a vital component of behavioral studies as well as being important in the design of context-aware systems. Recent studies have shown that speech contains rich information about emotion, and numerous speech-based emotion classification methods have been proposed. However, the classification performance is still short of what is desired for the algorithms to be used in real systems. We present an emotion classification system using several one-against-all support vector machines with a thresholding fusion mechanism to combine the individual outputs, which provides the functionality to effectively increase the emotion classification accuracy at the expense of rejecting some samples as unclassified. Results show that the proposed system outperforms three state-of-the-art methods and that the thresholding fusion mechanism can effectively improve the emotion classification, which is important for applications that require very high accuracy but do not require that all samples be classified. We evaluate the system performance for several challenging scenarios including speaker-independent tests, tests on noisy speech signals, and tests using non-professional acted recordings, in order to demonstrate the performance of the system and the effectiveness of the thresholding fusion mechanism in real scenarios.Peer ReviewedPreprin

    Automatic vocal recognition of a child's perceived emotional state within the Speechome corpus

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 137-149).With over 230,000 hours of audio/video recordings of a child growing up in the home setting from birth to the age of three, the Human Speechome Project has pioneered a comprehensive, ecologically valid observational dataset that introduces far-reaching new possibilities for the study of child development. By offering In vivo observation of a child's daily life experience at ultra-dense, longitudinal time scales, the Speechome corpus holds great potential for discovering developmental insights that have thus far eluded observation. The work of this thesis aspires to enable the use of the Speechome corpus for empirical study of emotional factors in early child development. To fully harness the benefits of Speechome for this purpose, an automated mechanism must be created to perceive the child's emotional state within this medium. Due to the latent nature of emotion, we sought objective, directly measurable correlates of the child's perceived emotional state within the Speechome corpus, focusing exclusively on acoustic features of the child's vocalizations and surrounding caretaker speech. Using Partial Least Squares regression, we applied these features to build a model that simulates human perceptual heuristics for determining a child's emotional state. We evaluated the perceptual accuracy of models built across child-only, adult-only, and combined feature sets within the overall sampled dataset, as well as controlling for social situations, vocalization behaviors (e.g. crying, laughing, babble), individual caretakers, and developmental age between 9 and 24 months. Child and combined models consistently demonstrated high perceptual accuracy, with overall adjusted R-squared values of 0.54 and 0.58, respectively, and an average of 0.59 and 0.67 per month. Comparative analysis across longitudinal and socio-behavioral contexts yielded several notable developmental and dyadic insights. In the process, we have developed a data mining and analysis methodology for modeling perceived child emotion and quantifying caretaker intersubjectivity that we hope to extend to future datasets across multiple children, as new deployments of the Speechome recording technology are established. Such large-scale comparative studies promise an unprecedented view into the nature of emotional processes in early childhood and potentially enlightening discoveries about autism and other developmental disorders.by Sophia Yuditskaya.S.M

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Full text link
    The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

    Audio diarization for LENA data and its application to computing language behavior statistics for individuals with autism

    Get PDF
    The objective of this dissertation is to develop diarization algorithms for LENA data and study its application to compute language behavior statistics for individuals with autism. LENA device is one of the most commonly used devices to collect audio data in autism and language development studies. LENA child and adult detector algorithms were evaluated for two different datasets: i) older children dataset consisting of children already diagnosed with autism spectrum disor- der and ii) infants dataset consisting of infants at risk for autism. I-vector based diarization algorithms were developed for the two datasets to tackle two scenarios: a) some amount of labeled data is present for every speaker present in the audio recording and b) no labeled data is present for the audio recording to be diarized. Further, i-vector based diarization methods were applied to compute objective measures of assessment. These objective measures of assessment were analyzed to show they can reveal some aspects of autism severity. Also, a method to extract a 5 minute high child vocalization audio window from a 16 hour day long recording was developed, which was then used to compute canonical babble statistics using human annotation.Ph.D

    Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

    Get PDF

    Characterisation of disordered auditory processing in adults who present to audiology with hearing difficulties in presence of normal hearing thresholds: Correlation between auditory tests and symptoms

    Get PDF
    The diagnosis of auditory processing disorder (APD) remains controversial. Quantifying symptoms in individuals with APD by using validated questionnaires may help better understand the disorder and inform appropriate diagnostic evaluation. Aims: This study was aimed at characterising the symptoms in APD and correlating them with the results of auditory processing (AP) tests. Methods: Phase 1: Normative data of a speech-in-babble test, to be used as part of the APD test battery, were collected for 69 normal volunteers aged 20–57 years. Phase 2: Sixty adult subjects with hearing difficulties and normal audiogram and 38 healthy age-matched controls completed three validated questionnaires (Amsterdam Inventory for Auditory Disability; Speech, Spatial and Qualities of Hearing Scale; hyperacusis questionnaire) and underwent AP tests, including dichotic digits, frequency and duration pattern, gaps-in-noise, speech-in-babble and suppression of otoacoustic emissions by contralateral noise. The subjects were categorised into the clinical APD group or clinical non- APD group depending on whether they met the criterion of two failed tests. The questionnaire scores in the three groups were compared. Phase 3: The questionnaire scores were correlated with the APD test results in 58/60 clinical subjects and 38 of the normal subjects. Results: Phase 1: Normative data for the speech-in-babble test afforded an upper cut-off mean value of 4.4 dB for both ears Phase 2: Adults with APD presented with hearing difficulties in quiet and noise; difficulties in localising, recognising and detecting sounds and hyperacusis with significantly poorer scores compared to clinical non- APD subjects and normal controls. Phase 3: Weak to moderate correlations were noted among the scores of the three questionnaires and the APD tests. Correlations were the strongest for the gaps-in-noise, speech-in-babble, dichotic digit tests with all three questionnaires. Conclusions: The three validated questionnaires may help identify adults with normal hearing who need referral for APD assessment

    Learning and Retention of Novel Words in Musicians and Non-Musicians: The Impact of Enriched Auditory Experience on Behavioral Performance and Electrophysiologic Measures

    Get PDF
    abstract: Music training is associated with measurable physiologic changes in the auditory pathway. Benefits of music training have also been demonstrated in the areas of working memory, auditory attention, and speech perception in noise. The purpose of this study was to determine whether long-term auditory experience secondary to music training enhances the ability to detect, learn, and recall new words. Participants consisted of 20 young adult musicians and 20 age-matched non-musicians. In addition to completing word recognition and non-word detection tasks, each participant learned 10 nonsense words in a rapid word-learning task. All tasks were completed in quiet and in multi-talker babble. Next-day retention of the learned words was examined in isolation and in context. Cortical auditory evoked responses to vowel stimuli were recorded to obtain latencies and amplitudes for the N1, P2, and P3a components. Performance was compared across groups and listening conditions. Correlations between the behavioral tasks and the cortical auditory evoked responses were also examined. No differences were found between groups (musicians vs. non-musicians) on any of the behavioral tasks. Nor did the groups differ in cortical auditory evoked response latencies or amplitudes, with the exception of P2 latencies, which were significantly longer in musicians than in non-musicians. Performance was significantly poorer in babble than in quiet on word recognition and non-word detection, but not on word learning, learned-word retention, or learned-word detection. CAEP latencies collapsed across group were significantly longer and amplitudes were significantly smaller in babble than in quiet. P2 latencies in quiet were positively correlated with word recognition in quiet, while P3a latencies in babble were positively correlated with word recognition and learned-word detection in babble. No other significant correlations were observed between CAEPs and performance on behavioral tasks. These results indicated that, for young normal-hearing adults, auditory experience resulting from long-term music training did not provide an advantage for learning new information in either favorable (quiet) or unfavorable (babble) listening conditions. Results of the present study suggest that the relationship between music training and the strength of cortical auditory evoked responses may be more complex or too weak to be observed in this population.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Voice and speech perception in autism : a systematic review

    Get PDF
    Autism spectrum disorders (ASD) are characterized by persistent impairments in social communication and interaction, restricted and repetitive behavior. In the original description of autism by Kanner (1943) the presence of emotional impairments was already emphasized (self-absorbed, emotionally cold, distanced, and retracted). However, little research has been conducted focusing on auditory perception of vocal emotional cues, being the audio-visual comprehension most commonly explored instead. Similarly to faces, voices play an important role in social interaction contexts in which individuals with ASD show impairments. The aim of the current systematic review was to integrate evidence from behavioral and neurobiological studies for a more comprehensive understanding of voice processing abnormalities in ASD. Among different types of information that the human voice may provide, we hypothesize particular deficits with vocal affect information processing by individuals with ASD. The relationship between vocal stimuli impairments and disrupted Theory of Mind in Autism is discussed. Moreover, because ASD are characterized by deficits in social reciprocity, further discussion of the abnormal oxytocin system in individuals with ASD is performed as a possible biological marker for abnormal vocal affect information processing and social interaction skills in ASD population
    corecore