431 research outputs found

    OBJECTIVE AND SUBJECTIVE EVALUATION OF DEREVERBERATION ALGORITHMS

    Get PDF
    Reverberation significantly impacts the quality and intelligibility of speech. Several dereverberation algorithms have been proposed in the literature to combat this problem. A majority of these algorithms utilize a single channel and are developed for monaural applications, and as such do not preserve the cues necessary for sound localization. This thesis describes a blind two-channel dereverberation technique that improves the quality of speech corrupted by reverberation while preserving cues that affect localization. The method is based by combining a short term (2ms) and long term (20ms) weighting function of the linear prediction (LP) residual of the input signal. The developed and other dereverberation algorithms are evaluated objectively and subjectively in terms of sound quality and localization accuracy. The binaural adaptation provides a significant increase in sound quality while removing the loss in localization ability found in the bilateral implementation

    Predicting Speech Recognition using the Speech Intelligibility Index (SII) for Cochlear Implant Users and Listeners with Normal Hearing

    Get PDF
    Although the AzBio test is well validated, has effective standardization data available, and is highly recommended for Cochlear Implant (CI) evaluation, no attempt has been made to derive a Frequency Importance Function (FIF) for its stimuli. In the first phase of this dissertation, we derived FIFs for the AzBio sentence lists using listeners with normal hearing. Traditional procedures described in studies by Studebaker and Sherbecoe (1991) were applied for this purpose. Fifteen participants with normal hearing listened to a large number of AzBio sentences that were high- and low-pass filtered under speech-spectrum shaped noise at various signal-to-noise ratios. Frequency weights for the AzBio sentences were greatest in the 1.5 to 2 kHz frequency regions as is the case with other speech materials. A cross-procedure comparison was conducted between the traditional procedure (Studebaker and Sherbecoe, 1991) and the nonlinear optimization procedure (Kates, 2013). Consecutive data analyses provided speech recognition scores for the AzBio sentences in relation to the Speech Intelligibility Index (SII). Our findings provided empirically derived FIFs for the AzBio test that can be used for future studies. It is anticipated that the accuracy of predicting SIIs for CI patients will be improved when using these derived FIFs for the AzBio test. In the second study, the SIIfor CIrecipients was calculated to investigate whether the SII is an effective tool for predicting speech perception performance in a CI population. A total of fifteen CI adults participated. The FIFs obtained from the first study were used to compute the SII in these CI listeners. The obtained SIIs were compared with predicted SIIs using a transfer function curve derived from the first study. Due to the considerably poor hearing and large individual variability in performance in the CI population, the SII failed to predict speech perception performance for this CI group. Other predictive factors that have been associated with speech perception performance were also examined using a multiple regression analysis. Gap detection thresholds and duration of deafness were found to be significant predictive factors. These predictor factors and SIIs are discussed in relation to speech perception performance in CI users

    Effect of Prolonged Non-Traumatic Noise Exposure on Unvoiced Speech Recognition

    Get PDF
    Animal models in the past decade have shown that noise exposure may affect temporal envelope processing at supra-threshold levels while the absolute hearing threshold remains in the normal range. However, human studies have failed to consistently find such issue due to poor control of the participants’ noise exposure history and the measure sensitivity. The current study operationally defined non-traumatic noise exposure (NTNE) to be noise exposure at dental schools because of its distinctive high-pass spectral feature, non-traumatic nature, and systematic exposure schedule across dental students of different years. Temporal envelope processing was examined through unvoiced speech recognition interrupted by noise or by silence. The results showed that people who had systematic exposure to dental noise performed more poorly on tasks of temporal envelope processing than the exposed people. The effect of high-frequency NTNE on temporal envelope processing was more robust inside than outside the spectral band of dental noise and was more obvious in conditions that required finer temporal resolution (e.g faster noise modulation rate) than in those requiring less fine temporal resolution (e.g. slower noise modulation rate). Furthermore, there was a significant performance difference between the exposed and the unexposed groups on tasks of spectral envelope processing at low frequency. Meanwhile, the two groups performed similarly in tasks near threshold. Additional analyses showed that factors such as age, years of musical training, non-dental noise exposure history and peripheral auditory function were not able to explain the variance of the performance in tasks of temporal or spectral envelope processing. The findings from the current study support the general assumptions from animal models of NTNE that temporal and spectral envelope processing issues related to NTNE likely occur in retro-cochlear sites, at supra-threshold levels, and could be easily overlooked by clinically routine audiologic screening

    Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications

    Full text link
    This paper presents a comparative study of four different ap-proaches to automatic age and gender classification using seven classes on a telephony speech task and also compares the results with Human performance on the same data. The automatic approaches compared are based on (1) a parallel phone recognizer, derived from an automatic language identification system; (2) a system using dy-namic Bayesian networks to combine several prosodic features; (3) a system based solely on linear prediction analysis; and (4) Gaus-sian mixture models based on MFCCs for separate recognition of age and gender. On average, the parallel phone recognizer performs as well as Human listeners do, while loosing performance on short utterances. The system based on prosodic features however shows very little dependence on the length of the utterance. Index Terms — speech processing, acoustic signal analysis, speaker classification, age, gender 1

    DEVELOPMENT AND EVALUATION OF ENVELOPE, SPECTRAL AND TIME ENHANCEMENT ALGORITHMS FOR AUDITORY NEUROPATHY

    Get PDF
    Auditory neuropathy (AN) is a hearing disorder that reduces the ability to detect temporal cues in speech, thus leading to deprived speech perception. Traditional amplification and frequency shifting techniques used in modern hearing aids are not suitable to assist individuals with AN due to the unique symptoms that result from the disorder. This study proposes a method for combining both speech envelope enhancement and time scaling to combine the proven benefits of each algorithm. In addition, spectral enhancement is cascaded with envelope and time enhancement to address the poor frequency discrimination in AN. The proposed speech enhancement strategy was evaluated using an AN simulator with normal hearing listeners under varying degrees of AN severity. The results showed a significant increase in word recognition scores for time scaling and envelope enhancement over envelope enhancement alone. Furthermore, the addition of spectral enhancement resulted in further increase in word recognition at profound AN severity

    Sleep Breath

    Get PDF
    PurposeDiagnosis of obstructive sleep apnea by the gold-standard of polysomnography (PSG), or by home sleep testing (HST), requires numerous physical connections to the patient which may restrict use of these tools for early screening. We hypothesized that normal and disturbed breathing may be detected by a consumer smartphone without physical connections to the patient using novel algorithms to analyze ambient sound.MethodsWe studied 91 patients undergoing clinically indicated PSG. Phase I: In a derivation cohort (n = 32), we placed an unmodified Samsung Galaxy S5 without external microphone near the bed to record ambient sounds. We analyzed 12,352 discrete breath/non-breath sounds (386/patient), from which we developed algorithms to remove noise, and detect breaths as envelopes of spectral peaks. Phase II: In a distinct validation cohort (n = 59), we tested the ability of acoustic algorithms to detect AHI 15 on PSG.ResultsSmartphone-recorded sound analyses detected the presence, absence, and types of breath sound. Phase I: In the derivation cohort, spectral analysis identified breaths and apneas with a c-statistic of 0.91, and loud obstruction sounds with c-statistic of 0.95 on receiver operating characteristic analyses, relative to adjudicated events. Phase II: In the validation cohort, automated acoustic analysis provided a c-statistic of 0.87 compared to whole-night PSG.ConclusionsAmbient sounds recorded from a smartphone during sleep can identify apnea and abnormal breathing verified on PSG. Future studies should determine if this approach may facilitate early screening of SDB to identify at-risk patients for definitive diagnosis and therapy.Clinical trialsNCT03288376; clinicaltrials.orgR43 DP006418/DP/NCCDPHP CDC HHS/United States2019-05-24T00:00:00Z30022325PMC65341346307vault:3223

    Analysis and Detection of Pathological Voice using Glottal Source Features

    Full text link
    Automatic detection of voice pathology enables objective assessment and earlier intervention for the diagnosis. This study provides a systematic analysis of glottal source features and investigates their effectiveness in voice pathology detection. Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method, using approximate glottal source signals computed with the zero frequency filtering (ZFF) method, and using acoustic voice signals directly. In addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF to effectively capture the variations in glottal source spectra of pathological voice. Experiments were carried out using two databases, the Hospital Universitario Principe de Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice. Pathology detection experiments were carried out using support vector machine (SVM). From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features. The best detection performance was achieved when the glottal source features were combined with the conventional MFCCs and PLP features, which indicates the complementary nature of the features

    Energy flows in gesture-speech physics: The respiratory-vocal system and its coupling with hand gestures

    No full text
    Expressive moments in communicative hand gestures often align with emphatic stress in speech. It has recently been found that acoustic markers of emphatic stress arise naturally during steady-state phonation when upper-limb movements impart physical impulses on the body, most likely affecting acoustics via respiratory activity. In this confirmatory study, participants (N = 29) repeatedly uttered consonant-vowel (/pa/) mono-syllables while moving in particular phase relations with speech, or not moving the upper limbs. This study shows that respiration-related activity is affected by (especially high-impulse) gesturing when vocalizations occur near peaks in physical impulse. This study further shows that gesture-induced moments of bodily impulses increase the amplitude envelope of speech, while not similarly affecting the Fundamental Frequency (F0). Finally, tight relations between respiration-related activity and vocalization were observed, even in the absence of movement, but even more so when upper-limb movement is present. The current findings expand a developing line of research showing that speech is modulated by functional biomechanical linkages between hand gestures and the respiratory system. This identification of gesture-speech biomechanics promises to provide an alternative phylogenetic, ontogenetic, and mechanistic explanatory route of why communicative upper limb movements co-occur with speech in humans. ACKNOWLEDGMENT
    • …
    corecore