170 research outputs found

    A computational model of the relationship between speech intelligibility and speech acoustics

    Get PDF
    abstract: Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Acoustic measurement of overall voice quality in sustained vowels and continuous speech

    Get PDF
    Measurement of dysphonia severity involves auditory-perceptual evaluations and acoustic analyses of sound waves. Meta-analysis of proportional associations between these two methods showed that many popular perturbation metrics and noise-to-harmonics and others ratios do not yield reasonable results. However, this meta-analysis demonstrated that the validity of specific autocorrelation- and cepstrum-based measures was much more convincing, and appointed ‘smoothed cepstral peak prominence’ as the most promising metric of dysphonia severity. Original research confirmed this inferiority of perturbation measures and superiority of cepstral indices in dysphonia measurement of laryngeal-vocal and tracheoesophageal voice samples. However, to be truly representative for daily voice use patterns, measurement of overall voice quality is ideally founded on the analysis of sustained vowels ánd continuous speech. A customized method for including both sample types and calculating the multivariate Acoustic Voice Quality Index (i.e., AVQI), was constructed for this purpose. Original study of the AVQI revealed acceptable results in terms of initial concurrent validity, diagnostic precision, internal and external cross-validity and responsiveness to change. It thus was concluded that the AVQI can track changes in dysphonia severity across the voice therapy process. There are many freely and commercially available computer programs and systems for acoustic metrics of dysphonia severity. We investigated agreements and differences between two commonly available programs (i.e., Praat and Multi-Dimensional Voice Program) and systems. The results indicated that clinicians better not compare frequency perturbation data across systems and programs and amplitude perturbation data across systems. Finally, acoustic information can also be utilized as a biofeedback modality during voice exercises. Based on a systematic literature review, it was cautiously concluded that acoustic biofeedback can be a valuable tool in the treatment of phonatory disorders. When applied with caution, acoustic algorithms (particularly cepstrum-based measures and AVQI) have merited a special role in assessment and/or treatment of dysphonia severity

    Independent Component Analysis of Event-Related Electroencephalography During Speech and Non-Speech Discrimination: : Implications for the Sensorimotor ∆∞ Rhythm in Speech Processing

    Get PDF
    Background: The functional significance of sensorimotor integration in acoustic speech processing is unclear despite more than three decades of neuroimaging research. Constructivist theories have long speculated that listeners make predictions about articulatory goals functioning to weight sensory analysis toward expected acoustic features (e.g. analysis-by-synthesis; internal models). Direct-realist accounts posit that sensorimotor integration is achieved via a direct match between incoming acoustic cues and articulatory gestures. A method capable of favoring one account over the other requires an ongoing, high-temporal resolution measure of sensorimotor cortical activity prior to and following acoustic input. Although scalp-recorded electroencephalography (EEG) provides a measure of cortical activity on a millisecond time scale, it has low-spatial resolution due to the blurring or mixing of cortical signals on the scalp surface. Recently proposed solutions to the low-spatial resolution of EEG known as blind source separation algorithms (BSS) have made the identification of distinct cortical signals possible. The µ rhythm of the EEG is known to briefly suppress (i.e., decrease in spectral power) over the sensorimotor cortex during the performance, imagination, and observation of biological movements, suggesting that it may provide a sensitive index of sensorimotor integration during speech processing. Neuroimaging studies have traditionally investigated speech perception in two-forced choice designs in which participants discriminate between pairs of speech and nonspeech control stimuli. As such, this classical design was employed in the current dissertation work to address the following specific aims to: 1) isolate independent components with traditional EEG signatures within the dorsal sensorimotor stream network; 2) identify components with features of the sensorimotor µ rhythm and; 3) investigate changes in timefrequency activation of the µ rhythm relative to stimulus type, onset, and discriminability (i.e., perceptual performance). In light of constructivist predictions, it was hypothesized that the µ rhythm would show significant suppression for syllable stimuli prior to and following stimulus onset, with significant differences between correct discrimination trials and those discriminated at chance levels. Methods: The current study employed millisecond temporal resolution EEG to measure ongoing decreases and increases in spectral power (event-related spectral perturbations; ERSPs) prior to, during, and after the onset of acoustic speech and tone-sweep stimuli embedded in white-noise. Sixteen participants were asked to passively listen to or actively identify speech and tone signals in a two-force choice same/different discrimination task. To investigate the role of ERSPs in perceptual identification performance, high signal-to-noise ratios (SNRs) in which speech and tone identification was significantly better than chance (+4dB) and low SNRs in which performance was below chance (-6dB and -18dB) were compared to a baseline of passive noise. Independent component analysis (ICA) of the EEG was used to reduce artifact and source mixing due to volume conduction. Independent components were clustered using measure product methods and cortical source modeling, including spectra, scalp distribution, equivalent current dipole estimation (ECD), and standardized low-resolution tomography (sLORETA). Results: Data analysis revealed six component clusters consistent with a bilateral dorsal-stream sensorimotor network, including component clusters localized to the precentral and postcentral gyrus, cingulate cortex, supplemental motor area, and posterior temporal regions. Timefrequency analysis of the left and right lateralized µ component clusters revealed significant (pFDR\u3c.05) suppression in the traditional beta frequency range (13-30Hz) prior to, during, and following stimulus onset. No significant differences from baseline were found for passive listening conditions. Tone discrimination was different from passive noise in the time period following stimulus onset only. No significant differences were found for correct relative to chance tone stimuli. For both left and right lateralized clusters, early suppression (i.e., prior to stimulus onset) compared to the passive noise baseline was found for the syllable discrimination task only. Significant differences between correct trials and trials identified at chance level were found for the time period following stimulus offset for the syllable discrimination task in left lateralized cluster. Conclusions: As this is the first study to employ BSS methods to isolate components of the EEG during acoustic speech and non-speech discrimination, findings have important implications for the functional role of sensorimotor integration in speech processing. Consistent with expectations, the current study revealed component clusters associated with source models within the sensorimotor dorsal stream network. Beta suppression of the µ component clusters in both the left and right hemispheres is consistent with activity in the precentral gyrus prior to and following acoustic input. As early suppression of the µ was found prior the syllable discrimination task, the present findings favor internal model concepts of speech processing over mechanisms proposed by direct-realists. Significant differences between correct and chance syllable discrimination trials are also consistent with internal model concepts suggesting that sensorimotor integration is related to perceptual performance at the point in time when initial articulatory hypotheses are compared with acoustic input. The relatively inexpensive, noninvasive EEG methodology used in this study may have translational value in the future as a brain computer interface (BCI) approach. As deficits in sensorimotor integration are thought to underlie cognitive-communication impairments in a number of communication disorders, the development of neuromodulatory feedback approaches may provide a novel avenue for augmenting current therapeutic protocols

    An Investigation of Auditory and Visual Temporal Processing in Children with Reading Disorders

    Get PDF
    Several lines of research have revealed a relationship between reading disorders (RD) and auditory temporal processing deficits. That is, subtle, yet rapid changes within an acoustic message are more difficult for individuals with RD to perceive than for those individuals with normal reading abilities, which negatively impacts accurate speech perception and, in turn, phonological processing and decoding abilities (Cestnick & Jerger, 2000; De Jong et al, 2000; Fink et al., 2006; Walker et al., 2006). However, researchers investigating a pansensory temporal processing deficit theory of RD have found conflicting evidence supporting the relationship between visual temporal processing and reading, specifically in regards to the magnocellular deficit theory of dyslexia (Chase & Jenner, 1993; Farmer & Klein, 1993; Lehmkuhle et al., 1993; Lovegrove, 1993). The purpose of the current study was to further investigate the relationship between pansensory processing deficits and subtypes of reading disorders. Participants included 27 children (ages 10-13) divided into three reading ability groups (i.e., normal reading, dysphonetic, and dysphoneidetic) based on performance the WRMT-R and Word/Nonword Test. Experimental tasks included gap detection, duration discrimination, and duration temporal order judgment tasks presented in both the auditory and visual modalities. When controlling for verbal ability (PPVT-IV), due to significant group differences, both RD groups (dysphonetic and dysphoneidetic deficits) demonstrated a poorer performance when compared to the control group on both the within- and between-channel gap paradigms of the auditory gap detection task. No significant differences were found between normal, dysphonetic, and dysphoneidetic readers on any of the visual temporal processing tasks. The current study failed to support the pansensory deficit of RD when reading groups were dichotomized across experimental tasks. However, when considering reading abilities as a continuum several significant correlations between performance on auditory and visual experimental tasks and reading decoding standardized measures were found suggesting that pansensory temporal processing is strongly associated with reading abilities. Results suggest that auditory temporal processing abilities are closely linked to phonological decoding skills in addition to sight-word recognition abilities for the young adolescents having reading disorders.Ph.D

    Predicting room acoustical behavior with the ODEON computer model

    Get PDF

    The effect of multitalker background noise on speech intelligibility in Parkinson\u27s disease and controls

    Get PDF
    This study investigated the effect of multi-talker background noise on speech intelligibility in participants with hypophonia due to Parkinson’s disease (PD). Ten individuals with PD and 10 geriatric controls were tested on four speech intelligibility tasks at the single word, sentence, and conversation level in various conditions of background noise. Listeners assessed speech intelligibility using word identification or orthographic transcription procedures. Results revealed non-significant differences between groups when intelligibility was assessed in no background noise. PD speech intelligibility decreased significantly relative to controls in the presence of background noise. A phonetic error analysis revealed a distinct error profile for PD speech in background noise. The four most frequent phonetic errors were glottal-null, consonant-null in final position, stop place of articulation, and initial position cluster-singleton. The results demonstrate that individuals with PD have significant and distinctive deficits in speech intelligibility and phonetic errors in the presence of background noise

    Sensorimotor Modulations by Cognitive Processes During Accurate Speech Discrimination: An EEG Investigation of Dorsal Stream Processing

    Get PDF
    Internal models mediate the transmission of information between anterior and posterior regions of the dorsal stream in support of speech perception, though it remains unclear how this mechanism responds to cognitive processes in service of task demands. The purpose of the current study was to identify the influences of attention and working memory on sensorimotor activity across the dorsal stream during speech discrimination, with set size and signal clarity employed to modulate stimulus predictability and the time course of increased task demands, respectively. Independent Component Analysis of 64–channel EEG data identified bilateral sensorimotor mu and auditory alpha components from a cohort of 42 participants, indexing activity from anterior (mu) and posterior (auditory) aspects of the dorsal stream. Time frequency (ERSP) analysis evaluated task-related changes in focal activation patterns with phase coherence measures employed to track patterns of information flow across the dorsal stream. ERSP decomposition of mu clusters revealed event-related desynchronization (ERD) in beta and alpha bands, which were interpreted as evidence of forward (beta) and inverse (alpha) internal modeling across the time course of perception events. Stronger pre-stimulus mu alpha ERD in small set discrimination tasks was interpreted as more efficient attentional allocation due to the reduced sensory search space enabled by predictable stimuli. Mu-alpha and mu-beta ERD in peri- and post-stimulus periods were interpreted within the framework of Analysis by Synthesis as evidence of working memory activity for stimulus processing and maintenance, with weaker activity in degraded conditions suggesting that covert rehearsal mechanisms are sensitive to the quality of the stimulus being retained in working memory. Similar ERSP patterns across conditions despite the differences in stimulus predictability and clarity, suggest that subjects may have adapted to tasks. In light of this, future studies of sensorimotor processing should consider the ecological validity of the tasks employed, as well as the larger cognitive environment in which tasks are performed. The absence of interpretable patterns of mu-auditory coherence modulation across the time course of speech discrimination highlights the need for more sensitive analyses to probe dorsal stream connectivity
    • …
    corecore