11,865 research outputs found

    Effect of Changing the Vocal Tract Shape on the Sound Production of the Recorder: An Experimental and Theoretical Study

    Full text link
    Changing the vocal tract shape is one of the techniques which can be used by the players of wind instruments to modify the quality of the sound. It has been intensely studied in the case of reed instruments but has received only little attention in the case of air-jet instruments. This paper presents a first study focused on changes in the vocal tract shape in recorder playing techniques. Measurements carried out with recorder players allow to identify techniques involving changes of the mouth shape as well as consequences on the sound. A second experiment performed in laboratory mimics the coupling with the vocal tract on an artificial mouth. The phase of the transfer function between the instrument and the mouth of the player is identified to be the relevant parameter of the coupling. It is shown to have consequences on the spectral content in terms of energy distribution among the even and odd harmonics, as well as on the stability of the first two oscillating regimes. The results gathered from the two experiments allow to develop a simplified model of sound production including the effect of changing the vocal tract shape. It is based on the modification of the jet instabilities due to the pulsating emerging jet. Two kinds of instabilities, symmetric and anti-symmetric, with respect to the stream axis, are controlled by the coupling with the vocal tract and the acoustic oscillation within the pipe, respectively. The symmetry properties of the flow are mapped on the temporal formulation of the source term, predicting a change in the even / odd harmonics energy distribution. The predictions are in qualitative agreement with the experimental observations

    Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

    Involvement of the cortico-basal ganglia-thalamocortical loop in developmental stuttering

    Get PDF
    Stuttering is a complex neurodevelopmental disorder that has to date eluded a clear explication of its pathophysiological bases. In this review, we utilize the Directions Into Velocities of Articulators (DIVA) neurocomputational modeling framework to mechanistically interpret relevant findings from the behavioral and neurological literatures on stuttering. Within this theoretical framework, we propose that the primary impairment underlying stuttering behavior is malfunction in the cortico-basal ganglia-thalamocortical (hereafter, cortico-BG) loop that is responsible for initiating speech motor programs. This theoretical perspective predicts three possible loci of impaired neural processing within the cortico-BG loop that could lead to stuttering behaviors: impairment within the basal ganglia proper; impairment of axonal projections between cerebral cortex, basal ganglia, and thalamus; and impairment in cortical processing. These theoretical perspectives are presented in detail, followed by a review of empirical data that make reference to these three possibilities. We also highlight any differences that are present in the literature based on examining adults versus children, which give important insights into potential core deficits associated with stuttering versus compensatory changes that occur in the brain as a result of having stuttered for many years in the case of adults who stutter. We conclude with outstanding questions in the field and promising areas for future studies that have the potential to further advance mechanistic understanding of neural deficits underlying persistent developmental stuttering.R01 DC007683 - NIDCD NIH HHS; R01 DC011277 - NIDCD NIH HHSPublished versio

    A Vowel Analysis of the Northwestern University-Children\u27s Perception of Speech Evaluation Tool

    Get PDF
    In an analysis of the speech perception evaluation tool, the Northwestern University – Children’s Perception of Speech test, the goal was to determine whether the foil words and the target word were phonemically balanced across each page of test Book A, as it corresponds to the target words presented in Test Form 1 and Test Form 2 independently. Based on vowel sounds alone, variation exists in the vowels that appear on a test page on the majority of pages. The corresponding formant frequencies, at all three resonance levels for both the average adult male speaker and the average adult female speaker, revealed that the target word could be easily distinguished from the foil words on the premise of percent differences calculated between the formants of the target vowel and the foil vowels. For the population of children with hearing impairments, especially those with limited or no access to the high frequencies, the NU-CHIPS evaluation tool may not be the best indicator of the child’s speech perception ability due to significant vowel variations

    Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

    Get PDF
    Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

    Speaker Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

    Cepstral peak prominence: a comprehensive analysis

    Full text link
    An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise
    • …
    corecore