Search CORE

11,865 research outputs found

Effect of Changing the Vocal Tract Shape on the Sound Production of the Recorder: An Experimental and Theoretical Study

Author: Auvray R
Ernoult Augustin
Fabre B
Terrien S
Vergez C
Publication venue: 'S. Hirzel Verlag'
Publication date: 01/01/2015
Field of study

Changing the vocal tract shape is one of the techniques which can be used by the players of wind instruments to modify the quality of the sound. It has been intensely studied in the case of reed instruments but has received only little attention in the case of air-jet instruments. This paper presents a first study focused on changes in the vocal tract shape in recorder playing techniques. Measurements carried out with recorder players allow to identify techniques involving changes of the mouth shape as well as consequences on the sound. A second experiment performed in laboratory mimics the coupling with the vocal tract on an artificial mouth. The phase of the transfer function between the instrument and the mouth of the player is identified to be the relevant parameter of the coupling. It is shown to have consequences on the spectral content in terms of energy distribution among the even and odd harmonics, as well as on the stability of the first two oscillating regimes. The results gathered from the two experiments allow to develop a simplified model of sound production including the effect of changing the vocal tract shape. It is based on the modification of the jet instabilities due to the pulsating emerging jet. Two kinds of instabilities, symmetric and anti-symmetric, with respect to the stream axis, are controlled by the coupling with the vocal tract and the acoustic oscillation within the pipe, respectively. The symmetry properties of the flow are mapped on the temporal formulation of the source term, predicting a change in the even / odd harmonics energy distribution. The predictions are in qualitative agreement with the experimental observations

arXiv.org e-Print Archive

HAL AMU

Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

Author: Berry Jeffrey J.
Ji An
Johnson Michael T.
Publication venue: e-Publications@Marquette
Publication date: 01/10/2016
Field of study

Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

epublications@Marquette

Involvement of the cortico-basal ganglia-thalamocortical loop in developmental stuttering

Author: Chang Soo-Eun
Guenther Frank H.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Stuttering is a complex neurodevelopmental disorder that has to date eluded a clear explication of its pathophysiological bases. In this review, we utilize the Directions Into Velocities of Articulators (DIVA) neurocomputational modeling framework to mechanistically interpret relevant findings from the behavioral and neurological literatures on stuttering. Within this theoretical framework, we propose that the primary impairment underlying stuttering behavior is malfunction in the cortico-basal ganglia-thalamocortical (hereafter, cortico-BG) loop that is responsible for initiating speech motor programs. This theoretical perspective predicts three possible loci of impaired neural processing within the cortico-BG loop that could lead to stuttering behaviors: impairment within the basal ganglia proper; impairment of axonal projections between cerebral cortex, basal ganglia, and thalamus; and impairment in cortical processing. These theoretical perspectives are presented in detail, followed by a review of empirical data that make reference to these three possibilities. We also highlight any differences that are present in the literature based on examining adults versus children, which give important insights into potential core deficits associated with stuttering versus compensatory changes that occur in the brain as a result of having stuttered for many years in the case of adults who stutter. We conclude with outstanding questions in the field and promising areas for future studies that have the potential to further advance mechanistic understanding of neural deficits underlying persistent developmental stuttering.R01 DC007683 - NIDCD NIH HHS; R01 DC011277 - NIDCD NIH HHSPublished versio

Boston University Institutional Repository (OpenBU)

A Vowel Analysis of the Northwestern University-Children\u27s Perception of Speech Evaluation Tool

Author: Zukowski Kassie Nicole
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2017
Field of study

In an analysis of the speech perception evaluation tool, the Northwestern University – Children’s Perception of Speech test, the goal was to determine whether the foil words and the target word were phonemically balanced across each page of test Book A, as it corresponds to the target words presented in Test Form 1 and Test Form 2 independently. Based on vowel sounds alone, variation exists in the vowels that appear on a test page on the majority of pages. The corresponding formant frequencies, at all three resonance levels for both the average adult male speaker and the average adult female speaker, revealed that the target word could be easily distinguished from the foil words on the premise of percent differences calculated between the formants of the target vowel and the foil vowels. For the population of children with hearing impairments, especially those with limited or no access to the high frequencies, the NU-CHIPS evaluation tool may not be the best indicator of the child’s speech perception ability due to significant vowel variations

UNH Scholars' Repository

Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

Author: Wang Jianglin
Publication venue: e-Publications@Marquette
Publication date: 01/10/2013
Field of study

Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

epublications@Marquette

Speaker Independent Acoustic-to-Articulatory Inversion

Author: Ji An
Publication venue: e-Publications@Marquette
Publication date: 01/10/2014
Field of study

Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

epublications@Marquette

Cepstral peak prominence: a comprehensive analysis

Author: Abramowitz
Alpan
Alpan
Alpan
Awan
Awan
Awan
Awan
Awan
Balasubramanium
Balasubramanium
Blankenship
Cannito
Chen
Childers
Childers
Clapham
Dejonckere
Eadie
Esposito
Esposito
Ferrer
Fraile
Fraj
Haderlein
Haderlein
Halberstam
Hartl
Hartl
Hartl
Haykin
Heman-Ackah
Heman-Ackah
Heman-Ackah
Hillenbrand
Hillenbrand
Howard
Juan Ignacio Godino-Llorente
Kumar
Leong
Lowell
Lowell
Maryn
Maryn
Maryn
Medhurst
Mehta
Mehta
Merk
Moers
Murphy
Murphy
Murphy
Nagle
Noll
Oppenheim
Oppenheim
Peterson
Rabiner
Rosa
Rubén Fraile
Samlan
Samlan
Shanmugan
Shrivastav
Shrivastav
Shue
Solomon
Story
Vasilakis
Vipperla
Watts
Wolfe
Wolfe
Yap
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Archivo Digital UPM