433 research outputs found

    Speaking Rate Effects on Locus Equation Slope

    Get PDF
    A locus equation describes a 1st order regression fit to a scatter of vowel steady-state frequency values predicting vowel onset frequency values. Locus equation coefficients are often interpreted as indices of coarticulation. Speaking rate variations with a constant consonant–vowel form are thought to induce changes in the degree of coarticulation. In the current work, the hypothesis that locus slope is a transparent index of coarticulation is examined through the analysis of acoustic samples of large-scale, nearly continuous variations in speaking rate. Following the methodological conventions for locus equation derivation, data pooled across ten vowels yield locus equation slopes that are mostly consistent with the hypothesis that locus equations vary systematically with coarticulation. Comparable analyses between different four-vowel pools reveal variations in the locus slope range and changes in locus slope sensitivity to rate change. Analyses across rate but within vowels are substantially less consistent with the locus hypothesis. Taken together, these findings suggest that the practice of vowel pooling exerts a non-negligible influence on locus outcomes. Results are discussed within the context of articulatory accounts of locus equations and the effects of speaking rate change

    Silent speech: restoring the power of speech to people whose larynx has been removed

    Get PDF
    Every year, some 17,500 people in Europe and North America lose the power of speech after undergoing a laryngectomy, normally as a treatment for throat cancer. Several research groups have recently demonstrated that it is possible to restore speech to these people by using machine learning to learn the transformation from articulator movement to sound. In our project articulator movement is captured by a technique developed by our collaborators at Hull University called Permanent Magnet Articulography (PMA), which senses the changes of magnetic field caused by movements of small magnets attached to the lips and tongue. This solution, however, requires synchronous PMA-and-audio recordings for learning the transformation and, hence, it cannot be applied to people who have already lost their voice. Here we propose to investigate a variant of this technique in which the PMA data are used to drive an articulatory synthesiser, which generates speech acoustics by simulating the airflow through a computational model of the vocal tract. The project goals, participants, current status, and achievements of the project are discussed below.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Speech synthesis, Speech simulation and speech science

    Get PDF
    Speech synthesis research has been transformed in recent years through the exploitation of speech corpora - both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production

    Book Notice: Taylor, Paul - Text-to-Speech Synthesis

    Get PDF
    published or submitted for publicationis peer reviewe

    Articulatory Synthesis for Data Augmentation in Phoneme Recognition

    Get PDF
    While numerous studies on automatic speech recognition have been published in recent years describing data augmentation strategies based on time or frequency domain signal processing, few works exist on the artificial extensions of training data sets using purely synthetic speech data. In this work, the German KIEL corpus was augmented with synthetic data generated with the state-of-the-art articulatory synthesizer VOCALTRACTLAB. It is shown that the additional synthetic data can lead to a significantly better performance in single-phoneme recognition in certain cases, while at the same time, the performance can also decrease in other cases, depending on the degree of acoustic naturalness of the synthetic phonemes. As a result, this work can potentially guide future studies to improve the quality of articulatory synthesis via the link between synthetic speech production and automatic speech recognition

    Asymmetry in vowel perception in L1: evidence from articulatory synthesis of an [i]-[e] continuum

    Get PDF
    aussi disponible sur:http://www.geocities.com/ch_karypidis/docs/conferences/Karypidis_et_alii_AISV2005_en.pdfFor the past 25 years, a debate on whether vowel discrimination is affected by stimulus presentation order has been raised and the role of peripheral vowels in our perception has been under careful examination.In earlier studies, the method used to synthesize a vowel continuum has been by fragmenting, in equidistant points, the F1/F2 Euclidean distance between two prototypes (best exemplars of two different vowel categories). Nonetheless, the resulting sounds were rather unrealistic, inasmuch as some of them were assigned formant-value combinations that cannot be produced by a human vocal tract. Furthermore, the assignment of fixed F3 and F4 values generated a false spectral peak (around 3100 Hz and thus close to that of [i]) which induced the listeners to identify more [i]'s than they should have. Evidence from a recent study on vowel prototypes suggests that [i] has a very narrow perception zone, despite its acoustic stability and peripherality and notwithstanding the absence of a mid-close [e] in the system.Bearing these methodological inconsistencies in mind, we opted to prepare our stimuli using articulatory synthesis. Therefore, we have synthesized a prototypic French [i] (stimulus no. 1, the most extreme) and then modified its parameters (jaw height and tongue position), gradually and in 9 steps, towards a prototypic French [e] (stimulus no. 10, the least extreme). We subsequently submitted the 10-vowel continuum to 34 native French listeners by conducting:a) an identification test in which listeners were requested to identify as [i] or [e] seven repetitions of each stimulus, presented in random order;b) a discrimination test in which listeners were presented with 34 stimulus combinations [18 one-step pairs (9 stimulus combinations, 2 orders) and 16 two-step pairs (8 stimulus combinations, 2 orders)] and were asked whether the two vowels were the same or different. The ISI (Inter-Stimulus Interval) was fixed at 250 ms and every pair was presented five times.Results from the identification test reveal a clear quantal perception of the two categories.The discrimination results demonstrate that: a) discrimination is more difficult when a more extreme (on the F2' dimension) stimulus is presented second and b) discrimination is significantly easier in the 2-step condition, in both orders of presentation

    Palate-referenced Articulatory Features for Acoustic-to-Articulator Inversion

    Get PDF
    The selection of effective articulatory features is an important component of tasks such as acoustic-to-articulator inversion and articulatory synthesis. Although it is common to use direct articulatory sensor measurements as feature variables, this approach fails to incorporate important physiological information such as palate height and shape and thus is not as representative of vocal tract cross section as desired. We introduce a set of articulator feature variables that are palate referenced and normalized with respect to the articulatory working space in order to improve the quality of the vocal tract representation. These features include normalized horizontal positions plus the normalized palatal height of two midsagittal and one lateral tongue sensor, as well as normalized lip separation and lip protrusion. The quality of the feature representation is evaluated subjectively by comparing the variances and vowel separation in the working space and quantitatively through measurement of acoustic-to-articulator inversion error. Results indicate that the palate-referenced features have reduced variance and increased separation between vowels spaces and substantially lower inversion error than direct sensor measures

    Model-based exploration of linking between vowel articulatory space and acoustic space

    Get PDF
    While the acoustic vowel space has been extensively studied in previous research, little is known about the high-dimensional articulatory space of vowels. The articulatory imaging techniques are limited to tracking only a few key articulators, leaving the rest of the articulators unmonitored. In the present study, we attempted to develop a detailed articulatory space obtained by training a 3D articulatory synthesizer to learn eleven British English vowels. An analysis-by-synthesis strategy was used to acoustically optimize vocal tract parameters that represent twenty articulatory dimensions. The results show that tongue height and retraction, larynx location and lip roundness are the most perceptually distinctive articulatory dimensions. Yet, even for these dimensions, there is a fair amount of articulatory overlap between vowels, unlike the fine-grained acoustic space. This method opens up the possibility of using modelling to investigate the link between speech production and perception

    Praat Tutorial: Goldman

    Get PDF
    • …
    corecore