2,176 research outputs found

    Voice morphing using the generative topographic mapping

    Get PDF
    In this paper we address the problem of Voice Morphing. We attempt to transform the spectral characteristics of a source speaker's speech signal so that the listener would believe that the speech was uttered by a target speaker. The voice morphing system transforms the spectral envelope as represented by a Linear Prediction model. The transformation is achieved by codebook mapping using the Generative Topographic Mapping, a non-linear, latent variable, parametrically constrained, Gaussian Mixture Model

    Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

    Get PDF
    In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91

    Biomechanics of the orofacial motor system: Influence of speaker-specific characteristics on speech production

    No full text
    International audienceOrofacial biomechanics has been shown to influence the time signals of speech production and to impose constraints with which the central nervous system has to contend in order to achieve the goals of speech production. After a short explanation of the concept of biomechanics and its link with the variables usually measured in phonetics, two modeling studies are presented, which exemplify the influence of speaker-specific vocal tract morphology and muscle anatomy on speech production. First, speaker-specific 2D biomechanical models of the vocal tract were used that accounted for inter-speaker differences in head morphology. In particular, speakers have different main fiber orientations in the Styloglossus Muscle. Focusing on vowel /i/ it was shown that these differences induce speaker-specific susceptibility to changes in this muscle's activation. Second, the study by Stavness et al. (2013) is summarized. These authors investigated the role of a potential inter-speaker variability of the Orbicularis Oris Muscle implementation with a 3D biomechanical face model. A deeper implementation tends to reduce lip aperture; an increase in peripheralness tends to increase lip protrusion. With these studies, we illustrate the fact that speaker-specific orofacial biomechanics influences the patterns of articulatory and acoustic variability, and the emergence of speech control strategies

    Towards a Multimodal Silent Speech Interface for European Portuguese

    Get PDF
    Automatic Speech Recognition (ASR) in the presence of environmental noise is still a hard problem to tackle in speech science (Ng et al., 2000). Another problem well described in the literature is the one concerned with elderly speech production. Studies (Helfrich, 1979) have shown evidence of a slower speech rate, more breaks, more speech errors and a humbled volume of speech, when comparing elderly with teenagers or adults speech, on an acoustic level. This fact makes elderly speech hard to recognize, using currently available stochastic based ASR technology. To tackle these two problems in the context of ASR for HumanComputer Interaction, a novel Silent Speech Interface (SSI) in European Portuguese (EP) is envisioned.info:eu-repo/semantics/acceptedVersio

    Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

    Get PDF
    We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

    Motor Equivalence in Speech Production

    No full text
    International audienceThe first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the degrees of freedom of the speech motor system, or by changing the physical conditions of speech production or by modifying the feedback information. Examples are provided for each of these approaches. Implications of these studies for a better understanding of speech production and its interactions with speech perception are presented in the last section. Implications are mainly related to characterization of the mechanisms underlying interarticulatory coordination and to the analysis of the speech production goals
    corecore