Search CORE

2,176 research outputs found

Voice morphing using the generative topographic mapping

Author: Moroz I. M.
Orphanidou C.
Roberts S. J.
Publication venue
Publication date: 01/01/2003
Field of study

In this paper we address the problem of Voice Morphing. We attempt to transform the spectral characteristics of a source speaker's speech signal so that the listener would believe that the speech was uttered by a target speaker. The voice morphing system transforms the spectral envelope as represented by a Linear Prediction model. The transformation is achieved by codebook mapping using the Generative Topographic Mapping, a non-linear, latent variable, parametrically constrained, Gaussian Mixture Model

Oxford University Research Archive

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

Author: Badlani Rohan
Elizalde Benjamin
Kumar Anurag
Lane Ian
Raj Bhiksha
Shah Ankit
Vincent Emmanuel
Publication venue
Publication date: 25/08/2016
Field of study

In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Biomechanics of the orofacial motor system: Influence of speaker-specific characteristics on speech production

Author: Perrier Pascal
Winkler Ralf
Publication venue: Peter Lang Publishing Group
Publication date: 15/10/2015
Field of study

International audienceOrofacial biomechanics has been shown to influence the time signals of speech production and to impose constraints with which the central nervous system has to contend in order to achieve the goals of speech production. After a short explanation of the concept of biomechanics and its link with the variables usually measured in phonetics, two modeling studies are presented, which exemplify the influence of speaker-specific vocal tract morphology and muscle anatomy on speech production. First, speaker-specific 2D biomechanical models of the vocal tract were used that accounted for inter-speaker differences in head morphology. In particular, speakers have different main fiber orientations in the Styloglossus Muscle. Focusing on vowel /i/ it was shown that these differences induce speaker-specific susceptibility to changes in this muscle's activation. Second, the study by Stavness et al. (2013) is summarized. These authors investigated the role of a potential inter-speaker variability of the Orbicularis Oris Muscle implementation with a 3D biomechanical face model. A deeper implementation tends to reduce lip aperture; an increase in peripheralness tends to increase lip protrusion. With these studies, we illustrate the fact that speaker-specific orofacial biomechanics influences the patterns of articulatory and acoustic variability, and the emergence of speech control strategies

Hal - Université Grenoble Alpes

Towards a Multimodal Silent Speech Interface for European Portuguese

Author: Antonio Teixeira
Carlos Bastos
Joao Freitas
Miguel Dias
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

Automatic Speech Recognition (ASR) in the presence of environmental noise is still a hard problem to tackle in speech science (Ng et al., 2000). Another problem well described in the literature is the one concerned with elderly speech production. Studies (Helfrich, 1979) have shown evidence of a slower speech rate, more breaks, more speech errors and a humbled volume of speech, when comparing elderly with teenagers or adults speech, on an acoustic level. This fact makes elderly speech hard to recognize, using currently available stochastic based ASR technology. To tackle these two problems in the context of ASR for HumanComputer Interaction, a novel Silent Speech Interface (SSI) in European Portuguese (EP) is envisioned.info:eu-repo/semantics/acceptedVersio

IntechOpen

Repositório Institucional do ISCTE-IUL

Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

Author: Alexey Ozerov
Frdric Bimbot
Pierrick Philippe
Rmi Gribonval
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

Author: Matthews I
Theobald B
Publication venue
Publication date: 01/01/2012
Field of study

We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

University of East Anglia digital repository

Motor Equivalence in Speech Production

Author: Fuchs Susanne
Perrier Pascal
Publication venue: 'Wiley'
Publication date: 01/06/2015
Field of study

International audienceThe first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the degrees of freedom of the speech motor system, or by changing the physical conditions of speech production or by modifying the feedback information. Examples are provided for each of these approaches. Implications of these studies for a better understanding of speech production and its interactions with speech perception are presented in the last section. Implications are mainly related to characterization of the mechanisms underlying interarticulatory coordination and to the analysis of the speech production goals

Hal - Université Grenoble Alpes