245 research outputs found

    Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue

    Get PDF
    A control model of the production of VCV sequences is presented, which consists in three main parts: a static forward model of the relations between motor commands and acoustic properties; the specification of targets in the perceptual space; a planning procedure based on optimization principles. Examples of simulations generated with this model illustrate how it can be used to assess theories and models of coarticulation in speech

    DNN adaptation by automatic quality estimation of ASR hypotheses

    Full text link
    In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

    Autonomous Learning of Representations

    Get PDF
    Walter O, Häb-Umbach R, Mokbel B, Paaßen B, Hammer B. Autonomous Learning of Representations. KI - Künstliche Intelligenz. 2015;29(4):339–351.Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively

    Wavelet-based voice morphing

    Get PDF
    This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity
    • …
    corecore