Search CORE

245 research outputs found

Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue

Author: Ma Liang
Payan Yohan
Perrier Pascal
Publication venue
Publication date: 01/01/2005
Field of study

A control model of the production of VCV sequences is presented, which consists in three main parts: a static forward model of the relations between motor commands and acoustic properties; the specification of targets in the perceptual space; a planning procedure based on optimization principles. Examples of simulations generated with this model illustrate how it can be used to assess theories and models of coarticulation in speech

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

DNN adaptation by automatic quality estimation of ASR hypotheses

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Autonomous Learning of Representations

Author: Hammer Barbara
Häb-Umbach Reinhold
Mokbel Bassam
Paaßen Benjamin
Walter Oliver
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Walter O, Häb-Umbach R, Mokbel B, Paaßen B, Hammer B. Autonomous Learning of Representations. KI - Künstliche Intelligenz. 2015;29(4):339–351.Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively

Publications at Bielefeld University

Wavelet-based voice morphing

Author: Moroz I. M.
Orphanidou C.
Roberts S. J.
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity

Oxford University Research Archive

Novel Approaches to Speaker Clustering for Speaker Diarization in Audio Broadcast News Data

Author: France Miheli&#269
Janez &#381
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study