Search CORE

4 research outputs found

Automatic speech segmentation : why and what segments ?

Author: ROSSI (M.)
Publication venue: GRETSI, Saint Martin d'Hères, France
Publication date: 01/01/1990
Field of study

I present and discuss the SAPHO (Segmentation by Acoustico-Phonetic knowledge) model implemented in Awk language under the Unix system on a MASSCOMP computer. The system is devised as a speaker independent ASS (automatic speech segmentation), by a previous recognition of the phonetic articulation manner. In ail the ASR systems the phonetic knowledge is at least implicitely used . Il has to be explicitely referred to . The phonemic units cannot be directly built from the acoustic signal and are not available at the output of SAPHO . According to the Level Building procedure SAPHO supplies a hierarchized set of acoustic properties and segments, and phonetic properties and segments which fit the phonetic parsing of the acoustic wave . The amenability of this system is entailed by ils modularity which allows a possible further architecture as distributed tasks.The processors are concieved either as data driven with numeric computalion or as expectation driven activities with symbolic computation . The recursivity in the acoustic and the phonetic supervisors at each step of the parsing ensures the likelihood of the décisions . The suitability and the reliability of SAPHO are corroborated by the accuracy of the results .Nous présentons et discutons le modèle SAPHO (segmentation par les connaissances acoustico-phonétiques) mis en ouvre en langage AWK sous UNIX, sur une station de travail Masscomp . Ce système est conçu comme une procédure de segmentation indépendante du locuteur fondée sur une reconnaissance préalable du mode d'articulation phonétique . Dans la plupart des modèles RAP, les connaissances phonétiques sont toujours utilisées, au moins de façon implicite . Elles doivent l'être de façon explicite. Les unités phonémiques ne peuvent pas être directement construites à partir du signal acoustique ; elles ne sont pas encore disponibles à la sortie de SAPHO . Suivant le modèle de Construction de Niveaux (Level Building), SAPHO fournit un ensemble hiérarchisé de propriétés et de segments acoustiques, de propriétés et de segments phonétiques congruents avec les unités phonétiques et leur structure interne . La souplesse de ce système est assurée par sa modularité . La fiabilité de SAPHO est corroborée par l'exactitude des résultats

I-Revues

Mathematical methods of signal processing

Author: Sayols Baixeras Narcís
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

The aim of this project is to present in a systematic way the more relevant mathematical methods of signal processing, and to explore how they are applied to speech and image precessing. After explaining the more common parts of a standard course in signal processing, we put special emphasis in two new tools that have played a significant role in signal processing in the past few years: pattern theory and wavelet theory. Finally, we use all these techniques to implement an algorithm that detects the wallpaper group of a plane mosaic taking an image of it as input and an algorithm that returns the phoneme sequence of a speech signal. The material in this memory can be grouped in two parts. The first part, consisting of the first six chapters, deals with the theoretical foundation of signal processing. It also includes materials related to plane symmetry groups. The second part, consisting of the last two chapters, is focussed on the applications

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Discriminative and generative approaches for long- and short-term speaker characteristics modeling : application to speaker verification

Author: Dehak Najim
Publication venue: École de technologie supérieure
Publication date
Field of study

The speaker verification problem can be stated as follows: given two speech recordings, determine whether or not they have been uttered by the same speaker. Most current speaker verification systems are based on Gaussian mixture models. This probabilistic representation allows to adequately model the complex distribution of the underlying speech feature parameters. It however represents an inadequate basis for discriminating between speakers, which is the key issue in the area of speaker verification. In the first part of this thesis, we attempt to overcome these difficulties by proposing to combine support vector machines, a well established discriminative modeling, with two generative approaches based on Gaussian mixture models. In the first generative approach, a target speaker is represented by a Gaussian mixture model corresponding to a Maximum A Posteriori adaptation of a large Gaussian mixture model, coined universal background model, to the target speaker data. The second generative approach is the Joint Factor Analysis that has become the state-of-the-art in the field of speaker verification during the last three years. The advantage of this technique is that it provides a framework of powerful tools for modeling the inter-speaker and channel variabilities. We propose and test several kernel functions that are integrated in the design of both previous combinations. The best results are obtained when the support vector machines are applied within a new space called the "total variability space", defined using the factor analysis. In this novel modeling approach, the channel effect is treated through a combination of linear discnminant analysis and kemel normalization based on the inverse of the within covariance matrix of the speaker. In the second part of this thesis, we present a new approach to modeling the speaker's longterm prosodic and spectral characteristics. This novel approach is based on continuous approximations of the prosodic and cepstral contours contained in a pseudo-syllabic segment of speech. Each of these contours is fitted to a Legendre polynomial, whose coefficients are modeled by a Gaussian mixture model. The joint factor analysis is used to treat the speaker and channel variabilities. Finally, we perform a scores fusion between systems based on long-term speaker characteristics with those described above that use short-term speaker features

Espace ÉTS

Phoneme segmentation of speech

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Crossref