Search CORE

10,027 research outputs found

Robust ASR using Support Vector Machines

Author: A. Gallardo-Antolín
Allwein
Bengio
Bourlard
Burges
C. Peláez-Moreno
Clarkson
Crammer
D. Martín-Iglesias
F. Díaz-de-María
Fürnkranz
Ganapathiraju
Glass
Hsu
Jiang
Joachims
Navia-Vázquez
R. Solera-Ureña
Rabiner
Schölkopf
Shimodaira
Thubthong
Trentin
Vapnik
Vapnik
Vicente-Peña
Weiss
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

PROSODY PREDICTION FOR TAMIL TEXT-TO-SPEECH SYNTHESIZER USING SENTIMENT ANALYSIS

Author: Kumar G Bharadwaja
Rajendran Vaibhavi
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 01/04/2017
Field of study

A speech synthesizer which sounds similar to a human voice is preferred over a robotic voice, and hence to increase the naturalness of a speechÂ synthesizer an efficacious prosody model is imperative. Hence, this paper is focused on developing a prosody prediction model using sentimentÂ analysis for a Tamil speech synthesizer. Two variations of prosody prediction models using SentiWordNet are experimented: one without a stemmerÂ and the other with a stemmer. The prosody prediction model with a stemmer performs much more efficiently than the one without a stemmer as itÂ tackles the highly agglutinative and inflectional words in Tamil language in a better way and is exemplified clearly, in this paper. The performance ofÂ the prosody prediction model with a stemmer has a higher classification accuracy of 77% on the test set in comparison to the 57% accuracy by theÂ prosody model without a stemmer.Â

Innovare Academic Sciences: E-Journals

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Author: A Graves
A Graves
A Lozano-Diez
A rahman Mohamed
Alicia Lozano-Diez
CM Bishop
D Martinez
D Martinez
D Reynolds
D Yu
Doroteo T. Toledano
F Gers
F Richardson
F Weninger
FA Gers
FA Gers
G Hinton
H Li
Ian McLoughlin
J Gonzalez-Dominguez
J Gonzalez-Dominguez
J Schmidhuber
Javier Gonzalez-Dominguez
Joaquin Gonzalez-Rodriguez
M Van Segbroeck
N Dehak
N Dehak
P Kenny
PA Torres-Carrasquillo
Ruben Zazo
Y Song
YK Muthusamy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo

The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Author: Bell Peter
Klejch Ondřej
Wallington Electra
Publication venue: 'International Speech Communication Association'
Publication date: 30/08/2021
Field of study

Edinburgh Research Explorer