Search CORE

51 research outputs found

Recommended from our members

Speech Separation for Recognition and Enhancement

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

A pitch for the significance of complex acoustic scenes ("Speech in the Wild"), and the importance of thinking about ways for separating and organizing them. Includes very brief reviews of separation by spatial cues, pitch, and source models

Columbia University Academic Commons

Adaptation of Hybrid ANN/HMM Models using Linear Hidden Transformations and Conservative Training

Author: De Mori R.
Gemello R.
Laface Pietro
Mana F.
Scanzio Stefano
Publication venue: IEEE
Publication date: 01/01/2006
Field of study

International audienceA technique is proposed for the adaptation of automatic speech recognition systems using Hybrid models combining Artificial Neural Networks with Hidden Markov Models. The application of linear transformations not only to the input features, but also to the outputs of the internal layers is investigated. The motivation is that the outputs of an internal layer represent a projection of the input pattern into a space where it should be easier to learn the classification or transformation expected at the output of the network. A new solution, called Conservative Training, is proposed that compensates for the lack of adaptation samples in certain classes. Supervised adaptation experiments with different corpora and for different adaptation types are described. The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations

PORTO Publications Open Repository TOrino

Porting concepts from DNNs back to GMMs

Author: Demuynck Kris
Triefenbach Fabian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the 'deep' GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination

Crossref

Ghent University Academic Bibliography

Feature extraction and feature reduction for spoken letter recognition

Author: NC DOCKS at The University of North Carolina at Greensboro
Wendell Tyler James
Publication venue
Publication date: 01/01/2016
Field of study

The complexity of finding the relevant features for the classification of spoken letters is due to the phonetic similarities between letters and their high dimensionality. Spoken letter classification in machine learning literature has often led to very convoluted algorithms to achieve successful classification. The success in this work can be found in the high classification rate as well as the relatively small amount of computation required between signal retrieval to feature selection. The relevant features spring from an analysis of the sequential properties between the vectors produced from a Fourier transform. The study mainly focuses on the classification of fricative letters f and s, m and n, and the eset (b,c,d,e,g,p,t,v,z) which are highly indistinguishable, especially when transmitted over the modern VoIP digital devices. Another feature of this research is the dataset produced did not include signal processing that reduces noise which is shown to produce equivalent and sometimes better results. All pops and static noises that appear were kept as part of the sound files. This is in contrast to other research that recorded their dataset with high grade equipment and noise reduction algorithms. To classify the audio files, the machine learning algorithm that was used is called the random forest algorithm. This algorithm was successful because the features produced were largely separable in relatively few dimensions. Classification accuracies were in the 92\%-97\% depending on the dataset

The University of North Carolina at Greensboro

Adaptation of Hybrid ANN/HMM Models using Linear Hidden Transformations and Conservative Training

Author: DE MORI R
GEMELLO R
LAFACE P
MANA F
SCANZIO S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Adapting Hybrid ANN/HMM to Speech Variations

Author: D. Albesano
F. Mana
LAFACE Pietro
R. Gemello
SCANZIO STEFANO
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/01/2006
Field of study

A technique is proposed for the adaptation of automatic speech recognition systems using Hybrid models combining Artificial Neural Networks with Hidden Markov Models. We investigated in this paper the extension of the classical approach consisting in applying linear transformations not only to the input features, but also to the outputs of the internal layers. The motivation is that the outputs of an internal layer represent a projection of the input pattern into a space where it should be easier to learn the classification or transformation expected at the output of the network. To reduce the risk that the network focuses on new data only, loosing its generalization capability (catastrophic forgetting), an original solution, Conservative Training is proposed. We illustrate the problem of catastrophic forgetting using an artificial test-bed, and apply our techniques to a set of adaptation tasks in the domain of Automatic Speech Recognition (ASR) based on Artificial Neural Networks. We report on the adaptation potential of different techniques, and on the generalization capability of the adapted networks. The results show that the combination of the proposed approaches mitigates the catastrophic forgetting effects, and always outperforms the use of the classical linear transformation in the feature space. 1

CiteSeerX

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Speaker Recognition using Channel Factor Feature Compensation

Author: CASTALDO F
COLIBRO D
DALMASSO E
LAFACE P.
VAIR C
Publication venue
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Linear hidden transformations for adaptation of hybrid ANN/HMM models

Author: De Mori Renato
Gemello Roberto
Laface Pietro
Mana Franco
Scanzio Stefano
Publication venue: Elsevier
Publication date: 01/01/2007
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model

Author: Wang Ke
Wang Yujun
Xie Lei
Zhang Junbo
Publication venue: 'International Speech Communication Association'
Publication date: 25/10/2018
Field of study

Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. A variety of neural network adaptation methods have been proposed since deep learning models have become the main stream. But there still lacks an experimental comparison between different methods, especially when DNN-based acoustic models have been advanced greatly. In this paper, we aim to close this gap by providing an empirical evaluation of three typical speaker adaptation methods: LIN, LHUC and KLD. Adaptation experiments, with different size of adaptation data, are conducted on a strong TDNN-LSTM acoustic model. More challengingly, here, the source and target we are concerned with are standard Mandarin speaker model and accented Mandarin speaker model. We compare the performances of different methods and their combinations. Speaker adaptation performance is also examined by speaker's accent degree.Comment: Interspeech 201

arXiv.org e-Print Archive

Crossref

Loquendo - Politecnico di Torino’s 2008 NIST Speaker Recognition Evaluation System

Author: CASTALDO F
COLIBRO D
COLIBRO D
DALMASSO E
LAFACE P.
VAIR C
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)