Search CORE

5,811 research outputs found

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

Author: -Doss Mathew Magimai.
Collobert Ronan
Palaz Dimitri
Publication venue
Publication date: 07/12/2013
Field of study

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in ``deep learning'' approaches questioned such systems, but while some attempts were made with simpler features such as spectrograms, state-of-the-art systems still rely on MFCCs. This might be viewed as a kind of failure from deep learning approaches, which are often claimed to have the ability to train with raw signals, alleviating the need of hand-crafted features. In this paper, we investigate a convolutional neural network approach for raw speech signals. While convolutional architectures got tremendous success in computer vision or text processing, they seem to have been let down in the past recent years in the speech processing field. We show that it is possible to learn an end-to-end phoneme sequence classifier system directly from raw signal, with similar performance on the TIMIT and WSJ datasets than existing systems based on MFCC, questioning the need of complex hand-crafted features on large datasets.Comment: NIPS Deep Learning Workshop, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Hidden Markov models and neural networks for speech recognition

Author: Riis Søren Kamaric
Publication venue: Technical University of Denmark
Publication date: 01/01/1998
Field of study

The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ..

CiteSeerX

Online Research Database In Technology

Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech

Author: Aniol M.
Bell P.
Christensen H.
Green P.
Hain T.
King S.
Swietojanski P.
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

Semi-continuous hidden Markov models for speech recognition

Author: Huang Xuedong
Publication venue: The University of Edinburgh
Publication date: 01/01/1989
Field of study

Edinburgh Research Archive

Optoelectronic Reservoir Computing

Author: A Rodan
A Rodan
AF Atiya
D Verstraeten
F Triefenbach
H Jaeger
H Jaeger
H Jaeger
HJ Caulfield
K Vandoorne
L Appeltant
M Lukoševičius
M Peil
T Larger
T Larger
VJ Mathews
W Maass
YK Chembo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2011
Field of study

Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an opto-electronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations.Comment: Contains main paper and two Supplementary Material

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

PubMed Central

DI-fusion