5,811 research outputs found
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks
Most phoneme recognition state-of-the-art systems rely on a classical neural
network classifiers, fed with highly tuned features, such as MFCC or PLP
features. Recent advances in ``deep learning'' approaches questioned such
systems, but while some attempts were made with simpler features such as
spectrograms, state-of-the-art systems still rely on MFCCs. This might be
viewed as a kind of failure from deep learning approaches, which are often
claimed to have the ability to train with raw signals, alleviating the need of
hand-crafted features. In this paper, we investigate a convolutional neural
network approach for raw speech signals. While convolutional architectures got
tremendous success in computer vision or text processing, they seem to have
been let down in the past recent years in the speech processing field. We show
that it is possible to learn an end-to-end phoneme sequence classifier system
directly from raw signal, with similar performance on the TIMIT and WSJ
datasets than existing systems based on MFCC, questioning the need of complex
hand-crafted features on large datasets.Comment: NIPS Deep Learning Workshop, 201
Hidden Markov models and neural networks for speech recognition
The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ..
Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
The rapid population aging has stimulated the development of assistive
devices that provide personalized medical support to the needies suffering from
various etiologies. One prominent clinical application is a computer-assisted
speech training system which enables personalized speech therapy to patients
impaired by communicative disorders in the patient's home environment. Such a
system relies on the robust automatic speech recognition (ASR) technology to be
able to provide accurate articulation feedback. With the long-term aim of
developing off-the-shelf ASR systems that can be incorporated in clinical
context without prior speaker information, we compare the ASR performance of
speaker-independent bottleneck and articulatory features on dysarthric speech
used in conjunction with dedicated neural network-based acoustic models that
have been shown to be robust against spectrotemporal deviations. We report ASR
performance of these systems on two dysarthric speech datasets of different
characteristics to quantify the achieved performance gains. Despite the
remaining performance gap between the dysarthric and normal speech, significant
improvements have been reported on both datasets using speaker-independent ASR
architectures.Comment: to appear in Computer Speech & Language -
https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial
text overlap with arXiv:1807.1094
Optoelectronic Reservoir Computing
Reservoir computing is a recently introduced, highly efficient bio-inspired
approach for processing time dependent data. The basic scheme of reservoir
computing consists of a non linear recurrent dynamical system coupled to a
single input layer and a single output layer. Within these constraints many
implementations are possible. Here we report an opto-electronic implementation
of reservoir computing based on a recently proposed architecture consisting of
a single non linear node and a delay line. Our implementation is sufficiently
fast for real time information processing. We illustrate its performance on
tasks of practical importance such as nonlinear channel equalization and speech
recognition, and obtain results comparable to state of the art digital
implementations.Comment: Contains main paper and two Supplementary Material
- …