6,920 research outputs found
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks
Most phoneme recognition state-of-the-art systems rely on a classical neural
network classifiers, fed with highly tuned features, such as MFCC or PLP
features. Recent advances in ``deep learning'' approaches questioned such
systems, but while some attempts were made with simpler features such as
spectrograms, state-of-the-art systems still rely on MFCCs. This might be
viewed as a kind of failure from deep learning approaches, which are often
claimed to have the ability to train with raw signals, alleviating the need of
hand-crafted features. In this paper, we investigate a convolutional neural
network approach for raw speech signals. While convolutional architectures got
tremendous success in computer vision or text processing, they seem to have
been let down in the past recent years in the speech processing field. We show
that it is possible to learn an end-to-end phoneme sequence classifier system
directly from raw signal, with similar performance on the TIMIT and WSJ
datasets than existing systems based on MFCC, questioning the need of complex
hand-crafted features on large datasets.Comment: NIPS Deep Learning Workshop, 201
Recommended from our members
Gait recognition using HMMs and dual discriminative observations for sub-dynamics analysis
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.We propose a new gait recognition method that combines holistic and model-based features. Both types of features are extracted automatically from gait silhouette sequences and their combination takes place by means of a pair of hidden Markov models. In the proposed system, the holistic features are initially used for capturing general gait dynamics whereas, subsequently, the model-based features are deployed for capturing more detailed sub-dynamics by refining upon the preceding general dynamics. Furthermore, the holistic and model-based features are suitably processed in order to improve the discriminatory capacity of the final system. The experimental results show that the proposed method exhibits performance advantages in comparison with popular existing methods
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
Bayesian Information Extraction Network
Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various
aspects of language in one model. Many existing algorithms developed for
learning and inference in DBNs are applicable to probabilistic language
modeling. To demonstrate the potential of DBNs for natural language processing,
we employ a DBN in an information extraction task. We show how to assemble
wealth of emerging linguistic instruments for shallow parsing, syntactic and
semantic tagging, morphological decomposition, named entity recognition etc. in
order to incrementally build a robust information extraction system. Our method
outperforms previously published results on an established benchmark domain.Comment: 6 page
Generative Image Modeling Using Spatial LSTMs
Modeling the distribution of natural images is challenging, partly because of
strong statistical dependencies which can extend over hundreds of pixels.
Recurrent neural networks have been successful in capturing long-range
dependencies in a number of problems but only recently have found their way
into generative image models. We here introduce a recurrent image model based
on multi-dimensional long short-term memory units which are particularly suited
for image modeling due to their spatial structure. Our model scales to images
of arbitrary size and its likelihood is computationally tractable. We find that
it outperforms the state of the art in quantitative comparisons on several
image datasets and produces promising results when used for texture synthesis
and inpainting
Automated speech and audio analysis for semantic access to multimedia
The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives
- …