5,317 research outputs found
Meta-Learning for Phonemic Annotation of Corpora
We apply rule induction, classifier combination and meta-learning (stacked
classifiers) to the problem of bootstrapping high accuracy automatic annotation
of corpora with pronunciation information. The task we address in this paper
consists of generating phonemic representations reflecting the Flemish and
Dutch pronunciations of a word on the basis of its orthographic representation
(which in turn is based on the actual speech recordings). We compare several
possible approaches to achieve the text-to-pronunciation mapping task:
memory-based learning, transformation-based learning, rule induction, maximum
entropy modeling, combination of classifiers in stacked learning, and stacking
of meta-learners. We are interested both in optimal accuracy and in obtaining
insight into the linguistic regularities involved. As far as accuracy is
concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at
word level) for single classifiers is boosted significantly with additional
error reductions of 31% and 38% respectively using combination of classifiers,
and a further 5% using combination of meta-learners, bringing overall word
level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We
also show that the application of machine learning methods indeed leads to
increased insight into the linguistic regularities determining the variation
between the two pronunciation variants studied.Comment: 8 page
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks
Most phoneme recognition state-of-the-art systems rely on a classical neural
network classifiers, fed with highly tuned features, such as MFCC or PLP
features. Recent advances in ``deep learning'' approaches questioned such
systems, but while some attempts were made with simpler features such as
spectrograms, state-of-the-art systems still rely on MFCCs. This might be
viewed as a kind of failure from deep learning approaches, which are often
claimed to have the ability to train with raw signals, alleviating the need of
hand-crafted features. In this paper, we investigate a convolutional neural
network approach for raw speech signals. While convolutional architectures got
tremendous success in computer vision or text processing, they seem to have
been let down in the past recent years in the speech processing field. We show
that it is possible to learn an end-to-end phoneme sequence classifier system
directly from raw signal, with similar performance on the TIMIT and WSJ
datasets than existing systems based on MFCC, questioning the need of complex
hand-crafted features on large datasets.Comment: NIPS Deep Learning Workshop, 201
Memory-Based Lexical Acquisition and Processing
Current approaches to computational lexicology in language technology are
knowledge-based (competence-oriented) and try to abstract away from specific
formalisms, domains, and applications. This results in severe complexity,
acquisition and reusability bottlenecks. As an alternative, we propose a
particular performance-oriented approach to Natural Language Processing based
on automatic memory-based learning of linguistic (lexical) tasks. The
consequences of the approach for computational lexicology are discussed, and
the application of the approach on a number of lexical acquisition and
disambiguation tasks in phonology, morphology and syntax is described.Comment: 18 page
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
- …