4,115 research outputs found
Discriminative Segmental Cascades for Feature-Rich Phone Recognition
Discriminative segmental models, such as segmental conditional random fields
(SCRFs) and segmental structured support vector machines (SSVMs), have had
success in speech recognition via both lattice rescoring and first-pass
decoding. However, such models suffer from slow decoding, hampering the use of
computationally expensive features, such as segment neural networks or other
high-order features. A typical solution is to use approximate decoding, either
by beam pruning in a single pass or by beam pruning to generate a lattice
followed by a second pass. In this work, we study discriminative segmental
models trained with a hinge loss (i.e., segmental structured SVMs). We show
that beam search is not suitable for learning rescoring models in this
approach, though it gives good approximate decoding performance when the model
is already well-trained. Instead, we consider an approach inspired by
structured prediction cascades, which use max-marginal pruning to generate
lattices. We obtain a high-accuracy phonetic recognition system with several
expensive feature types: a segment neural network, a second-order language
model, and second-order phone boundary features
Active Learning for Dialogue Act Classification
Active learning techniques were employed for classification of dialogue acts over two dialogue corpora, the English human-human Switchboard corpus and the Spanish human-machine Dihana corpus. It is shown clearly that active learning improves on a baseline obtained through a passive learning approach to tagging the same data sets. An error reduction of 7% was obtained on Switchboard, while a factor 5 reduction in the amount of labeled data needed for classification was achieved on Dihana. The passive Support Vector Machine learner used as baseline in itself significantly improves the state of the art in dialogue act classification on both corpora. On Switchboard it gives a 31% error reduction compared to the previously best reported result
Multi-facet classification of e-mails in a helpdesk scenario
Helpdesks have to manage a huge amount of
support requests which are usually submitted
via e-mail. In order to be assigned to experts
e ciently, incoming e-mails have to be classi-
ed w. r. t. several facets, in particular topic,
support type and priority. It is desirable to
perform these classi cations automatically.
We report on experiments using Support Vector
Machines and k-Nearest-Neighbours, respectively,
for the given multi-facet classi -
cation task. The challenge is to de ne suitable
features for each facet. Our results suggest
that improvements can be gained for all
facets, and they also reveal which features are
promising for a particular facet
Real-time robust automatic speech recognition using compact support vector machines
In the last years, support vector machines (SVMs) have shown excellent performance in many applications, especially in the presence of noise. In particular, SVMs offer several advantages over artificial neural networks (ANNs) that have attracted the attention of the speech processing community. Nevertheless, their high computational requirements prevent them from being used in practice in automatic speech recognition (ASR), where ANNs have proven to be successful. The high complexity of SVMs in this context arises from the use of huge speech training databases with millions of samples and highly overlapped classes. This paper suggests the use of a weighted least squares (WLS) training procedure that facilitates the possibility of imposing a compact semiparametric model on the SVM, which results in a dramatic complexity reduction. Such a complexity reduction with respect to conventional SVMs, which is between two and three orders of magnitude, allows the proposed hybrid WLS-SVC/HMM system to perform real-time speech decoding on a connected-digit recognition task (SpeechDat Spanish database). The experimental evaluation of the proposed system shows encouraging performance levels in clean and noisy conditions, although further improvements are required to reach the maturity level of current context-dependent HMM based recognizers.Spanish Ministry of Science and Innovation TEC 2008-06382 and TEC 2008-02473 and Comunidad AutĂłnoma de Madrid-UC3M CCG10-UC3M/TIC-5304.Publicad
- …