18,480 research outputs found
Discriminative Segmental Cascades for Feature-Rich Phone Recognition
Discriminative segmental models, such as segmental conditional random fields
(SCRFs) and segmental structured support vector machines (SSVMs), have had
success in speech recognition via both lattice rescoring and first-pass
decoding. However, such models suffer from slow decoding, hampering the use of
computationally expensive features, such as segment neural networks or other
high-order features. A typical solution is to use approximate decoding, either
by beam pruning in a single pass or by beam pruning to generate a lattice
followed by a second pass. In this work, we study discriminative segmental
models trained with a hinge loss (i.e., segmental structured SVMs). We show
that beam search is not suitable for learning rescoring models in this
approach, though it gives good approximate decoding performance when the model
is already well-trained. Instead, we consider an approach inspired by
structured prediction cascades, which use max-marginal pruning to generate
lattices. We obtain a high-accuracy phonetic recognition system with several
expensive feature types: a segment neural network, a second-order language
model, and second-order phone boundary features
Bio-Inspired Multi-Layer Spiking Neural Network Extracts Discriminative Features from Speech Signals
Spiking neural networks (SNNs) enable power-efficient implementations due to
their sparse, spike-based coding scheme. This paper develops a bio-inspired SNN
that uses unsupervised learning to extract discriminative features from speech
signals, which can subsequently be used in a classifier. The architecture
consists of a spiking convolutional/pooling layer followed by a fully connected
spiking layer for feature discovery. The convolutional layer of leaky,
integrate-and-fire (LIF) neurons represents primary acoustic features. The
fully connected layer is equipped with a probabilistic spike-timing-dependent
plasticity learning rule. This layer represents the discriminative features
through probabilistic, LIF neurons. To assess the discriminative power of the
learned features, they are used in a hidden Markov model (HMM) for spoken digit
recognition. The experimental results show performance above 96% that compares
favorably with popular statistical feature extraction methods. Our results
provide a novel demonstration of unsupervised feature acquisition in an SNN
- …