6,104 research outputs found
Phoneme Recognition Using Acoustic Events
This paper presents a new approach to phoneme recognition using nonsequential
sub--phoneme units. These units are called acoustic events and are
phonologically meaningful as well as recognizable from speech signals. Acoustic
events form a phonologically incomplete representation as compared to
distinctive features. This problem may partly be overcome by incorporating
phonological constraints. Currently, 24 binary events describing manner and
place of articulation, vowel quality and voicing are used to recognize all
German phonemes. Phoneme recognition in this paradigm consists of two steps:
After the acoustic events have been determined from the speech signal, a
phonological parser is used to generate syllable and phoneme hypotheses from
the event lattice. Results obtained on a speaker--dependent corpus are
presented.Comment: 4 pages, to appear at ICSLP'94, PostScript version (compressed and
uuencoded
Fast and Accurate OOV Decoder on High-Level Features
This work proposes a novel approach to out-of-vocabulary (OOV) keyword search
(KWS) task. The proposed approach is based on using high-level features from an
automatic speech recognition (ASR) system, so called phoneme posterior based
(PPB) features, for decoding. These features are obtained by calculating
time-dependent phoneme posterior probabilities from word lattices, followed by
their smoothing. For the PPB features we developed a special novel very fast,
simple and efficient OOV decoder. Experimental results are presented on the
Georgian language from the IARPA Babel Program, which was the test language in
the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum
term weighted value (MTWV) metric and computational speed, for single ASR
systems, the proposed approach significantly outperforms the state-of-the-art
approach based on using in-vocabulary proxies for OOV keywords in the indexed
database. The comparison of the two OOV KWS approaches on the fusion results of
the nine different ASR systems demonstrates that the proposed OOV decoder
outperforms the proxy-based approach in terms of MTWV metric given the
comparable processing speed. Other important advantages of the OOV decoder
include extremely low memory consumption and simplicity of its implementation
and parameter optimization.Comment: Interspeech 2017, August 2017, Stockholm, Sweden. 201
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Recently, the connectionist temporal classification (CTC) model coupled with
recurrent (RNN) or convolutional neural networks (CNN), made it easier to train
speech recognition systems in an end-to-end fashion. However in real-valued
models, time frame components such as mel-filter-bank energies and the cepstral
coefficients obtained from them, together with their first and second order
derivatives, are processed as individual elements, while a natural alternative
is to process such components as composed entities. We propose to group such
elements in the form of quaternions and to process these quaternions using the
established quaternion algebra. Quaternion numbers and quaternion neural
networks have shown their efficiency to process multidimensional inputs as
entities, to encode internal dependencies, and to solve many tasks with less
learning parameters than real-valued models. This paper proposes to integrate
multiple feature views in quaternion-valued convolutional neural network
(QCNN), to be used for sequence-to-sequence mapping with the CTC model.
Promising results are reported using simple QCNNs in phoneme recognition
experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme
error rate (PER) with less learning parameters than a competing model based on
real-valued CNNs.Comment: Accepted at INTERSPEECH 201
Radio Oranje: Enhanced Access to a Historical Spoken Word Collection
Access to historical audio collections is typically very restricted:\ud
content is often only available on physical (analog) media and the\ud
metadata is usually limited to keywords, giving access at the level\ud
of relatively large fragments, e.g., an entire tape. Many spoken\ud
word heritage collections are now being digitized, which allows the\ud
introduction of more advanced search technology. This paper presents\ud
an approach that supports online access and search for recordings of\ud
historical speeches. A demonstrator has been built, based on the\ud
so-called Radio Oranje collection, which contains radio speeches by\ud
the Dutch Queen Wilhelmina that were broadcast during World War II.\ud
The audio has been aligned with its original 1940s manual\ud
transcriptions to create a time-stamped index that enables the speeches to be\ud
searched at the word level. Results are presented together with\ud
related photos from an external database
A spiking neural network for real-time Spanish vowel phonemes recognition
This paper explores neuromorphic approach capabilities applied to real-time speech processing. A spiking
recognition neural network composed of three types of neurons is proposed. These neurons are based on an
integrative and fire model and are capable of recognizing auditory frequency patterns, such as vowel phonemes;
words are recognized as sequences of vowel phonemes. For demonstrating real-time operation, a complete
spiking recognition neural network has been described in VHDL for detecting certain Spanish words, and it has
been tested in a FPGA platform. This is a stand-alone and fully hardware system that allows to embed it in a
mobile system. To stimulate the network, a spiking digital-filter-based cochlea has been implemented in VHDL.
In the implementation, an Address Event Representation (AER) is used for transmitting information between
neurons.Ministerio de Economía y Competitividad TEC2012-37868-C04-02/0
- …