Search CORE

6,104 research outputs found

Phoneme Recognition Using Acoustic Events

Author: Carson-Berndsen Julie
Huebener Kai
Publication venue
Publication date: 01/01/1994
Field of study

This paper presents a new approach to phoneme recognition using nonsequential sub--phoneme units. These units are called acoustic events and are phonologically meaningful as well as recognizable from speech signals. Acoustic events form a phonologically incomplete representation as compared to distinctive features. This problem may partly be overcome by incorporating phonological constraints. Currently, 24 binary events describing manner and place of articulation, vowel quality and voicing are used to recognize all German phonemes. Phoneme recognition in this paradigm consists of two steps: After the acoustic events have been determined from the speech signal, a phonological parser is used to generate syllable and phoneme hypotheses from the event lattice. Results obtained on a speaker--dependent corpus are presented.Comment: 4 pages, to appear at ICSLP'94, PostScript version (compressed and uuencoded

arXiv.org e-Print Archive

CiteSeerX

Universaar

Acronym

Fast and Accurate OOV Decoder on High-Level Features

Author: Khokhlov Yuri
Medennikov Ivan
Romanenko Alexei
Tomashenko Natalia
Publication venue
Publication date: 19/07/2017
Field of study

This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for decoding. These features are obtained by calculating time-dependent phoneme posterior probabilities from word lattices, followed by their smoothing. For the PPB features we developed a special novel very fast, simple and efficient OOV decoder. Experimental results are presented on the Georgian language from the IARPA Babel Program, which was the test language in the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum term weighted value (MTWV) metric and computational speed, for single ASR systems, the proposed approach significantly outperforms the state-of-the-art approach based on using in-vocabulary proxies for OOV keywords in the indexed database. The comparison of the two OOV KWS approaches on the fusion results of the nine different ASR systems demonstrates that the proposed OOV decoder outperforms the proxy-based approach in terms of MTWV metric given the comparable processing speed. Other important advantages of the OOV decoder include extremely low memory consumption and simplicity of its implementation and parameter optimization.Comment: Interspeech 2017, August 2017, Stockholm, Sweden. 201

arXiv.org e-Print Archive

Crossref

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Author: Bengio Yoshua
De Mori Renato
Linarès Georges
Morchid Mohamed
Parcollet Titouan
Trabelsi Chiheb
Zhang Ying
Publication venue
Publication date: 20/06/2018
Field of study

Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.Comment: Accepted at INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Werff Laurens van der
Publication venue: Landelijke Onderzoekschool Taalwetenschap
Publication date: 01/01/2007
Field of study

Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database

University of Twente Research Information

Utrecht University Repository

A spiking neural network for real-time Spanish vowel phonemes recognition

Author: Gómez Rodríguez Francisco de Asís
Jiménez Fernández Ángel Francisco
Jiménez Moreno Gabriel
Miró Amarante María Lourdes
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

This paper explores neuromorphic approach capabilities applied to real-time speech processing. A spiking recognition neural network composed of three types of neurons is proposed. These neurons are based on an integrative and fire model and are capable of recognizing auditory frequency patterns, such as vowel phonemes; words are recognized as sequences of vowel phonemes. For demonstrating real-time operation, a complete spiking recognition neural network has been described in VHDL for detecting certain Spanish words, and it has been tested in a FPGA platform. This is a stand-alone and fully hardware system that allows to embed it in a mobile system. To stimulate the network, a spiking digital-filter-based cochlea has been implemented in VHDL. In the implementation, an Address Event Representation (AER) is used for transmitting information between neurons.Ministerio de Economía y Competitividad TEC2012-37868-C04-02/0

idUS. Depósito de Investigación Universidad de Sevilla