Search CORE

17 research outputs found

Phonetic Searching

Author
Publication venue
Publication date: 12/11/2006
Field of study

An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.Georgia Tech Research Corporatio

Scholarly Materials And Research @ Georgia Tech

Phonetic Searching

Author
Publication venue
Publication date
Field of study

Scholarly Materials And Research @ Georgia Tech

Hierarchical duration modeling for a speech recognition system

Author: Chung Grace Yuet-Chee
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 102-105).by Grace Chung.M.S

DSpace@MIT

Fast Approximate Spoken Term Detection from Sequence of Phonemes

Author: Hermansky Hynek
Pinto Joel Praveen
Prasanna S. R. Mahadeva
Szoke Igor
Publication venue
Publication date: 11/02/2010
Field of study

We investigate the detection of spoken terms in conversational speech using phoneme recognition with the objective of achieving smaller index size as well as faster search speed. Speech is processed and indexed as a sequence of one best phoneme sequence. We propose the use of a probabilistic pronunciation model for the search term to compensate for the errors in the recognition of phonemes. This model is derived using the pronunciation of the word and the phoneme confusion matrix. Experiments are performed on the conversational telephone speech database distributed by NIST for the 2006 spoken term detection. We achieve about 1500 times smaller index size and 14 times faster search speed compared to the state-of-the-art system using phoneme lattice at the cost of relatively lower detection performance

Infoscience - École polytechnique fédérale de Lausanne

Unsupervised Pre-Training for Voice Activation

Author: Kolesau Aliaksei
Šešok Dmitrij
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

The problem of voice activation is to find a pre-defined word in the audio stream. Solutions such as keyword spotter “Ok, Google” for Android devices or keyword spotter “Alexa” for Amazon devices use tens of thousands to millions of keyword examples in training. In this paper, we explore the possibility of using pre-trained audio features to build voice activation with a small number of keyword examples. The contribution of this article consists of two parts. First, we investigate the dependence of the quality of the voice activation system on the number of examples in training for English and Russian and show that the use of pre-trained audio features, such as wav2vec, increases the accuracy of the system by up to 10% if only seven examples are available for each keyword during training. At the same time, the benefits of such features become less and disappear as the dataset size increases. Secondly, we prepare and provide for general use a dataset for training and testing voice activation for the Lithuanian language. We also provide training results on this dataset.This article belongs to the Section Computing and Artificial Intelligenc

Vilniaus Gedimino Technikos Universitetas: VGTU Talpykla / Vilnius Gediminas Technical University: VGTU Repository

The Design and Application of an Acoustic Front-End for Use in Speech Interfaces

Author: Gerber Christoph
Publication venue: ProQuest Dissertations & Theses,
Publication date: 01/01/1996
Field of study

This thesis describes the design, implementation, and application of an acoustic front-end. Such front-ends constitute the core of automatic speech recognition systems. The front-end whose development is reported here has been designed for speaker-independent large vocabulary recognition. The emphasis of this thesis is more one of design than of application. This work exploits the current state-of-the-art in speech recognition research, for example, the use of Hidden Markov Models. It describes the steps taken to build a speaker-independent large vocabulary system from signal processing, through pattern matching, to language modelling. An acoustic front-end can be considered as a multi-stage process, each of which requires the specification of many parameters. Some parameters have fundamental consequences for the ultimate application of the front-end. Therefore, a major part of this thesis is concerned with their analysis and specification. Experiments were carried out to determine the characteristics of individual parameters, the results of which were then used to motivate particular parameter settings. The thesis concludes with some applications that point out, not only the power of the resulting acoustic front-end, but also its limitations

Glasgow Theses Service

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Language technology in multimedia information retrieval:Proceedings of the fourteenth International Twente Workshop on Language Technology

Author
Publication venue: 'University Library/University of Twente'
Publication date: 01/12/1998
Field of study

University of Twente Research Information