Search CORE

30,307 research outputs found

A Speech Recognition System Based Improved Algorithm of Dual-template HMM

Author: Zhang Jing
Zhang Min
Publication venue: Published by Elsevier Ltd.
Publication date: 31/12/2011
Field of study

AbstractThe hidden Markov (HMM) and speech recognition algorithm based this model were studied in the paper. In addition the model and recognition algorithm of HMM got be improved based on the traditional the HMM. In the process of modeling, through the training of multiple observe sequence to achieve the recognition of non-specific people, and according to the different number of HMM states to establish the double-template of rough and high precision, and through the second matching algorithm to achieve higher recognition rate. A speech recognition system combined MFCC parameters and HMM algorithm was constructed based improved HMM algorithm. Experimental result shown the speech recognition rate of large vocabulary of non-specific people was greatly improved

Elsevier - Publisher Connector

Integrate template matching and statistical modeling for continuous speech recognition

Author: Sun Xie
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Title from PDF of title page (University of Missouri--Columbia, viewed on May 30, 2012).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Dissertation advisor: Dr. Yunxin ZhaoVita.Ph. D. University of Missouri--Columbia 2011"December 2011"In this dissertation, a novel approach of integrating template matching with statistical modeling is proposed to improve continuous speech recognition. Commonly used Hidden Markov Models (HMMs) are ineffective in modeling details of speech temporal evolutions, which can be overcome by template-based methods. However, template-based methods are difficult to be extended in large vocabulary continuous speech recognition (LVCSR). Our proposed approach takes advantages of both statistical modeling and template matching to overcome the weaknesses of traditional HMMs and conventional template-based methods. We use multiple Gaussian Mixture Model indices to represent each frame of speech templates. The local distances of log likelihood ratio and Kullback-Leibler divergence are proposed for dynamic time warping based template matching. In order to reduce computational complexity and storage space, we propose methods of minimum distance template selection and maximum log-likelihood template selection, and investigate a template compression method on top of template selection to further improve recognition performance. Experimental results on the TIMIT phone recognition task and a LVCSR task of telehealth captioning demonstrated that the proposed approach significantly improved the performance of recognition accuracy over the HMM baselines, and on the TIMIT task, the proposed method showed consistent performance improvements over progressively enhanced HMM baselines. Moreover, the template selection methods largely reduced computation and storage complexities. Finally, an investigation was made to combine acoustic scores in triphone template matching with scores of prosodic features, which showed positive effects on vowels in LVCSR.Includes bibliographical reference

University of Missouri: MOspace

Design and implementation of a user-oriented speech recognition interface: the synergy of technology and human factors

Author: Kloosterman Sietse H.
Publication venue: Elsevier
Publication date: 01/01/1994
Field of study

The design and implementation of a user-oriented speech recognition interface are described. The interface enables the use of speech recognition in so-called interactive voice response systems which can be accessed via a telephone connection. In the design of the interface a synergy of technology and human factors is achieved. This synergy is very important for making speech interfaces a natural and acceptable form of human-machine interaction. Important concepts such as interfaces, human factors and speech recognition are discussed. Additionally, an indication is given as to how the synergy of human factors and technology can be realised by a sketch of the interface's implementation. An explanation is also provided of how the interface might be integrated in different applications fruitfully

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Twente Research Information

Dissertations of the University of Groningen

Continuous Action Recognition Based on Sequence Alignment

Author: Cech Jan
Evangelidis Georgios
Horaud Radu
Kulkarni Kaustubh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/06/2014
Field of study

Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Statistical assessment of speech system performance

Author: Moshier Stephen L.
Publication venue
Publication date
Field of study

Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described

NASA Technical Reports Server