Search CORE

9,337 research outputs found

Wavelet transforms for non-uniform speech recognition

Author: Javier L
Lleida E
Martí J
Nadeu Camprubí Climent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1996
Field of study

An algorithm for nonuniform speech segmentation and its application in speech recognition systems is presented. A method based on the Modulated Gaussian Wavelet Transform based Speech Analyser (MGWTSA) and the subsequent parametrization block is used to transform a uniform signal into a set of nonuniformly separated frames, with the accurate information being fed into a speech recognition system. The algorithm needs a frame characterizing the signal where necessary, trying to reduce the number of frames per signal as much as possible, without an appreciable reduction in the recognition rate of the system.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Integrating user-centred design in the development of a silent speech interface based on permanent magnetic articulography

Author: Bai Jie
Cheah Lam A.
Ell Stephen R.
Fagan Michael J.
Gilbert James M.
Gonzalez Jose A.
Green Phil D.
Moore Roger K.
Rychenko Sergey I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/12/2015
Field of study

Abstract: A new wearable silent speech interface (SSI) based on Permanent Magnetic Articulography (PMA) was developed with the involvement of end users in the design process. Hence, desirable features such as appearance, port-ability, ease of use and light weight were integrated into the prototype. The aim of this paper is to address the challenges faced and the design considerations addressed during the development. Evaluation on both hardware and speech recognition performances are presented here. The new prototype shows a com-parable performance with its predecessor in terms of speech recognition accuracy (i.e. ~95% of word accuracy and ~75% of sequence accuracy), but significantly improved appearance, portability and hardware features in terms of min-iaturization and cost

Repository@Hull - Worktribe

Speech systems research at Texas Instruments

Author: Doddington George R.
Publication venue
Publication date
Field of study

An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented

NASA Technical Reports Server

Information fusion for subband-HMM speaker recognition

Author: Damper R. I.
Dodd T. J.
Higgins J. E.
Publication venue
Publication date: 01/01/2001
Field of study

Southampton (e-Prints Soton)

Multimodal One-Shot Learning of Speech and Images

Author: Eloff Ryan
Engelbrecht Herman A.
Kamper Herman
Publication venue
Publication date: 15/04/2019
Field of study

Imagine a robot is shown new concepts visually together with spoken tags, e.g. "milk", "eggs", "butter". After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the "milk". Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.Comment: 5 pages, 1 figure, 3 tables; accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Robust speech recognition based on a Bayesian prediction approach

Author: Hirose K
Huo Q
Jiang H
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

We study a category of robust speech recognition problem in which mismatches exist between training and testing conditions, and no accurate knowledge of the mismatch mechanism is available. The only available information is the test data along with a set of pretrained Gaussian mixture continuous density hidden Markov models (CDHMMs). We investigate the problem from the viewpoint of Bayesian prediction. A simple prior distribution, namely constrained uniform distribution, is adopted to characterize the uncertainty of the mean vectors of the CDHMMs. Two methods, namely a model compensation technique based on Bayesian predictive density and a robust decision strategy called Viterbi Bayesian predictive classification are studied. The proposed methods are compared with the conventional Viterbi decoding algorithm in speaker-independent recognition experiments on isolated digits and TI connected digit strings (TIDTGITS), where the mismatches between training and testing conditions are caused by: (1) additive Gaussian white noise, (2) each of 25 types of actual additive ambient noises, and (3) gender difference. The experimental results show that the adopted prior distribution and the proposed techniques help to improve the performance robustness under the examined mismatch conditions.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Voice input/output capabilities at Perception Technology Corporation

Author: Ferber Leon A.
Publication venue
Publication date
Field of study

Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included

NASA Technical Reports Server

Video augmentation for improving audio speech recognition under noise

Author: British Machine Vision Conference (BMVC)
Cavallaro A
Gong S
Pachoud S
Publication venue
Publication date: 23/02/2015
Field of study

Queen Mary Research Online