Search CORE

13,738 research outputs found

Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

Author: Olorenshaw Lex
Trawick David
Publication venue
Publication date
Field of study

The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words

NASA Technical Reports Server

Recommended from our members

Privacy-Preserving iVector-Based Speaker Verification

Author: Lu R.
Rahulamathavan Y.
Rajarajan M.
Ray I. G.
Sutharsini K. R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2019
Field of study

This paper introduces an efficient algorithm to develop a privacy-preserving voice verification based on iVector and linear discriminant analysis techniques. This research considers a scenario in which users enrol their voice biometric to access different services (i.e., banking). Once enrolment is completed, users can verify themselves using their voice print instead of alphanumeric passwords. Since a voice print is unique for everyone, storing it with a third-party server raises several privacy concerns. To address this challenge, this paper proposes a novel technique based on randomization to carry out voice authentication, which allows the user to enrol and verify their voice in the randomized domain. To achieve this, the iVector-based voice verification technique has been redesigned to work on the randomized domain. The proposed algorithm is validated using a well-known speech dataset. The proposed algorithm neither compromises the authentication accuracy nor adds additional complexity due to the randomization operations

City Research Online

Recommended from our members

An end-to-end framework for real-time automatic sleep stage classification.

Author: Ancoli-Israel Sonia
Chee Michael WL
Gooley Joshua J
Ong Ju Lynn
Patanaik Amiya
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

Sleep staging is a fundamental but time consuming process in any sleep laboratory. To greatly speed up sleep staging without compromising accuracy, we developed a novel framework for performing real-time automatic sleep stage classification. The client-server architecture adopted here provides an end-to-end solution for anonymizing and efficiently transporting polysomnography data from the client to the server and for receiving sleep stages in an interoperable fashion. The framework intelligently partitions the sleep staging task between the client and server in a way that multiple low-end clients can work with one server, and can be deployed both locally as well as over the cloud. The framework was tested on four datasets comprising ≈1700 polysomnography records (≈12000 hr of recordings) collected from adolescents, young, and old adults, involving healthy persons as well as those with medical conditions. We used two independent validation datasets: one comprising patients from a sleep disorders clinic and the other incorporating patients with Parkinson's disease. Using this system, an entire night's sleep was staged with an accuracy on par with expert human scorers but much faster (≈5 s compared with 30-60 min). To illustrate the utility of such real-time sleep staging, we used it to facilitate the automatic delivery of acoustic stimuli at targeted phase of slow-sleep oscillations to enhance slow-wave sleep

eScholarship - University of California

MISPRONUNCIATION DETECTION AND DIAGNOSIS IN MANDARIN ACCENTED ENGLISH SPEECH

Author: Khanal Subash
Publication venue: UKnowledge
Publication date: 01/01/2020
Field of study

This work presents the development, implementation, and evaluation of a Mispronunciation Detection and Diagnosis (MDD) system, with application to pronunciation evaluation of Mandarin-accented English speech. A comprehensive detection and diagnosis of errors in the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) was performed by using the expert phonetic transcripts and an Automatic Speech Recognition (ASR) system. Articulatory features derived from the parallel kinematic data available in the EMA-MAE corpus were used to identify the most significant articulatory error patterns seen in L2 speakers during common mispronunciations. Using both acoustic and articulatory information, an ASR based Mispronunciation Detection and Diagnosis (MDD) system was built and evaluated across different feature combinations and Deep Neural Network (DNN) architectures. The MDD system captured mispronunciation errors with a detection accuracy of 82.4%, a diagnostic accuracy of 75.8% and a false rejection rate of 17.2%. The results demonstrate the advantage of using articulatory features in revealing the significant contributors of mispronunciation as well as improving the performance of MDD systems

University of Kentucky

Very Fast Keyword Spotting System with Real Time Factor below 0.01

Author: J Foote
J Málek
J Nouza
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/07/2020
Field of study

In the paper we present an architecture of a keyword spotting (KWS) system that is based on modern neural networks, yields good performance on various types of speech data and can run very fast. We focus mainly on the last aspect and propose optimizations for all the steps required in a KWS design: signal processing and likelihood computation, Viterbi decoding, spot candidate detection and confidence calculation. We present time and memory efficient modelling by bidirectional feedforward sequential memory networks (an alternative to recurrent nets) either by standard triphones or so called quasi-monophones, and an entirely forward decoding of speech frames (with minimal need for look back). Several variants of the proposed scheme are evaluated on 3 large Czech datasets (broadcast, internet and telephone, 17 hours in total) and their performance is compared by Detection Error Tradeoff (DET) diagrams and real-time (RT) factors. We demonstrate that the complete system can run in a single pass with a RT factor close to 0.001 if all optimizations (including a GPU for likelihood computation) are applied.Comment: 11 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Enhancing posterior based speech recognition systems

Author: Ketabdar Hamed
Publication venue: Lausanne, EPFL
Publication date: 05/09/2008
Field of study

The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for enhancing the estimation of local posteriors, by integrating phonetic and lexical knowledge, as well as long contextual information. This framework allows for hierarchical estimation, integration and use of local posteriors from the phoneme up to the word level. We propose two approaches for enhancing the posteriors. In the first approach, phoneme posteriors estimated with an ANN (particularly multi-layer Perceptron – MLP) are used as emission probabilities in HMM forward-backward recursions. This yields new enhanced posterior estimates integrating HMM topological constraints (encoding specific phonetic and lexical knowledge), and long context. In the second approach, a temporal context of the regular MLP posteriors is post-processed by a secondary MLP, in order to learn inter and intra dependencies among the phoneme posteriors. The learned knowledge is integrated in the posterior estimation during the inference (forward pass) of the second MLP, resulting in enhanced posteriors. The use of resulting local enhanced posteriors is investigated in a wide range of posterior based speech recognition systems (e.g. Tandem and hybrid HMM/ANN), as a replacement or in combination with the regular MLP posteriors. The enhanced posteriors consistently outperform the regular posteriors in different applications over small and large vocabulary databases

Infoscience - École polytechnique fédérale de Lausanne

Support Vector Machine classification of strong gravitational lenses

Author: Flamary R.
Hartley P.
Jackson N.
Metcalf R. B.
Tagore A. S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

The imminent advent of very large-scale optical sky surveys, such as Euclid and LSST, makes it important to find efficient ways of discovering rare objects such as strong gravitational lens systems, where a background object is multiply gravitationally imaged by a foreground mass. As well as finding the lens systems, it is important to reject false positives due to intrinsic structure in galaxies, and much work is in progress with machine learning algorithms such as neural networks in order to achieve both these aims. We present and discuss a Support Vector Machine (SVM) algorithm which makes use of a Gabor filterbank in order to provide learning criteria for separation of lenses and non-lenses, and demonstrate using blind challenges that under certain circumstances it is a particularly efficient algorithm for rejecting false positives. We compare the SVM engine with a large-scale human examination of 100000 simulated lenses in a challenge dataset, and also apply the SVM method to survey images from the Kilo-Degree Survey.Comment: Accepted by MNRA

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The University of Manchester - Institutional Repository

Survey on Leveraging Uncertainty Estimation Towards Trustworthy Deep Neural Networks: The Case of Reject Option and Post-training Processing

Author: Abdar Moloud
Aickelin Uwe
Hasan Mehedi
Hossain Ibrahim
Khosravi Abbas
Lio' Pietro
Nahavandi Saeid
Rahman Ashikur
Publication venue
Publication date: 10/04/2023
Field of study

Although neural networks (especially deep neural networks) have achieved \textit{better-than-human} performance in many fields, their real-world deployment is still questionable due to the lack of awareness about the limitation in their knowledge. To incorporate such awareness in the machine learning model, prediction with reject option (also known as selective classification or classification with abstention) has been proposed in literature. In this paper, we present a systematic review of the prediction with the reject option in the context of various neural networks. To the best of our knowledge, this is the first study focusing on this aspect of neural networks. Moreover, we discuss different novel loss functions related to the reject option and post-training processing (if any) of network output for generating suitable measurements for knowledge awareness of the model. Finally, we address the application of the rejection option in reducing the prediction time for the real-time problems and present a comprehensive summary of the techniques related to the reject option in the context of extensive variety of neural networks. Our code is available on GitHub: \url{https://github.com/MehediHasanTutul/Reject_option

arXiv.org e-Print Archive