18 research outputs found
A new robust algorithm for isolated word endpoint detection
Teager Energy and Energy-Entropy Features are two approaches, which have recently been used for locating the endpoints of an utterance. However, each of them has some drawbacks for speech in noisy environments. This paper proposes a novel method to combine these two approaches to locate endpoint intervals and yet make a final decision based on energy, which requires far less time than the feature based methods. After the algorithm description, an experimental evaluation is presented, comparing the automatically determined endpoints with those determined by skilled personnel. It is shown that the accuracy of this algorithm is quite satisfactory and acceptable
A Robust Multiple Feature Approach To Endpoint Detection In Car Environment Based On Advanced Classifiers
In this paper we propose an endpoint detection system based on the
use of several features extracted from each speech frame, followed by a robust
classifier (i.e Adaboost and Bagging of decision trees, and a multilayer perceptron)
and a finite state automata (FSA). We present results for four different
classifiers. The FSA module consisted of a 4-state decision logic that filtered
false alarms and false positives. We compare the use of four different classifiers
in this task. The look ahead of the method that we propose was of 7 frames,
which are the number of frames that maximized the accuracy of the system.
The system was tested with real signals recorded inside a car, with signal to
noise ratio that ranged from 6 dB to 30dB. Finally we present experimental results
demonstrating that the system yields robust endpoint detection
Exploring Non-linear Transformations for an Entropybased Voice Activity Detector
In this paper we explore the use of non-linear transformations in
order to improve the performance of an entropy based voice activity detector
(VAD). The idea of using a non-linear transformation comes from some
previous work done in speech linear prediction (LPC) field based in source
separation techniques, where the score function was added into the classical
equations in order to take into account the real distribution of the signal. We
explore the possibility of estimating the entropy of frames after calculating its
score function, instead of using original frames. We observe that if signal is
clean, estimated entropy is essentially the same; but if signal is noisy
transformed frames (with score function) are able to give different entropy if
the frame is voiced against unvoiced ones. Experimental results show that this
fact permits to detect voice activity under high noise, where simple entropy
method fails
Malay articulation system for early screening diagnostic using hidden markov model and genetic algorithm
Speech recognition is an important technology and can be used as a great aid for individuals with sight or hearing disabilities today. There are extensive research interest and development in this area for over the past decades. However, the prospect in Malaysia regarding the usage and exposure is still immature even though there is demand from the medical and healthcare sector. The aim of this research is to assess the quality and the impact of using computerized method for early screening of speech articulation disorder among Malaysian such as the omission, substitution, addition and distortion in their speech. In this study, the statistical probabilistic approach using Hidden Markov Model (HMM) has been adopted with newly designed Malay corpus for articulation disorder case following the SAMPA and IPA guidelines. Improvement is made at the front-end processing for feature vector selection by applying the silence region calibration algorithm for start and end point detection. The classifier had also been modified significantly by incorporating Viterbi search with Genetic Algorithm (GA) to obtain high accuracy in recognition result and for lexical unit classification. The results were evaluated by following National Institute of Standards and Technology (NIST) benchmarking. Based on the test, it shows that the recognition accuracy has been improved by 30% to 40% using Genetic Algorithm technique compared with conventional technique. A new corpus had been built with verification and justification from the medical expert in this study. In conclusion, computerized method for early screening can ease human effort in tackling speech disorders and the proposed Genetic Algorithm technique has been proven to improve the recognition performance in terms of search and classification task
Recognition of in-ear microphone speech data using multi-layer neural networks
Speech collected through a microphone placed in front of the mouth has been the primary source of data collection for speech recognition. There are only a few speech recognition studies using speech collected from the human ear canal. In this study, a speech recognition system is presented, specifically an isolated word recognizer which uses speech collected from the external auditory canals of the subjects via an in-ear microphone. Currently, the vocabulary is limited to seven words that can be used as control commands for a wide variety of applications. The speech segmentation task is achieved by using the short-time signal energy parameter and the short-time energy-entropy feature (EEF), and by incorporating some heuristic assumptions. Multi-layer feedforward neural networks with two-layer and three-layer network configurations are selected for the word recognition task and use real cepstrum (RC) and mel-frequency cepstral coefficients (MFCCs) extracted from each segmented utterance as characteristic features for the word recognizer. Results show that the neural network configurations investigated are viable choices for this specific recognition task as the average recognition rates obtained with the MFCCs as input features for the two-layer and three-layer networks are 94.731% and 94.61% respectively on the data investigated. Average recognition rates obtained using the RCs as features on the same network configurations are 86.252% and 86.7% respectively.http://archive.org/details/recognitionofine109452848Approved for public release; distribution is unlimited
Speech Endpoint Detection: An Image Segmentation Approach
Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms