375 research outputs found

    A Computation Efficient Voice Activity Detector for Low Signal-to-Noise Ratio in Hearing Aids

    Get PDF
    This paper proposes a spectral entropy-based voice activity detection method, which is computationally efficient for hearing aids. The method is highly accurate at low SNR levels by using the spectral entropy which is more robust against changes of the noise power. Compared with the traditional fast Fourier transform based spectral entropy approaches, the proposed method of calculating the spectral entropy using the outputs of a hearing aid filter-bank significantly reduces the computational complexity. The performance of the proposed method was evaluated and compared with two other computationally efficient methods. At negative SNR levels, the proposed method has an accuracy of more than 5% higher than the power-based method with the number of floating-point operations only about 1/100 of that of the statistical model based method

    A voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin

    Get PDF
    Voice activity detection algorithms are widely used in the areas of voice compression, speech synthesis, speech recognition, speech enhancement, and etc. In this paper, an efficient voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin is proposed. The proposed sub-band detection consists of two parts: crosswise detection and lengthwise detection. Energy detection and pitch detection are in the range of considerations. For a better performance, double-threshold criterion is used to reduce the misjudgment rate of the detection. Performance evaluation is based on six noise environments with different SNRs. Experiment results indicate that the proposed algorithm can detect the area of voice effectively in non-stationary environment and low SNR environment and has the potential to progress

    Intelligent CCTV Surveillance Based on Sound Recognition and Sound Localization

    Get PDF
    CCTV is used for many purposes, especially for surveillance and fortraffic condition monitoring. This paper proposesan intelligent CCTV system that tracks sound events based on sound recognition and sound localization. From the experimental results, it is evident that the proposed method can be successfully used for the intelligent CCTV system of CCTV

    Variance of spectral entropy (VSE): an SNR estimator for speech enhancement in hearing aids

    Get PDF
    In everyday situations an individual can encounter a variety of acoustic environments. For an individual with a hearing aid following speech in different types of background noise can often present a challenge. For this reason, estimating the signal-to-noise ratio (SNR) is a key factor to consider in hearing-aid design. The ability to adjust a noise reduction algorithm according to the SNR could provide the flexibility required to improve speech intelligibility in varying levels of background noise. However, most of the current high-accuracy SNR estimation methods are relatively complex and may inhibit the performance of hearing aids. This study investigates the advantages of incorporating a spectral entropy method to estimate SNR for speech enhancement in hearing aids; in particular a variance of spectral entropy (VSE) measure. The VSE approach avoids some of the complex computational steps of traditional statistical-model based SNR estimation methods by only measuring the spectral entropy among frequency channels of interest within the hearing aid. For this study, the SNR was estimated using the spectral entropy method in different types of noise. The variance of the spectral entropy in a hearing-aid model with 10 peripheral frequency channels was used to measure the SNR. By measuring the variance of the spectral entropy at input SNR levels between -10 dB to 20 dB, the relationship function between the SNR and the VSE was estimated. The VSE for the speech-in-noise was measured at temporal intervals of 1.5s. The VSE method demonstrates a more reliable performance in different types of background noise, in particular for low-number of speakers babble noise when compared to the US National Institute of Standards and Technology (NIST) or Waveform Amplitude Distribution Analysis (WADA) methods. The VSE method may also reduce additional computational steps (reducing system delays) making it more appropriate for implementation in hearing aids where system delays should be minimized as much as possible

    Speech Endpoint Detection: An Image Segmentation Approach

    Get PDF
    Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms

    Auditory filter-bank compression improves estimation of signal-to-noise ratio for speech in noise

    Get PDF
    Signal-to-noise ratio (SNR) estimation is necessary for many speech processing applications often challenged by nonstationary noise. The authors have previously demonstrated that the variance of spectral entropy (VSE) is a reliable estimate of SNR in nonstationary noise. Based on pre-estimated VSE-SNR relationship functions, the SNR of unseen acoustic environments can be estimated from the measured VSE. This study predicts that introducing a compressive function based on cochlear processing will increase the stability of the pre-estimated VSE-SNR relationship functions. This study demonstrates that calculating the VSE based on a nonlinear filter-bank, simulating cochlear compression, reduces the VSE-based SNR estimation errors. VSE-SNR relationship functions were estimated using speech tokens presented in babble noise comprised of different numbers of speakers. Results showed that the coefficient of determination (R2) of the estimated VSE-SNR relationship functions have absolute percentage improvements of over 26% when using a filter-bank with a compressive function, compared to when using a linear filter-bank without compression. In 2-talker babble noise, the estimation accuracy is more than 3 dB better than other published methods

    Recognition of in-ear microphone speech data using multi-layer neural networks

    Get PDF
    Speech collected through a microphone placed in front of the mouth has been the primary source of data collection for speech recognition. There are only a few speech recognition studies using speech collected from the human ear canal. In this study, a speech recognition system is presented, specifically an isolated word recognizer which uses speech collected from the external auditory canals of the subjects via an in-ear microphone. Currently, the vocabulary is limited to seven words that can be used as control commands for a wide variety of applications. The speech segmentation task is achieved by using the short-time signal energy parameter and the short-time energy-entropy feature (EEF), and by incorporating some heuristic assumptions. Multi-layer feedforward neural networks with two-layer and three-layer network configurations are selected for the word recognition task and use real cepstrum (RC) and mel-frequency cepstral coefficients (MFCCs) extracted from each segmented utterance as characteristic features for the word recognizer. Results show that the neural network configurations investigated are viable choices for this specific recognition task as the average recognition rates obtained with the MFCCs as input features for the two-layer and three-layer networks are 94.731% and 94.61% respectively on the data investigated. Average recognition rates obtained using the RCs as features on the same network configurations are 86.252% and 86.7% respectively.http://archive.org/details/recognitionofine109452848Approved for public release; distribution is unlimited
    • …
    corecore