Search CORE

375 research outputs found

A Computation Efficient Voice Activity Detector for Low Signal-to-Noise Ratio in Hearing Aids

Author: Demosthenous A
Liu F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2021
Field of study

This paper proposes a spectral entropy-based voice activity detection method, which is computationally efficient for hearing aids. The method is highly accurate at low SNR levels by using the spectral entropy which is more robust against changes of the noise power. Compared with the traditional fast Fourier transform based spectral entropy approaches, the proposed method of calculating the spectral entropy using the outputs of a hearing aid filter-bank significantly reduces the computational complexity. The performance of the proposed method was evaluated and compared with two other computationally efficient methods. At negative SNR levels, the proposed method has an accuracy of more than 5% higher than the power-based method with the number of floating-point operations only about 1/100 of that of the statistical model based method

UCL Discovery

A voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin

Author: Huang Shaoguang
Wang Yinfeng
Wei Ying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Voice activity detection algorithms are widely used in the areas of voice compression, speech synthesis, speech recognition, speech enhancement, and etc. In this paper, an efficient voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin is proposed. The proposed sub-band detection consists of two parts: crosswise detection and lengthwise detection. Energy detection and pitch detection are in the range of considerations. For a better performance, double-threshold criterion is used to reduce the misjudgment rate of the detection. Performance evaluation is based on six noise environments with different SNRs. Experiment results indicate that the proposed algorithm can detect the area of voice effectively in non-stationary environment and low SNR environment and has the potential to progress

Ghent University Academic Bibliography

Intelligent CCTV Surveillance Based on Sound Recognition and Sound Localization

Author: Min-Jeong Kim, Hyeon-Ji Yoo, Soo-Yeon Lee, Seung Ho Choi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2016
Field of study

CCTV is used for many purposes, especially for surveillance and fortraffic condition monitoring. This paper proposesan intelligent CCTV system that tracks sound events based on sound recognition and sound localization. From the experimental results, it is evident that the proposed method can be successfully used for the intelligent CCTV system of CCTV

International Journal on Recent and Innovation Trends in Computing and Communication

Variance of spectral entropy (VSE): an SNR estimator for speech enhancement in hearing aids

Author: Demosthenous AC
Liu F
Yasin I
Publication venue: 24th International Congress on Sound and Vibration
Publication date: 01/09/2017
Field of study

In everyday situations an individual can encounter a variety of acoustic environments. For an individual with a hearing aid following speech in different types of background noise can often present a challenge. For this reason, estimating the signal-to-noise ratio (SNR) is a key factor to consider in hearing-aid design. The ability to adjust a noise reduction algorithm according to the SNR could provide the flexibility required to improve speech intelligibility in varying levels of background noise. However, most of the current high-accuracy SNR estimation methods are relatively complex and may inhibit the performance of hearing aids. This study investigates the advantages of incorporating a spectral entropy method to estimate SNR for speech enhancement in hearing aids; in particular a variance of spectral entropy (VSE) measure. The VSE approach avoids some of the complex computational steps of traditional statistical-model based SNR estimation methods by only measuring the spectral entropy among frequency channels of interest within the hearing aid. For this study, the SNR was estimated using the spectral entropy method in different types of noise. The variance of the spectral entropy in a hearing-aid model with 10 peripheral frequency channels was used to measure the SNR. By measuring the variance of the spectral entropy at input SNR levels between -10 dB to 20 dB, the relationship function between the SNR and the VSE was estimated. The VSE for the speech-in-noise was measured at temporal intervals of 1.5s. The VSE method demonstrates a more reliable performance in different types of background noise, in particular for low-number of speakers babble noise when compared to the US National Institute of Standards and Technology (NIST) or Waveform Amplitude Distribution Analysis (WADA) methods. The VSE method may also reduce additional computational steps (reducing system delays) making it more appropriate for implementation in hearing aids where system delays should be minimized as much as possible

UCL Discovery

An efficient voice activity detection algorithm by combining statistical model and energy detection

Author: A Benyassine
A Davis
B Schölkopf
B-F Wu
D Kim
E Nemer
G Evangelopoulos
G Ying
ITU-T Rec P.48
ITU-T Rec P.56
J Garofolo
J Ramírez
J Ramírez
J Ramírez
J Ramírez
J Shen
J Sohn
JG Wilpon
JH Chang
JW Shin
K Li
L Huang
LR Rabiner
Q Jo
Q Li
R Chengalvarayan
R Le Bouquin-Jeannès
R Tahmasbi
S Gazor
S Kang
S Kay
S Kuroiwa
T Yu
TV Pham
Y Ephraim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Speech Endpoint Detection: An Image Segmentation Approach

Author: Faris Nesma
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms

University of Waterloo's Institutional Repository

Voice activity detection algorithm based on long-term pitch information

Author: A Varga
BF Wu
Dan Qu
G Martin
J Ramirez
J Rodman
J Sohn
K Manohar
Liang He
LR Rabiner
MA Bartsch
PK Ghosh
S Ahmadi
T Gerkmann
Wei-Qiang Zhang
Xu-Kui Yang
Y Datao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Auditory filter-bank compression improves estimation of signal-to-noise ratio for speech in noise

Author: Demosthenous A
Liu F
Yasin I
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/05/2020
Field of study

Signal-to-noise ratio (SNR) estimation is necessary for many speech processing applications often challenged by nonstationary noise. The authors have previously demonstrated that the variance of spectral entropy (VSE) is a reliable estimate of SNR in nonstationary noise. Based on pre-estimated VSE-SNR relationship functions, the SNR of unseen acoustic environments can be estimated from the measured VSE. This study predicts that introducing a compressive function based on cochlear processing will increase the stability of the pre-estimated VSE-SNR relationship functions. This study demonstrates that calculating the VSE based on a nonlinear filter-bank, simulating cochlear compression, reduces the VSE-based SNR estimation errors. VSE-SNR relationship functions were estimated using speech tokens presented in babble noise comprised of different numbers of speakers. Results showed that the coefficient of determination (R2) of the estimated VSE-SNR relationship functions have absolute percentage improvements of over 26% when using a filter-bank with a compressive function, compared to when using a linear filter-bank without compression. In 2-talker babble noise, the estimation accuracy is more than 3 dB better than other published methods

UCL Discovery

Recognition of in-ear microphone speech data using multi-layer neural networks

Author: Bulbuller Gokhan.
Publication venue: Monterey California. Naval Postgraduate School
Publication date: 01/03/2006
Field of study

Speech collected through a microphone placed in front of the mouth has been the primary source of data collection for speech recognition. There are only a few speech recognition studies using speech collected from the human ear canal. In this study, a speech recognition system is presented, specifically an isolated word recognizer which uses speech collected from the external auditory canals of the subjects via an in-ear microphone. Currently, the vocabulary is limited to seven words that can be used as control commands for a wide variety of applications. The speech segmentation task is achieved by using the short-time signal energy parameter and the short-time energy-entropy feature (EEF), and by incorporating some heuristic assumptions. Multi-layer feedforward neural networks with two-layer and three-layer network configurations are selected for the word recognition task and use real cepstrum (RC) and mel-frequency cepstral coefficients (MFCCs) extracted from each segmented utterance as characteristic features for the word recognizer. Results show that the neural network configurations investigated are viable choices for this specific recognition task as the average recognition rates obtained with the MFCCs as input features for the two-layer and three-layer networks are 94.731% and 94.61% respectively on the data investigated. Average recognition rates obtained using the RCs as features on the same network configurations are 86.252% and 86.7% respectively.http://archive.org/details/recognitionofine109452848Approved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School