Search CORE

574 research outputs found

A Computation Efficient Voice Activity Detector for Low Signal-to-Noise Ratio in Hearing Aids

Author: Demosthenous A
Liu F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2021
Field of study

This paper proposes a spectral entropy-based voice activity detection method, which is computationally efficient for hearing aids. The method is highly accurate at low SNR levels by using the spectral entropy which is more robust against changes of the noise power. Compared with the traditional fast Fourier transform based spectral entropy approaches, the proposed method of calculating the spectral entropy using the outputs of a hearing aid filter-bank significantly reduces the computational complexity. The performance of the proposed method was evaluated and compared with two other computationally efficient methods. At negative SNR levels, the proposed method has an accuracy of more than 5% higher than the power-based method with the number of floating-point operations only about 1/100 of that of the statistical model based method

UCL Discovery

A voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin

Author: Huang Shaoguang
Wang Yinfeng
Wei Ying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Voice activity detection algorithms are widely used in the areas of voice compression, speech synthesis, speech recognition, speech enhancement, and etc. In this paper, an efficient voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin is proposed. The proposed sub-band detection consists of two parts: crosswise detection and lengthwise detection. Energy detection and pitch detection are in the range of considerations. For a better performance, double-threshold criterion is used to reduce the misjudgment rate of the detection. Performance evaluation is based on six noise environments with different SNRs. Experiment results indicate that the proposed algorithm can detect the area of voice effectively in non-stationary environment and low SNR environment and has the potential to progress

Ghent University Academic Bibliography

Variance of spectral entropy (VSE): an SNR estimator for speech enhancement in hearing aids

Author: Demosthenous AC
Liu F
Yasin I
Publication venue: 24th International Congress on Sound and Vibration
Publication date: 01/09/2017
Field of study

In everyday situations an individual can encounter a variety of acoustic environments. For an individual with a hearing aid following speech in different types of background noise can often present a challenge. For this reason, estimating the signal-to-noise ratio (SNR) is a key factor to consider in hearing-aid design. The ability to adjust a noise reduction algorithm according to the SNR could provide the flexibility required to improve speech intelligibility in varying levels of background noise. However, most of the current high-accuracy SNR estimation methods are relatively complex and may inhibit the performance of hearing aids. This study investigates the advantages of incorporating a spectral entropy method to estimate SNR for speech enhancement in hearing aids; in particular a variance of spectral entropy (VSE) measure. The VSE approach avoids some of the complex computational steps of traditional statistical-model based SNR estimation methods by only measuring the spectral entropy among frequency channels of interest within the hearing aid. For this study, the SNR was estimated using the spectral entropy method in different types of noise. The variance of the spectral entropy in a hearing-aid model with 10 peripheral frequency channels was used to measure the SNR. By measuring the variance of the spectral entropy at input SNR levels between -10 dB to 20 dB, the relationship function between the SNR and the VSE was estimated. The VSE for the speech-in-noise was measured at temporal intervals of 1.5s. The VSE method demonstrates a more reliable performance in different types of background noise, in particular for low-number of speakers babble noise when compared to the US National Institute of Standards and Technology (NIST) or Waveform Amplitude Distribution Analysis (WADA) methods. The VSE method may also reduce additional computational steps (reducing system delays) making it more appropriate for implementation in hearing aids where system delays should be minimized as much as possible

UCL Discovery

An efficient voice activity detection algorithm by combining statistical model and energy detection

Author: A Benyassine
A Davis
B Schölkopf
B-F Wu
D Kim
E Nemer
G Evangelopoulos
G Ying
ITU-T Rec P.48
ITU-T Rec P.56
J Garofolo
J Ramírez
J Ramírez
J Ramírez
J Ramírez
J Shen
J Sohn
JG Wilpon
JH Chang
JW Shin
K Li
L Huang
LR Rabiner
Q Jo
Q Li
R Chengalvarayan
R Le Bouquin-Jeannès
R Tahmasbi
S Gazor
S Kang
S Kay
S Kuroiwa
T Yu
TV Pham
Y Ephraim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Speech Endpoint Detection: An Image Segmentation Approach

Author: Faris Nesma
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms

University of Waterloo's Institutional Repository

The voice activity detection (VAD) recorder and VAD network recorder : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University

Author: Liu Feng
Publication venue: 'Massey University'
Publication date: 01/01/2001
Field of study

The project is to provide a feasibility study for the AudioGraph tool, focusing on two application areas: the VAD (voice activity detector) recorder and the VAD network recorder. The first one achieves a low bit-rate speech recording on the fly, using a GSM compression coder with a simple VAD algorithm; and the second one provides two-way speech over IP, fulfilling echo cancellation with a simplex channel. The latter is required for implementing a synchronous AudioGraph. In the first chapter we introduce the background of this project, specifically, the VoIP technology, the AudioGraph tool, and the VAD algorithms. We also discuss the problems set for this project. The second chapter presents all the relevant techniques in detail, including sound representation, speech-coding schemes, sound file formats, PowerPlant and Macintosh programming issues, and the simple VAD algorithm we have developed. The third chapter discusses the implementation issues, including the systems' objective, architecture, the problems encountered and solutions used. The fourth chapter illustrates the results of the two applications. The user documentations for the applications are given, and after that, we analyse the parameters based on the results. We also present the default settings of the parameters, which could be used in the AudioGraph system. The last chapter provides conclusions and future work

Massey Research Online

A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies

Author: Atal B. S.
Bachu R. G.
Carlos Salazar
D. Ortiz P.
Germain F. G.
Luisa F. Villa
O.L. Quintero
Ortiz D.
Pichot P.
Prasad R. V.
Saha G.
Skahnov K.
Tanyer S. G.
Verteletskaya E.
Publication venue: 'IOP Publishing'
Publication date: 11/05/2016
Field of study

A simple but efficient voice activity detector based on the Hilbert transform and a dynamic threshold is presented to be used on the pre-processing of audio signals -- The algorithm to define the dynamic threshold is a modification of a convex combination found in literature -- This scheme allows the detection of prosodic and silence segments on a speech in presence of non-ideal conditions like a spectral overlapped noise -- The present work shows preliminary results over a database built with some political speech -- The tests were performed adding artificial noise to natural noises over the audio signals, and some algorithms are compared -- Results will be extrapolated to the field of adaptive filtering on monophonic signals and the analysis of speech pathologies on futures works20th Argentinean Bioengineering Society Congress, SABI 2015 (XX Congreso Argentino de Bioingeniería y IX Jornadas de Ingeniería Clínica)28–30 October 2015, San Nicolás de los Arroyos, Argentin

Crossref

Repositorio Institucional Universidad EAFIT

New Advances in Voice Activity Detection using HOS and Optimization Strategies

Author: C.G. Puntonet
J. Ramirez
J.M. Gorriz
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen