Search CORE

3 research outputs found

AN INTEGRATED FRAMEWORK FOR MULTI-CHANNEL MULTI-SOURCE LOCALIZATION AND VOICE ACTIVITY DETECTION

Author: Abutalebi Hamid Reza
Asaei Afsaneh
Bourlard Hervé
Garner Philip N.
Taghizadeh Mohammad J.
Publication venue: Idiap
Publication date: 01/01/2011
Field of study

Two of the major challenges in microphone array based adap- tive beamforming, speech enhancement and distant speech recognition, are robust and accurate source localization and voice activity detection. This paper introduces a spatial gra- dient steered response power using the phase transform (SRP- PHAT) method which is capable of localization of competing speakers in overlapping conditions. We further investigate the behavior of the SRP function and characterize theoretically a fixed point in its search space for the diffuse noise field. We call this fixed point the null position in the SRP search space. Building on this evidence, we propose a technique for multi- channel voice activity detection (MVAD) based on detection of a maximum power corresponding to the null position. The gradient SRP-PHAT in tandem with the MVAD form an inte- grated framework of multi-source localization and voice ac- tivity detection. The experiments carried out on real data recordings show that this framework is very effective in prac- tical applications of hands-free communication

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

An Integrated Framework for Multi-Channel Multi-Source Localization and Voice Activity Detection

Author: Abutalebi Hamid Reza
Asaei Afsaneh
Bourlard Hervé
Garner Philip N.
Taghizadeh Mohammad J.
Publication venue
Publication date: 19/12/2013
Field of study

Two of the major challenges in microphone array based adaptive beamforming, speech enhancement and distant speech recognition, are robust and accurate source localization and voice activity detection. This paper introduces a spatial gradient steered response power using the phase transform (SRP-PHAT) method which is capable of localization of competing speakers in overlapping conditions. We further investigate the behaviour of the SRP function and characterize theoretically a fixed point in its search space for the diffuse noise field. We call this fixed point the null position in the SRP search space. Building on this evidence, we propose a technique for multi- channel voice activity detection (MVAD) based on detection of a maximum power corresponding to the null position. The gradient SRP-PHAT in tandem with the MVAD form an integrated framework of multi-source localization and voice activity detection. The experiments carried out on real data recordings show that this framework is very effective in practical applications of hands-free communication

Infoscience - École polytechnique fédérale de Lausanne

Voice Activity Detection Using Source Separation Techniques

Author: Nikos Doukas
Patrick Naylor
Tania Stathaki
Publication venue
Publication date
Field of study

A novel Voice Activity Detector is presented that is based on Source Separation techniques applied to single sensor signals. It offers very accurate estimation of the endpoints in very low Signal to Noise ratio conditions, while maintaining low complexity. Since the procedure is totally iterative, it is suitable for use in real-time applications and is capable of operating in dynamically adapting situations. Results are presented for both White Gaussian and Car Engine background noise. The performance of the new technique is compared with that of the GSM Voice Activity Detector. 1. Introduction Voice Activity Detection (VAD) is important in many areas of speech processing technology, such as noise reduction, voice recognition, speech coding etc, and has been extensively studied ([7], [5], [1]). Most of the existing techniques focus on relatively mild noise conditions (small positive SNR, for example the conditions found in an office environment). The work presented in this paper focus..

CiteSeerX