Search CORE

3,429 research outputs found

Voice Activity Detection. Fundamentals and Speech Recognition System Robustness

Author: J. C. Segura
J. M. Gorriz
J. Ramirez
Publication venue: 'IntechOpen'
Publication date: 01/01/2007
Field of study

IntechOpen

CiteSeerX

Features for voice activity detection: a comparative analysis

Author: Gerhard Schmidt
Markus Buck
Simon Graf
Tobias Herbig
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Robust Speech Detection for Noisy Environments

Author: Hernández Luis A.
San Segundo Hernández Rubén
Varela Serrano Oscar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM) to improve speech recognition systems in stationary and non-stationary noise environments: inside motor vehicles (like cars or planes) or inside buildings close to high traffic places (like in a control tower for air traffic control (ATC)). In these environments, there is a high stationary noise level caused by vehicle motors and additionally, there could be people speaking at certain distance from the main speaker producing non-stationary noise. The VAD presented in this paper is characterized by a new front-end and a noise level adaptation process that increases significantly the VAD robustness for different signal to noise ratios (SNRs). The feature vector used by the VAD includes the most relevant Mel Frequency Cepstral Coefficients (MFCC), normalized log energy and delta log energy. The proposed VAD has been evaluated and compared to other well-known VADs using three databases containing different noise conditions: speech in clean environments (SNRs mayor que 20 dB), speech recorded in stationary noise environments (inside or close to motor vehicles), and finally, speech in non stationary environments (including noise from bars, television and far-field speakers). In the three cases, the detection error obtained with the proposed VAD is the lowest for all SNRs compared to Acero¿s VAD (reference of this work) and other well-known VADs like AMR, AURORA or G729 annex b

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

ViVoVAD: a Voice Activity Detection Tool based on Recurrent Neural Networks

Author: Artiaga Antonio Miguel
Bailo Ignacio Viñals
Gimeno Jordán Pablo
Giménez Alfonso Ortega
Solano Eduardo Lleida
Publication venue: 'Universidad de Zaragoza'
Publication date: 20/05/2019
Field of study

Voice Activity Detection (VAD) aims to distinguishcorrectly those audio segments containing humanspeech. In this paper we present our latest approachto the VAD task that relies on the modellingcapabilities of Bidirectional Long Short TermMemory (BLSTM) layers to classify every frame inan audio signal as speech or non-speec

Universidad Zaragoza: Open Journal Systems

Decision fusion of voice activity detectors

Author: Nasibov Zaur
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Pre-processing of Speech Signals for Robust Parameter Estimation

Author: Esquivel Jaramillo Alfredo
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2021
Field of study

VBN

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

Author: Alameda-Pineda Xavier
Ban Yutong
Girin Laurent
Horaud Radu
Li Xiaofei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/02/2019
Field of study

We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

A robust sequential hypothesis testing method for brake squeal localisation

Author: Attia F.
Cong F.
DiBiase J. H.
Elko G.
Gerkmann T.
Hou J.
Liang Y.
Madhu N.
Madhu N.
Madhu N.
Madhu N.
Mauer G.
Nilesh Madhu
Philippen B.
Rainer Martin
Scheuing J.
Sebastian Gergen
Shannon C.
Thiergart O.
van Trees H. L.
Ward D. B.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2019
Field of study

This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement. (C) 2019 Author(s)

Crossref

Ghent University Academic Bibliography