63,003 research outputs found

    Robust Speech Detection for Noisy Environments

    Get PDF
    This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM) to improve speech recognition systems in stationary and non-stationary noise environments: inside motor vehicles (like cars or planes) or inside buildings close to high traffic places (like in a control tower for air traffic control (ATC)). In these environments, there is a high stationary noise level caused by vehicle motors and additionally, there could be people speaking at certain distance from the main speaker producing non-stationary noise. The VAD presented in this paper is characterized by a new front-end and a noise level adaptation process that increases significantly the VAD robustness for different signal to noise ratios (SNRs). The feature vector used by the VAD includes the most relevant Mel Frequency Cepstral Coefficients (MFCC), normalized log energy and delta log energy. The proposed VAD has been evaluated and compared to other well-known VADs using three databases containing different noise conditions: speech in clean environments (SNRs mayor que 20 dB), speech recorded in stationary noise environments (inside or close to motor vehicles), and finally, speech in non stationary environments (including noise from bars, television and far-field speakers). In the three cases, the detection error obtained with the proposed VAD is the lowest for all SNRs compared to AceroÂżs VAD (reference of this work) and other well-known VADs like AMR, AURORA or G729 annex b

    Adaptive Multi-Class Audio Classification in Noisy In-Vehicle Environment

    Full text link
    With ever-increasing number of car-mounted electric devices and their complexity, audio classification is increasingly important for the automotive industry as a fundamental tool for human-device interactions. Existing approaches for audio classification, however, fall short as the unique and dynamic audio characteristics of in-vehicle environments are not appropriately taken into account. In this paper, we develop an audio classification system that classifies an audio stream into music, speech, speech+music, and noise, adaptably depending on driving environments including highway, local road, crowded city, and stopped vehicle. More than 420 minutes of audio data including various genres of music, speech, speech+music, and noise are collected from diverse driving environments. The results demonstrate that the proposed approach improves the average classification accuracy up to 166%, and 64% for speech, and speech+music, respectively, compared with a non-adaptive approach in our experimental settings

    Acoustic Scene Classification

    Get PDF
    This work was supported by the Centre for Digital Music Platform (grant EP/K009559/1) and a Leadership Fellowship (EP/G007144/1) both from the United Kingdom Engineering and Physical Sciences Research Council
    • …
    corecore