Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization

Chalapathy Neti; Gerasimos Potamianos; Sabine Deligne

Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization

Authors: Chalapathy Neti
Gerasimos Potamianos
Sabine Deligne
Publication date
Publisher

Abstract

Although current automatic speech recognition (ASR) systems perform remarkably well for a variety of recognition tasks in clean audio conditions, their accuracy degrades with increasing levels of environment noise. New approaches are needed to handle the ASR lack of robustness to noise. In this paper, we propose a multi-sensor approach to ASR, where visual information, in addition to the standard audio information, is obtained from the speaker’s face in a second channel. Audio-visual ASR, where both an audio channel and a visual channel are input to the recognition system, has already been demonstrated to outperform traditional audioonly ASR in noise conditions [5] [6]. In addition to audiovisual ASR, the visual modality has been investigated as a means of enhancement, where clean audio features are estimated from audio-visual speech when the audio channel is corrupted by noise [3] [4]. However, in [4] for example, the ASR performance of linear audio-visual enhancement (where clean audio features are estimated via linear filtering of the noisy audio-visual features) remains significantly inferior to the performance of audio-visual ASR. In this paper, we introduce a non-linear enhancement technique called Audio-Visual Codebook Dependent Cepstral Normalization (AVCDCN) and we consider its use with both audioonly ASR and audio-visual ASR. AVCDCN is inspired fro

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.68.51...

Last time updated on 22/10/2014