Article thumbnail

Audiovisual speech source separation: a regularization method based on visual voice activity detection

By Bertrand Rivet, Laurent Girin, Christine Serviere, Dinh-Tuan Pham and Christian Jutten


International audienceAudio-visual speech source separation consists in mixing visual speech processing techniques (e.g. lip parameters tracking) with source separation methods to improve and/or simplify the extraction of a speech signal from a mixture of acoustic signals. In this paper, we present a new approach to this problem: visual information is used here as a voice activity detector (VAD). Results show that, in the difficult case of realistic convolutive mixtures, the classic problem of the permutation of the output frequency channels can be solved using the visual information with a simpler processing than when using only audio information

Topics: blind source separation, visual voice activity detection, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
Publisher: HAL CCSD
Year: 2007
OAI identifier: oai:HAL:hal-00195014v1
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • https://hal.archives-ouvertes.... (external link)
  • https://hal.archives-ouvertes.... (external link)
  • https://hal.archives-ouvertes.... (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.