Audiovisual data fusion for successive speakers tracking

Aycard, Olivier; Labourey, Quentin; Pellerin, Denis; Rombaut, Michèle

research

Audiovisual data fusion for successive speakers tracking

Authors: Olivier Aycard
Quentin Labourey
Denis Pellerin
Michèle Rombaut
Publication date: 5 January 2014
Publisher: HAL CCSD

Abstract

International audienceIn this paper, a human speaker tracking method on audio and video data is presented. It is applied to con- versation tracking with a robot. Audiovisual data fusion is performed in a two-steps process. Detection is performed independently on each modality: face detection based on skin color on video data and sound source localization based on the time delay of arrival on audio data. The results of those detection processes are then fused thanks to an adaptation of bayesian filter to detect the speaker. The robot is able to detect the face of the talking person and to detect a new speaker in a conversation

Similar works

Full text

Available Versions

HAL Descartes

oai:HAL:hal-00935636v1

Last time updated on 14/04/2021

Hal - Université Grenoble Alpes

oai:HAL:hal-00935636v1

Last time updated on 11/11/2016