Identify, locate and separate: Audio-visual object extraction in large
  video collections using weak supervision

Duong, Ngoc; Essid, Slim; Ozerov, Alexey; Parekh, Sanjeel; Pérez, Patrick; Richard, Gaël

research

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Authors: Ngoc Duong
Slim Essid
Alexey Ozerov
Sanjeel Parekh
Patrick Pérez
Gaël Richard
Publication date: 7 November 2018
Publisher

Abstract

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results

Similar works

Full text

Available Versions

HAL-Rennes 1

oai:HAL:hal-01914532v1

Last time updated on 31/01/2024