Search CORE

3 research outputs found

EUMSSI team at the MediaEval Person Discovery Challenge 2016

Author: Le Nam
Meignier Sylvain
Odobez Jean-Marc
Publication venue
Publication date: 19/11/2016
Field of study

We present the results of the EUMSSI team’s participation in the Multimodal Person Discovery task. The goal is to identify all people who simultaneously appear and speak in a video corpus. In the proposed system, besides improving each modality, we emphasize on the ranking of multiple results from both audio stream and visual stream

Infoscience - École polytechnique fédérale de Lausanne

Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media

Author: Bendris M.
Chetty G.
Farneback G.
Gay P.
Gay P.
Giraudel A.
Hershey J.
Iyengar G.
Le N.
Ngiam J.
Patterson E. K.
Pigou L.
Potamianos G.
Ren J. S.
Rúa E. A.
Srivastava N.
Sutskever I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2016
Field of study

Person discovery in the absence of prior identity knowledge requires accurate association of visual and auditory cues. In broadcast data, multimodal analysis faces additional challenges due to narrated voices over muted scenes or dubbing in different languages. To address these challenges, we define and analyze the problem of dubbing detection in broadcast data, which has not been explored before. We propose a method to represent the temporal relationship between the auditory and visual streams. This method consists of canonical correlation analysis to learn a joint multimodal space, and long short term memory (LSTM) networks to model cross-modality temporal dependencies. Our contributions also include the introduction of a newly acquired dataset of face-speech segments from TV data, which we have made publicly available. The proposed method achieves promising performance on this real world dataset as compared to several baselines

Infoscience - École polytechnique fédérale de Lausanne

Crossref

EUMSSI team at the MediaEval Person Discovery Challenge

Author: Le Nam
Meignier Sylvain
Odobez Jean-Marc
Wu Di
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceWe present the results of the EUMSSI team's participation in the Multimodal Person Discovery task at the MediaEval challenge 2015. The goal is to identify all people who simultaneously appear and speak in a video corpus, which implicitly involves both audio stream and visual stream. We emphasize on improving each modality separately and bench-marking them to analyze their pros and cons

Infoscience - École polytechnique fédérale de Lausanne