Search CORE

2 research outputs found

Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media

Author: Bendris M.
Chetty G.
Farneback G.
Gay P.
Gay P.
Giraudel A.
Hershey J.
Iyengar G.
Le N.
Ngiam J.
Patterson E. K.
Pigou L.
Potamianos G.
Ren J. S.
Rúa E. A.
Srivastava N.
Sutskever I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2016
Field of study

Person discovery in the absence of prior identity knowledge requires accurate association of visual and auditory cues. In broadcast data, multimodal analysis faces additional challenges due to narrated voices over muted scenes or dubbing in different languages. To address these challenges, we define and analyze the problem of dubbing detection in broadcast data, which has not been explored before. We propose a method to represent the temporal relationship between the auditory and visual streams. This method consists of canonical correlation analysis to learn a joint multimodal space, and long short term memory (LSTM) networks to model cross-modality temporal dependencies. Our contributions also include the introduction of a newly acquired dataset of face-speech segments from TV data, which we have made publicly available. The proposed method achieves promising performance on this real world dataset as compared to several baselines

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Structured Exploration of Who, What, When, and Where in Heterogeneous Multimedia News Sources

Author: Brendan Jou
Daniel Morozoff-abegauz
Hongzhi Li
Joseph G. Ellis
Shih-fu Chang
Publication venue
Publication date: 01/01/2013
Field of study

We present a fully automatic system from raw data gathering to navigation over heterogeneous news sources, including over 18k hours of broadcast video news, 3.58M online articles, and 430M public Twitter messages. Our system addresses the challenge of extracting“who,”“what,”“when,” and“where”from a truly multimodal perspective, leveraging audiovisual information in broadcast news andthose embeddedin articles, as well as textualcues inbothclosed captions and raw document content in articles and social media. Performed over time, we are able to extract and study the trend of topics in the news and detect interesting peaks in news coverage over the life of the topic. We visualize these peaks in trending news topics using automatically extracted keywords and iconic images, and introduce a novel multimodal algorithm for naming speakers in the news. We also present several intuitive navigation interfaces for interacting with these complex topic structures over different news sources

CiteSeerX

Crossref