Search CORE

7,732 research outputs found

News Story Segmentation in Multiple Modalities

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

TRECVID 2003 - an overview

Author: Kraaij Wessel
Over Paul
Smeaton Alan F.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2003
Field of study

Irish Universities

DCU Online Research Access Service

InfoLink: analysis of Dutch broadcast news and cross-media browsing

Author: Hessen Arjan van
Jong Franciska de
Morang Jeroen
Ordelman Roeland
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualised using SMIL-scripts for presenting the streaming broadcast news video and the information links

University of Twente Research Information

TRECVID 2004 - an overview

Author: Kraaij Wessel
Over Paul
Smeaton Alan F.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2004
Field of study

Irish Universities

DCU Online Research Access Service

Weakly-Supervised Alignment of Video With Text

Author: Bach Francis
Bojanowski Piotr
Grave Edouard
Lajugie Rémi
Laptev Ivan
Ponce Jean
Schmid Cordelia
Publication venue
Publication date: 07/12/2015
Field of study

Suppose that we are given a set of videos, along with natural language descriptions in the form of multiple sentences (e.g., manual annotations, movie scripts, sport summaries etc.), and that these sentences appear in the same temporal order as their visual counterparts. We propose in this paper a method for aligning the two modalities, i.e., automatically providing a time stamp for every sentence. Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities. We formulate this problem as an integer quadratic program, and solve its continuous convex relaxation using an efficient conditional gradient algorithm. Several rounding procedures are proposed to construct the final integer solution. After demonstrating significant improvements over the state of the art on the related task of aligning video with symbolic labels [7], we evaluate our method on a challenging dataset of videos with associated textual descriptions [36], using both bag-of-words and continuous representations for text.Comment: ICCV 2015 - IEEE International Conference on Computer Vision, Dec 2015, Santiago, Chil

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Glasgow University at TRECVID 2006

Author: Chantamunee S.
Gotoh Y.
Hilaire X.
Hopfgartner F.
Jose J.M.
Urban J.
Villa R.
Publication venue
Publication date: 01/11/2006
Field of study

In the first part of this paper we describe our experiments in the automatic and interactive search tasks of TRECVID 2006. We submitted five fully automatic runs, including a text baseline, two runs based on visual features, and two runs that combine textual and visual features in a graph model. For the interactive search, we have implemented a new video search interface with relevance feedback facilities, based on both textual and visual features. The second part is concerned with our approach to the high-level feature extraction task, based on textual information extracted from speech recogniser and machine translation outputs. They were aligned with shots and associated with high-level feature references. A list of significant words was created for each feature, and it was in turn utilised for identification of a feature during the evaluation

Enlighten

Recommended from our members

Towards a Multimodal Time-Based Empathy Prediction System

Author: Barbieri F.
del Prado Martin F. M.
Guizzo E.
Lucchesi F.
Maffei G.
Weyde T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

We describe our system for empathic emotion recognition. It is based on deep learning on multiple modalities in a late fusion architecture. We describe the modules of our system and discuss the evaluation results. Our code is also available for the research community

City Research Online

Crossref