Search CORE

7,812 research outputs found

Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

Author: AF Smeaton
AF Smeaton
Alan F. Smeaton
B Huurnink
B Huurnink
CGM Snoek
CGM Snoek
CV Thornley
D. Tjondronegoro
H.-T. Pu
Johan Oomen
L. Hollink
M Hertzum
Paul Over
S Shatford
Wessel Kraaij
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

Crossref

Irish Universities

DCU Online Research Access Service

Radboud Repository

Sound and Vision Publications

K-Space at TRECVid 2008

Author: Adamek Tomasz
Byrne Daragh
Jones Gareth J.F.
Keenan Gordon
Lee Hyowon
McGuinness Kevin
O'Connor Noel E.
O'Hare Neil
Smeaton Alan F.
Wilkins Peter
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2008
Field of study

In this paper we describe K-Space’s participation in TRECVid 2008 in the interactive search task. For 2008 the K-Space group performed one of the largest interactive video information retrieval experiments conducted in a laboratory setting. We had three institutions participating in a multi-site multi-system experiment. In total 36 users participated, 12 each from Dublin City University (DCU, Ireland), University of Glasgow (GU, Scotland) and Centrum Wiskunde & Informatica (CWI, the Netherlands). Three user interfaces were developed, two from DCU which were also used in 2007 as well as an interface from GU. All interfaces leveraged the same search service. Using a latin squares arrangement, each user conducted 12 topics, leading in total to 6 runs per site, 18 in total. We officially submitted for evaluation 3 of these runs to NIST with an additional expert run using a 4th system. Our submitted runs performed around the median. In this paper we will present an overview of the search system utilized, the experimental setup and a preliminary analysis of our results

Irish Universities

DCU Online Research Access Service

K-Space at TRECVID 2008

Author: Adamek T.
Amin A.
Avrithis Y.
Bailer W.
Benmokhtar R.
Byrne D.
Chandramouli K.
Cobet A.
Dumont E.
Goldmann L.
Goyal A.
Haller M.
Halvey M.
Hannah D.
Hopfgartner F.
Huet B.
Izquierdo E.
Jones G.
Jose J.M.
Keenan G.
Kompatsiaris I.
Lee H.
McGuinness K.
Merialdo B.
Mezaris V.
Moerzinger R.
O'Connor N.
O'Hare N.
Papadopoulous G.
Praks P.
Punitha P.
Samour A.
Schallauer P.
Sikora T.
Smeaton A.F.
Spyrou E.
Tolias G.
Troncy R.
Villa R.
Wilkins P.
Publication venue
Publication date: 01/01/2008
Field of study

In this paper we describe K-Space’s participation in TRECVid 2008 in the interactive search task. For 2008 the K-Space group performed one of the largest interactive video information retrieval experiments conducted in a laboratory setting. We had three institutions participating in a multi-site multi-system experiment. In total 36 users participated, 12 each from Dublin City University (DCU, Ireland), University of Glasgow (GU, Scotland) and Centrum Wiskunde and Informatica (CWI, the Netherlands). Three user interfaces were developed, two from DCU which were also used in 2007 as well as an interface from GU. All interfaces leveraged the same search service. Using a latin squares arrangement, each user conducted 12 topics, leading in total to 6 runs per site, 18 in total. We officially submitted for evaluation 3 of these runs to NIST with an additional expert run using a 4th system. Our submitted runs performed around the median. In this paper we will present an overview of the search system utilized, the experimental setup and a preliminary analysis of our results

DSpace at NTUA

Enlighten

A framework for automatic semantic video annotation

Author: A Amir
A Basharat
A Gupta
A Kapoor
A Ulges
AF Smeaton
Amjad Altadmri
Amr Ahmed
B Chandrasekaran
C Fellbaum
DB Lenat
H Liu
H Motulsky
J Sivic
ML Shyu
N Haering
R Fergus
T Brox
WL Zhao
XY Wei
YG Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2013
Field of study

The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation

University of Lincoln Institutional Repository

Crossref

Edge Hill University Research Information Repository

Kent Academic Repository

Studying Interaction Methodologies in Video Retrieval

Author: Hopfgartner F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

So far, several approaches have been studied to bridge the problem of the Semantic Gap, the bottleneck in image and video retrieval. However, no approach is successful enough to increase retrieval performances significantly. One reason is the lack of understanding the user's interest, a major condition towards adapting results to a user. This is partly due to the lack of appropriate interfaces and the missing knowledge of how to interpret user's actions with these interfaces. In this paper, we propose to study the importance of various implicit indicators of relevance. Furthermore, we propose to investigate how this implicit feedback can be combined with static user profiles towards an adaptive video retrieval model

CiteSeerX

Enlighten

Weakly-Supervised Alignment of Video With Text

Author: Bach Francis
Bojanowski Piotr
Grave Edouard
Lajugie Rémi
Laptev Ivan
Ponce Jean
Schmid Cordelia
Publication venue
Publication date: 07/12/2015
Field of study

Suppose that we are given a set of videos, along with natural language descriptions in the form of multiple sentences (e.g., manual annotations, movie scripts, sport summaries etc.), and that these sentences appear in the same temporal order as their visual counterparts. We propose in this paper a method for aligning the two modalities, i.e., automatically providing a time stamp for every sentence. Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities. We formulate this problem as an integer quadratic program, and solve its continuous convex relaxation using an efficient conditional gradient algorithm. Several rounding procedures are proposed to construct the final integer solution. After demonstrating significant improvements over the state of the art on the related task of aligning video with symbolic labels [7], we evaluate our method on a challenging dataset of videos with associated textual descriptions [36], using both bag-of-words and continuous representations for text.Comment: ICCV 2015 - IEEE International Conference on Computer Vision, Dec 2015, Santiago, Chil

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server