8 research outputs found
Machine Learning Architectures for Video Annotation and Retrieval
PhDIn this thesis we are designing machine learning methodologies for solving the problem
of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc
queries. Concept-based video annotation refers to the annotation of video fragments
with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain
objects, activities, locations etc., and combinations of the former. Our contributions
are: i) A thorough analysis on extending and using different local descriptors towards
improved concept-based video annotation and a stacking architecture that uses in the
first layer, concept classifiers trained on local descriptors and improves their prediction
accuracy by implicitly capturing concept relations, in the last layer of the stack. ii)
A cascade architecture that orders and combines many classifiers, trained on different
visual descriptors, for the same concept. iii) A deep learning architecture that exploits
concept relations at two different levels. At the first level, we build on ideas from
multi-task learning, and propose an approach to learn concept-specific representations
that are sparse, linear combinations of representations of latent concepts. At a second
level, we build on ideas from structured output learning, and propose the introduction,
at training time, of a new cost term that explicitly models the correlations between
the concepts. By doing so, we explicitly model the structure in the output space
(i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that
combines concept-based video annotation and textual query analysis, and transforms
concept-based keyframe and query representations into a common semantic embedding
space. Our architectures have been extensively evaluated on the TRECVID SIN 2013,
the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness
compared to other similar approaches
Concept Language Models and Event-based Concept Number Selection for Zero-example Event Detection
<p>Zero-example event detection is a problem where, given an event query as input but no example videos for training a detector, the system retrieves the most closely related videos. In this paper we present a fully-automatic zero-example event detection method that is based on translating the event description to a predefined set of concepts for which previously trained visual concept detectors are available. We adopt the use of Concept Language Models (CLMs), which is a method of augmenting semantic concept definition, and we propose a new concept-selection method for deciding on the appropriate number of the concepts needed to describe an event query. The proposed system achieves state-of-the-art performance in automatic zero-example event detection.</p
VERGE in VBS 2019
This paper presents VERGE, an interactive video retrieval engine that enables browsing and searching into video content. The system implements various retrieval modalities, such as visual or textual search, concept detection and clustering, as well as a multimodal fusion and a reranking capability. All results are displayed in a graphical user interface in an efficient and friendly manner
ITI-CERTH participation in TRECVID 2016
<p>This paper provides an overview of the runs submitted to TRECVID 2016 by ITI-CERTH. ITI-CERTH participated in the Ad-hoc Video Search (AVS), Multimedia Event Detection (MED), Instance Search (INS) and Surveillance Event Detection (SED) tasks. Our AVS task participation is based on a method that combines the linguistic analysis of the query and the concept-based annotation of video fragments. In the MED task, in 000Ex task we exploit the textual description of an event class in order retrieve related videos, without using positive samples. Furthermore, in 010Ex and 1000Ex tasks, a kernel sub class version of our discriminant analysis method (KSDA) combined with a fast linear SVM is employed. The INS task is performed by employing VERGE, which is an interactive retrieval application that integrates retrieval functionalities that consider only visual information. For the surveillance event detection (SED) task, we deployed a novel activity detection algorithm that is based on Motion Boundary Activity Areas (MBAA), dense trajectories, Fisher vectors and an overlapping sliding window.</p
ITI - CERTH in TRECVID 2016 Ad - hoc Video Search (AVS)
<p>This presentation provides an overview of the runs submitted to TRECVID 2016 by ITI-CERTH in the Ad-hoc Video Search (AVS) task. Our AVS task participation is based on a method that combines the linguistic analysis of the query and the concept-based annotation of video fragments.</p