5 research outputs found
Machine Learning Architectures for Video Annotation and Retrieval
PhDIn this thesis we are designing machine learning methodologies for solving the problem
of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc
queries. Concept-based video annotation refers to the annotation of video fragments
with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain
objects, activities, locations etc., and combinations of the former. Our contributions
are: i) A thorough analysis on extending and using different local descriptors towards
improved concept-based video annotation and a stacking architecture that uses in the
first layer, concept classifiers trained on local descriptors and improves their prediction
accuracy by implicitly capturing concept relations, in the last layer of the stack. ii)
A cascade architecture that orders and combines many classifiers, trained on different
visual descriptors, for the same concept. iii) A deep learning architecture that exploits
concept relations at two different levels. At the first level, we build on ideas from
multi-task learning, and propose an approach to learn concept-specific representations
that are sparse, linear combinations of representations of latent concepts. At a second
level, we build on ideas from structured output learning, and propose the introduction,
at training time, of a new cost term that explicitly models the correlations between
the concepts. By doing so, we explicitly model the structure in the output space
(i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that
combines concept-based video annotation and textual query analysis, and transforms
concept-based keyframe and query representations into a common semantic embedding
space. Our architectures have been extensively evaluated on the TRECVID SIN 2013,
the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness
compared to other similar approaches
Informedia @ TRecviD 2016 Med and AVs
We report on our system used in the TRECVID 2016 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 000Ex, 010Ex and 100Ex settings for the Pre-specified Events. On the AVS task, the CMU team submitted runs for fully-automatic system with no annotation condition