8 research outputs found

    Machine Learning Architectures for Video Annotation and Retrieval

    Get PDF
    PhDIn this thesis we are designing machine learning methodologies for solving the problem of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc queries. Concept-based video annotation refers to the annotation of video fragments with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain objects, activities, locations etc., and combinations of the former. Our contributions are: i) A thorough analysis on extending and using different local descriptors towards improved concept-based video annotation and a stacking architecture that uses in the first layer, concept classifiers trained on local descriptors and improves their prediction accuracy by implicitly capturing concept relations, in the last layer of the stack. ii) A cascade architecture that orders and combines many classifiers, trained on different visual descriptors, for the same concept. iii) A deep learning architecture that exploits concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. At a second level, we build on ideas from structured output learning, and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that combines concept-based video annotation and textual query analysis, and transforms concept-based keyframe and query representations into a common semantic embedding space. Our architectures have been extensively evaluated on the TRECVID SIN 2013, the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness compared to other similar approaches

    Concept Language Models and Event-based Concept Number Selection for Zero-example Event Detection

    No full text
    <p>Zero-example event detection is a problem where, given an event query as input but no example videos for training a detector, the system retrieves the most closely related videos. In this paper we present a fully-automatic zero-example event detection method that is based on translating the event description to a predefined set of concepts for which previously trained visual concept detectors are available. We adopt the use of Concept Language Models (CLMs), which is a method of augmenting semantic concept definition, and we propose a new concept-selection method for deciding on the appropriate number of the concepts needed to describe an event query. The proposed system achieves state-of-the-art performance in automatic zero-example event detection.</p

    VERGE in VBS 2019

    No full text
    This paper presents VERGE, an interactive video retrieval engine that enables browsing and searching into video content. The system implements various retrieval modalities, such as visual or textual search, concept detection and clustering, as well as a multimodal fusion and a reranking capability. All results are displayed in a graphical user interface in an efficient and friendly manner

    ITI-CERTH participation in TRECVID 2016

    No full text
    <p>This paper provides an overview of the runs submitted to TRECVID 2016 by ITI-CERTH. ITI-CERTH participated in the Ad-hoc Video Search (AVS), Multimedia Event Detection (MED), Instance Search (INS) and Surveillance Event Detection (SED) tasks. Our AVS task participation is based on a method that combines the linguistic analysis of the query and the concept-based annotation of video fragments. In the MED task, in 000Ex task we exploit the textual description of an event class in order retrieve related videos, without using positive samples. Furthermore, in 010Ex and 1000Ex tasks, a kernel sub class version of our discriminant analysis method (KSDA) combined with a fast linear SVM is employed. The INS task is performed by employing VERGE, which is an interactive retrieval application that integrates retrieval functionalities that consider only visual information. For the surveillance event detection (SED) task, we deployed a novel activity detection algorithm that is based on Motion Boundary Activity Areas (MBAA), dense trajectories, Fisher vectors and an overlapping sliding window.</p

    ITI - CERTH in TRECVID 2016 Ad - hoc Video Search (AVS)

    No full text
    <p>This presentation provides an overview of the runs submitted to TRECVID 2016 by ITI-CERTH in the Ad-hoc Video Search (AVS) task. Our AVS task participation is based on a method that combines the linguistic analysis of the query and the concept-based annotation of video fragments.</p
    corecore