19 research outputs found

    Learning to detect video events from zero or very few video examples

    Get PDF
    In this work we deal with the problem of high-level event detection in video. Specifically, we study the challenging problems of i) learning to detect video events from solely a textual description of the event, without using any positive video examples, and ii) additionally exploiting very few positive training samples together with a small number of ``related'' videos. For learning only from an event's textual description, we first identify a general learning framework and then study the impact of different design choices for various stages of this framework. For additionally learning from example videos, when true positive training samples are scarce, we employ an extension of the Support Vector Machine that allows us to exploit ``related'' event videos by automatically introducing different weights for subsets of the videos in the overall training set. Experimental evaluations performed on the large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for publicatio

    TRECVID 2014 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics

    No full text
    International audienceThe TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort

    Deliverable D7.7 Dissemination and Standardisation Report v3

    Get PDF
    This deliverable presents the LinkedTV dissemination and standardisation report for the project period of months 31 to 42 (April 2014 to March 2015)

    Deliverable D9.3 Final Project Report

    Get PDF
    This document comprises the final report of LinkedTV. It includes a publishable summary, a plan for use and dissemination of foreground and a report covering the wider societal implications of the project in the form of a questionnaire

    An overview on the evaluated video retrieval tasks at TRECVID 2022

    Full text link
    The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology. Over the last twenty-one years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2022 planned for the following six tasks: Ad-hoc video search, Video to text captioning, Disaster scene description and indexing, Activity in extended videos, deep video understanding, and movie summarization. In total, 35 teams from various research organizations worldwide signed up to join the evaluation campaign this year. This paper introduces the tasks, datasets used, evaluation frameworks and metrics, as well as a high-level results overview.Comment: arXiv admin note: substantial text overlap with arXiv:2104.13473, arXiv:2009.0998

    Maximum Margin Learning Under Uncertainty

    Get PDF
    PhDIn this thesis we study the problem of learning under uncertainty using the statistical learning paradigm. We rst propose a linear maximum margin classi er that deals with uncertainty in data input. More speci cally, we reformulate the standard Support Vector Machine (SVM) framework such that each training example can be modeled by a multi-dimensional Gaussian distribution described by its mean vector and its covariance matrix { the latter modeling the uncertainty. We address the classi cation problem and de ne a cost function that is the expected value of the classical SVM cost when data samples are drawn from the multi-dimensional Gaussian distributions that form the set of the training examples. Our formulation approximates the classical SVM formulation when the training examples are isotropic Gaussians with variance tending to zero. We arrive at a convex optimization problem, which we solve e - ciently in the primal form using a stochastic gradient descent approach. The resulting classi er, which we name SVM with Gaussian Sample Uncertainty (SVM-GSU), is tested on synthetic data and ve publicly available and popular datasets; namely, the MNIST, WDBC, DEAP, TV News Channel Commercial Detection, and TRECVID MED datasets. Experimental results verify the e ectiveness of the proposed method. Next, we extended the aforementioned linear classi er so as to lead to non-linear decision boundaries, using the RBF kernel. This extension, where we use isotropic input uncertainty and we name Kernel SVM with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU), is used in the problems of video event detection and video aesthetic quality assessment. The experimental results show that exploiting input uncertainty, especially in problems where only a limited number of positive training examples are provided, can lead to better classi cation, detection, or retrieval performance. Finally, we present a preliminary study on how the above ideas can be used under the deep convolutional neural networks learning paradigm so as to exploit inherent sources of uncertainty, such as spatial pooling operations, that are usually used in deep networks

    Deliverable D1.4 Visual, text and audio information analysis for hypervideo, final release

    Get PDF
    Having extensively evaluated the performance of the technologies included in the first release of WP1 multimedia analysis tools, using content from the LinkedTV scenarios and by participating in international benchmarking activities, concrete decisions regarding the appropriateness and the importance of each individual method or combination of methods were made, which, combined with an updated list of information needs for each scenario, led to a new set of analysis requirements that had to be addressed through the release of the final set of analysis techniques of WP1. To this end, coordinated efforts on three directions, including (a) the improvement of a number of methods in terms of accuracy and time efficiency, (b) the development of new technologies and (c) the definition of synergies between methods for obtaining new types of information via multimodal processing, resulted in the final bunch of multimedia analysis methods for video hyperlinking. Moreover, the different developed analysis modules have been integrated into a web-based infrastructure, allowing the fully automatic linking of the multitude of WP1 technologies and the overall LinkedTV platform

    Deliverable D7.5 LinkedTV Dissemination and Standardisation Report v2

    Get PDF
    This deliverable presents the LinkedTV dissemination and standardisation report for the project period of months 19 to 30 (April 2013 to March 2014)
    corecore