Search CORE

7,254 research outputs found

TRECVID 2008 - goals, tasks, data, evaluation mechanisms and metrics

Author: Awad George M.
Fiscus Jon
Kraaij Wessel
Over Paul
Rose Travis
Smeaton Alan F.
Publication venue: National Institute for Standards and Technology (NIST)
Publication date: 17/11/2008
Field of study

The TREC Video Retrieval Evaluation (TRECVID) 2008 is a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last 7 years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. In 2008, 77 teams (see Table 1) from various research organizations --- 24 from Asia, 39 from Europe, 13 from North America, and 1 from Australia --- participated in one or more of five tasks: high-level feature extraction, search (fully automatic, manually assisted, or interactive), pre-production video (rushes) summarization, copy detection, or surveillance event detection. The copy detection and surveillance event detection tasks are being run for the first time in TRECVID. This paper presents an overview of TRECVid in 2008

Irish Universities

DCU Online Research Access Service

Radboud Repository

TRECVID 2007 - Overview

Author: Awad George M.
Kraaij Wessel
Over Paul
Smeaton Alan F.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2007
Field of study

DCU Online Research Access Service

A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving

Author: Keutzer Kurt
Sangiovanni-Vincentelli Alberto L.
Seshia Sanjit A.
Wu Bichen
Yue Xiangyu
Publication venue
Publication date: 01/01/2018
Field of study

3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate point-level labels from a computer game. The framework supports data collection from both auto-driving scenes and user-configured scenes. Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic calibration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Author: Geiger Andreas
Kiefel Martin
Sun Ming-Ting
Xie Jun
Publication venue
Publication date: 12/04/2016
Field of study

Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition (CVPR), 201

arXiv.org e-Print Archive

MPG.PuRe

SAVASA project @ TRECVID 2012: interactive surveillance event detection

Author: Clawson Kathy
Direkoglu Cem
Gimenez Roberto
Jargalsaikhan Iveel
Li Hao
Little Suzanne
Martinez Llorens Ana
Mereu Anna
Nieto Marcos
O'Connor Noel E.
Rodriguez Aitor
Sanchez Pedro
Santos de la Camara Raul
Smeaton Alan F.
Villarroel Peniza Karina
Publication venue
Publication date: 26/11/2012
Field of study

In this paper we describe our participation in the interactive surveillance event detection task at TRECVid 2012. The system we developed was comprised of individual classifiers brought together behind a simple video search interface that enabled users to select relevant segments based on down~sampled animated gifs. Two types of user -- `experts' and `end users' -- performed the evaluations. Due to time constraints we focussed on three events -- ObjectPut, PersonRuns and Pointing -- and two of the five available cameras (1 and 3). Results from the interactive runs as well as discussion of the performance of the underlying retrospective classifiers are presented

DCU Online Research Access Service

Crowdsourcing step-by-step information extraction to enhance existing how-to videos

Author: Gajos Krzysztof Z.
Guo Philip J.
Kim Ju Ho
Miller Robert C.
Nguyen Phu Tran
Weir Sarah
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Millions of learners today use how-to videos to master new skills in a variety of domains. But browsing such videos is often tedious and inefficient because video player interfaces are not optimized for the unique step-by-step structure of such videos. This research aims to improve the learning experience of existing how-to videos with step-by-step annotations. We first performed a formative study to verify that annotations are actually useful to learners. We created ToolScape, an interactive video player that displays step descriptions and intermediate result thumbnails in the video timeline. Learners in our study performed better and gained more self-efficacy using ToolScape versus a traditional video player. To add the needed step annotations to existing how-to videos at scale, we introduce a novel crowdsourcing workflow. It extracts step-by-step structure from an existing video, including step times, descriptions, and before and after images. We introduce the Find-Verify-Expand design pattern for temporal and visual annotation, which applies clustering, text processing, and visual analysis algorithms to merge crowd output. The workflow does not rely on domain-specific customization, works on top of existing videos, and recruits untrained crowd workers. We evaluated the workflow with Mechanical Turk, using 75 cooking, makeup, and Photoshop videos on YouTube. Results show that our workflow can extract steps with a quality comparable to that of trained annotators across all three domains with 77% precision and 81% recall

CiteSeerX

DSpace@MIT