Search CORE

13,309 research outputs found

Beyond standard benchmarks: Parameterizing performance evaluation in visual object tracking

Author: Kristan Matej
Leonardis Aleš
Lukežič Alan
Zajc Luka Čehovin
Publication venue
Publication date: 25/03/2017
Field of study

Object-to-camera motion produces a variety of apparent motion patterns that significantly affect performance of short-term visual trackers. Despite being crucial for designing robust trackers, their influence is poorly explored in standard benchmarks due to weakly defined, biased and overlapping attribute annotations. In this paper we propose to go beyond pre-recorded benchmarks with post-hoc annotations by presenting an approach that utilizes omnidirectional videos to generate realistic, consistently annotated, short-term tracking scenarios with exactly parameterized motion patterns. We have created an evaluation system, constructed a fully annotated dataset of omnidirectional videos and the generators for typical motion patterns. We provide an in-depth analysis of major tracking paradigms which is complementary to the standard benchmarks and confirms the expressiveness of our evaluation approach

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Egocentric Hand Detection Via Dynamic Region Growing

Author: He Shengfeng
Huang Shao
Lau Rynson W. H.
Wang Weiqiang
Publication venue
Publication date: 09/11/2017
Field of study

Egocentric videos, which mainly record the activities carried out by the users of the wearable cameras, have drawn much research attentions in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition and social interaction understanding. In this work, we propose a dynamic region growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. We first determine seed regions that most likely belong to the hand, by analyzing the motion patterns across successive frames. The hand regions can then be located by extending from the seed regions, according to the scores computed for the adjacent superpixels. These scores are derived from four egocentric cues: contrast, location, position consistency and appearance continuity. We discuss how to apply the proposed method in real-life scenarios, where multiple hands irregularly appear and disappear from the videos. Experimental results on public datasets show that the proposed method achieves superior performance compared with the state-of-the-art methods, especially in complicated scenarios

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

RGB-D datasets using microsoft kinect or similar sensors: a survey

Author: Galili
Guan
Hu
Kolner
Mulvad
Nakazawa
Palushani
Palushani
Publication venue: Springer
Publication date: 01/01/2015
Field of study

RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

Northumbria University Research Portal

Crossref

Springer - Publisher Connector

Online Research Database In Technology

Long-Term Visual Object Tracking Benchmark

Author: AW Smeulders
B Babenko
C Vondrick
D Held
H Grabner
H Li
J Zhang
Jack Valmadre
JF Henriques
JF Henriques
M Danelljan
M Kristan
M Kumar
M Mueller
P Liang
WL Lu
Y Hua
Y Li
Y Wu
Z Kalal
Publication venue
Publication date: 01/01/2019
Field of study

We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. The proposed dataset paves a way to suitably assess long term tracking performance and train better deep learning architectures (avoiding/reducing augmentation, which may not reflect real world behaviour). We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. We further present thorough qualitative and quantitative evaluation highlighting the importance of long term aspect of tracking. Our most interesting observations are (a) existing short sequence benchmarks fail to bring out the inherent differences in tracking algorithms which widen up while tracking on long sequences and (b) the accuracy of trackers abruptly drops on challenging long sequences, suggesting the potential need of research efforts in the direction of long-term tracking.Comment: ACCV 2018 (Oral

arXiv.org e-Print Archive

Crossref

FieldSAFE: Dataset for Obstacle Detection in Agriculture

Author: Christiansen Peter
Green Ole
Jørgensen Rasmus Nyholm
Karstoft Henrik
Kragh Mikkel Fly
Larsen Morten
Laursen Morten Stigaard
Steen Kim Arild
Publication venue: 'MDPI AG'
Publication date: 11/09/2017
Field of study

In this paper, we present a novel multi-modal dataset for obstacle detection in agriculture. The dataset comprises approximately 2 hours of raw sensor data from a tractor-mounted sensor system in a grass mowing scenario in Denmark, October 2016. Sensing modalities include stereo camera, thermal camera, web camera, 360-degree camera, lidar, and radar, while precise localization is available from fused IMU and GNSS. Both static and moving obstacles are present including humans, mannequin dolls, rocks, barrels, buildings, vehicles, and vegetation. All obstacles have ground truth object labels and geographic coordinates.Comment: Submitted to special issue of MDPI Sensors: Sensors in Agricultur

arXiv.org e-Print Archive

Directory of Open Access Journals

Illumination invariant stationary object detection

Author: Bhargav Mitra
Chang J.‐Y.
Chris Chatwin
Nagachetan Bangalore
Philip Birch
Rupert Young
Tian Y.
Waqas Hassan
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/02/2013
Field of study

A real-time system for the detection and tracking of moving objects that becomes stationary in a restricted zone. A new pixel classification method based on the segmentation history image is used to identify stationary objects in the scene. These objects are then tracked using a novel adaptive edge orientation-based tracking method. Experimental results have shown that the tracking technique gives more than a 95% detection success rate, even if objects are partially occluded. The tracking results, together with the historic edge maps, are analysed to remove objects that are no longer stationary or are falsely identified as foreground regions because of sudden changes in the illumination conditions. The technique has been tested on over 7 h of video recorded at different locations and time of day, both outdoors and indoors. The results obtained are compared with other available state-of-the-art methods

Crossref

Directory of Open Access Journals

Sussex Research Online