6,876 research outputs found
Comparison of different integral histogram based tracking algorithms
Object tracking is an important subject in computer vision with a wide range of applications – security and surveillance, motion-based recognition, driver assistance systems, and human-computer interaction. The proliferation of high-powered computers, the availability of high quality and inexpensive video cameras, and the increasing need for automated video analysis have generated a great deal of interest in object tracking algorithms. Tracking is usually performed in the context of high-level applications that require the location and/or shape of the object in every frame. Research is being conducted in the development of object tracking algorithms over decades and a number of approaches have been proposed. These approaches differ from each other in object representation, feature selection, and modeling the shape and appearance of the object. Histogram-based tracking has been proved to be an efficient approach in many applications. Integral histogram is a novel method which allows the extraction of histograms of multiple rectangular regions in an image in a very efficient manner. A number of algorithms have used this function in their approaches in the recent years, which made an attempt to use the integral histogram in a more efficient manner. In this paper different algorithms which used this method as a part of their tracking function, are evaluated by comparing their tracking results and an effort is made to modify some of the algorithms for better performance. The sequences used for the tracking experiments are of gray scale (non-colored) and have significant shape and appearance variations for evaluating the performance of the algorithms. Extensive experimental results on these challenging sequences are presented, which demonstrate the tracking abilities of these algorithms
ROAM: a Rich Object Appearance Model with Application to Rotoscoping
Rotoscoping, the detailed delineation of scene elements through a video shot,
is a painstaking task of tremendous importance in professional post-production
pipelines. While pixel-wise segmentation techniques can help for this task,
professional rotoscoping tools rely on parametric curves that offer the artists
a much better interactive control on the definition, editing and manipulation
of the segments of interest. Sticking to this prevalent rotoscoping paradigm,
we propose a novel framework to capture and track the visual aspect of an
arbitrary object in a scene, given a first closed outline of this object. This
model combines a collection of local foreground/background appearance models
spread along the outline, a global appearance model of the enclosed object and
a set of distinctive foreground landmarks. The structure of this rich
appearance model allows simple initialization, efficient iterative optimization
with exact minimization at each step, and on-line adaptation in videos. We
demonstrate qualitatively and quantitatively the merit of this framework
through comparisons with tools based on either dynamic segmentation with a
closed curve or pixel-wise binary labelling
Learning from minimally labeled data with accelerated convolutional neural networks
The main objective of an Artificial Vision Algorithm is to design a mapping function that takes an image as an input and correctly classifies it into one of the user-determined categories. There are several important properties to be satisfied by the mapping function for visual understanding. First, the function should produce good representations of the visual world, which will be able to recognize images independently of pose, scale and illumination. Furthermore, the designed artificial vision system has to learn these representations by itself. Recent studies on Convolutional Neural Networks (ConvNets) produced promising advancements in visual understanding. These networks attain significant performance upgrades by relying on hierarchical structures inspired by biological vision systems. In my research, I work mainly in two areas: 1) how ConvNets can be programmed to learn the optimal mapping function using the minimum amount of labeled data, and 2) how these networks can be accelerated for practical purposes. In this work, algorithms that learn from unlabeled data are studied. A new framework that exploits unlabeled data is proposed. The proposed framework obtains state-of-the-art performance results in different tasks.
Furthermore, this study presents an optimized streaming method for ConvNets’ hardware accelerator on an embedded platform. It is tested on object classification and detection applications using ConvNets. Experimental results indicate high computational efficiency, and significant performance upgrades over all other existing platforms
Real-Time 6D Object Pose Estimation on CPU
We propose a fast and accurate 6D object pose estimation from a RGB-D image.
Our proposed method is template matching based and consists of three main
technical components, PCOF-MOD (multimodal PCOF), balanced pose tree (BPT) and
optimum memory rearrangement for a coarse-to-fine search. Our model templates
on densely sampled viewpoints and PCOF-MOD which explicitly handles a certain
range of 3D object pose improve the robustness against background clutters. BPT
which is an efficient tree-based data structures for a large number of
templates and template matching on rearranged feature maps where nearby
features are linearly aligned accelerate the pose estimation. The experimental
evaluation on tabletop and bin-picking dataset showed that our method achieved
higher accuracy and faster speed in comparison with state-of-the-art techniques
including recent CNN based approaches. Moreover, our model templates can be
trained only from 3D CAD in a few minutes and the pose estimation run in near
real-time (23 fps) on CPU. These features are suitable for any real
applications.Comment: accepted to IROS 201
- …