6,876 research outputs found

    Comparison of different integral histogram based tracking algorithms

    Get PDF
    Object tracking is an important subject in computer vision with a wide range of applications – security and surveillance, motion-based recognition, driver assistance systems, and human-computer interaction. The proliferation of high-powered computers, the availability of high quality and inexpensive video cameras, and the increasing need for automated video analysis have generated a great deal of interest in object tracking algorithms. Tracking is usually performed in the context of high-level applications that require the location and/or shape of the object in every frame. Research is being conducted in the development of object tracking algorithms over decades and a number of approaches have been proposed. These approaches differ from each other in object representation, feature selection, and modeling the shape and appearance of the object. Histogram-based tracking has been proved to be an efficient approach in many applications. Integral histogram is a novel method which allows the extraction of histograms of multiple rectangular regions in an image in a very efficient manner. A number of algorithms have used this function in their approaches in the recent years, which made an attempt to use the integral histogram in a more efficient manner. In this paper different algorithms which used this method as a part of their tracking function, are evaluated by comparing their tracking results and an effort is made to modify some of the algorithms for better performance. The sequences used for the tracking experiments are of gray scale (non-colored) and have significant shape and appearance variations for evaluating the performance of the algorithms. Extensive experimental results on these challenging sequences are presented, which demonstrate the tracking abilities of these algorithms

    ROAM: a Rich Object Appearance Model with Application to Rotoscoping

    Get PDF
    Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the segments of interest. Sticking to this prevalent rotoscoping paradigm, we propose a novel framework to capture and track the visual aspect of an arbitrary object in a scene, given a first closed outline of this object. This model combines a collection of local foreground/background appearance models spread along the outline, a global appearance model of the enclosed object and a set of distinctive foreground landmarks. The structure of this rich appearance model allows simple initialization, efficient iterative optimization with exact minimization at each step, and on-line adaptation in videos. We demonstrate qualitatively and quantitatively the merit of this framework through comparisons with tools based on either dynamic segmentation with a closed curve or pixel-wise binary labelling

    Learning from minimally labeled data with accelerated convolutional neural networks

    Get PDF
    The main objective of an Artificial Vision Algorithm is to design a mapping function that takes an image as an input and correctly classifies it into one of the user-determined categories. There are several important properties to be satisfied by the mapping function for visual understanding. First, the function should produce good representations of the visual world, which will be able to recognize images independently of pose, scale and illumination. Furthermore, the designed artificial vision system has to learn these representations by itself. Recent studies on Convolutional Neural Networks (ConvNets) produced promising advancements in visual understanding. These networks attain significant performance upgrades by relying on hierarchical structures inspired by biological vision systems. In my research, I work mainly in two areas: 1) how ConvNets can be programmed to learn the optimal mapping function using the minimum amount of labeled data, and 2) how these networks can be accelerated for practical purposes. In this work, algorithms that learn from unlabeled data are studied. A new framework that exploits unlabeled data is proposed. The proposed framework obtains state-of-the-art performance results in different tasks. Furthermore, this study presents an optimized streaming method for ConvNets’ hardware accelerator on an embedded platform. It is tested on object classification and detection applications using ConvNets. Experimental results indicate high computational efficiency, and significant performance upgrades over all other existing platforms

    Real-Time 6D Object Pose Estimation on CPU

    Full text link
    We propose a fast and accurate 6D object pose estimation from a RGB-D image. Our proposed method is template matching based and consists of three main technical components, PCOF-MOD (multimodal PCOF), balanced pose tree (BPT) and optimum memory rearrangement for a coarse-to-fine search. Our model templates on densely sampled viewpoints and PCOF-MOD which explicitly handles a certain range of 3D object pose improve the robustness against background clutters. BPT which is an efficient tree-based data structures for a large number of templates and template matching on rearranged feature maps where nearby features are linearly aligned accelerate the pose estimation. The experimental evaluation on tabletop and bin-picking dataset showed that our method achieved higher accuracy and faster speed in comparison with state-of-the-art techniques including recent CNN based approaches. Moreover, our model templates can be trained only from 3D CAD in a few minutes and the pose estimation run in near real-time (23 fps) on CPU. These features are suitable for any real applications.Comment: accepted to IROS 201
    • …
    corecore