292 research outputs found

    Efficient Diverse Ensemble for Discriminative Co-Tracking

    Full text link
    Ensemble discriminative tracking utilizes a committee of classifiers, to label data samples, which are in turn, used for retraining the tracker to localize the target using the collective knowledge of the committee. Committee members could vary in their features, memory update schemes, or training data, however, it is inevitable to have committee members that excessively agree because of large overlaps in their version space. To remove this redundancy and have an effective ensemble learning, it is critical for the committee to include consistent hypotheses that differ from one-another, covering the version space with minimum overlaps. In this study, we propose an online ensemble tracker that directly generates a diverse committee by generating an efficient set of artificial training. The artificial data is sampled from the empirical distribution of the samples taken from both target and background, whereas the process is governed by query-by-committee to shrink the overlap between classifiers. The experimental results demonstrate that the proposed scheme outperforms conventional ensemble trackers on public benchmarks.Comment: CVPR 2018 Submissio

    Exploring Human Vision Driven Features for Pedestrian Detection

    Full text link
    Motivated by the center-surround mechanism in the human visual attention system, we propose to use average contrast maps for the challenge of pedestrian detection in street scenes due to the observation that pedestrians indeed exhibit discriminative contrast texture. Our main contributions are first to design a local, statistical multi-channel descriptorin order to incorporate both color and gradient information. Second, we introduce a multi-direction and multi-scale contrast scheme based on grid-cells in order to integrate expressive local variations. Contributing to the issue of selecting most discriminative features for assessing and classification, we perform extensive comparisons w.r.t. statistical descriptors, contrast measurements, and scale structures. This way, we obtain reasonable results under various configurations. Empirical findings from applying our optimized detector on the INRIA and Caltech pedestrian datasets show that our features yield state-of-the-art performance in pedestrian detection.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

    Learning Pose Invariant and Covariant Classifiers from Image Sequences

    Get PDF
    Object tracking and detection over a wide range of viewpoints is a long-standing problem in Computer Vision. Despite significant advance in wide-baseline sparse interest point matching and development of robust dense feature models, it remains a largely open problem. Moreover, abundance of low cost mobile platforms and novel application areas, such as real-time Augmented Reality, constantly push the performance limits of existing methods. There is a need to modify and adapt these to meet more stringent speed and capacity requirements. In this thesis, we aim to overcome the difficulties due to the multi-view nature of the object detection task. We significantly improve upon existing statistical keypoint matching algorithms to perform fast and robust recognition of image patches independently of object pose. We demonstrate this on various 2D and 3D datasets. The statistical keypoint matching approaches require massive amounts of training data covering a wide range of viewpoints. We have developed a weakly supervised algorithm to greatly simplify their training for 3D objects. We also integrate this algorithm in a 3D tracking-by-detection system to perform real-time Augmented Reality. Finally, we extend the use of a large training set with smooth viewpoint variation to category-level object detection. We introduce a new dataset with continuous pose annotations which we use to train pose estimators for objects of a single category. By using these estimators' output to select pose specific classifiers, our framework can simultaneously localize objects in an image and recover their pose. These decoupled pose estimation and classification steps yield improved detection rates. Overall, we rely on image and video sequences to train classifiers that can either operate independently of the object pose or recover the pose parameters explicitly. We show that in both cases our approaches mitigate the effects of viewpoint changes and improve the recognition performance

    Keypoint Recognition using Random Forests and Random Ferns

    Get PDF
    In many 3-D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system. We introduce an approach that takes advantage of this fact by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase and eliminates the need for expensive patch preprocessing, without sacrificing recognition performance. This makes our approach highly suitable for real-time operations on low-powered devices. To this end, we developed two related methods. The first uses Random Forests that rely on simple binary tests on image intensities surrounding the keypoints. In the second, we flatten the trees to turn them into simple bit strings, which we will refer to as Ferns, and combine their output in a Naive Bayesian manner. Surprisingly, the Ferns, while simpler, actually perform better than the trees. This is because the Naive Bayesian approach benefits more from the thousands of synthetic training examples we can generate than output averaging as usually performed by Random Forests. Furthermore, the more general partition that the trees allow does not appear to be of great use for our problem

    Defining the Pose of any 3D Rigid Object and an Associated Distance

    Get PDF
    The pose of a rigid object is usually regarded as a rigid transformation, described by a translation and a rotation. However, equating the pose space with the space of rigid transformations is in general abusive, as it does not account for objects with proper symmetries -- which are common among man-made objects.In this article, we define pose as a distinguishable static state of an object, and equate a pose with a set of rigid transformations. Based solely on geometric considerations, we propose a frame-invariant metric on the space of possible poses, valid for any physical rigid object, and requiring no arbitrary tuning. This distance can be evaluated efficiently using a representation of poses within an Euclidean space of at most 12 dimensions depending on the object's symmetries. This makes it possible to efficiently perform neighborhood queries such as radius searches or k-nearest neighbor searches within a large set of poses using off-the-shelf methods. Pose averaging considering this metric can similarly be performed easily, using a projection function from the Euclidean space onto the pose space. The practical value of those theoretical developments is illustrated with an application of pose estimation of instances of a 3D rigid object given an input depth map, via a Mean Shift procedure

    Ten Years of Pedestrian Detection, What Have We Learned?

    Full text link
    Paper-by-paper results make it easy to miss the forest for the trees.We analyse the remarkable progress of the last decade by discussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families of approaches, all currently reaching similar detection quality. Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance on the challenging Caltech-USA dataset.Comment: To appear in ECCV 2014 CVRSUAD workshop proceeding

    Evidential combination of pedestrian detectors

    Get PDF
    International audienceThe importance of pedestrian detection in many applications has led to the development of many algorithms. In this paper, we address the problem of combining the outputs of several detectors. A pre-trained pedestrian detector is seen as a black box returning a set of bounding boxes with associated scores. A calibration step is first conducted to transform those scores into a probability measure. The bounding boxes are then grouped into clusters and their scores are combined. Different combination strategies using the theory of belief functions are proposed and compared to probabilistic ones. A combination rule based on triangular norms is used to deal with dependencies among detectors. More than 30 state-of-the-art detectors were combined and tested on the Caltech Pedestrian Detection Benchmark. The best combination strategy outperforms the currently best performing detector by 9% in terms of log-average miss rate
    • …
    corecore