10 research outputs found
Boosted Random ferns for object detection
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.Peer ReviewedPostprint (author's final draft
Determining the Number of Batik Motif Object based on Hierarchical Symmetry Detection Approach
In certain conditions, symmetry can be used to describe objects in the batik motif efficiently. Symmetry can be defined based on three linear transformations of dimension n in Euclidian space in the form of translation and rotation. This concept is useful for detecting objects and recognising batik motifs. In this study, we conducted a study of the symmetry effect to determine the number of batik motif objects in an image using symmetry algorithm through a hierarchical approach. The process focuses on determining the intersection line of the batik motif object. Furthermore, by utilising intersection line information for bilateral and rotational symmetry, the number of objects carried out recursively is determined. The results obtained are numbers of batik motif objects through symmetry detection. This information will be used as a reference for batik motif detection. Based on the experimental results, there are some errors caused by the axis of the symmetry line that is not appropriate due to the characteristics of batik motifs. The problem is solved by adding several rules to detect symmetry line and to determine the number of objects. The additional rules increase the average accuracy of the number of object detection from 66.21% to 86.19% (19.99% increase)
Characterizing Objects in Images using Human Context
Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context
複数の特徴空間における投票システムを用いたロバストな物体検出手法
学位の種別:課程博士University of Tokyo(東京大学
Context-driven Object Detection and Segmentation with Auxiliary Information
One fundamental problem in computer vision and robotics is to
localize objects of interest in an image. The task can either be
formulated as an object detection problem if the objects are
described by a set of pose parameters, or an object segmentation
one if we recover object boundary precisely. A key issue in
object detection and segmentation concerns exploiting the spatial
context, as local evidence is often insufficient to determine
object pose in the presence of heavy occlusions or large object
appearance variations. This thesis addresses the object detection
and segmentation problem in such adverse conditions with
auxiliary depth data provided by RGBD cameras. We focus on four
main issues in context-aware object detection and segmentation:
1) what are the effective context representations? 2) how can we
work with limited and imperfect depth data? 3) how to design
depth-aware features and integrate depth cues into conventional
visual inference tasks? 4) how to make use of unlabeled data to
relax the labeling requirements for training data?
We discuss three object detection and segmentation scenarios
based on varying amounts of available auxiliary information. In
the first case, depth data are available for model training but
not available for testing. We propose a structured Hough voting
method for detecting objects with heavy occlusion in indoor
environments, in which we extend the Hough hypothesis space to
include both the object's location, and its visibility pattern.
We design a new score function that accumulates votes for object
detection and occlusion prediction. In addition, we explore the
correlation between objects and their environment, building a
depth-encoded object-context model based on RGBD data. In the
second case, we address the problem of localizing glass objects
with noisy and incomplete depth data. Our method integrates the
intensity and depth information from a single view point, and
builds a Markov Random Field that predicts glass boundary and
region jointly. In addition, we propose a nonparametric,
data-driven label transfer scheme for local glass boundary
estimation. A weighted voting scheme based on a joint feature
manifold is adopted to integrate depth and appearance cues, and
we learn a distance metric on the depth-encoded feature manifold.
In the third case, we make use of unlabeled data to relax the
annotation requirements for object detection and segmentation,
and propose a novel data-dependent margin distribution learning
criterion for boosting, which utilizes the intrinsic geometric
structure of datasets. One key aspect of this method is that it
can seamlessly incorporate unlabeled data by including a graph
Laplacian regularizer. We demonstrate the performance of our
models and compare with baseline methods on several real-world
object detection and segmentation tasks, including indoor object
detection, glass object segmentation and foreground segmentation
in video
Voting by Grouping Dependent Parts
Abstract. Hough voting methods efficiently handle the high complexity of multiscale, category-level object detection in cluttered scenes. The primary weakness of this approach is however that mutually dependent local observations are independently voting for intrinsically global object properties such as object scale. All the votes are added up to obtain object hypotheses. The assumption is thus that object hypotheses are a sum of independent part votes. Popular representation schemes are, however, based on an overlapping sampling of semi-local image features with large spatial support (e.g. SIFT or geometric blur). Features are thus mutually dependent and we incorporate these dependences into probabilistic Hough voting by presenting an objective function that combines three intimately related problems: i) grouping of mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Experiments successfully demonstrate that state-of-the-art Hough voting and even sliding windows are significantly improved by utilizing part dependences and jointly optimizing groups, correspondences, and votes.