68 research outputs found
Hierarchical Cascade of Classifiers for Efficient Poselet Evaluation
Poselets have been used in a variety of computer vision tasks, such as detection, segmentation, action classification, pose estimation and action recognition, often achieving state-of-the-art performance. Poselet evaluation, however, is computationally intensive as it involves running thousands of scanning window classifiers. We present an algorithm for training a hierarchical cascade of part-based detectors and apply it to speed up poselet evaluation. Our cascade hierarchy leverages common components shared across poselets. We generate a family of cascade hierarchies, including trees that grow logarithmically on the number of poselet classifiers. Our algorithm, under some reasonable assumptions, finds the optimal tree structure that maximizes speed for a given target detection rate. We test our system on the PASCAL dataset and show an order of magnitude speedup at less than 1% loss in AP
Hierarchical Cascade of Classifiers for Efficient Poselet Evaluation
Poselets have been used in a variety of computer vision tasks, such as detection, segmentation, action classification, pose estimation and action recognition, often achieving state-of-the-art performance. Poselet evaluation, however, is computationally intensive as it involves running thousands of scanning window classifiers. We present an algorithm for training a hierarchical cascade of part-based detectors and apply it to speed up poselet evaluation. Our cascade hierarchy leverages common components shared across poselets. We generate a family of cascade hierarchies, including trees that grow logarithmically on the number of poselet classifiers. Our algorithm, under some reasonable assumptions, finds the optimal tree structure that maximizes speed for a given target detection rate. We test our system on the PASCAL dataset and show an order of magnitude speedup at less than 1% loss in AP
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
Recommended from our members
Object Part Localization Using Exemplar-based Models
​Object part localization is a fundamental problem in computer vision, which aims to let machines understand object in an image as a configuration of parts. As the visual features at parts are usually weak and misleading, spatial models are needed to constrain the part configuration, ensuring that the estimated part locations respect both image cue and shape prior. Unlike most of the state-of-the-art techniques that employ parametric spatial models, we turn to non-parametric exemplars of part configurations. The benefit is twofold: instead of assuming any parametric yet imprecise distributions on the spatial relations of parts, exemplars literally encode such relations present in the training samples; exemplars allow us to prune the search space of part configurations with high confidence.
This thesis consists of two parts: fine-grained classification and object part localization. We first verify the efficacy of parts in fine-grained classification, where we build working systems that automatically identify dog breeds, fish species, and bird species using localized parts on the object. Then we explore multiple ways to enhance exemplar-based models, such that they can be well applied to deformable objects such as bird and human body. Specifically, we propose to enforce pose and subcategory consistency in exemplar matching, thus obtaining more reliable hypotheses of configuration. We also propose part-pair representation that features novel shape composing with multiple promising hypotheses. In the end, we adapt exemplars to hierarchical representation, and design a principled formulation to predict the part configuration based on multi-scale image cues and multi-level exemplars. These efforts consistently improve the accuracy of object part localization
Discriminative latent variable models for visual recognition
Visual Recognition is a central problem in computer vision, and it has numerous potential applications in many dierent elds, such as robotics, human computer interaction, and entertainment. In this dissertation, we propose two discriminative latent variable models for handling challenging visual recognition problems. In particular, we use latent variables to capture and model various prior knowledge in the training data. In the rst model, we address the problem of recognizing human actions from still images. We jointly consider both poses and actions in a unied framework, and treat human poses as latent variables. The learning of this model follows the framework of latent SVM. Secondly, we propose another latent variable model to address the problem of automated tag learning on YouTube videos. In particular, we address the semantic variations (sub-tags) of the videos which have the same tag. In the model, each video is assumed to be associated with a sub-tag label, and we treat this sub-tag label as latent information. This model is trained using a latent learning framework based on LogitBoost, which jointly considers both the latent sub-tag label and the tag label. Moreover, we propose a novel discriminative latent learning framework, kernel latent SVM, which combines the benet of latent SVM and kernel methods. The framework of kernel latent SVM is general enough to be applied in many applications of visual recognition. It is also able to handle complex latent variables with interdependent structures using composite kernels
- …