725 research outputs found
Semantic Part Segmentation using Compositional Model combining Shape and Appearance
In this paper, we study the problem of semantic part segmentation for
animals. This is more challenging than standard object detection, object
segmentation and pose estimation tasks because semantic parts of animals often
have similar appearance and highly varying shapes. To tackle these challenges,
we build a mixture of compositional models to represent the object boundary and
the boundaries of semantic parts. And we incorporate edge, appearance, and
semantic part cues into the compositional model. Given part-level segmentation
annotation, we develop a novel algorithm to learn a mixture of compositional
models under various poses and viewpoints for certain animal classes.
Furthermore, a linear complexity algorithm is offered for efficient inference
of the compositional model using dynamic programming. We evaluate our method
for horse and cow using a newly annotated dataset on Pascal VOC 2010 which has
pixelwise part labels. Experimental results demonstrate the effectiveness of
our method
Genetic CNN
The deep Convolutional Neural Network (CNN) is the state-of-the-art solution
for large-scale visual recognition. Following basic principles such as
increasing the depth and constructing highway connections, researchers have
manually designed a lot of fixed network structures and verified their
effectiveness.
In this paper, we discuss the possibility of learning deep network structures
automatically. Note that the number of possible network structures increases
exponentially with the number of layers in the network, which inspires us to
adopt the genetic algorithm to efficiently traverse this large search space. We
first propose an encoding method to represent each network structure in a
fixed-length binary string, and initialize the genetic algorithm by generating
a set of randomized individuals. In each generation, we define standard genetic
operations, e.g., selection, mutation and crossover, to eliminate weak
individuals and then generate more competitive ones. The competitiveness of
each individual is defined as its recognition accuracy, which is obtained via
training the network from scratch and evaluating it on a validation set. We run
the genetic process on two small datasets, i.e., MNIST and CIFAR10,
demonstrating its ability to evolve and find high-quality structures which are
little studied before. These structures are also transferrable to the
large-scale ILSVRC2012 dataset.Comment: Submitted to CVPR 2017 (10 pages, 5 figures
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations
We present a method for estimating articulated human pose from a single
static image based on a graphical model with novel pairwise relations that make
adaptive use of local image measurements. More precisely, we specify a
graphical model for human pose which exploits the fact the local image
measurements can be used both to detect parts (or joints) and also to predict
the spatial relationships between them (Image Dependent Pairwise Relations).
These spatial relationships are represented by a mixture model. We use Deep
Convolutional Neural Networks (DCNNs) to learn conditional probabilities for
the presence of parts and their spatial relationships within image patches.
Hence our model combines the representational flexibility of graphical models
with the efficiency and statistical power of DCNNs. Our method significantly
outperforms the state of the art methods on the LSP and FLIC datasets and also
performs very well on the Buffy dataset without any training.Comment: NIPS 2014 Camera Read
1 A Hierarchical Compositional System for Rapid Object Detection
We describe a hierarchical compositional system for detecting deformable objects in images. Objects are represented by graphical models. The algorithm uses a hierarchical tree where the root of the tree corresponds to the full object and lower-level elements of the tree correspond to simpler features. The algorithm proceeds by passing simple messages up and down the tree. The method works rapidly, in under a second, on 320 × 240 images. We demonstrate the approach on detecting cats, horses, and hands. The method works in the presence of background clutter and occlusions. Our approach is contrasted with more traditional methods such as dynamic programming and belief propagation.
- …