7,751 research outputs found
Image Segmentation Using Weak Shape Priors
The problem of image segmentation is known to become particularly challenging
in the case of partial occlusion of the object(s) of interest, background
clutter, and the presence of strong noise. To overcome this problem, the
present paper introduces a novel approach segmentation through the use of
"weak" shape priors. Specifically, in the proposed method, an segmenting active
contour is constrained to converge to a configuration at which its geometric
parameters attain their empirical probability densities closely matching the
corresponding model densities that are learned based on training samples. It is
shown through numerical experiments that the proposed shape modeling can be
regarded as "weak" in the sense that it minimally influences the segmentation,
which is allowed to be dominated by data-related forces. On the other hand, the
priors provide sufficient constraints to regularize the convergence of
segmentation, while requiring substantially smaller training sets to yield less
biased results as compared to the case of PCA-based regularization methods. The
main advantages of the proposed technique over some existing alternatives is
demonstrated in a series of experiments.Comment: 27 pages, 8 figure
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Iterative graph cuts for image segmentation with a nonlinear statistical shape prior
Shape-based regularization has proven to be a useful method for delineating
objects within noisy images where one has prior knowledge of the shape of the
targeted object. When a collection of possible shapes is available, the
specification of a shape prior using kernel density estimation is a natural
technique. Unfortunately, energy functionals arising from kernel density
estimation are of a form that makes them impossible to directly minimize using
efficient optimization algorithms such as graph cuts. Our main contribution is
to show how one may recast the energy functional into a form that is
minimizable iteratively and efficiently using graph cuts.Comment: Revision submitted to JMIV (02/24/13
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
Lucid Data Dreaming for Video Object Segmentation
Convolutional networks reach top quality in pixel-level video object
segmentation but require a large amount of training data (1k~100k) to deliver
such results. We propose a new training strategy which achieves
state-of-the-art results across three evaluation datasets while using 20x~1000x
less annotated data than competing methods. Our approach is suitable for both
single and multiple object segmentation. Instead of using large training sets
hoping to generalize across domains, we generate in-domain training data using
the provided annotation on the first frame of each video to synthesize ("lucid
dream") plausible future video frames. In-domain per-video training data allows
us to train high quality appearance- and motion-based models, as well as tune
the post-processing stage. This approach allows to reach competitive results
even when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the video object segmentation task a smaller
training set that is closer to the target domain is more effective. This
changes the mindset regarding how many training samples and general
"objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV
Unsupervised learning of human motion
An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful "foreground" features as well as features that arise from irrelevant background clutter - the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences
Multi-View Priors for Learning Detectors from Sparse Viewpoint Data
While the majority of today's object class models provide only 2D bounding
boxes, far richer output hypotheses are desirable including viewpoint,
fine-grained category, and 3D geometry estimate. However, models trained to
provide richer output require larger amounts of training data, preferably well
covering the relevant aspects such as viewpoint and fine-grained categories. In
this paper, we address this issue from the perspective of transfer learning,
and design an object class model that explicitly leverages correlations between
visual features. Specifically, our model represents prior distributions over
permissible multi-view detectors in a parametric way -- the priors are learned
once from training data of a source object class, and can later be used to
facilitate the learning of a detector for a target class. As we show in our
experiments, this transfer is not only beneficial for detectors based on
basic-level category representations, but also enables the robust learning of
detectors that represent classes at finer levels of granularity, where training
data is typically even scarcer and more unbalanced. As a result, we report
largely improved performance in simultaneous 2D object localization and
viewpoint estimation on a recent dataset of challenging street scenes.Comment: 13 pages, 7 figures, 4 tables, International Conference on Learning
Representations 201
- …