12,793 research outputs found
A Deep-structured Conditional Random Field Model for Object Silhouette Tracking
In this work, we introduce a deep-structured conditional random field
(DS-CRF) model for the purpose of state-based object silhouette tracking. The
proposed DS-CRF model consists of a series of state layers, where each state
layer spatially characterizes the object silhouette at a particular point in
time. The interactions between adjacent state layers are established by
inter-layer connectivity dynamically determined based on inter-frame optical
flow. By incorporate both spatial and temporal context in a dynamic fashion
within such a deep-structured probabilistic graphical model, the proposed
DS-CRF model allows us to develop a framework that can accurately and
efficiently track object silhouettes that can change greatly over time, as well
as under different situations such as occlusion and multiple targets within the
scene. Experiment results using video surveillance datasets containing
different scenarios such as occlusion and multiple targets showed that the
proposed DS-CRF approach provides strong object silhouette tracking performance
when compared to baseline methods such as mean-shift tracking, as well as
state-of-the-art methods such as context tracking and boosted particle
filtering.Comment: 17 page
Coarse-to-Fine Lifted MAP Inference in Computer Vision
There is a vast body of theoretical research on lifted inference in
probabilistic graphical models (PGMs). However, few demonstrations exist where
lifting is applied in conjunction with top of the line applied algorithms. We
pursue the applicability of lifted inference for computer vision (CV), with the
insight that a globally optimal (MAP) labeling will likely have the same label
for two symmetric pixels. The success of our approach lies in efficiently
handling a distinct unary potential on every node (pixel), typical of CV
applications. This allows us to lift the large class of algorithms that model a
CV problem via PGM inference. We propose a generic template for coarse-to-fine
(C2F) inference in CV, which progressively refines an initial coarsely lifted
PGM for varying quality-time trade-offs. We demonstrate the performance of C2F
inference by developing lifted versions of two near state-of-the-art CV
algorithms for stereo vision and interactive image segmentation. We find that,
against flat algorithms, the lifted versions have a much superior anytime
performance, without any loss in final solution quality.Comment: Published in IJCAI 201
SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model
To realize human-like robot intelligence, a large-scale cognitive
architecture is required for robots to understand the environment through a
variety of sensors with which they are equipped. In this paper, we propose a
novel framework named Serket that enables the construction of a large-scale
generative model and its inference easily by connecting sub-modules to allow
the robots to acquire various capabilities through interaction with their
environments and others. We consider that large-scale cognitive models can be
constructed by connecting smaller fundamental models hierarchically while
maintaining their programmatic independence. Moreover, connected modules are
dependent on each other, and parameters are required to be optimized as a
whole. Conventionally, the equations for parameter estimation have to be
derived and implemented depending on the models. However, it becomes harder to
derive and implement those of a larger scale model. To solve these problems, in
this paper, we propose a method for parameter estimation by communicating the
minimal parameters between various modules while maintaining their programmatic
independence. Therefore, Serket makes it easy to construct large-scale models
and estimate their parameters via the connection of modules. Experimental
results demonstrated that the model can be constructed by connecting modules,
the parameters can be optimized as a whole, and they are comparable with the
original models that we have proposed
Pseudo Mask Augmented Object Detection
In this work, we present a novel and effective framework to facilitate object
detection with the instance-level segmentation information that is only
supervised by bounding box annotation. Starting from the joint object detection
and instance segmentation network, we propose to recursively estimate the
pseudo ground-truth object masks from the instance-level object segmentation
network training, and then enhance the detection network with top-down
segmentation feedbacks. The pseudo ground truth mask and network parameters are
optimized alternatively to mutually benefit each other. To obtain the promising
pseudo masks in each iteration, we embed a graphical inference that
incorporates the low-level image appearance consistency and the bounding box
annotations to refine the segmentation masks predicted by the segmentation
network. Our approach progressively improves the object detection performance
by incorporating the detailed pixel-wise information learned from the
weakly-supervised segmentation network. Extensive evaluation on the detection
task in PASCAL VOC 2007 and 2012 [12] verifies that the proposed approach is
effective
Unsupervised Learning of Complex Articulated Kinematic Structures combining Motion and Skeleton Information
In this paper we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view image sequence. In contrast to prior motion information based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology by a successive iterative merge process. The iterative merge process is guided by a skeleton distance function which is generated from a novel object boundary generation method from sparse points. Our main contributions can be summarised as follows: (i) Unsupervised complex articulated kinematic structure learning by combining motion and skeleton information. (ii) Iterative fine-to-coarse merging strategy for adaptive motion segmentation and structure smoothing. (iii) Skeleton estimation from sparse feature points. (iv) A new highly articulated object dataset containing multi-stage complexity with ground truth. Our experiments show that the proposed method out-performs state-of-the-art methods both quantitatively and qualitatively
- …