28 research outputs found
Deep Structured Models for Large Scale Object Co-detection and Segmentation
Structured decisions are often required for a large variety of
image and scene understanding tasks in computer vision, with few
of them being object detection, localization, semantic
segmentation and many more. Structured prediction deals with
learning inherent structure by incorporating contextual
information from several images and multiple tasks. However, it
is very challenging when dealing with large scale image datasets
where performance is limited by high computational costs and
expressive power of the underlying representation learning
techniques. In this thesis,
we present efficient and effective deep structured models for
context-aware object detection, co-localization and
instance-level semantic segmentation.
First, we introduce a principled formulation for object
co-detection using a fully-connected conditional random field
(CRF). We build an explicit graph whose vertices represent object
candidates (instead of pixel values) and edges encode the object
similarity via simple, yet effective pairwise potentials. More
specifically, we design a weighted mixture of Gaussian kernels
for class-specific object similarity, and formulate kernel
weights estimation as a least-squares regression problem. Its
solution can therefore be obtained in closed-form. Furthermore,
in contrast with traditional co-detection approaches, it has been
shown that inference in such fully-connected CRFs can be
performed efficiently using an approximate mean-field method with
high-dimensional Gaussian filtering. This lets us effectively
leverage information in multiple images.
Next, we extend our class-specific co-detection framework to
multiple object categories. We model object candidates with rich,
high-dimensional features learned using a deep convolutional
neural network. In particular, our max-margin and directloss
structural boosting algorithms enable us to learn the most
suitable features that best encode pairwise similarity
relationships within our CRF framework. Furthermore, it
guarantees that the time and space complexity is O(n t) where n
is the total number of candidate boxes in the pool and t the
number of mean-field iterations.
Moreover, our experiments evidence the importance of learning
rich similarity measures to account for the contextual relations
across object classes and instances. However, all these methods
are based on precomputed object candidates (or proposals), thus
localization performance is limited by the quality of
bounding-boxes.
To address this, we present an efficient object proposal
co-generation technique that leverages the collective power of
multiple images. In particular, we design a deep neural network
layer that takes unary and pairwise features as input, builds a
fully-connected CRF and produces mean-field marginals as output.
It also lets us backpropagate the gradient through entire network
by unrolling the iterations of CRF inference. Furthermore, this
layer simplifies the end-to-end learning, thus effectively
benefiting from multiple candidates to co-generate high-quality
object proposals.
Finally, we develop a multi-task strategy to jointly learn object
detection, localization and instance-level semantic segmentation
in a single network. In particular, we introduce a novel
representation based on the distance transform of the object
masks. To this end, we design a new residual-deconvolution
architecture that infers such a representation and decodes it
into the final binary object mask. We show that the predicted
masks can go beyond the scope of the bounding boxes and that the
multiple tasks can benefit from each other.
In summary, in this thesis, we exploit the joint power of
multiple images as well as multiple tasks to improve
generalization performance of structured learning. Our novel deep
structured models, similarity learning techniques and
residual-deconvolution architecture can be used to make accurate
and reliable inference for key vision tasks. Furthermore, our
quantitative and qualitative experiments on large scale
challenging image datasets demonstrate the superiority of the
proposed approaches over the state-of-the-art methods
Nuclei instance segmentation with dual contour-enhanced adversarial network
The morphology of cancer cells is widely used by pathologists to grade stages of cancers. Accurate cancer cell segmentation is significant to obtain quantitative diagnosis. We proposed a dual contour-enhanced adversarial network to solve this challenge. The dual contour-enhanced masks and adversarial network are incorporated to improve individual cell segmentation capability. By evaluating quantitative individual cell segmentation results on 2017 MICCAI Digital Pathology Challenge, our method achieved best balance between precision and recall rate of individual cell segmentation compared to state-of-the-art cell segmentation methods