1,165 research outputs found
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion
Recent findings show that deep convolutional neural networks (DCNNs) do not
generalize well under partial occlusion. Inspired by the success of
compositional models at classifying partially occluded objects, we propose to
integrate compositional models and DCNNs into a unified deep model with innate
robustness to partial occlusion. We term this architecture Compositional
Convolutional Neural Network. In particular, we propose to replace the fully
connected classification head of a DCNN with a differentiable compositional
model. The generative nature of the compositional model enables it to localize
occluders and subsequently focus on the non-occluded parts of the object. We
conduct classification experiments on artificially occluded images as well as
real images of partially occluded objects from the MS-COCO dataset. The results
show that DCNNs do not classify occluded objects robustly, even when trained
with data that is strongly augmented with partial occlusions. Our proposed
model outperforms standard DCNNs by a large margin at classifying partially
occluded objects, even when it has not been exposed to occluded objects during
training. Additional experiments demonstrate that CompositionalNets can also
localize the occluders accurately, despite being trained with class labels
only. The code used in this work is publicly available.Comment: CVPR 2020; Code is available
https://github.com/AdamKortylewski/CompositionalNets; Supplementary material:
https://adamkortylewski.com/data/compnet_supp.pd
Visual Concepts and Compositional Voting
It is very attractive to formulate vision in terms of pattern theory
\cite{Mumford2010pattern}, where patterns are defined hierarchically by
compositions of elementary building blocks. But applying pattern theory to real
world images is currently less successful than discriminative methods such as
deep networks. Deep networks, however, are black-boxes which are hard to
interpret and can easily be fooled by adding occluding objects. It is natural
to wonder whether by better understanding deep networks we can extract building
blocks which can be used to develop pattern theoretic models. This motivates us
to study the internal representations of a deep network using vehicle images
from the PASCAL3D+ dataset. We use clustering algorithms to study the
population activities of the features and extract a set of visual concepts
which we show are visually tight and correspond to semantic parts of vehicles.
To analyze this we annotate these vehicles by their semantic parts to create a
new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised
part detectors. We show that visual concepts perform fairly well but are
outperformed by supervised discriminative methods such as Support Vector
Machines (SVM). We next give a more detailed analysis of visual concepts and
how they relate to semantic parts. Following this, we use the visual concepts
as building blocks for a simple pattern theoretical model, which we call
compositional voting. In this model several visual concepts combine to detect
semantic parts. We show that this approach is significantly better than
discriminative methods like SVM and deep networks trained specifically for
semantic part detection. Finally, we return to studying occlusion by creating
an annotated dataset with occlusion, called VehicleOcclusion, and show that
compositional voting outperforms even deep networks when the amount of
occlusion becomes large.Comment: It is accepted by Annals of Mathematical Sciences and Application
Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model
Amodal completion is a visual task that humans perform easily but which is
difficult for computer vision algorithms. The aim is to segment those object
boundaries which are occluded and hence invisible. This task is particularly
challenging for deep neural networks because data is difficult to obtain and
annotate. Therefore, we formulate amodal segmentation as an out-of-task and
out-of-distribution generalization problem. Specifically, we replace the fully
connected classifier in neural networks with a Bayesian generative model of the
neural network features. The model is trained from non-occluded images using
bounding box annotations and class labels only, but is applied to generalize
out-of-task to object segmentation and to generalize out-of-distribution to
segment occluded objects. We demonstrate how such Bayesian models can naturally
generalize beyond the training task labels when they learn a prior that models
the object's background context and shape. Moreover, by leveraging an outlier
process, Bayesian models can further generalize out-of-distribution to segment
partially occluded objects and to predict their amodal object boundaries. Our
algorithm outperforms alternative methods that use the same supervision by a
large margin, and even outperforms methods where annotated amodal segmentations
are used during training, when the amount of occlusion is large. Code is
publically available at https://github.com/YihongSun/Bayesian-Amodal
- …