309 research outputs found
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
Segmentation Given Partial Grouping Constraints
We consider data clustering problems where partial grouping is known a priori. We formulate such biased grouping problems as a constrained optimization problem, where structural properties of the data define the goodness of a grouping and partial grouping cues define the feasibility of a grouping. We enforce grouping smoothness and fairness on labeled data points so that sparse partial grouping information can be effectively propagated to the unlabeled data. Considering the normalized cuts criterion in particular, our formulation leads to a constrained eigenvalue problem. By generalizing the Rayleigh-Ritz theorem to projected matrices, we find the global optimum in the relaxed continuous domain by eigendecomposition, from which a near-global optimum to the discrete labeling problem can be obtained effectively. We apply our method to real image segmentation problems, where partial grouping priors can often be derived based on a crude spatial attentional map that binds places with common salient features or focuses on expected object locations. We demonstrate not only that it is possible to integrate both image structures and priors in a single grouping process, but also that objects can be segregated from the background without specific object knowledge
Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters
Convolution is the main building block of convolutional neural networks
(CNN). We observe that an optimized CNN often has highly correlated filters as
the number of channels increases with depth, reducing the expressive power of
feature representations. We propose Tied Block Convolution (TBC) that shares
the same thinner filters over equal blocks of channels and produces multiple
responses with a single filter. The concept of TBC can also be extended to
group convolution and fully connected layers, and can be applied to various
backbone networks and attention modules. Our extensive experimentation on
classification, detection, instance segmentation, and attention demonstrates
TBC's significant across-the-board gain over standard convolution and group
convolution. The proposed TiedSE attention module can even use 64 times fewer
parameters than the SE module to achieve comparable performance. In particular,
standard CNNs often fail to accurately aggregate information in the presence of
occlusion and result in multiple redundant partial object proposals. By sharing
filters across channels, TBC reduces correlation and can effectively handle
highly overlapping instances. TBC increases the average precision for object
detection on MS-COCO by 6% when the occlusion ratio is 80%. Our code will be
released.Comment: 13 page
- …