94 research outputs found
BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections
We introduce an architecture for large-scale image categorization that
enables the end-to-end learning of separate visual features for the different
classes to distinguish. The proposed model consists of a deep CNN shaped like a
tree. The stem of the tree includes a sequence of convolutional layers common
to all classes. The stem then splits into multiple branches implementing
parallel feature extractors, which are ultimately connected to the final
classification layer via learned gated connections. These learned gates
determine for each individual class the subset of features to use. Such a
scheme naturally encourages the learning of a heterogeneous set of specialized
features through the separate branches and it allows each class to use the
subset of features that are optimal for its recognition. We show the generality
of our proposed method by reshaping several popular CNNs from the literature
into our proposed architecture. Our experiments on the CIFAR100, CIFAR10, and
Synth datasets show that in each case our resulting model yields a substantial
improvement in accuracy over the original CNN. Our empirical analysis also
suggests that our scheme acts as a form of beneficial regularization improving
generalization performance.Comment: WACV 201
Coupled Depth Learning
In this paper we propose a method for estimating depth from a single image
using a coarse to fine approach. We argue that modeling the fine depth details
is easier after a coarse depth map has been computed. We express a global
(coarse) depth map of an image as a linear combination of a depth basis learned
from training examples. The depth basis captures spatial and statistical
regularities and reduces the problem of global depth estimation to the task of
predicting the input-specific coefficients in the linear combination. This is
formulated as a regression problem from a holistic representation of the image.
Crucially, the depth basis and the regression function are {\bf coupled} and
jointly optimized by our learning scheme. We demonstrate that this results in a
significant improvement in accuracy compared to direct regression of depth
pixel values or approaches learning the depth basis disjointly from the
regression function. The global depth estimate is then used as a guidance by a
local refinement method that introduces depth details that were not captured at
the global level. Experiments on the NYUv2 and KITTI datasets show that our
method outperforms the existing state-of-the-art at a considerably lower
computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation
High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and its Applications to High-Level Vision
Most of the current boundary detection systems rely exclusively on low-level
features, such as color and texture. However, perception studies suggest that
humans employ object-level reasoning when judging if a particular pixel is a
boundary. Inspired by this observation, in this work we show how to predict
boundaries by exploiting object-level features from a pretrained
object-classification network. Our method can be viewed as a "High-for-Low"
approach where high-level object features inform the low-level boundary
detection process. Our model achieves state-of-the-art performance on an
established boundary detection benchmark and it is efficient to run.
Additionally, we show that due to the semantic nature of our boundaries we
can use them to aid a number of high-level vision tasks. We demonstrate that
using our boundaries we improve the performance of state-of-the-art methods on
the problems of semantic boundary labeling, semantic segmentation and object
proposal generation. We can view this process as a "Low-for-High" scheme, where
low-level boundaries aid high-level vision tasks.
Thus, our contributions include a boundary detection system that is accurate,
efficient, generalizes well to multiple datasets, and is also shown to improve
existing state-of-the-art high-level vision methods on three distinct tasks
- …