49,555 research outputs found
Are Face and Object Recognition Independent? A Neurocomputational Modeling Exploration
Are face and object recognition abilities independent? Although it is
commonly believed that they are, Gauthier et al.(2014) recently showed that
these abilities become more correlated as experience with nonface categories
increases. They argued that there is a single underlying visual ability, v,
that is expressed in performance with both face and nonface categories as
experience grows. Using the Cambridge Face Memory Test and the Vanderbilt
Expertise Test, they showed that the shared variance between Cambridge Face
Memory Test and Vanderbilt Expertise Test performance increases monotonically
as experience increases. Here, we address why a shared resource across
different visual domains does not lead to competition and to an inverse
correlation in abilities? We explain this conundrum using our
neurocomputational model of face and object processing (The Model, TM). Our
results show that, as in the behavioral data, the correlation between
subordinate level face and object recognition accuracy increases as experience
grows. We suggest that different domains do not compete for resources because
the relevant features are shared between faces and objects. The essential power
of experience is to generate a "spreading transform" for faces that generalizes
to objects that must be individuated. Interestingly, when the task of the
network is basic level categorization, no increase in the correlation between
domains is observed. Hence, our model predicts that it is the type of
experience that matters and that the source of the correlation is in the
fusiform face area, rather than in cortical areas that subserve basic level
categorization. This result is consistent with our previous modeling
elucidating why the FFA is recruited for novel domains of expertise (Tong et
al., 2008)
BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections
We introduce an architecture for large-scale image categorization that
enables the end-to-end learning of separate visual features for the different
classes to distinguish. The proposed model consists of a deep CNN shaped like a
tree. The stem of the tree includes a sequence of convolutional layers common
to all classes. The stem then splits into multiple branches implementing
parallel feature extractors, which are ultimately connected to the final
classification layer via learned gated connections. These learned gates
determine for each individual class the subset of features to use. Such a
scheme naturally encourages the learning of a heterogeneous set of specialized
features through the separate branches and it allows each class to use the
subset of features that are optimal for its recognition. We show the generality
of our proposed method by reshaping several popular CNNs from the literature
into our proposed architecture. Our experiments on the CIFAR100, CIFAR10, and
Synth datasets show that in each case our resulting model yields a substantial
improvement in accuracy over the original CNN. Our empirical analysis also
suggests that our scheme acts as a form of beneficial regularization improving
generalization performance.Comment: WACV 201
The Devil is in the Tails: Fine-grained Classification in the Wild
The world is long-tailed. What does this mean for computer vision and visual
recognition? The main two implications are (1) the number of categories we need
to consider in applications can be very large, and (2) the number of training
examples for most categories can be very small. Current visual recognition
algorithms have achieved excellent classification accuracy. However, they
require many training examples to reach peak performance, which suggests that
long-tailed distributions will not be dealt with well. We analyze this question
in the context of eBird, a large fine-grained classification dataset, and a
state-of-the-art deep network classification algorithm. We find that (a) peak
classification performance on well-represented categories is excellent, (b)
given enough data, classification performance suffers only minimally from an
increase in the number of classes, (c) classification performance decays
precipitously as the number of training examples decreases, (d) surprisingly,
transfer learning is virtually absent in current methods. Our findings suggest
that our community should come to grips with the question of long tails
Hard Mixtures of Experts for Large Scale Weakly Supervised Vision
Training convolutional networks (CNN's) that fit on a single GPU with
minibatch stochastic gradient descent has become effective in practice.
However, there is still no effective method for training large CNN's that do
not fit in the memory of a few GPU cards, or for parallelizing CNN training. In
this work we show that a simple hard mixture of experts model can be
efficiently trained to good effect on large scale hashtag (multilabel)
prediction tasks. Mixture of experts models are not new (Jacobs et. al. 1991,
Collobert et. al. 2003), but in the past, researchers have had to devise
sophisticated methods to deal with data fragmentation. We show empirically that
modern weakly supervised data sets are large enough to support naive
partitioning schemes where each data point is assigned to a single expert.
Because the experts are independent, training them in parallel is easy, and
evaluation is cheap for the size of the model. Furthermore, we show that we can
use a single decoding layer for all the experts, allowing a unified feature
embedding space. We demonstrate that it is feasible (and in fact relatively
painless) to train far larger models than could be practically trained with
standard CNN architectures, and that the extra capacity can be well used on
current datasets.Comment: Appearing in CVPR 201
- …