374 research outputs found
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
Many approaches in generalized zero-shot learning rely on cross-modal mapping
between the image feature space and the class embedding space. As labeled
images are expensive, one direction is to augment the dataset by generating
either images or image features. However, the former misses fine-grained
details and the latter requires learning a mapping associated with class
embeddings. In this work, we take feature generation one step further and
propose a model where a shared latent space of image features and class
embeddings is learned by modality-specific aligned variational autoencoders.
This leaves us with the required discriminative information about the image and
classes in the latent features, on which we train a softmax classifier. The key
to our approach is that we align the distributions learned from images and from
side-information to construct latent features that contain the essential
multi-modal information associated with unseen classes. We evaluate our learned
latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2,
and establish a new state of the art on generalized zero-shot as well as on
few-shot learning. Moreover, our results on ImageNet with various zero-shot
splits show that our latent features generalize well in large-scale settings.Comment: Accepted at CVPR 201
Diversity inducing Information Bottleneck in Model Ensembles
Although deep learning models have achieved state-of-the-art performance on a
number of vision tasks, generalization over high dimensional multi-modal data,
and reliable predictive uncertainty estimation are still active areas of
research. Bayesian approaches including Bayesian Neural Nets (BNNs) do not
scale well to modern computer vision tasks, as they are difficult to train, and
have poor generalization under dataset-shift. This motivates the need for
effective ensembles which can generalize and give reliable uncertainty
estimates. In this paper, we target the problem of generating effective
ensembles of neural networks by encouraging diversity in prediction. We
explicitly optimize a diversity inducing adversarial loss for learning the
stochastic latent variables and thereby obtain diversity in the output
predictions necessary for modeling multi-modal data. We evaluate our method on
benchmark datasets: MNIST, CIFAR100, TinyImageNet and MIT Places 2, and
compared to the most competitive baselines show significant improvements in
classification accuracy, under a shift in the data distribution and in
out-of-distribution detection. Code will be released in this url
https://github.com/rvl-lab-utoronto/dibsComment: AAAI 2021. Samarth Sinha* and Homanga Bharadhwaj* contributed equally
to this wor
-Networks for Efficient Model Patching
Models pre-trained on large-scale datasets are often finetuned to support
newer tasks and datasets that arrive over time. This process necessitates
storing copies of the model over time for each task that the pre-trained model
is finetuned to. Building on top of recent model patching work, we propose
-Patching for finetuning neural network models in an efficient manner,
without the need to store model copies. We propose a simple and lightweight
method called -Networks to achieve this objective. Our comprehensive
experiments across setting and architecture variants show that
-Networks outperform earlier model patching work while only requiring a
fraction of parameters to be trained. We also show that this approach can be
used for other problem settings such as transfer learning and zero-shot domain
adaptation, as well as other tasks such as detection and segmentation
- …