6,202 research outputs found
Evaluation of Output Embeddings for Fine-Grained Image Classification
Image classification has advanced significantly in recent years with the
availability of large-scale image sets. However, fine-grained classification
remains a major challenge due to the annotation cost of large numbers of
fine-grained categories. This project shows that compelling classification
performance can be achieved on such categories even without labeled training
data. Given image and class embeddings, we learn a compatibility function such
that matching embeddings are assigned a higher score than mismatching ones;
zero-shot classification of an image proceeds by finding the label yielding the
highest joint compatibility score. We use state-of-the-art image features and
focus on different supervised attributes and unsupervised output embeddings
either derived from hierarchies or learned from unlabeled text corpora. We
establish a substantially improved state-of-the-art on the Animals with
Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate
that purely unsupervised output embeddings (learned from Wikipedia and improved
with fine-grained text) achieve compelling results, even outperforming the
previous supervised state-of-the-art. By combining different output embeddings,
we further improve results.Comment: @inproceedings {ARWLS15, title = {Evaluation of Output Embeddings for
Fine-Grained Image Classification}, booktitle = {IEEE Computer Vision and
Pattern Recognition}, year = {2015}, author = {Zeynep Akata and Scott Reed
and Daniel Walter and Honglak Lee and Bernt Schiele}
Beyond One-hot Encoding: lower dimensional target embedding
Target encoding plays a central role when learning Convolutional Neural
Networks. In this realm, One-hot encoding is the most prevalent strategy due to
its simplicity. However, this so widespread encoding schema assumes a flat
label space, thus ignoring rich relationships existing among labels that can be
exploited during training. In large-scale datasets, data does not span the full
label space, but instead lies in a low-dimensional output manifold. Following
this observation, we embed the targets into a low-dimensional space,
drastically improving convergence speed while preserving accuracy. Our
contribution is two fold: (i) We show that random projections of the label
space are a valid tool to find such lower dimensional embeddings, boosting
dramatically convergence rates at zero computational cost; and (ii) we propose
a normalized eigenrepresentation of the class manifold that encodes the targets
with minimal information loss, improving the accuracy of random projections
encoding while enjoying the same convergence rates. Experiments on CIFAR-100,
CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach
drastically improves convergence speed while reaching very competitive accuracy
rates.Comment: Published at Image and Vision Computin
- …