70 research outputs found
The Lov\'asz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
The Jaccard index, also referred to as the intersection-over-union score, is
commonly employed in the evaluation of image segmentation results given its
perceptual qualities, scale invariance - which lends appropriate relevance to
small objects, and appropriate counting of false negatives, in comparison to
per-pixel losses. We present a method for direct optimization of the mean
intersection-over-union loss in neural networks, in the context of semantic
image segmentation, based on the convex Lov\'asz extension of submodular
losses. The loss is shown to perform better with respect to the Jaccard index
measure than the traditionally used cross-entropy loss. We show quantitative
and qualitative differences between optimizing the Jaccard index per image
versus optimizing the Jaccard index taken over an entire dataset. We evaluate
the impact of our method in a semantic segmentation pipeline and show
substantially improved intersection-over-union segmentation scores on the
Pascal VOC and Cityscapes datasets using state-of-the-art deep learning
segmentation architectures.Comment: Accepted as a conference paper at CVPR 201
MultiGrain: a unified image embedding for classes and instances
MultiGrain is a network architecture producing compact vector representations
that are suited both for image classification and particular object retrieval.
It builds on a standard classification trunk. The top of the network produces
an embedding containing coarse and fine-grained information, so that images can
be recognized based on the object class, particular object, or if they are
distorted copies. Our joint training is simple: we minimize a cross-entropy
loss for classification and a ranking loss that determines if two images are
identical up to data augmentation, with no need for additional labels. A key
component of MultiGrain is a pooling layer that takes advantage of
high-resolution images with a network trained at a lower resolution.
When fed to a linear classifier, the learned embeddings provide
state-of-the-art classification accuracy. For instance, we obtain 79.4% top-1
accuracy with a ResNet-50 learned on Imagenet, which is a +1.8% absolute
improvement over the AutoAugment method. When compared with the cosine
similarity, the same embeddings perform on par with the state-of-the-art for
image retrieval at moderate resolutions
- …