541 research outputs found
Why do These Match? Explaining the Behavior of Image Similarity Models
Explaining a deep learning model can help users understand its behavior and
allow researchers to discern its shortcomings. Recent work has primarily
focused on explaining models for tasks like image classification or visual
question answering. In this paper, we introduce Salient Attributes for Network
Explanation (SANE) to explain image similarity models, where a model's output
is a score measuring the similarity of two inputs rather than a classification
score. In this task, an explanation depends on both of the input images, so
standard methods do not apply. Our SANE explanations pairs a saliency map
identifying important image regions with an attribute that best explains the
match. We find that our explanations provide additional information not
typically captured by saliency maps alone, and can also improve performance on
the classic task of attribute recognition. Our approach's ability to generalize
is demonstrated on two datasets from diverse domains, Polyvore Outfits and
Animals with Attributes 2. Code available at:
https://github.com/VisionLearningGroup/SANEComment: Accepted at ECCV 202
Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences
With the rapid proliferation of smart mobile devices, users now take millions
of photos every day. These include large numbers of clothing and accessory
images. We would like to answer questions like `What outfit goes well with this
pair of shoes?' To answer these types of questions, one has to go beyond
learning visual similarity and learn a visual notion of compatibility across
categories. In this paper, we propose a novel learning framework to help answer
these types of questions. The main idea of this framework is to learn a feature
transformation from images of items into a latent space that expresses
compatibility. For the feature transformation, we use a Siamese Convolutional
Neural Network (CNN) architecture, where training examples are pairs of items
that are either compatible or incompatible. We model compatibility based on
co-occurrence in large-scale user behavior data; in particular co-purchase data
from Amazon.com. To learn cross-category fit, we introduce a strategic method
to sample training data, where pairs of items are heterogeneous dyads, i.e.,
the two elements of a pair belong to different high-level categories. While
this approach is applicable to a wide variety of settings, we focus on the
representative problem of learning compatible clothing style. Our results
indicate that the proposed framework is capable of learning semantic
information about visual style and is able to generate outfits of clothes, with
items from different categories, that go well together.Comment: ICCV 201
- …