5,209 research outputs found
Learning Fashion Compatibility with Bidirectional LSTMs
The ubiquity of online fashion shopping demands effective recommendation
services for customers. In this paper, we study two types of fashion
recommendation: (i) suggesting an item that matches existing components in a
set to form a stylish outfit (a collection of fashion items), and (ii)
generating an outfit with multimodal (images/text) specifications from a user.
To this end, we propose to jointly learn a visual-semantic embedding and the
compatibility relationships among fashion items in an end-to-end fashion. More
specifically, we consider a fashion outfit to be a sequence (usually from top
to bottom and then accessories) and each item in the outfit as a time step.
Given the fashion items in an outfit, we train a bidirectional LSTM (Bi-LSTM)
model to sequentially predict the next item conditioned on previous ones to
learn their compatibility relationships. Further, we learn a visual-semantic
space by regressing image features to their semantic representations aiming to
inject attribute and category information as a regularization for training the
LSTM. The trained network can not only perform the aforementioned
recommendations effectively but also predict the compatibility of a given
outfit. We conduct extensive experiments on our newly collected Polyvore
dataset, and the results provide strong qualitative and quantitative evidence
that our framework outperforms alternative methods.Comment: ACM MM 1
Toward Explainable Fashion Recommendation
Many studies have been conducted so far to build systems for recommending
fashion items and outfits. Although they achieve good performances in their
respective tasks, most of them cannot explain their judgments to the users,
which compromises their usefulness. Toward explainable fashion recommendation,
this study proposes a system that is able not only to provide a goodness score
for an outfit but also to explain the score by providing reason behind it. For
this purpose, we propose a method for quantifying how influential each feature
of each item is to the score. Using this influence value, we can identify which
item and what feature make the outfit good or bad. We represent the image of
each item with a combination of human-interpretable features, and thereby the
identification of the most influential item-feature pair gives useful
explanation of the output score. To evaluate the performance of this approach,
we design an experiment that can be performed without human annotation; we
replace a single item-feature pair in an outfit so that the score will
decrease, and then we test if the proposed method can detect the replaced item
correctly using the above influence values. The experimental results show that
the proposed method can accurately detect bad items in outfits lowering their
scores
Compatibility Family Learning for Item Recommendation and Generation
Compatibility between items, such as clothes and shoes, is a major factor
among customer's purchasing decisions. However, learning "compatibility" is
challenging due to (1) broader notions of compatibility than those of
similarity, (2) the asymmetric nature of compatibility, and (3) only a small
set of compatible and incompatible items are observed. We propose an end-to-end
trainable system to embed each item into a latent vector and project a query
item into K compatible prototypes in the same space. These prototypes reflect
the broad notions of compatibility. We refer to both the embedding and
prototypes as "Compatibility Family". In our learned space, we introduce a
novel Projected Compatibility Distance (PCD) function which is differentiable
and ensures diversity by aiming for at least one prototype to be close to a
compatible item, whereas none of the prototypes are close to an incompatible
item. We evaluate our system on a toy dataset, two Amazon product datasets, and
Polyvore outfit dataset. Our method consistently achieves state-of-the-art
performance. Finally, we show that we can visualize the candidate compatible
prototypes using a Metric-regularized Conditional Generative Adversarial
Network (MrCGAN), where the input is a projected prototype and the output is a
generated image of a compatible item. We ask human evaluators to judge the
relative compatibility between our generated images and images generated by
CGANs conditioned directly on query items. Our generated images are
significantly preferred, with roughly twice the number of votes as others.Comment: 9 pages, accepted to AAAI 201
Creating Capsule Wardrobes from Fashion Images
We propose to automatically create capsule wardrobes. Given an inventory of
candidate garments and accessories, the algorithm must assemble a minimal set
of items that provides maximal mix-and-match outfits. We pose the task as a
subset selection problem. To permit efficient subset selection over the space
of all outfit combinations, we develop submodular objective functions capturing
the key ingredients of visual compatibility, versatility, and user-specific
preference. Since adding garments to a capsule only expands its possible
outfits, we devise an iterative approach to allow near-optimal submodular
function maximization. Finally, we present an unsupervised approach to learn
visual compatibility from "in the wild" full body outfit photos; the
compatibility metric translates well to cleaner catalog photos and improves
over existing methods. Our results on thousands of pieces from popular fashion
websites show that automatic capsule creation has potential to mimic skilled
fashionistas in assembling flexible wardrobes, while being significantly more
scalable.Comment: Accepted to CVPR 201
- …