3,725 research outputs found
Why do These Match? Explaining the Behavior of Image Similarity Models
Explaining a deep learning model can help users understand its behavior and
allow researchers to discern its shortcomings. Recent work has primarily
focused on explaining models for tasks like image classification or visual
question answering. In this paper, we introduce Salient Attributes for Network
Explanation (SANE) to explain image similarity models, where a model's output
is a score measuring the similarity of two inputs rather than a classification
score. In this task, an explanation depends on both of the input images, so
standard methods do not apply. Our SANE explanations pairs a saliency map
identifying important image regions with an attribute that best explains the
match. We find that our explanations provide additional information not
typically captured by saliency maps alone, and can also improve performance on
the classic task of attribute recognition. Our approach's ability to generalize
is demonstrated on two datasets from diverse domains, Polyvore Outfits and
Animals with Attributes 2. Code available at:
https://github.com/VisionLearningGroup/SANEComment: Accepted at ECCV 202
Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks
With the rapid development of fashion market, the customers' demands of
customers for fashion recommendation are rising. In this paper, we aim to
investigate a practical problem of fashion recommendation by answering the
question "which item should we select to match with the given fashion items and
form a compatible outfit". The key to this problem is to estimate the outfit
compatibility. Previous works which focus on the compatibility of two items or
represent an outfit as a sequence fail to make full use of the complex
relations among items in an outfit. To remedy this, we propose to represent an
outfit as a graph. In particular, we construct a Fashion Graph, where each node
represents a category and each edge represents interaction between two
categories. Accordingly, each outfit can be represented as a subgraph by
putting items into their corresponding category nodes. To infer the outfit
compatibility from such a graph, we propose Node-wise Graph Neural Networks
(NGNN) which can better model node interactions and learn better node
representations. In NGNN, the node interaction on each edge is different, which
is determined by parameters correlated to the two connected nodes. An attention
mechanism is utilized to calculate the outfit compatibility score with learned
node representations. NGNN can not only be used to model outfit compatibility
from visual or textual modality but also from multiple modalities. We conduct
experiments on two tasks: (1) Fill-in-the-blank: suggesting an item that
matches with existing components of outfit; (2) Compatibility prediction:
predicting the compatibility scores of given outfits. Experimental results
demonstrate the great superiority of our proposed method over others.Comment: 11 pages, accepted by the 2019 World Wide Web Conference (WWW-2019
Computational Technologies for Fashion Recommendation: A Survey
Fashion recommendation is a key research field in computational fashion
research and has attracted considerable interest in the computer vision,
multimedia, and information retrieval communities in recent years. Due to the
great demand for applications, various fashion recommendation tasks, such as
personalized fashion product recommendation, complementary (mix-and-match)
recommendation, and outfit recommendation, have been posed and explored in the
literature. The continuing research attention and advances impel us to look
back and in-depth into the field for a better understanding. In this paper, we
comprehensively review recent research efforts on fashion recommendation from a
technological perspective. We first introduce fashion recommendation at a macro
level and analyse its characteristics and differences with general
recommendation tasks. We then clearly categorize different fashion
recommendation efforts into several sub-tasks and focus on each sub-task in
terms of its problem formulation, research focus, state-of-the-art methods, and
limitations. We also summarize the datasets proposed in the literature for use
in fashion recommendation studies to give readers a brief illustration.
Finally, we discuss several promising directions for future research in this
field. Overall, this survey systematically reviews the development of fashion
recommendation research. It also discusses the current limitations and gaps
between academic research and the real needs of the fashion industry. In the
process, we offer a deep insight into how the fashion industry could benefit
from fashion recommendation technologies. the computational technologies of
fashion recommendation
Recommended from our members
Towards solving computer vision problems: datasets, labels, algorithms, and applications
The solution to a supervised computer vision problem consists of an application, algorithm, input data, and a set of human generated labels. Solving these kinds of tasks involves collecting large quantities of data, collecting appropriate labels, and developing machine vision algorithms tailored to the application. Progress on these problems has often benefited from large scale datasets with high fidelity labels. Successful algorithms display a synergy between application goals and the size and quality of the dataset. This thesis presents work highlighting the importance of each component of a supervised vision task.First, the problem of automatically classifying groups of people into social categories is introduced. This problem is called Urban Tribe Classification. To tackle this problem, each individual and the entire group of individuals are modeled. Since this was a newly introduced computer vision problem, a dataset for this task was created. On this dataset, the combined representation of group and individuals outperforms using only the person representations. This model showed promising results for automatic subculture classification.Second, the problem of creating perceptual embeddings based on human similarity judgements is tackled. This work focuses on triplet similarity comparisons of the form ``Is object more similar to or ?'', which have been useful for computer vision and machine learning applications. Unfortunately, triplet similarity comparisons, like many human labeling efforts, can be prohibitively expensive. This work proposes two techniques for dealing with this obstacle. First, an alternative display for collecting triplets is designed. This display shows a probe image and a grid of query images, allowing the user to collect multiple triplets simultaneously. The display is shown to reduce the cost and time of triplet collection. In addition, higher quality embeddings are created with the improved triplet collection UI. A 10,000-food item dataset of human taste similarity was created using this UI. Second, ``SNaCK,'' a low-dimensional perceptual embedding algorithm that combines human expertise with automatic machine kernels, is introduced. Both parts are complementary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. Finally, the precise localization of key frames of an action is explored. This work focuses on detecting the exact starting frame of a behavior, an important task for neuroscience research. To address this problem, a loss designed to penalize extra and missed action start detections over small misalignments. Recurrent neural networks (RNN) are trained to optimize this loss. The model is shown to reduce the number of false positives, an important criteria defined by the neuroscientist. The performance of the model is evaluated on a new dataset, the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was created for neuroscience research. On this dataset, the proposed model outperforms related approaches and baseline methods using an unstructured loss
ファッションのための深層学習:服装の統一性評価と格付けおよび推薦
Tohoku University岡谷貴之課
- …