2,105 research outputs found
Fine-grained Apparel Classification and Retrieval without rich annotations
The ability to correctly classify and retrieve apparel images has a variety
of applications important to e-commerce, online advertising and internet
search. In this work, we propose a robust framework for fine-grained apparel
classification, in-shop and cross-domain retrieval which eliminates the
requirement of rich annotations like bounding boxes and human-joints or
clothing landmarks, and training of bounding box/ key-landmark detector for the
same. Factors such as subtle appearance differences, variations in human poses,
different shooting angles, apparel deformations, and self-occlusion add to the
challenges in classification and retrieval of apparel items. Cross-domain
retrieval is even harder due to the presence of large variation between online
shopping images, usually taken in ideal lighting, pose, positive angle and
clean background as compared with street photos captured by users in
complicated conditions with poor lighting and cluttered scenes. Our framework
uses compact bilinear CNN with tensor sketch algorithm to generate embeddings
that capture local pairwise feature interactions in a translationally invariant
manner. For apparel classification, we pass the feature embeddings through a
softmax classifier, while, the in-shop and cross-domain retrieval pipelines use
a triplet-loss based optimization approach, such that squared Euclidean
distance between embeddings measures the dissimilarity between the images.
Unlike previous works that relied on bounding box, key clothing landmarks or
human joint detectors to assist the final deep classifier, proposed framework
can be trained directly on the provided category labels or generated triplets
for triplet loss optimization. Lastly, Experimental results on the DeepFashion
fine-grained categorization, and in-shop and consumer-to-shop retrieval
datasets provide a comparative analysis with previous work performed in the
domain.Comment: 14 pages, 6 figures, 3 tables, Submitted to Springer Journal of
Applied Intelligenc
A Deep-Learning-Based Fashion Attributes Detection Model
Analyzing fashion attributes is essential in the fashion design process.
Current fashion forecasting firms, such as WGSN utilizes information from all
around the world (from fashion shows, visual merchandising, blogs, etc). They
gather information by experience, by observation, by media scan, by interviews,
and by exposed to new things. Such information analyzing process is called
abstracting, which recognize similarities or differences across all the
garments and collections. In fact, such abstraction ability is useful in many
fashion careers with different purposes. Fashion forecasters abstract across
design collections and across time to identify fashion change and directions;
designers, product developers and buyers abstract across a group of garments
and collections to develop a cohesive and visually appeal lines; sales and
marketing executives abstract across product line each season to recognize
selling points; fashion journalist and bloggers abstract across runway photos
to recognize symbolic core concepts that can be translated into editorial
features. Fashion attributes analysis for such fashion insiders requires much
detailed and in-depth attributes annotation than that for consumers, and
requires inference on multiple domains. In this project, we propose a
data-driven approach for recognizing fashion attributes. Specifically, a
modified version of Faster R-CNN model is trained on images from a large-scale
localization dataset with 594 fine-grained attributes under different
scenarios, for example in online stores and street snapshots. This model will
then be used to detect garment items and classify clothing attributes for
runway photos and fashion illustrations
Looking at Outfit to Parse Clothing
This paper extends fully-convolutional neural networks (FCN) for the clothing
parsing problem. Clothing parsing requires higher-level knowledge on clothing
semantics and contextual cues to disambiguate fine-grained categories. We
extend FCN architecture with a side-branch network which we refer outfit
encoder to predict a consistent set of clothing labels to encourage
combinatorial preference, and with conditional random field (CRF) to explicitly
consider coherent label assignment to the given image. The empirical results
using Fashionista and CFPD datasets show that our model achieves
state-of-the-art performance in clothing parsing, without additional
supervision during training. We also study the qualitative influence of
annotation on the current clothing parsing benchmarks, with our Web-based tool
for multi-scale pixel-wise annotation and manual refinement effort to the
Fashionista dataset. Finally, we show that the image representation of the
outfit encoder is useful for dress-up image retrieval application
Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
What defines a visual style? Fashion styles emerge organically from how
people assemble outfits of clothing, making them difficult to pin down with a
computational model. Low-level visual similarity can be too specific to detect
stylistically similar images, while manually crafted style categories can be
too abstract to capture subtle style differences. We propose an unsupervised
approach to learn a style-coherent representation. Our method leverages
probabilistic polylingual topic models based on visual attributes to discover a
set of latent style factors. Given a collection of unlabeled fashion images,
our approach mines for the latent styles, then summarizes outfits by how they
mix those styles. Our approach can organize galleries of outfits by style
without requiring any style labels. Experiments on over 100K images demonstrate
its promise for retrieving, mixing, and summarizing fashion images by their
style
Complete the Look: Scene-based Complementary Product Recommendation
Modeling fashion compatibility is challenging due to its complexity and
subjectivity. Existing work focuses on predicting compatibility between product
images (e.g. an image containing a t-shirt and an image containing a pair of
jeans). However, these approaches ignore real-world 'scene' images (e.g.
selfies); such images are hard to deal with due to their complexity, clutter,
variations in lighting and pose (etc.) but on the other hand could potentially
provide key context (e.g. the user's body type, or the season) for making more
accurate recommendations. In this work, we propose a new task called 'Complete
the Look', which seeks to recommend visually compatible products based on scene
images. We design an approach to extract training data for this task, and
propose a novel way to learn the scene-product compatibility from fashion or
interior design images. Our approach measures compatibility both globally and
locally via CNNs and attention mechanisms. Extensive experiments show that our
method achieves significant performance gains over alternative systems. Human
evaluation and qualitative analysis are also conducted to further understand
model behavior. We hope this work could lead to useful applications which link
large corpora of real-world scenes with shoppable products.Comment: Accepted to CVPR'1
Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval
With the increasing number of online stores, there is a pressing need for
intelligent search systems to understand the item photos snapped by customers
and search against large-scale product databases to find their desired items.
However, it is challenging for conventional retrieval systems to match up the
item photos captured by customers and the ones officially released by stores,
especially for garment images. To bridge the customer- and store- provided
garment photos, existing studies have been widely exploiting the clothing
attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to
learn a common embedding space for garment representations. Unfortunately they
omit the sequential correlation of attributes and consume large quantity of
human labors to label the landmarks. In this paper, we propose a deep
multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain
embedding and sequential attribute learning are modeled simultaneously.
Sequential attribute learning not only provides the semantic guidance for
embedding, but also generates rich attention on discriminative local details
(\textit{e.g.,} black buttons) of clothing items without requiring extra
landmark labels. This leads to promising performance and 306 boost on
efficiency when compared with the state-of-the-art models, which is
demonstrated through rigorous experiments on two public fashion datasets
Studio2Shop: from studio photo shoots to fashion articles
Fashion is an increasingly important topic in computer vision, in particular
the so-called street-to-shop task of matching street images with shop images
containing similar fashion items. Solving this problem promises new means of
making fashion searchable and helping shoppers find the articles they are
looking for. This paper focuses on finding pieces of clothing worn by a person
in full-body or half-body images with neutral backgrounds. Such images are
ubiquitous on the web and in fashion blogs, and are typically studio photos, we
refer to this setting as studio-to-shop. Recent advances in computational
fashion include the development of domain-specific numerical representations.
Our model Studio2Shop builds on top of such representations and uses a deep
convolutional network trained to match a query image to the numerical feature
vectors of all the articles annotated in this image. Top- retrieval
evaluation on test query images shows that the correct items are most often
found within a range that is sufficiently small for building realistic visual
search engines for the studio-to-shop setting.Comment: 12 pages, 9 figures (Figure 1 has 5 subfigures, Figure 2 has 3
subfigures), 7 table
Fashion-Gen: The Generative Fashion Dataset and Challenge
We introduce a new dataset of 293,008 high definition (1360 x 1360 pixels)
fashion images paired with item descriptions provided by professional stylists.
Each item is photographed from a variety of angles. We provide baseline results
on 1) high-resolution image generation, and 2) image generation conditioned on
the given text descriptions. We invite the community to improve upon these
baselines. In this paper, we also outline the details of a challenge that we
are launching based upon this dataset
Query-free Clothing Retrieval via Implicit Relevance Feedback
Image-based clothing retrieval is receiving increasing interest with the
growth of online shopping. In practice, users may often have a desired piece of
clothing in mind (e.g., either having seen it before on the street or requiring
certain specific clothing attributes) but may be unable to supply an image as a
query. We model this problem as a new type of image retrieval task in which the
target image resides only in the user's mind (called "mental image retrieval"
hereafter). Because of the absence of an explicit query image, we propose to
solve this problem through relevance feedback. Specifically, a new Bayesian
formulation is proposed that simultaneously models the retrieval target and its
high-level representation in the mind of the user (called the "user metric"
hereafter) as posterior distributions of pre-fetched shop images and
heterogeneous features extracted from multiple clothing attributes,
respectively. Requiring only clicks as user feedback, the proposed algorithm is
able to account for the variability in human decision-making. Experiments with
real users demonstrate the effectiveness of the proposed algorithm.Comment: 12 pages, under review at IEEE Transactions on Multimedi
VersatileGait: A Large-Scale Synthetic Gait Dataset with Fine-GrainedAttributes and Complicated Scenarios
With the motivation of practical gait recognition applications, we propose to
automatically create a large-scale synthetic gait dataset (called
VersatileGait) by a game engine, which consists of around one million
silhouette sequences of 11,000 subjects with fine-grained attributes in various
complicated scenarios. Compared with existing real gait datasets with limited
samples and simple scenarios, the proposed VersatileGait dataset possesses
several nice properties, including huge dataset size, high sample diversity,
high-quality annotations, multi-pitch angles, small domain gap with the real
one, etc. Furthermore, we investigate the effectiveness of our dataset (e.g.,
domain transfer after pretraining). Then, we use the fine-grained attributes
from VersatileGait to promote gait recognition in both accuracy and speed, and
meanwhile justify the gait recognition performance under multi-pitch angle
settings. Additionally, we explore a variety of potential applications for
research.Extensive experiments demonstrate the value and effective-ness of the
proposed VersatileGait in gait recognition along with its associated
applications. We will release both VersatileGait and its corresponding data
generation toolkit for further studies
- …