14 research outputs found
Outfit Recommender System
The online apparel retail market size in the United States is worth about seventy-two billion US dollars. Recommendation systems on retail websites generate a lot of this revenue. Thus, improving recommendation systems can increase their revenue. Traditional recommendations for clothes consisted of lexical methods. However, visual-based recommendations have gained popularity over the past few years. This involves processing a multitude of images using different image processing techniques. In order to handle such a vast quantity of images, deep neural networks have been used extensively. With the help of fast Graphics Processing Units, these networks provide results which are extremely accurate, within a small amount of time. However, there are still ways in which recommendations for clothes can be improved. We propose an event-based clothing recommendation system which uses object detection. We train a model to identify nine events/scenarios that a user might attend: White Wedding, Indian Wedding, Conference, Funeral, Red Carpet, Pool Party, Birthday, Graduation and Workout. We train another model to detect clothes out of fifty-three categories of clothes worn at the event. Object detection gives a mAP of 84.01. Nearest neighbors of the clothes detected are recommended to the user
Parsing is All You Need for Accurate Gait Recognition in the Wild
Binary silhouettes and keypoint-based skeletons have dominated human gait
recognition studies for decades since they are easy to extract from video
frames. Despite their success in gait recognition for in-the-lab environments,
they usually fail in real-world scenarios due to their low information entropy
for gait representations. To achieve accurate gait recognition in the wild,
this paper presents a novel gait representation, named Gait Parsing Sequence
(GPS). GPSs are sequences of fine-grained human segmentation, i.e., human
parsing, extracted from video frames, so they have much higher information
entropy to encode the shapes and dynamics of fine-grained human parts during
walking. Moreover, to effectively explore the capability of the GPS
representation, we propose a novel human parsing-based gait recognition
framework, named ParsingGait. ParsingGait contains a Convolutional Neural
Network (CNN)-based backbone and two light-weighted heads. The first head
extracts global semantic features from GPSs, while the other one learns mutual
information of part-level features through Graph Convolutional Networks to
model the detailed dynamics of human walking. Furthermore, due to the lack of
suitable datasets, we build the first parsing-based dataset for gait
recognition in the wild, named Gait3D-Parsing, by extending the large-scale and
challenging Gait3D dataset. Based on Gait3D-Parsing, we comprehensively
evaluate our method and existing gait recognition methods. The experimental
results show a significant improvement in accuracy brought by the GPS
representation and the superiority of ParsingGait. The code and dataset are
available at https://gait3d.github.io/gait3d-parsing-hp .Comment: 16 pages, 14 figures, ACM MM 2023 accepted, project page:
https://gait3d.github.io/gait3d-parsing-h
On Symbiosis of Attribute Prediction and Semantic Segmentation
In this paper, we propose to employ semantic segmentation to improve
person-related attribute prediction. The core idea lies in the fact that the
probability of an attribute to appear in an image is far from being uniform in
the spatial domain. We build our attribute prediction model jointly with a deep
semantic segmentation network. This harnesses the localization cues learned by
the semantic segmentation to guide the attention of the attribute prediction to
the regions where different attributes naturally show up. Therefore, in
addition to prediction, we are able to localize the attributes despite merely
having access to image-level labels (weak supervision) during training. We
first propose semantic segmentation-based pooling and gating, respectively
denoted as SSP and SSG. In the former, the estimated segmentation masks are
used to pool the final activations of the attribute prediction network, from
multiple semantically homogeneous regions. In SSG, the same idea is applied to
the intermediate layers of the network. SSP and SSG, while effective, impose
heavy memory utilization since each channel of the activations is pooled/gated
with all the semantic segmentation masks. To circumvent this, we propose
Symbiotic Augmentation (SA), where we learn only one mask per activation
channel. SA allows the model to either pick one, or combine (weighted
superposition) multiple semantic maps, in order to generate the proper mask for
each channel. SA simultaneously applies the same mechanism to the reverse
problem by leveraging output logits of attribute prediction to guide the
semantic segmentation task. We evaluate our proposed methods for facial
attributes on CelebA and LFWA datasets, while benchmarking WIDER Attribute and
Berkeley Attributes of People for whole body attributes. Our proposed methods
achieve superior results compared to the previous works.Comment: Accepted for publication in PAMI. arXiv admin note: substantial text
overlap with arXiv:1704.0874
ファッションのための深層学習:服装の統一性評価と格付けおよび推薦
Tohoku University岡谷貴之課