463 research outputs found
MMFL-Net: Multi-scale and Multi-granularity Feature Learning for Cross-domain Fashion Retrieval
Instance-level image retrieval in fashion is a challenging issue owing to its
increasing importance in real-scenario visual fashion search. Cross-domain
fashion retrieval aims to match the unconstrained customer images as queries
for photographs provided by retailers; however, it is a difficult task due to a
wide range of consumer-to-shop (C2S) domain discrepancies and also considering
that clothing image is vulnerable to various non-rigid deformations. To this
end, we propose a novel multi-scale and multi-granularity feature learning
network (MMFL-Net), which can jointly learn global-local aggregation feature
representations of clothing images in a unified framework, aiming to train a
cross-domain model for C2S fashion visual similarity. First, a new
semantic-spatial feature fusion part is designed to bridge the semantic-spatial
gap by applying top-down and bottom-up bidirectional multi-scale feature
fusion. Next, a multi-branch deep network architecture is introduced to capture
global salient, part-informed, and local detailed information, and extracting
robust and discrimination feature embedding by integrating the similarity
learning of coarse-to-fine embedding with the multiple granularities. Finally,
the improved trihard loss, center loss, and multi-task classification loss are
adopted for our MMFL-Net, which can jointly optimize intra-class and
inter-class distance and thus explicitly improve intra-class compactness and
inter-class discriminability between its visual representations for feature
learning. Furthermore, our proposed model also combines the multi-task
attribute recognition and classification module with multi-label semantic
attributes and product ID labels. Experimental results demonstrate that our
proposed MMFL-Net achieves significant improvement over the state-of-the-art
methods on the two datasets, DeepFashion-C2S and Street2Shop.Comment: 27 pages, 12 figures, Published by <Multimedia Tools and
Applications
Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data
Models for music artist classification usually were operated in the frequency
domain, in which the input audio samples are processed by the spectral
transformation. The WaveNet architecture, originally designed for speech and
music generation. In this paper, we propose an end-to-end architecture in the
time domain for this task. A WaveNet classifier was introduced which directly
models the features from a raw audio waveform. The WaveNet takes the waveform
as the input and several downsampling layers are subsequent to discriminate
which artist the input belongs to. In addition, the proposed method is applied
to singer identification. The model achieving the best performance obtains an
average F1 score of 0.854 on benchmark dataset of Artist20, which is a
significant improvement over the related works. In order to show the
effectiveness of feature learning of the proposed method, the bottleneck layer
of the model is visualized.Comment: 12 page
- …