1,180 research outputs found
Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval
In this paper, we investigate the problem of zero-shot sketch-based image
retrieval (ZS-SBIR), where human sketches are used as queries to conduct
retrieval of photos from unseen categories. We importantly advance prior arts
by proposing a novel ZS-SBIR scenario that represents a firm step forward in
its practical application. The new setting uniquely recognizes two important
yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap
between amateur sketch and photo, and (ii) the necessity for moving towards
large-scale retrieval. We first contribute to the community a novel ZS-SBIR
dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000
photos spanning across 110 categories. Highly abstract amateur human sketches
are purposefully sourced to maximize the domain gap, instead of ones included
in existing datasets that can often be semi-photorealistic. We then formulate a
ZS-SBIR framework to jointly model sketches and photos into a common embedding
space. A novel strategy to mine the mutual information among domains is
specifically engineered to alleviate the domain gap. External semantic
knowledge is further embedded to aid semantic transfer. We show that, rather
surprisingly, retrieval performance significantly outperforms that of
state-of-the-art on existing datasets that can already be achieved using a
reduced version of our model. We further demonstrate the superior performance
of our full model by comparing with a number of alternatives on the newly
proposed dataset. The new dataset, plus all training and testing code of our
model, will be publicly released to facilitate future researchComment: Oral paper in CVPR 201
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
Learning models for semantic classification of insufficient plantar pressure images
Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and
effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set
learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose
an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are
introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset
of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by
using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)-
based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally,
the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained
CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition
methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H)
and time (training and evaluation). The proposed method for the plantar pressure classification task shows high
performance in most indices when comparing with other methods. The transfer learning-based method can be
applied to other insufficient data-sets of sensor imaging fields
- …