23,936 research outputs found
Exploiting Local Features from Deep Networks for Image Retrieval
Deep convolutional neural networks have been successfully applied to image
classification tasks. When these same networks have been applied to image
retrieval, the assumption has been made that the last layers would give the
best performance, as they do in classification. We show that for instance-level
image retrieval, lower layers often perform better than the last layers in
convolutional neural networks. We present an approach for extracting
convolutional features from different layers of the networks, and adopt VLAD
encoding to encode features into a single vector for each image. We investigate
the effect of different layers and scales of input images on the performance of
convolutional features using the recent deep networks OxfordNet and GoogLeNet.
Experiments demonstrate that intermediate layers or higher layers with finer
scales produce better results for image retrieval, compared to the last layer.
When using compressed 128-D VLAD descriptors, our method obtains
state-of-the-art results and outperforms other VLAD and CNN based approaches on
two out of three test datasets. Our work provides guidance for transferring
deep networks trained on image classification to image retrieval tasks.Comment: CVPR DeepVision Workshop 201
Cross-Domain Image Retrieval with Attention Modeling
With the proliferation of e-commerce websites and the ubiquitousness of smart
phones, cross-domain image retrieval using images taken by smart phones as
queries to search products on e-commerce websites is emerging as a popular
application. One challenge of this task is to locate the attention of both the
query and database images. In particular, database images, e.g. of fashion
products, on e-commerce websites are typically displayed with other
accessories, and the images taken by users contain noisy background and large
variations in orientation and lighting. Consequently, their attention is
difficult to locate. In this paper, we exploit the rich tag information
available on the e-commerce websites to locate the attention of database
images. For query images, we use each candidate image in the database as the
context to locate the query attention. Novel deep convolutional neural network
architectures, namely TagYNet and CtxYNet, are proposed to learn the attention
weights and then extract effective representations of the images. Experimental
results on public datasets confirm that our approaches have significant
improvement over the existing methods in terms of the retrieval accuracy and
efficiency.Comment: 8 pages with an extra reference pag
Automated Pruning for Deep Neural Network Compression
In this work we present a method to improve the pruning step of the current
state-of-the-art methodology to compress neural networks. The novelty of the
proposed pruning technique is in its differentiability, which allows pruning to
be performed during the backpropagation phase of the network training. This
enables an end-to-end learning and strongly reduces the training time. The
technique is based on a family of differentiable pruning functions and a new
regularizer specifically designed to enforce pruning. The experimental results
show that the joint optimization of both the thresholds and the network weights
permits to reach a higher compression rate, reducing the number of weights of
the pruned network by a further 14% to 33% compared to the current
state-of-the-art. Furthermore, we believe that this is the first study where
the generalization capabilities in transfer learning tasks of the features
extracted by a pruned network are analyzed. To achieve this goal, we show that
the representations learned using the proposed pruning methodology maintain the
same effectiveness and generality of those learned by the corresponding
non-compressed network on a set of different recognition tasks.Comment: 8 pages, 5 figures. Published as a conference paper at ICPR 201
Component-based Attention for Large-scale Trademark Retrieval
The demand for large-scale trademark retrieval (TR) systems has significantly
increased to combat the rise in international trademark infringement.
Unfortunately, the ranking accuracy of current approaches using either
hand-crafted or pre-trained deep convolution neural network (DCNN) features is
inadequate for large-scale deployments. We show in this paper that the ranking
accuracy of TR systems can be significantly improved by incorporating hard and
soft attention mechanisms, which direct attention to critical information such
as figurative elements and reduce attention given to distracting and
uninformative elements such as text and background. Our proposed approach
achieves state-of-the-art results on a challenging large-scale trademark
dataset.Comment: Fix typos related to authors' informatio
- …