32 research outputs found
Good Practice in CNN Feature Transfer
The objective of this paper is the effective transfer of the Convolutional Neural Network (CNN) feature in image search and classification. Systematically, we study three facts in CNN transfer. 1) We demonstrate the advantage of using images with a properly large size as input to CNN instead of the conventionally resized one. 2) We benchmark the performance of different CNN layers improved by average/max pooling on the feature maps. Our observation suggests that the Conv5 feature yields very competitive accuracy under such pooling step. 3) We find that the simple combination of pooled features extracted across various CNN layers is effective in collecting evidences from both low and high level descriptors. Following these good practices, we are capable of improving the state of the art on a number of benchmarks to a large margin
A deep representation for depth images from synthetic data
Convolutional Neural Networks (CNNs) trained on large scale RGB databases
have become the secret sauce in the majority of recent approaches for object
categorization from RGB-D data. Thanks to colorization techniques, these
methods exploit the filters learned from 2D images to extract meaningful
representations in 2.5D. Still, the perceptual signature of these two kind of
images is very different, with the first usually strongly characterized by
textures, and the second mostly by silhouettes of objects. Ideally, one would
like to have two CNNs, one for RGB and one for depth, each trained on a
suitable data collection, able to capture the perceptual properties of each
channel for the task at hand. This has not been possible so far, due to the
lack of a suitable depth database. This paper addresses this issue, proposing
to opt for synthetically generated images rather than collecting by hand a 2.5D
large scale database. While being clearly a proxy for real data, synthetic
images allow to trade quality for quantity, making it possible to generate a
virtually infinite amount of data. We show that the filters learned from such
data collection, using the very same architecture typically used on visual
data, learns very different filters, resulting in depth features (a) able to
better characterize the different facets of depth images, and (b) complementary
with respect to those derived from CNNs pre-trained on 2D datasets. Experiments
on two publicly available databases show the power of our approach
Component-based Attention for Large-scale Trademark Retrieval
The demand for large-scale trademark retrieval (TR) systems has significantly
increased to combat the rise in international trademark infringement.
Unfortunately, the ranking accuracy of current approaches using either
hand-crafted or pre-trained deep convolution neural network (DCNN) features is
inadequate for large-scale deployments. We show in this paper that the ranking
accuracy of TR systems can be significantly improved by incorporating hard and
soft attention mechanisms, which direct attention to critical information such
as figurative elements and reduce attention given to distracting and
uninformative elements such as text and background. Our proposed approach
achieves state-of-the-art results on a challenging large-scale trademark
dataset.Comment: Fix typos related to authors' informatio
Evolving Convolutional Neural Networks for Glaucoma Diagnosis / Redes neurais convolucionais em evolução para diagnóstico de glaucoma
O glaucoma é uma doença ocular que causa danos ao nervo óptico do olho e sucessivo estreitamento do campo visual nos pacientes afetados, o que pode levar o paciente, em estágio avançado, à cegueira. Este trabalho apresenta um estudo sobre o uso de Redes Neurais Convolucionais (CNNs) para o diagnóstico automático através de imagens de fundo de olho. No entanto, a construção de uma CNN capaz de alcançar resultados satisfatórios para o diagnóstico do glaucoma, envolve muito esforço que, em muitas situações, nem sempre é capaz de tais resultados. O objetivo deste trabalho é utilizar um algoritmo genético (AG) para otimizar arquiteturas de CNNs através da técnica de evolução de algoritmos que possa aprimorar o diagnóstico do glaucoma em imagens de fundo do olho do conjunto de dados RIM-ONE-r2. Nosso artigo demonstra resultados satisfatórios após o treinamento do melhor indivÃduo escolhido pelo AG, com a obtenção de uma acurácia de 91%