15,346 research outputs found

    Class-Weighted Convolutional Features for Visual Instance Search

    Get PDF
    Image retrieval in realistic scenarios targets large dynamic datasets of unlabeled images. In these cases, training or fine-tuning a model every time new images are added to the database is neither efficient nor scalable. Convolutional neural networks trained for image classification over large datasets have been proven effective feature extractors for image retrieval. The most successful approaches are based on encoding the activations of convolutional layers, as they convey the image spatial information. In this paper, we go beyond this spatial information and propose a local-aware encoding of convolutional features based on semantic information predicted in the target image. To this end, we obtain the most discriminative regions of an image using Class Activation Maps (CAMs). CAMs are based on the knowledge contained in the network and therefore, our approach, has the additional advantage of not requiring external information. In addition, we use CAMs to generate object proposals during an unsupervised re-ranking stage after a first fast search. Our experiments on two public available datasets for instance retrieval, Oxford5k and Paris6k, demonstrate the competitiveness of our approach outperforming the current state-of-the-art when using off-the-shelf models trained on ImageNet. The source code and model used in this paper are publicly available at http://imatge-upc.github.io/retrieval-2017-cam/.Comment: To appear in the British Machine Vision Conference (BMVC), September 201

    Cross-dimensional Weighting for Aggregated Deep Convolutional Features

    Full text link
    We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs. We first present a generalized framework that encompasses a broad family of approaches and includes cross-dimensional pooling and weighting steps. We then propose specific non-parametric schemes for both spatial- and channel-wise weighting that boost the effect of highly active spatial responses and at the same time regulate burstiness effects. We experiment on different public datasets for image search and show that our approach outperforms the current state-of-the-art for approaches based on pre-trained networks. We also provide an easy-to-use, open source implementation that reproduces our results.Comment: Accepted for publications at the 4th Workshop on Web-scale Vision and Social Media (VSM), ECCV 201

    Deep Object-Centric Representations for Generalizable Robot Learning

    Full text link
    Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as an object-centric prior for the perception system of a learned policy. We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy. A task-independent meta-attention locates possible objects in the scene, and a task-specific attention identifies which objects are predictive of the trajectories. The scope of the task-specific attention is easily adjusted by showing demonstrations with distractor objects or with diverse relevant objects. Our results indicate that this approach exhibits good generalization across object instances using very few samples, and can be used to learn a variety of manipulation tasks using reinforcement learning
    • …
    corecore