340 research outputs found

    Compact Bilinear Pooling

    Full text link
    Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition. However, bilinear features are high dimensional, typically on the order of hundreds of thousands to a few million, which makes them impractical for subsequent analysis. We propose two compact bilinear representations with the same discriminative power as the full bilinear representation but with only a few thousand dimensions. Our compact representations allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. The compact bilinear representations are derived through a novel kernelized analysis of bilinear pooling which provide insights into the discriminative power of bilinear pooling, and a platform for further research in compact pooling methods. Experimentation illustrate the utility of the proposed representations for image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP

    Aggregating Deep Features For Image Retrieval

    Get PDF
    Measuring visual similarity between two images is useful in several multimedia applications such as visual search and image retrieval. However, measuring visual similarity between two images is an ill-posed problem which makes it a challenging task.This problem has been tackled extensively by the computer vision and machine learning communities. Nevertheless, with the recent advancements in deep learning, it is now possible to design novel image representations that allow systems to measure visual similarity more accurately than existing and widely adopted approaches, such as Fisher vectors. Unfortunately, deep-learning-based visual similarity approaches typically require post-processing stages that can be computationally expensive. To alleviate this issue, this thesis describes deep-learning-based visual image representations that allow a system to measure visual similarity without requiring post-processing stages. Specifically, this thesis describes max-pooling-based aggregation layers that combined with a convolutional-neural-network-based produce rich image representations for image retrieval without requiring an expensive post-processing stages. Moreover, the proposed max-pooling-based aggregation layers are general and can be seamlessly integrated with any existing and pre-trained networks. The experiments on large-scale image retrieval datasets confirm that the introduced image representations yield visual similarity measures that achieve a comparable or better retrieval performance than state-of-the art approaches, without requiring expensive post-processing operations

    Deformable Part Models are Convolutional Neural Networks

    Full text link
    Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a novel synthesis of the two ideas. Our construction involves unrolling the DPM inference algorithm and mapping each step to an equivalent (and at times novel) CNN layer. From this perspective, it becomes natural to replace the standard image features used in DPM with a learned feature extractor. We call the resulting model DeepPyramid DPM and experimentally validate it on PASCAL VOC. DeepPyramid DPM significantly outperforms DPMs based on histograms of oriented gradients features (HOG) and slightly outperforms a comparable version of the recently introduced R-CNN detection system, while running an order of magnitude faster

    Deep Bottleneck Feature for Image Classification

    Get PDF
    Effective image representation plays an important role for image classification and retrieval. Bag-of-Features (BoF) is well known as an effective and robust visual representation. However, on large datasets, convolutional neural networks (CNN) tend to perform much better, aided by the availability of large amounts of training data. In this paper, we propose a bag of Deep Bottleneck Features (DBF) for image classification, effectively combining the strengths of a CNN within a BoF framework. The DBF features, obtained from a previously well-trained CNN, form a compact and low-dimensional representation of the original inputs, effective for even small datasets. We will demonstrate that the resulting BoDBF method has a very powerful and discriminative capability that is generalisable to other image classification tasks

    DISC: Deep Image Saliency Computing via Progressive Representation Learning

    Full text link
    Salient object detection increasingly receives attention as an important component or step in several pattern recognition and image processing tasks. Although a variety of powerful saliency models have been intensively proposed, they usually involve heavy feature (or model) engineering based on priors (or assumptions) about the properties of objects and backgrounds. Inspired by the effectiveness of recently developed feature learning, we provide a novel Deep Image Saliency Computing (DISC) framework for fine-grained image saliency computing. In particular, we model the image saliency from both the coarse- and fine-level observations, and utilize the deep convolutional neural network (CNN) to learn the saliency representation in a progressive manner. Specifically, our saliency model is built upon two stacked CNNs. The first CNN generates a coarse-level saliency map by taking the overall image as the input, roughly identifying saliency regions in the global context. Furthermore, we integrate superpixel-based local context information in the first CNN to refine the coarse-level saliency map. Guided by the coarse saliency map, the second CNN focuses on the local context to produce fine-grained and accurate saliency map while preserving object details. For a testing image, the two CNNs collaboratively conduct the saliency computing in one shot. Our DISC framework is capable of uniformly highlighting the objects-of-interest from complex background while preserving well object details. Extensive experiments on several standard benchmarks suggest that DISC outperforms other state-of-the-art methods and it also generalizes well across datasets without additional training. The executable version of DISC is available online: http://vision.sysu.edu.cn/projects/DISC.Comment: This manuscript is the accepted version for IEEE Transactions on Neural Networks and Learning Systems (T-NNLS), 201
    • …
    corecore