34,578 research outputs found

    Cross-convolutional-layer Pooling for Image Recognition

    Get PDF
    Recent studies have shown that a Deep Convolutional Neural Network (DCNN) pretrained on a large image dataset can be used as a universal image descriptor, and that doing so leads to impressive performance for a variety of image classification tasks. Most of these studies adopt activations from a single DCNN layer, usually the fully-connected layer, as the image representation. In this paper, we proposed a novel way to extract image representations from two consecutive convolutional layers: one layer is utilized for local feature extraction and the other serves as guidance to pool the extracted features. By taking different viewpoints of convolutional layers, we further develop two schemes to realize this idea. The first one directly uses convolutional layers from a DCNN. The second one applies the pretrained CNN on densely sampled image regions and treats the fully-connected activations of each image region as convolutional feature activations. We then train another convolutional layer on top of that as the pooling-guidance convolutional layer. By applying our method to three popular visual classification tasks, we find our first scheme tends to perform better on the applications which need strong discrimination on subtle object patterns within small regions while the latter excels in the cases that require discrimination on category-level patterns. Overall, the proposed method achieves superior performance over existing ways of extracting image representations from a DCNN.Comment: Fixed typos. Journal extension of arXiv:1411.7466. Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Improving a 3-D Convolutional Neural Network Model Reinvented from VGG16 with Batch Normalization

    Get PDF
    It is challenging to build and train a Convolutional Neural Network model that can achieve a high accuracy rate for the first time. There are many variables to consider such as initial parameters, learning rate, and batch size. Unsuccessfully training a model is one of the most inevitable problems. In some cases, the model struggles to find a lower Loss Function value which results in a poor performance. Batch Normalization is considered as a remedy to overcome this problem. In this paper, two models reinvented from VGG16 are created with and without using Batch Normalization to evaluate their model performance. It is clear that the model using Batch Normalization provides a better result in terms of Loss Function value and model accuracy, which also achieves a very high accuracy rate. It also reaches the saturation point of the highest model accuracy faster than the model without Batch Normalization. This paper also finds that the accuracy of 3D Convolutional Neural Network model reinvented from VGG16 with Batch Normalization is at 91.2% which can beat many benchmarking results on UCF101 such as IDT [5], Two-Stream [10], and Dynamic Image Networks IDT [4]. The technique introduced in this paper shows a fast, reliable and accurate estimation of human activity type and could be used in smart environments

    Cross-dimensional Weighting for Aggregated Deep Convolutional Features

    Full text link
    We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs. We first present a generalized framework that encompasses a broad family of approaches and includes cross-dimensional pooling and weighting steps. We then propose specific non-parametric schemes for both spatial- and channel-wise weighting that boost the effect of highly active spatial responses and at the same time regulate burstiness effects. We experiment on different public datasets for image search and show that our approach outperforms the current state-of-the-art for approaches based on pre-trained networks. We also provide an easy-to-use, open source implementation that reproduces our results.Comment: Accepted for publications at the 4th Workshop on Web-scale Vision and Social Media (VSM), ECCV 201
    • …
    corecore