3 research outputs found

    Invariant Tensor Feature Coding

    Full text link
    We propose a novel feature coding method that exploits invariance. We consider the setting where the transformations that preserve the image contents compose a finite group of orthogonal matrices. This is the case in many image transformations, such as image rotations and image flipping. We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier using convex loss minimization. From this result, we propose a novel feature modeling for principal component analysis and k-means clustering, which are used for most feature coding methods, and global feature functions that explicitly consider the group action. Although the global feature functions are complex nonlinear functions in general, we can calculate the group action on this space easily by constructing the functions as the tensor product representations of basic representations, resulting in the explicit form of invariant feature functions. We demonstrate the effectiveness of our methods on several image datasets.Comment: 14 pages, 5 figure

    Invariant deep compressible covariance pooling for aerial scene categorization

    Get PDF
    Learning discriminative and invariant feature representation is the key to visual image categorization. In this article, we propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization. We consider transforming the input image according to a finite transformation group that consists of multiple confounding orthogonal matrices, such as the D4 group. Then, we adopt a Siamese-style network to transfer the group structure to the representation space, where we can derive a trivial representation that is invariant under the group action. The linear classifier trained with trivial representation will also be possessed with invariance. To further improve the discriminative power of representation, we extend the representation to the tensor space while imposing orthogonal constraints on the transformation matrix to effectively reduce feature dimensions. We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods. In particular, with using ResNet architecture, our IDCCP model can reduce the dimension of the tensor representation by about 98% without sacrificing accuracy (i.e., <0.5%)

    Deep invariant feature learning for remote sensing scene classification

    Get PDF
    Image classification, as the core task in the computer vision field, has proceeded at a break­neck pace. It largely attributes to the recent growth of deep learning techniques which have blown the conventional statistical methods on a plethora of benchmarks and even can outperform humans in specific image classification tasks. Despite deep learning exceeding alternative techniques, they have many apparent disadvantages that prevent them from being deployed for the general-purpose. Specifically, deep learning always requires a considerable amount of well-annotated data to circumvent the problems of over-fitting and the lacking of prior knowledge. However, manually labelled data is expensive to acquire and is impossible to incorporate the variations as much as the real world. Consequently, deep learning models usually fail when they confront with the underrepresented variations in the training data. This is the main reason why the deep learning model is barely satisfactory in the challeng­ing image recognition task that contains nuisance variations such as, Remote Sensing Scene Classification (RSSC). The classification of remote sensing scene image is a procedure of assigning the seman­tic meaning labels for the given satellite images that contain the complicated variations, such as texture and appearances. The algorithms for effectively understanding and recognising remote sensing scene images have the potential to be employed in a broad range of applications, such as urban planning, Land Use and Land Cover (LULC) determination, natural hazards detection, vegetation mapping, environmental monitoring. This inspires us to de­sign the frameworks that can automatically predict the precise label for satellite images. In our research project, we mine and define the challenges in RSSC community compared with general scene image recognition tasks. Specifically, we summarise the problems into the following perspectives. 1) Visual-semantic ambiguity: the discrepancy between visual features and semantic concepts; 2) Variations: the intra-class diversity and inter-class similarity; 3) Clutter background; 4) The small size of the training set; 5) Unsatisfactory classification accuracy in large-scale datasets. To address the aforementioned challenges, we explore a way to dynamically expand the capabilities of incorporating the prior knowledge by transforming the input data so that we can learn the globally invariant second-order features from the transformed data for improving the performance of RSSC tasks. First, we devise a recurrent transformer network (RTN) to progressively discover the discriminative regions of input images and learn the corresponding second-order features. The model is optimised using pairwise ranking loss to achieve localising discriminative parts and learning the corresponding features in a mutu­ally reinforced way. Second, we observed that existing remote sensing image datasets lack the provision of ontological structures. Therefore, a multi-granularity canonical appearance pooling (MG-CAP) model is proposed to automatically seek the implied hierarchical structures of datasets and produced covariance features contained the multi-grained information. Third, we explore a way to improve the discriminative power of the second-order features. To accomplish this target, we present a covariance feature embedding (CFE) model to im­prove the distinctive power of covariance pooling by using suitable matrix normalisation methods and a low-norm cosine similarity loss to accurately metric the distances of high­dimensional features. Finally, we improved the performance of RSSC while using fewer model parameters. An invariant deep compressible covariance pooling (IDCCP) model is presented to boost the classification accuracy for RSSC tasks. Meanwhile, we proofed the generalisability of our IDCCP model using group theory and manifold optimisation techniques. All of the proposed frameworks allow being optimised in an end-to-end manner and are well-supported by GPU acceleration. We conduct extensive experiments on the well-known remote sensing scene image datasets to demonstrate the great promotions of our proposed methods in comparison with state-of-the-art approaches
    corecore