144 research outputs found

    Aggregated Deep Local Features for Remote Sensing Image Retrieval

    Get PDF
    Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing Imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal contributio

    Learning to Associate Words and Images Using a Large-scale Graph

    Full text link
    We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230-dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.Comment: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 201

    Recent Advances in Deep Learning Techniques for Face Recognition

    Full text link
    In recent years, researchers have proposed many deep learning (DL) methods for various tasks, and particularly face recognition (FR) made an enormous leap using these techniques. Deep FR systems benefit from the hierarchical architecture of the DL methods to learn discriminative face representation. Therefore, DL techniques significantly improve state-of-the-art performance on FR systems and encourage diverse and efficient real-world applications. In this paper, we present a comprehensive analysis of various FR systems that leverage the different types of DL techniques, and for the study, we summarize 168 recent contributions from this area. We discuss the papers related to different algorithms, architectures, loss functions, activation functions, datasets, challenges, improvement ideas, current and future trends of DL-based FR systems. We provide a detailed discussion of various DL methods to understand the current state-of-the-art, and then we discuss various activation and loss functions for the methods. Additionally, we summarize different datasets used widely for FR tasks and discuss challenges related to illumination, expression, pose variations, and occlusion. Finally, we discuss improvement ideas, current and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp. 99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613

    Dataset shift in land-use classification for optical remote sensing

    Get PDF
    Multimodal dataset shifts consisting of both concept and covariate shifts are addressed in this study to improve texture-based land-use classification accuracy for optical panchromatic and multispectral remote sensing. Multitemporal and multisensor variances between train and test data are caused by atmospheric, phenological, sensor, illumination and viewing geometry differences, which cause supervised classification inaccuracies. The first dataset shift reduction strategy involves input modification through shadow removal before feature extraction with gray-level co-occurrence matrix and local binary pattern features. Components of a Rayleigh quotient-based manifold alignment framework is investigated to reduce multimodal dataset shift at the input level of the classifier through unsupervised classification, followed by manifold matching to transfer classification labels by finding across-domain cluster correspondences. The ability of weighted hierarchical agglomerative clustering to partition poorly separated feature spaces is explored and weight-generalized internal validation is used for unsupervised cardinality determination. Manifold matching solves the Hungarian algorithm with a cost matrix featuring geometric similarity measurements that assume the preservation of intrinsic structure across the dataset shift. Local neighborhood geometric co-occurrence frequency information is recovered and a novel integration thereof is shown to improve matching accuracy. A final strategy for addressing multimodal dataset shift is multiscale feature learning, which is used within a convolutional neural network to obtain optimal hierarchical feature representations instead of engineered texture features that may be sub-optimal. Feature learning is shown to produce features that are robust against multimodal acquisition differences in a benchmark land-use classification dataset. A novel multiscale input strategy is proposed for an optimized convolutional neural network that improves classification accuracy to a competitive level for the UC Merced benchmark dataset and outperforms single-scale input methods. All the proposed strategies for addressing multimodal dataset shift in land-use image classification have resulted in significant accuracy improvements for various multitemporal and multimodal datasets.Thesis (PhD)--University of Pretoria, 2016.National Research Foundation (NRF)University of Pretoria (UP)Electrical, Electronic and Computer EngineeringPhDUnrestricte
    • …
    corecore