1,986 research outputs found

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Full text link
    Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin

    Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

    Full text link
    Remote sensing (RS) image retrieval is of great significant for geological information mining. Over the past two decades, a large amount of research on this task has been carried out, which mainly focuses on the following three core issues: feature extraction, similarity metric and relevance feedback. Due to the complexity and multiformity of ground objects in high-resolution remote sensing (HRRS) images, there is still room for improvement in the current retrieval approaches. In this paper, we analyze the three core issues of RS image retrieval and provide a comprehensive review on existing methods. Furthermore, for the goal to advance the state-of-the-art in HRRS image retrieval, we focus on the feature extraction issue and delve how to use powerful deep representations to address this task. We conduct systematic investigation on evaluating correlative factors that may affect the performance of deep features. By optimizing each factor, we acquire remarkable retrieval results on publicly available HRRS datasets. Finally, we explain the experimental phenomenon in detail and draw conclusions according to our analysis. Our work can serve as a guiding role for the research of content-based RS image retrieval

    Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?

    Get PDF
    In this paper, we evaluate the generalization power of deep features (ConvNets) in two new scenarios: aerial and remote sensing image classification. We evaluate experimentally ConvNets trained for recognizing everyday objects for the classification of aerial and remote sensing images. ConvNets obtained the best results for aerial images, while for remote sensing, they performed well but were outperformed by low-level color descriptors, such as BIC. We also present a correlation analysis, showing the potential for combining/fusing different ConvNets with other descriptors or even for combining multiple ConvNets. A preliminary set of experiments fusing ConvNets obtains state-of-the-art results for the well-known UCMerced dataset

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE

    Coastal fog detection using visual sensing

    Get PDF
    Use of visual sensing techniques to detect low visibility conditions may have a number of advantages when combined with other methods, such as satellite based remote sensing, as data can be collected and processed in real or near real time. Camera-enabled visual sensing can provide direct confirmation of modelling and forecasting results. Fog detection, modelling and prediction are a priority for maritime communities and coastal cities due to economic impacts of fog on aviation, marine, and land transportation. Canadian and Irish coasts are particularly vulnerable to dense fog under certain environmental conditions. Offshore oil and gas production on Grand Bank (off the Canadian East Coast) can be adversely affected by weather and sea state conditions. In particular, fog can disrupt the transfer of equipment and people to/from the production platforms by helicopter. Such disruptions create delays and the delays cost money. According to offshore oil and gas industry representatives at a recent workshop on metocean monitoring and forecasting for the NL offshore, there is a real need for improved forecasting of visibility (fog) out to 3 days. The ability to accurately forecast future fog conditions would improve the industry’s ability to adjust its schedule of operations accordingly. In addition, it was recognized by workshop participants that the physics of Grand Banks fog formation is not well understood, and that more and better data are needed

    Um arcabouço para seleção e fusão de classificadores de padrão

    Get PDF
    Orientadores: Ricardo da Silva Torres, Anderson RochaTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O crescente aumento de dados visuais, seja pelo uso de inúmeras câmeras de vídeo monitoramento disponíveis ou pela popularização de dispositivos móveis que permitem pessoas criar, editar e compartilhar suas próprias imagens/vídeos, tem contribuído enormemente para a chamada ''big data revolution". Esta grande quantidade de dados visuais dá origem a uma caixa de Pandora de novos problemas de classificação visuais nunca antes imaginados. Tarefas de classificação de imagens e vídeos foram inseridos em diferentes e complexas aplicações e o uso de soluções baseadas em aprendizagem de máquina tornou-se mais popular para diversas aplicações. Entretanto, por outro lado, não existe uma ''bala de prata" que resolva todos os problemas, ou seja, não é possível caracterizar todas as imagens de diferentes domínios com o mesmo método de descrição e nem utilizar o mesmo método de aprendizagem para alcançar bons resultados em qualquer tipo de aplicação. Nesta tese, propomos um arcabouço para seleção e fusão de classificadores. Nosso método busca combinar métodos de caracterização de imagem e aprendizagem por meio de uma abordagem meta-aprendizagem que avalia quais métodos contribuem melhor para solução de um determinado problema. O arcabouço utiliza três diferentes estratégias de seleção de classificadores para apontar o menos correlacionados e eficazes, por meio de análises de medidas de diversidade. Os experimentos mostram que as abordagens propostas produzem resultados comparáveis aos famosos métodos da literatura para diferentes aplicações, utilizando menos classificadores e não sofrendo com problemas que afetam outras técnicas como a maldição da dimensionalidade e normalização. Além disso, a nossa abordagem é capaz de alcançar resultados eficazes de classificação usando conjuntos de treinamento muito reduzidosAbstract: The frequent growth of visual data, either by countless available monitoring video cameras or the popularization of mobile devices that allow each person to create, edit, and share their own images and videos have contributed enormously to the so called ''big-data revolution''. This shear amount of visual data gives rise to a Pandora box of new visual classification problems never imagined before. Image and video classification tasks have been inserted in different and complex applications and the use of machine learning-based solutions has become the most popular approach to several applications. Notwithstanding, there is no silver bullet that solves all the problems, i.e., it is not possible to characterize all images of different domains with the same description method nor is it possible to use the same learning method to achieve good results in any kind of application. In this thesis, we aim at proposing a framework for classifier selection and fusion. Our method seeks to combine image characterization and learning methods by means of a meta-learning approach responsible for assessing which methods contribute more towards the solution of a given problem. The framework uses three different strategies of classifier selection which pinpoints the less correlated, yet effective, classifiers through a series of diversity measure analysis. The experiments show that the proposed approaches yield comparable results to well-known algorithms from the literature on many different applications but using less learning and description methods as well as not incurring in the curse of dimensionality and normalization problems common to some fusion techniques. Furthermore, our approach is able to achieve effective classification results using very reduced training setsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

    Full text link
    Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
    corecore