16 research outputs found

    Fourier-based Rotation-invariant Feature Boosting: An Efficient Framework for Geospatial Object Detection

    Get PDF
    Geospatial object detection of remote sensing imagery has been attracting an increasing interest in recent years, due to the rapid development in spaceborne imaging. Most of previously proposed object detectors are very sensitive to object deformations, such as scaling and rotation. To this end, we propose a novel and efficient framework for geospatial object detection in this letter, called Fourier-based rotation-invariant feature boosting (FRIFB). A Fourier-based rotation-invariant feature is first generated in polar coordinate. Then, the extracted features can be further structurally refined using aggregate channel features. This leads to a faster feature computation and more robust feature representation, which is good fitting for the coming boosting learning. Finally, in the test phase, we achieve a fast pyramid feature extraction by estimating a scale factor instead of directly collecting all features from image pyramid. Extensive experiments are conducted on two subsets of NWPU VHR-10 dataset, demonstrating the superiority and effectiveness of the FRIFB compared to previous state-of-the-art methods

    Clasificacion de imagenes con bag of visual words

    Get PDF
    Cap. 10- pp. 181-200La clasificación de imágenes es un proceso mediante el cual un ordenador es capaz de decidir qué contenidos están presentes en una imagen, esto es a qué clase pertenece o qué objetos contiene. En los últimos años el modelo Bag of Visual Words (BoVW) se ha convertido en una de las soluciones más utilizadas para realizar esta tarea. El término visual word (palabra visual, o simplemente “palabra”) hace referencia a una pequeña parte de una imagen. El BoVW consta de varias etapas: un muestreo de puntos característicos (keypoints) de la imagen, la descripción de los mismos, la creación de un diccionario de palabras visuales mediante un proceso de agrupamiento, la representación de las imágenes a nivel global utilizando este diccionario y, finalmente, una clasificación de estas representaciones para decidir la clase a la que pertenece. En este capítulo se explicará el modelo BoVW de clasificación de imágenes, detallando estas etapas

    Remote Sensing Image Scene Classification: Benchmark and State of the Art

    Full text link
    Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.Comment: This manuscript is the accepted version for Proceedings of the IEE

    Image Classification Using Bag-of-Visual-Words Model

    Get PDF
    Recently, with the explosive growth of digital technologies, there has been a rapid proliferation of the size of image collection. The technique of supervised image clas sification has been widely applied in many domains in order to organize, search, and retrieve images. However, the traditional feature extraction approaches yield the poor classification accuracy. Therefore, the Bag-of-visual-words model, inspired by Bag-of Words model in document classification, was used to present images with the local descriptors for image classification, and also it performs well in some fields. This research provides the empirical evidence to prove that the BoVW model outperforms the traditional feature extraction approaches for both binary image clas sification and multi-class image classification. Furthermore, the research reveals that the size of the visual vocabulary during the process of building BoVW model impact on the accuracy results of image classification

    Aprendizado ativo baseado em atributos contextuais de superpixel para classificação de imagem de sensoriamento remoto

    Get PDF
    Orientadores: Alexandre Xavier Falcão, Jefersson Alex dos SantosDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentemente, técnicas de aprendizado de máquina têm sido propostas para criar mapas temáticos a partir de imagens de sensoriamento remoto. Estas técnicas podem ser divididas em métodos de classificação baseados em pixels ou regiões. Este trabalho concentra-se na segunda abordagem, uma vez que estamos interessados em imagens com milhões de pixels e a segmentação da imagem em regiões (superpixels) pode reduzir consideravelmente o número de amostras a serem classificadas. Porém, mesmo utilizando superpixels, o número de amostras ainda é grande para anotá-las manualmente e treinar o classificador. As técnicas de aprendizado ativo propostas resolvem este problema começando pela seleção de um conjunto pequeno de amostras selecionadas aleatoriamente. Tais amostras são anotadas manualmente e utilizadas para treinar a primeira instância do classificador. Em cada iteração do ciclo de aprendizagem, o classificador atribui rótulos e seleciona as amostras mais informativas para a correção/confirmação pelo usuário, aumentando o tamanho do conjunto de treinamento. A instância do classificador é melhorada no final de cada iteração pelo seu treinamento e utilizada na iteração seguinte até que o usuário esteja satisfeito com o classificador. Observamos que a maior parte dos métodos reclassificam o conjunto inteiro de dados em cada iteração do ciclo de aprendizagem, tornando este processo inviável para interação com o usuário. Portanto, enderaçamos dois problemas importantes em classificação baseada em regiões de imagens de sensoriamento remoto: (a) a descrição efetiva de superpixels e (b) a redução do tempo requerido para seleção de amostras em aprendizado ativo. Primeiro, propusemos um descritor contextual de superpixels baseado na técnica de sacola de palavras, que melhora o resultado de descritores de cor e textura amplamente utilizados. Posteriormente, propusemos um método supervisionado de redução do conjunto de dados que é baseado em um método do estado da arte em aprendizado ativo chamado Multi-Class Level Uncertainty (MCLU). Nosso método mostrou-se tão eficaz quanto o MCLU e ao mesmo tempo consideravelmente mais eficiente. Adicionalmente, melhoramos seu desempenho por meio da aplicação de um processo de relaxação no mapa de classificação, utilizando Campos Aleatórios de MarkovAbstract: In recent years, machine learning techniques have been proposed to create classification maps from remote sensing images. These techniques can be divided into pixel- and region-based image classification methods. This work concentrates on the second approach, since we are interested in images with millions of pixels and the segmentation of the image into regions (superpixels) can considerably reduce the number of samples for classification. However, even using superpixels the number of samples is still large for manual annotation of samples to train the classifier. Active learning techniques have been proposed to address the problem by starting from a small set of randomly selected samples, which are manually labeled and used to train a first instance of the classifier. At each learning iteration, the classifier assigns labels and selects the most informative samples for user correction/confirmation, increasing the size of the training set. An improved instance of the classifier is created by training, after each iteration, and used in the next iteration until the user is satisfied with the classifier. We observed that most methods reclassify the entire pool of unlabeled samples at every learning iteration, making the process unfeasible for user interaction. Therefore, we address two important problems in region-based classification of remote sensing images: (a) the effective superpixel description and (b) the reduction of the time required for sample selection in active learning. First, we propose a contextual superpixel descriptor, based on bag of visual words, that outperforms widely used color and texture descriptors. Second, we propose a supervised method for dataset reduction that is based on a state-of-art active learning technique, called Multi-Class Level Uncertainty (MCLU). Our method has shown to be as effective as MCLU, while being considerably more efficient. Additionally, we further improve its performance by applying a relaxation process on the classification map by using Markov Random FieldsMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks

    Get PDF
    The automatic classification of ships from aerial images is a considerable challenge. Previous works have usually applied image processing and computer vision techniques to extract meaningful features from visible spectrum images in order to use them as the input for traditional supervised classifiers. We present a method for determining if an aerial image of visible spectrum contains a ship or not. The proposed architecture is based on Convolutional Neural Networks (CNN), and it combines neural codes extracted from a CNN with a k-Nearest Neighbor method so as to improve performance. The kNN results are compared to those obtained with the CNN Softmax output. Several CNN models have been configured and evaluated in order to seek the best hyperparameters, and the most suitable setting for this task was found by using transfer learning at different levels. A new dataset (named MASATI) composed of aerial imagery with more than 6000 samples has also been created to train and evaluate our architecture. The experimentation shows a success rate of over 99% for our approach, in contrast with the 79% obtained with traditional methods in classification of ship images, also outperforming other methods based on CNNs. A dataset of images (MWPU VHR-10) used in previous works was additionally used to evaluate the proposed approach. Our best setup achieves a success ratio of 86% with these data, significantly outperforming previous state-of-the-art ship classification methods.This work was funded by both the Spanish Government’s Ministry of Economy, Industry and Competitiveness and Babcock MCS Spain through the projects RTC-2014-1863-8 and INAER4-14Y(IDI-20141234)

    Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects

    Get PDF
    Hyperspectral Imaging (HSI) has been extensively utilized in many real-life applications because it benefits from the detailed spectral information contained in each pixel. Notably, the complex characteristics i.e., the nonlinear relation among the captured spectral information and the corresponding object of HSI data make accurate classification challenging for traditional methods. In the last few years, Deep Learning (DL) has been substantiated as a powerful feature extractor that effectively addresses the nonlinear problems that appeared in a number of computer vision tasks. This prompts the deployment of DL for HSI classification (HSIC) which revealed good performance. This survey enlists a systematic overview of DL for HSIC and compared state-of-the-art strategies of the said topic. Primarily, we will encapsulate the main challenges of traditional machine learning for HSIC and then we will acquaint the superiority of DL to address these problems. This survey breakdown the state-of-the-art DL frameworks into spectral-features, spatial-features, and together spatial-spectral features to systematically analyze the achievements (future research directions as well) of these frameworks for HSIC. Moreover, we will consider the fact that DL requires a large number of labeled training examples whereas acquiring such a number for HSIC is challenging in terms of time and cost. Therefore, this survey discusses some strategies to improve the generalization performance of DL strategies which can provide some future guidelines

    Bridging the Simulation-to-Reality Gap: Adapting Simulation Environment for Object Recognition

    Get PDF
    Rapid advancements in object recognition have created a huge demand for labeled datasets for the task of training, testing, and validation of different techniques. Due to the wide range of applications, object models in the datasets need to cover both variations in geometric features and diverse conditions in which sensory inputs are obtained. Also, the need to manually label the object models is cumbersome. As a result, it becomes difficult for researchers to gain access to adequate datasets for the development of new methods or algorithms. In comparison, computer simulation has been considered a cost-effective solution to generate simulated data for the training, testing, and validation of object recognition techniques. However, its effectiveness has been the major concern due to a problem commonly known as the reality gap, which emphasizes the differences that exist between real and simulated images. Aimed at bridging the reality gap, this study incorporates the influential factors that cause the problem and then proposes to adjust the setting of simulation to not only imitate the objects but also the environment that matches with the real-world scenario. In addition, it includes a system structure to retrieve information of the real world and to incorporate this information in the setting of environmental properties in simulation. This study covers a total of 14 experiments using different influential factors to generate simulated data and assess the reality gap with real-world counterpart images. The proposed approach enables the rendering of realistic data with ground-truth labels, thus making simulated datasets a cost-effective and efficient alternative

    Inferring landscape preferences from social media using data science techniques

    Get PDF
    People and societies attribute different values to landscapes, which are often derived from their preferences. Such preferences are shaped by aesthetics, recreational benefits, safety, and other services provided by landscapes. Researchers have found that more appealing landscapes can promote human health and well-being. Existing methods used to study landscape preferences, such as social surveys, create high quality data but have high cost of time and effort and are poorly suited to capture dynamic landscape-scale changes across large geographic scales. With the rapid rise in social media, a huge amount of user-generated data is now available for researchers to study emotions or sentiments (i.e., preferences) towards particular topics of interest. This dissertation investigates how social media data can be used to indirectly measure (Zanten et al., 2016) and learn features relevant to landscape preferences, focusing primarily on a specific landscape called green infrastructure (GI). The first phase of the work introduces a first-ever benchmark GI location dataset within the US (GReen Infrastructure Dataset, or GRID) and develops a computer vision algorithm for identifying GI from aerial images using Google/Bing Map API. The data collected from this object detection method is then used to re-train a human preference model developed previously (Rai, 2013) and it improved the prediction accuracy significantly. I found that with the framework introduced here, we can collect the landscape data, which is comparable to the current methods in terms of quality with much less efforts. Second phase uses GI images and textual comments from Flickr, Instagram, and Twitter to train a lexicon-based sentiment model for predicting people's sentiments for GI. Since almost 70 percent of US adults are using some social media platform to connect with their friends, families or to follow recent news and topic of interest (Pew research, 2015), it is imperative to understand whether people share, post, or comment about the landscape settings they live in or prefer. And the results show that social media information can be really useful in predicting people’s sentiments about landscape they live or visit. The third phase builds on the second phase to identify specific features that are correlated with higher and lower preferences. The findings demonstrate that we can learn features that impacts people’s preference about the landscape. These features are very descriptive that a layperson can understand and can also be useful for designers, storm-water engineers, city planners to incorporate in their landscape designs such that it improves human health and well- being. Finally, I will conclude and describe some follow up research that I think would be potential in understanding landscape: work on speeding up the object detection algorithms using more advanced computer vision methods and harnessing the power of GPUs and extension of the findings to other types of GI and landscape designs
    corecore