692 research outputs found

    Classification of high dimensional and imbalanced hyperspectral imagery data

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA is applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of using together these two techniques, and also to evaluate the application order that leads to the best classification performance. Experimental results demonstrate the significance of combining these preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order of application corresponds to first a resampling algorithm and then PCA, this is a question that still needs a much more thorough investigationPartially supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596–C04–04/ESP and TIN2009–14205–C04–04, and by Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–0

    Exploring issues of balanced versus imbalanced samples in mapping grass community in the telperion reserve using high resolution images and selected machine learning algorithms

    Get PDF
    ABSTRACT Accurate vegetation mapping is essential for a number of reasons, one of which is for conservation purposes. The main objective of this research was to map different grass communities in the game reserve using RapidEye and Sentinel-2 MSI images and machine learning classifiers [support vector machine (SVM) and Random forest (RF)] to test the impacts of balanced and imbalance training data on the performance and the accuracy of Support Vector Machine and Random forest in mapping the grass communities and test the sensitivities of pixel resolution to balanced and imbalance training data in image classification. The imbalanced and balanced data sets were obtained through field data collection. The results show RF and SVM are producing a high overall accuracy for Sentinel-2 imagery for both the balanced and imbalanced data set. The RF classifier has yielded an overall accuracy of 79.45% and kappa of 74.38% and an overall accuracy of 76.19% and kappa of 73.21% using imbalanced and balanced training data respectively. The SVM classifier yielded an overall accuracy of 82.54% and kappa of 80.36% and an overall accuracy of 82.21% and a kappa of 78.33% using imbalanced and balanced training data respectively. For the RapidEye imagery, RF and SVM algorithm produced overall accuracy affected by a balanced data set leading to reduced accuracy. The RF algorithm had an overall accuracy that dropped by 6% (from 63.24% to 57.94%) while the SVM dropped by 7% (from 57.31% to 50.79%). The results thereby show that the imbalanced data set is a better option when looking at the image classification of vegetation species than the balanced data set. The study recommends the implementation of ways of handling misclassification among the different grass species to improve classification for future research. Further research can be carried out on other types of high resolution multispectral imagery using different advanced algorithms on different training size samples.EM201

    Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the futureThis work has partially been supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596 and TIN2009–14205, the Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–04, and the Generalitat Valenciana under grant PROMETEO/2010/02

    Hyperspectral Image Analysis with Subspace Learning-based One-Class Classification

    Full text link
    Hyperspectral image (HSI) classification is an important task in many applications, such as environmental monitoring, medical imaging, and land use/land cover (LULC) classification. Due to the significant amount of spectral information from recent HSI sensors, analyzing the acquired images is challenging using traditional Machine Learning (ML) methods. As the number of frequency bands increases, the required number of training samples increases exponentially to achieve a reasonable classification accuracy, also known as the curse of dimensionality. Therefore, separate band selection or dimensionality reduction techniques are often applied before performing any classification task over HSI data. In this study, we investigate recently proposed subspace learning methods for one-class classification (OCC). These methods map high-dimensional data to a lower-dimensional feature space that is optimized for one-class classification. In this way, there is no separate dimensionality reduction or feature selection procedure needed in the proposed classification framework. Moreover, one-class classifiers have the ability to learn a data description from the category of a single class only. Considering the imbalanced labels of the LULC classification problem and rich spectral information (high number of dimensions), the proposed classification approach is well-suited for HSI data. Overall, this is a pioneer study focusing on subspace learning-based one-class classification for HSI data. We analyze the performance of the proposed subspace learning one-class classifiers in the proposed pipeline. Our experiments validate that the proposed approach helps tackle the curse of dimensionality along with the imbalanced nature of HSI data

    Bayesian gravitation based classification for hyperspectral images.

    Get PDF
    Integration of spectral and spatial information is extremely important for the classification of high-resolution hyperspectral images (HSIs). Gravitation describes interaction among celestial bodies which can be applied to measure similarity between data for image classification. However, gravitation is hard to combine with spatial information and rarely been applied in HSI classification. This paper proposes a Bayesian Gravitation based Classification (BGC) to integrate the spectral and spatial information of local neighbors and training samples. In the BGC method, each testing pixel is first assumed as a massive object with unit volume and a particular density, where the density is taken as the data mass in BGC. Specifically, the data mass is formulated as an exponential function of the spectral distribution of its neighbors and the spatial prior distribution of its surrounding training samples based on the Bayesian theorem. Then, a joint data gravitation model is developed as the classification measure, in which the data mass is taken to weigh the contribution of different neighbors in a local region. Four benchmark HSI datasets, i.e. the Indian Pines, Pavia University, Salinas, and Grss_dfc_2014, are tested to verify the BGC method. The experimental results are compared with that of several well-known HSI classification methods, including the support vector machines, sparse representation, and other eight state-of-the-art HSI classification methods. The BGC shows apparent superiority in the classification of high-resolution HSIs and also flexibility for HSIs with limited samples

    Introducing artificial data generation in active learning for land use/land cover classification

    Get PDF
    Fonseca, J., Douzas, G., & Bacao, F. (2021). Increasing the effectiveness of active learning: Introducing artificial data generation in active learning for land use/land cover classification. Remote Sensing, 13(13), 1-20. [2619]. https://doi.org/10.3390/rs13132619In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data “on-demand” for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on the optimization of AL by modifying the selection criteria and the classifiers used. Although improvements in these areas will result in more effective data collection, the use of artificial data sources to reduce human–computer interaction remains unexplored. In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL. The implementation of the proposed AL framework is done using Geometric SMOTE as the data generator. We compare the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets. We show that this modification of the AL framework significantly reduces cost and time requirements for a successful AL implementation in all of the datasets used in the experiment.publishersversionpublishe
    • …
    corecore