22,985 research outputs found

    Ensemble classifiers for land cover mapping

    Get PDF
    This study presents experimental investigations on supervised ensemble classification for land cover classification. Despite the arrays of classifiers available in machine learning to create an ensemble, knowing and understanding the correct classifier to use for a particular dataset remains a major challenge. The ensemble method increases classification accuracy by consulting experts taking final decision in the classification process. This study generated various land cover maps, using image classification. This is to authenticate the number of classifiers that should be used for creating an ensemble. The study exploits feature selection techniques to create diversity in ensemble classification. Landsat imagery of Kampala (the capital of Uganda, East Africa), AVIRIS hyperspectral dataset of Indian pine of Indiana and Support Vector Machines were used to carry out the investigation. The research reveals that the superiority of different classification approaches employed depends on the datasets used. In addition, the pre-processing stage and the strategy used during the designing phase of each classifier is very essential. The results obtained from the experiments conducted showed that, there is no significant benefit in using many base classifiers for decision making in ensemble classification. The research outcome also reveals how to design better ensemble using feature selection approach for land cover mapping. The study also reports the experimental comparison of generalized support vector machines, random forests, C4.5, neural network and bagging classifiers for land cover classification of hyperspectral images. These classifiers are among the state-of-the-art supervised machine learning methods for solving complex pattern recognition problems. The pixel purity index was used to obtain the endmembers from the Indiana pine and Washington DC mall hyperspectral image datasets. Generalized reduced gradient optimization algorithm was used to estimate fractional abundance in the image dataset thereafter obtained numeric values for land cover classification. The fractional abundance of each pixel was obtained using the spectral signature values of the endmembers and pixel values of class labels. As the results of the experiments, the classifiers show promising results. Using Indiana pine and Washington DC mall hyperspectral datasets, experimental comparison of all the classifiers’ performances reveals that random forests outperforms the other classifiers and it is computational effective. The study makes a positive contribution to the problem of classifying land cover hyperspectral images by exploring the use of generalized reduced gradient method and five supervised classifiers. The accuracy comparison of these classifiers is valuable for decision makers to consider tradeoffs in method accuracy versus complexity. The results from the research has attracted nine publications which include, six international and one local conference papers, one published in Computing Research Repository (CoRR), one Journal paper submitted and one Springer book chapter, Abe et al., 2012 obtained a merit award based on the reviewer reports and the score reports of the conference committee members during the conference period

    A Quantitative Assessment of Forest Cover Change in the Moulouya River Watershed (Morocco) by the Integration of a Subpixel-Based and Object-Based Analysis of Landsat Data

    Get PDF
    A quantitative assessment of forest cover change in the Moulouya River watershed (Morocco) was carried out by means of an innovative approach from atmospherically corrected reflectance Landsat images corresponding to 1984 (Landsat 5 Thematic Mapper) and 2013 (Landsat 8 Operational Land Imager). An object-based image analysis (OBIA) was undertaken to classify segmented objects as forested or non-forested within the 2013 Landsat orthomosaic. A Random Forest classifier was applied to a set of training data based on a features vector composed of different types of object features such as vegetation indices, mean spectral values and pixel-based fractional cover derived from probabilistic spectral mixture analysis). The very high spatial resolution image data of Google Earth 2013 were employed to train/validate the Random Forest classifier, ranking the NDVI vegetation index and the corresponding pixel-based percentages of photosynthetic vegetation and bare soil as the most statistically significant object features to extract forested and non-forested areas. Regarding classification accuracy, an overall accuracy of 92.34% was achieved. The previously developed classification scheme was applied to the 1984 Landsat data to extract the forest cover change between 1984 and 2013, showing a slight net increase of 5.3% (ca. 8800 ha) in forested areas for the whole region

    Temporal optimisation of image acquisition for land cover classification with random forest and MODIS time-series

    Get PDF
    The analysis and classification of land cover is one of the principal applications in terrestrial remote sensing. Due to the seasonal variability of different vegetation types and land surface characteristics, the ability to discriminate land cover types changes over time. Multi-temporal classification can help to improve the classification accuracies, but different constraints, such as financial restrictions or atmospheric conditions, may impede their application. The optimisation of image acquisition timing and frequencies can help to increase the effectiveness of the classification process. For this purpose, the Feature Importance (FI) measure of the state-of-the art machine learning method Random Forest was used to determine the optimal image acquisition periods for a general (Grassland, Forest, Water, Settlement, Peatland) and Grassland specific (Improved Grassland, Semi-Improved Grassland) land cover classification in central Ireland based on a 9-year time-series of MODIS Terra 16 day composite data (MOD13Q1). Feature Importances for each acquisition period of the Enhanced Vegetation Index (EVI) and Normalised Difference Vegetation Index (NDVI) were calculated for both classification scenarios. In the general land cover classification, the months December and January showed the highest, and July and August the lowest separability for both VIs over the entire nine-year period. This temporal separability was reflected in the classification accuracies, where the optimal choice of image dates outperformed the worst image date by 13% using NDVI and 5% using EVI on a mono-temporal analysis. With the addition of the next best image periods to the data input the classification accuracies converged quickly to their limit at around 8–10 images. The binary classification schemes, using two classes only, showed a stronger seasonal dependency with a higher intra-annual, but lower inter-annual variation. Nonetheless anomalous weather conditions, such as the cold winter of 2009/2010 can alter the temporal separability pattern significantly. Due to the extensive use of the NDVI for land cover discrimination, the findings of this study should be transferrable to data from other optical sensors with a higher spatial resolution. However, the high impact of outliers from the general climatic pattern highlights the limitation of spatial transferability to locations with different climatic and land cover conditions. The use of high-temporal, moderate resolution data such as MODIS in conjunction with machine-learning techniques proved to be a good base for the prediction of image acquisition timing for optimal land cover classification results

    On the benefits of clustering approaches in digital soil mapping: an application example concerning soil texture regionalization

    Get PDF
    High-resolution soil maps are urgently needed by land managers and researchers for a variety of applications. Digital soil mapping (DSM) allows us to regionalize soil properties by relating them to environmental covariates with the help of an empirical model. In this study, a legacy soil dataset was used to train a machine learning algorithm in order to predict the particle size distribution within the catchment of the Bode River in Saxony-Anhalt (Germany). The random forest ensemble learning method was used to predict soil texture based on environmental covariates originating from a digital elevation model, land cover data and geologic maps. We studied the usefulness of clustering applications in addressing various aspects of the DSM procedure. To improve areal representativity of the legacy soil data in terms of spatial variability, the environmental covariates were used to cluster the landscape of the study area into spatial units for stratified random sampling. Different sampling strategies were used to create balanced training data and were evaluated on their ability to improve model performance. Clustering applications were also involved in feature selection and stratified cross-validation. Under the best-performing sampling strategy, the resulting models achieved an R2 of 0.29 to 0.50 in topsoils and 0.16-0.32 in deeper soil layers. Overall, clustering applications appear to be a versatile tool to be employed at various steps of the DSM procedure. Beyond their successful application, further application fields in DSM were identified. One of them is to find adequate means to include expert knowledge. © Copyright

    Benchmark of machine learning methods for classification of a Sentinel-2 image

    Get PDF
    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performanc

    Identifying Mislabeled Training Data

    Full text link
    This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data
    • …
    corecore