6 research outputs found

    Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification

    Get PDF
    High spatial resolution (1–5 m) remotely sensed datasets are increasingly being used to map land covers over large geographic areas using supervised machine learning algorithms. Although many studies have compared machine learning classification methods, sample selection methods for acquiring training and validation data for machine learning, and cross-validation techniques for tuning classifier parameters are rarely investigated, particularly on large, high spatial resolution datasets. This work, therefore, examines four sample selection methods—simple random, proportional stratified random, disproportional stratified random, and deliberative sampling—as well as three cross-validation tuning approaches—k-fold, leave-one-out, and Monte Carlo methods. In addition, the effect on the accuracy of localizing sample selections to a small geographic subset of the entire area, an approach that is sometimes used to reduce costs associated with training data collection, is investigated. These methods are investigated in the context of support vector machines (SVM) classification and geographic object-based image analysis (GEOBIA), using high spatial resolution National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters, covering a 2,609 km2 regional-scale area in northeastern West Virginia, USA. Stratified-statistical-based sampling methods were found to generate the highest classification accuracy. Using a small number of training samples collected from only a subset of the study area provided a similar level of overall accuracy to a sample of equivalent size collected in a dispersed manner across the entire regional-scale dataset. There were minimal differences in accuracy for the different cross-validation tuning methods. The processing time for Monte Carlo and leave-one-out cross-validation were high, especially with large training sets. For this reason, k-fold cross-validation appears to be a good choice. Classifications trained with samples collected deliberately (i.e., not randomly) were less accurate than classifiers trained from statistical-based samples. This may be due to the high positive spatial autocorrelation in the deliberative training set. Thus, if possible, samples for training should be selected randomly; deliberative samples should be avoided

    Spatiotemporally representative and cost-efficient sampling design for validation activities in wanglang experimental site

    Get PDF
    Altres ajuts: EC Copernicus Global Land Service (CGLOPS-1, 199494-JRC.Spatiotemporally representative Elementary Sampling Units (ESUs) are required for capturing the temporal variations in surface spatial heterogeneity through field measurements. Since inaccessibility often coexists with heterogeneity, a cost-efficient sampling design is mandatory. We proposed a sampling strategy to generate spatiotemporally representative and cost-efficient ESUs based on the conditioned Latin hypercube sampling scheme. The proposed strategy was constrained by multi-temporal Normalized Difference Vegetation Index (NDVI) imagery, and the ESUs were limited within a sampling feasible region established based on accessibility criteria. A novel criterion based on the Overlapping Area (OA) between the NDVI frequency distribution histogram from the sampled ESUs and that from the entire study area was used to assess the sampling efficiency. A case study inWanglang National Nature Reserve in China showed that the proposed strategy improves the spatiotemporally representativeness of sampling (mean annual OA = 74.7%) compared to the single-temporally constrained (OA = 68.7%) and the random sampling (OA = 63.1%) strategies. The introduction of the feasible region constraint significantly reduces in-situ labour-intensive characterization necessities at expenses of about 9% loss in the spatiotemporal representativeness of the sampling. Our study will support the validation activities in Wanglang experimental site providing a benchmark for locating the nodes of automatic observation systems (e.g., LAINet) which need a spatially distributed and temporally fixed sampling design

    Evaluation of Sampling Methods for Validation of Remotely Sensed Fractional Vegetation Cover

    No full text
    Validation over heterogeneous areas is critical to ensuring the quality of remote sensing products. This paper focuses on the sampling methods used to validate the coarse-resolution fractional vegetation cover (FVC) product in the Heihe River Basin, where the patterns of spatial variations in and between land cover types vary significantly in the different growth stages of vegetation. A sampling method, called the mean of surface with non-homogeneity (MSN) method, and three other sampling methods are examined with real-world data obtained in 2012. A series of 15-m-resolution fractional vegetation cover reference maps were generated using the regressions of field-measured and satellite data. The sampling methods were tested using the 15-m-resolution normalized difference vegetation index (NDVI) and land cover maps over a complete period of vegetation growth. Two scenes were selected to represent the situations in which sampling locations were sparsely and densely distributed. The results show that the FVCs estimated using the MSN method have errors of approximately less than 0.03 in the two selected scenes. The validation accuracy of the sampling methods varies with variations in the stratified non-homogeneity in the different growing stages of the vegetation. The MSN method, which considers both heterogeneity and autocorrelations between strata, is recommended for use in the determination of samplings prior to the design of an experimental campaign. In addition, the slight scaling bias caused by the non-linear relationship between NDVI and FVC samples is discussed. The positive or negative trend of the biases predicted using a Taylor expansion is found to be consistent with that of the real biases

    Evaluation of Sampling Methods for Validation of Remotely Sensed Fractional Vegetation Cover

    No full text
    Validation over heterogeneous areas is critical to ensuring the quality of remote sensing products. This paper focuses on the sampling methods used to validate the coarse-resolution fractional vegetation cover (FVC) product in the Heihe River Basin, where the patterns of spatial variations in and between land cover types vary significantly in the different growth stages of vegetation. A sampling method, called the mean of surface with non-homogeneity (MSN) method, and three other sampling methods are examined with real-world data obtained in 2012. A series of 15-m-resolution fractional vegetation cover reference maps were generated using the regressions of field-measured and satellite data. The sampling methods were tested using the 15-m-resolution normalized difference vegetation index (NDVI) and land cover maps over a complete period of vegetation growth. Two scenes were selected to represent the situations in which sampling locations were sparsely and densely distributed. The results show that the FVCs estimated using the MSN method have errors of approximately less than 0.03 in the two selected scenes. The validation accuracy of the sampling methods varies with variations in the stratified non-homogeneity in the different growing stages of the vegetation. The MSN method, which considers both heterogeneity and autocorrelations between strata, is recommended for use in the determination of samplings prior to the design of an experimental campaign. In addition, the slight scaling bias caused by the non-linear relationship between NDVI and FVC samples is discussed. The positive or negative trend of the biases predicted using a Taylor expansion is found to be consistent with that of the real biases

    Object-Based Supervised Machine Learning Regional-Scale Land-Cover Classification Using High Resolution Remotely Sensed Data

    Get PDF
    High spatial resolution (HR) (1m – 5m) remotely sensed data in conjunction with supervised machine learning classification are commonly used to construct land-cover classifications. Despite the increasing availability of HR data, most studies investigating HR remotely sensed data and associated classification methods employ relatively small study areas. This work therefore drew on a 2,609 km2, regional-scale study in northeastern West Virginia, USA, to investigates a number of core aspects of HR land-cover supervised classification using machine learning. Issues explored include training sample selection, cross-validation parameter tuning, the choice of machine learning algorithm, training sample set size, and feature selection. A geographic object-based image analysis (GEOBIA) approach was used. The data comprised National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters. Stratified-statistical-based training sampling methods were found to generate higher classification accuracies than deliberative-based sampling. Subset-based sampling, in which training data is collected from a small geographic subset area within the study site, did not notably decrease the classification accuracy. For the five machine learning algorithms investigated, support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), and learning vector quantization (LVQ), increasing the size of the training set typically improved the overall accuracy of the classification. However, RF was consistently more accurate than the other four machine learning algorithms, even when trained from a relatively small training sample set. Recursive feature elimination (RFE), which can be used to reduce the dimensionality of a training set, was found to increase the overall accuracy of both SVM and NEU classification, however the improvement in overall accuracy diminished as sample size increased. RFE resulted in only a small improvement the overall accuracy of RF classification, indicating that RF is generally insensitive to the Hughes Phenomenon. Nevertheless, as feature selection is an optional step in the classification process, and can be discarded if it has a negative effect on classification accuracy, it should be investigated as part of best practice for supervised machine land-cover classification using remotely sensed data
    corecore