163 research outputs found

    PresenceAbsence: An R Package for Presence Absence Analysis

    Get PDF
    The PresenceAbsence package for R provides a set of functions useful when evaluating the results of presence-absence analysis, for example, models of species distribution or the analysis of diagnostic tests. The package provides a toolkit for selecting the optimal threshold for translating a probability surface into presence-absence maps specifically tailored to their intended use. The package includes functions for calculating threshold dependent measures such as confusion matrices, percent correctly classified (PCC), sensitivity, specificity, and Kappa, and produces plots of each measure as the threshold is varied. It also includes functions to plot the Receiver Operator Characteristic (ROC) curve and calculates the associated area under the curve (AUC), a threshold independent measure of model quality. Finally, the package computes optimal thresholds by multiple criteria, and plots these optimized thresholds on the graphs.

    Comparing Nonlinear and Nonparametric Modeling Techniques for Mapping and Stratification in Forest Inventories of the Interior Western USA

    Get PDF
    Recent emphasis has been placed on merging regional forest inventory data with satellite-based information both to improve the efficiency of estimates of population totals, and to produce regional maps of forest variables. There are numerous ways in which forest class and structure variables may be modeled as functions of remotely sensed variables, yet surprisingly little work has been directed at surveying modem statistical techniques to determine which tools are best suited to the tasks given multiple objectives and logistical constraints. Here, a series of analyses to compare nonlinear and nonparametric modeling techniques for mapping a variety of forest variables, and for stratification of field plots, was conducted using data in the Interior Western United States. The analyses compared four statistical modeling techniques for predicting two discrete and four continuous forest inventory variables. The modeling techniques include generalized additive models (GAMs), classification and regression trees (CARTs), multivariate adaptive regression splines (MARS), and artificial neural networks (ANNs). Alternative stratification schemes were also compared for estimating population totals. The analyses were conducted within six ecologically different regions using a variety of satellite-based predictor variables. The work resulted in the development of an objective modeling box that automatically models spatial response variables as functions of any assortment of predictor variables through the four nonlinear or nonparametric modeling techniques. In comparing the different modeling techniques, all proved themselves workable in an automated environment, though ANNs were more problematic. When their potential mapping ability was explored through a simple simulation, tremendous advantages were seen in use of MARS and ANN for prediction over GAMs, CART, and a simple linear model. However, much smaller differences were seen when using real data. In some instances, a simple linear approach worked virtually as well as the more complex models, while small gains were seen using more complex models in other instances. In real data runs, MARS performed (marginally) best most often for binary variables, while GAMs performed (marginally) best most often for continuous variables. After considering a subjective ease of use measure, computing time and other predictive performance measures, it was determined that MARS had many advantages over other modeling techniques. In addition, stratification tests illustrated cost-effective means to improve precision of estimates of forest population totals. Finally, the general effect of map accuracy on the relative precision of estimates of population totals obtained under simple random sampling compared to that obtained under stratified random sampling was established and graphically illustrated as a tool for management decisions

    PresenceAbsence: An R Package for Presence Absence Analysis

    Get PDF
    The PresenceAbsence package for R provides a set of functions useful when evaluating the results of presence-absence analysis, for example, models of species distribution or the analysis of diagnostic tests. The package provides a toolkit for selecting the optimal threshold for translating a probability surface into presence-absence maps specifically tailored to their intended use. The package includes functions for calculating threshold dependent measures such as confusion matrices, percent correctly classified (PCC), sensitivity, specificity, and Kappa, and produces plots of each measure as the threshold is varied. It also includes functions to plot the Receiver Operator Characteristic (ROC) curve and calculates the associated area under the curve (AUC), a threshold independent measure of model quality. Finally, the package computes optimal thresholds by multiple criteria, and plots these optimized thresholds on the graphs

    Evaluating the remote sensing and inventory-based estimation of biomass in the western carpathians

    Get PDF
    Understanding the potential of forest ecosystems as global carbon sinks requires a thorough knowledge of forest carbon dynamics, including both sequestration and fluxes among multiple pools. The accurate quantification of biomass is important to better understand forest productivity and carbon cycling dynamics. Stand-based inventories (SBIs) are widely used for quantifying forest characteristics and for estimating biomass, but information may quickly become outdated in dynamic forest environments. Satellite remote sensing may provide a supplement or substitute. We tested the accuracy of aboveground biomass estimates modeled from a combination of Landsat Thematic Mapper (TM) imagery and topographic data, as well as SBI-derived variables in a Picea abies forest in the Western Carpathian Mountains. We employed Random Forests for non-parametric, regression tree-based modeling. Results indicated a difference in the importance of SBI-based and remote sensing-based predictors when estimating aboveground biomass. The most accurate models for biomass prediction ranged from a correlation coefficient of 0.52 for the TM- and topography-based model, to 0.98 for the inventory-based model. While Landsat-based biomass estimates were measurably less accurate than those derived from SBI, adding tree height or stand-volume as a field-based predictor to TM and topography-based models increased performance to 0.36 and 0.86, respectively. Our results illustrate the potential of spectral data to reveal spatial details in stand structure and ecological complexity. © 2011 by the authors

    Forest structure and aboveground biomass in the southwestern United States from MODIS and MISR

    Get PDF
    Red band bidirectional reflectance factor data from the NASA MODerate resolution Imaging Spectroradiometer (MODIS) acquired over the southwestern United States were interpreted through a simple geometric–optical (GO) canopy reflectance model to provide maps of fractional crown cover (dimensionless), mean canopy height (m), and aboveground woody biomass (Mg ha−1) on a 250 m grid. Model adjustment was performed after dynamic injection of a background contribution predicted via the kernel weights of a bidirectional reflectance distribution function (BRDF) model. Accuracy was assessed with respect to similar maps obtained with data from the NASA Multiangle Imaging Spectroradiometer (MISR) and to contemporaneous US Forest Service (USFS) maps based partly on Forest Inventory and Analysis (FIA) data. MODIS and MISR retrievals of forest fractional cover and mean height both showed compatibility with the USFS maps, with MODIS mean absolute errors (MAE) of 0.09 and 8.4 m respectively, compared with MISR MAE of 0.10 and 2.2 m, respectively. The respective MAE for aboveground woody biomass was ~10 Mg ha−1, the same as that from MISR, although the MODIS retrievals showed a much weaker correlation, noting that these statistics do not represent evaluation with respect to ground survey data. Good height retrieval accuracies with respect to averages from high resolution discrete return lidar data and matches between mean crown aspect ratio and mean crown radius maps and known vegetation type distributions both support the contention that the GO model results are not spurious when adjusted against MISR bidirectional reflectance factor data. These results highlight an alternative to empirical methods for the exploitation of moderate resolution remote sensing data in the mapping of woody plant canopies and assessment of woody biomass loss and recovery from disturbance in the southwestern United States and in parts of the world where similar environmental conditions prevail

    United States Forest Disturbance Trends Observed Using Landsat Time Series

    Get PDF
    Disturbance events strongly affect the composition, structure, and function of forest ecosystems; however, existing U.S. land management inventories were not designed to monitor disturbance. To begin addressing this gap, the North American Forest Dynamics (NAFD) project has examined a geographic sample of 50 Landsat satellite image time series to assess trends in forest disturbance across the conterminous United States for 1985-2005. The geographic sample design used a probability-based scheme to encompass major forest types and maximize geographic dispersion. For each sample location disturbance was identified in the Landsat series using the Vegetation Change Tracker (VCT) algorithm. The NAFD analysis indicates that, on average, 2.77 Mha/yr of forests were disturbed annually, representing 1.09%/yr of US forestland. These satellite-based national disturbance rates estimates tend to be lower than those derived from land management inventories, reflecting both methodological and definitional differences. In particular the VCT approach used with a biennial time step has limited sensitivity to low-intensity disturbances. Unlike prior satellite studies, our biennial forest disturbance rates vary by nearly a factor of two between high and low years. High western US disturbance rates were associated with active fire years and insect activity, while variability in the east is more strongly related to harvest rates in managed forests. We note that generating a geographic sample based on representing forest type and variability may be problematic since the spatial pattern of disturbance does not necessarily correlate with forest type. We also find that the prevalence of diffuse, non-stand clearing disturbance in US forests makes the application of a biennial geographic sample problematic. Future satellite-based studies of disturbance at regional and national scales should focus on wall-to-wall analyses with annual time step for improved accuracy

    Assessing North American Forest Disturbance from the Landsat Archive

    Get PDF
    Forest disturbances are thought to play a major role in controlling land-atmosphere fluxes of carbon. Under the auspices of the North American Carbon Program, the LEDAPS (Landsat Ecosystem Disturbance Adaptive Processing System) and NACP-FIA projects have been analyzing the Landsat satellite record to assess rates of forest disturbance across North America. In the LEDAPS project, wall-to-wall Landsat imagery for the period 1975-2000 has been converted to surface reflectance and analyzed for decadal losses (disturbance) or gains (regrowth) in biomass using a spectral "disturbance index". The NACP-FIA project relies on a geographic sample of dense Landsat image time series, allowing both disturbance rates and recovery trends to be characterized. Preliminary results for the 1990's indicate high rates of harvest within the southeastern US, Eastern Canada, and the Pacific Northwest, with spatially averaged (approx.50x50 km) turnover periods as low as 25-40 years. Lower rates of disturbance are found in the Rockies and Northeastern US

    Assessing small area estimates via artificial populations from KBAABB: a kNN-based approximation to ABB

    Full text link
    Comparing and evaluating small area estimation (SAE) models for a given application is inherently difficult. Typically, we do not have enough data in many areas to check unit-level modeling assumptions or to assess unit-level predictions empirically; and there is no ground truth available for checking area-level estimates. Design-based simulation from artificial populations can help with each of these issues, but only if the artificial populations (a) realistically represent the application at hand and (b) are not built using assumptions that could inherently favor one SAE model over another. In this paper, we borrow ideas from random hot deck, approximate Bayesian bootstrap (ABB), and k nearest neighbor (kNN) imputation methods, which are often used for multiple imputation of missing data. We propose a kNN-based approximation to ABB (KBAABB) for a different purpose: generating an artificial population when rich unit-level auxiliary data is available. We introduce diagnostic checks on the process of building the artificial population itself, and we demonstrate how to use such an artificial population for design-based simulation studies to compare and evaluate SAE models, using real data from the Forest Inventory and Analysis (FIA) program of the US Forest Service. We illustrate how such simulation studies may be disseminated and explored interactively through an online R Shiny application

    A Regression Tree Approach using Mathematical Programming

    Get PDF
    Regression analysis is a machine learning approach that aims to accurately predict the value of continuous output variables from certain independent input variables, via automatic estimation of their latent relationship from data. Tree-based regression models are popular in literature due to their flexibility to model higher order non-linearity and great interpretability. Conventionally, regression tree models are trained in a two-stage procedure, i.e. recursive binary partitioning is employed to produce a tree structure, followed by a pruning process of removing insignificant leaves, with the possibility of assigning multivariate functions to terminal leaves to improve generalisation. This work introduces a novel methodology of node partitioning which, in a single optimisation model, simultaneously performs the two tasks of identifying the break-point of a binary split and assignment of multivariate functions to either leaf, thus leading to an efficient regression tree model. Using six real world benchmark problems, we demonstrate that the proposed method consistently outperforms a number of state-of-the-art regression tree models and methods based on other techniques, with an average improvement of 7–60% on the mean absolute errors (MAE) of the predictions
    • …
    corecore