83 research outputs found

    Transforming Feature Space to Interpret Machine Learning Models

    Full text link
    Model-agnostic tools for interpreting machine-learning models struggle to summarize the joint effects of strongly dependent features in high-dimensional feature spaces, which play an important role in pattern recognition, for example in remote sensing of landcover. This contribution proposes a novel approach that interprets machine-learning models through the lens of feature space transformations. It can be used to enhance unconditional as well as conditional post-hoc diagnostic tools including partial dependence plots, accumulated local effects plots, or permutation feature importance assessments. While the approach can also be applied to nonlinear transformations, we focus on linear ones, including principal component analysis (PCA) and a partial orthogonalization technique. Structured PCA and diagnostics along paths offer opportunities for representing domain knowledge. The new approach is implemented in the R package `wiml`, which can be combined with existing explainable machine-learning packages. A case study on remote-sensing landcover classification with 46 features is used to demonstrate the potential of the proposed approach for model interpretation by domain experts.Comment: 13 pages, 7 figures, 1 tabl

    Geostatistics without Stationarity Assumptions within Geographical Information Systems

    Get PDF
    The present work deals with two challenging problems of applied geostatistics: (i) Stationarity assumptions often do not hold under real-world conditions. (ii) Geostatistical methods have to be linked with spatial databases in order to be applicable in non-stationary situations. Solutions for both problems are proposed and implemented. (i) A central assumption in geostatistics is the stationarity of the process. However the spatial variability of many natural phenomena heavily depends on the local geology, which is nonstationary in most cases. To deal with this, the concept of process stationarity is replaced by a stationarity of the governing influence relating the local semivariogram and the local geology as stored in a Geographical Information System (GIS). A construction method is used, which can meaningfully incorporate additional spatial information from GIS, e.g. smoothly varying geology in the investigated area, spatially varying anisotropy induced by mountainous morphology, or geological faults interrupting continuity. Least-squares parameter estimation is used for fitting instationary semivariogram models in typical example situations, leading to non-linear optimization problems. Furthermore, a method for semivariogram parameter estimation in the present of linear trend is proposed. (ii) Geostatistical tools that make use of the local geology need direct access to the data stored in the GIS. A link between the presented geostatistical tools and the GIS software ArcView was established. Thus, spatial data such as measured contaminant concentrations, soil properties and morphology can be incorporated in geostatistical analyses. R code that fits instationary semivariogram models and performs kriging was implemented and can be obtained from the author. It is applied to simulated datasets.Die vorliegende Diplomarbeit befasst sich mit zwei wichtigen Problemen der angewandten Geostatistik: (i) Stationaritätsannahmen werden unter realweltlichen Bedingungen oft nicht erfüllt. (ii) Geostatistische Methoden müssen mit räumlichen Datenbanken verbunden werden, um unter nichtstationären Bedingungen anwendbar zu sein. Lösungen für beide Probleme werden vorgeschlagen und implementiert. (i) In der Geostatistik ist die Stationarität des Prozesses eine zentrale Annahme. Die räumliche Variabilität vieler Phänomene in unserer Umwelt hängt jedoch stark von lokalen geologischen Verhältnissen ab, die meist aber instationär sind. Um damit umgehen zu können, wird das Konzept der Stationarität des Prozesses ersetzt durch eine Stationarität des Einflusses der lokalen Geologie, wie sie in einem GIS gespeichert ist, auf das lokale Semivariogramm. Es wird eine Konstruktionsmethode benutzt, die auf sinnvolle Art räumliche Informationen aus dem GIS in Semivariogrammmodelle einbinden kann, etwa sich über das Untersuchungsgebiet gleichmäßig verändernde geologische Verhältnisse, sich räumlich verändernde Anisotropie im Gebirgsrelief oder geologische Störungen, die die Kontinuität unterbrechen. Kleinste-Quadrate Schätzung wird für die Anpassung instationärer Semivariogrammmodelle in typischen Beispielsituationen verwendet. Dies führt zu nichtlinearen Optimierungsproblemen. Des weiteren wird eine Methode der Schätzung von Semivariogrammparametern in Modellen mit linearem Trend vorgestellt. (ii) Geostatistische Werkzeuge, die lokalen geologischen Verh¨ältnisse berücksichtigen, benötigen einen direkten Zugang zu Daten, die in einem GIS gespeichert sind. Im Rahmen dieser Arbeit wurde eine Verbindung zwischen den vorgestellten geostatistischen Werkzeugen und dem GIS Programm ArcView erstellt. Auf diese Weise können räumliche Daten wie etwa Schadstoffkonzentrationen, Bodeneigenschaften oder die Morphologie in geostatistische Analysen einbezogen werden. R-Code, der instationäre Semivariogrammmodelle anpasst und Kriging durchführt, wurde erstellt und auf simulierte Datensätze angewandt. Der Code kann über den Author bezogen werden.researc

    Potential of Space-Borne Hyperspectral Data for Biomass Quantification in an Arid Environment : Advantages and Limitations

    Get PDF
    In spite of considerable efforts to monitor global vegetation, biomass quantification in drylands is still a major challenge due to low spectral resolution and considerable background effects. Hence, this study examines the potential of the space-borne hyperspectral Hyperion sensor compared to the multispectral Landsat OLI sensor in predicting dwarf shrub biomass in an arid region characterized by challenging conditions for satellite-based analysis: The Eastern Pamirs of Tajikistan. We calculated vegetation indices for all available wavelengths of both sensors, correlated these indices with field-mapped biomass while considering the multiple comparison problem, and assessed the predictive performance of single-variable linear models constructed with data from each of the sensors. Results showed an increased performance of the hyperspectral sensor and the particular suitability of indices capturing the short-wave infrared spectral region in dwarf shrub biomass prediction. Performance was considerably poorer in the area with less vegetation cover. Furthermore, spatial transferability of vegetation indices was not feasible in this region, underlining the importance of repeated model building. This study indicates that upcoming space-borne hyperspectral sensors increase the performance of biomass prediction in the world’s arid environments

    mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R

    Full text link
    Spatial and spatiotemporal machine-learning models require a suitable framework for their model assessment, model selection, and hyperparameter tuning, in order to avoid error estimation bias and over-fitting. This contribution reviews the state-of-the-art in spatial and spatiotemporal cross-validation, and introduces the {R} package {mlr3spatiotempcv} as an extension package of the machine-learning framework {mlr3}. Currently various {R} packages implementing different spatiotemporal partitioning strategies exist: {blockCV}, {CAST}, {skmeans} and {sperrorest}. The goal of {mlr3spatiotempcv} is to gather the available spatiotemporal resampling methods in {R} and make them available to users through a simple and common interface. This is made possible by integrating the package directly into the {mlr3} machine-learning framework, which already has support for generic non-spatiotemporal resampling methods such as random partitioning. One advantage is the use of a consistent nomenclature in an overarching machine-learning toolkit instead of a varying package-specific syntax, making it easier for users to choose from a variety of spatiotemporal resampling methods. This package avoids giving recommendations which method to use in practice as this decision depends on the predictive task at hand, the autocorrelation within the data, and the spatial structure of the sampling design or geographic objects being studied.Comment: 35 pages, 15 Figures, 1 Tabl

    Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data

    Get PDF
    Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data

    Flujos de detritos y aluviones históricos en la cuenca del río Blanco (32°55'-33°10' y 69°10'-69°25'), Mendoza

    Get PDF
    La cuenca del río Blanco está sometida a un clima árido con precipitaciones medias anuales de 400 mm. Las lluvias intensas y de corta duración están concentradas en los tres meses de verano actuando frecuentemente como disparadoras de flujos de detritos y aluviones en el Cordón del Plata y generando variaciones en el caudal del río Blanco por aporte extraordinario de estos eventos. Información sobre la ocurrencia de flujos de detritos y aluviones reportados entre 1942 y 2001 en la cuenca del río Blanco fue recopilada y analizada a partir de crónicas periodísticas, datos aportados por los pobladores, registros pluviométricos de la zona y años de anomalías climáticas, tratando de establecer la precipitación mínima requerida para desestabilizar estas pendientes. Durante el período mencionado se registraron 18 flujos de detritos y aluviones, de los cuales los más importantes ocurrieron en 1960, 1967, 1970 y 1982; y produciéndose dos eventos en los años 1954, 1967, 1982 y 1993, sin embargo la intensidad de los eventos de flujos de detritos y aluviones parece incrementarse durante los años de anomalías climáticas correspondientes a la fase cálida del fenómeno ENOS "El Niño".Fil: Paez, Maria Solange. Consejo Nacional de Investigaciones Científicas y Técnicas. Científico Tecnológico Mendoza. Instituto Argentino de Nivología, Glaciología y Ciencias Ambientales; ArgentinaFil: Moreiras, Stella Maris. Consejo Nacional de Investigaciones Científicas y Técnicas. Científico Tecnológico Mendoza. Instituto Argentino de Nivología, Glaciología y Ciencias Ambientales; ArgentinaFil: Brenning, Alexander. University of Waterloo. Departament of Geography and Environmental Management; CanadáFil: Giambiagi, Laura Beatriz. Consejo Nacional de Investigaciones Científicas y Técnicas. Científico Tecnológico Mendoza. Instituto Argentino de Nivología, Glaciología y Ciencias Ambientales; Argentin

    Balance de masa del glaciar cubierto del Pirámide (Chile Central, 33°S) entre 1965 y 2000 aplicando métodos geodésicos

    Get PDF
    Entre las consecuencias más preocupantes del calentamiento climático en Chile central figuran la pérdida de hielo y el retroceso de los glaciares andinos debido a sus efectos sobre la disponibilidad del recurso hídrico. En este contexto, el comportamiento de los glaciares cubiertos ha sido poco estudiado pese a que constituyen el 14% de la superficie glaciar en los Andes de Santiago y un porcentaje mayor de las zonas de ablación donde se concentra la pérdida de hielo. Utilizando métodos geodésicos, este estudio calculó el balance de masa neto del glaciar del Pirámide (4,6 km2), el glaciar cubierto más grande de la cuenca del río Yeso, la cual abastece la ciudad de Santiago de agua potable. Con el fin de obtener la diferencia en altura entre los años 1965 y 2000, se prepararon modelos digitales de elevación (MDE) a través de la interpolación de curvas de nivel de una carta topográfica (Instituto Geográfico Militar, IGM) y la restitución fotogramétrica de fotografías aéreas estereoscópicas (Servicio Aerofotogramétrico de la Fuerza Aérea, SAF). Se alcanzaron precisiones de 10,6 m (IGM) y 7,5 m (SAF) para los MDE y 12,5 m para la diferencia entre los MDE, resultando en un margen de error de 4,0 m al 95% de confianza para la diferencia de altura promedio en 40 puntos fijos. El descenso de altura total de -9,69 m como promedio de la superficie del glaciar equivale a unbalance de masa neto anual de -0,249 m a-1 equivalente en agua (e.a.), o una pérdida de 40 millones de m3 de agua (±40% al 95% de confianza). Ello es alrededor del 23% de la masa de hielo del glaciar del Pirámide y corresponde a una escorrentía promedio potencial del orden de 100 l s-1 durante verano, lo que subraya la importancia de este recurso hídrico no renovable para la disponibilidad de agua en la cuenca.PALABRAS CLAVE: Glaciar Cubierto, Balance de Masa, Cambio Climático, Andes de Santiag

    Do Red Edge and Texture Attributes from High-Resolution Satellite Data Improve Wood Volume Estimation in a Semi-Arid Mountainous Region?

    Get PDF
    Remote sensing-based woody biomass quantification in sparsely-vegetated areas is often limited when using only common broadband vegetation indices as input data for correlation with ground-based measured biomass information. Red edge indices and texture attributes are often suggested as a means to overcome this issue. However, clear recommendations on the suitability of specific proxies to provide accurate biomass information in semi-arid to arid environments are still lacking. This study contributes to the understanding of using multispectral high-resolution satellite data (RapidEye), specifically red edge and texture attributes, to estimate wood volume in semi-arid ecosystems characterized by scarce vegetation. LASSO (Least Absolute Shrinkage and Selection Operator) and random forest were used as predictive models relating in situ-measured aboveground standing wood volume to satellite data. Model performance was evaluated based on cross-validation bias, standard deviation and Root Mean Square Error (RMSE) at the logarithmic and non-logarithmic scales. Both models achieved rather limited performances in wood volume prediction. Nonetheless, model performance increased with red edge indices and texture attributes, which shows that they play an important role in semi-arid regions with sparse vegetation

    Assessing uncertainties in landslide susceptibility predictions in a changing environment (Styrian Basin, Austria)

    Get PDF
    The assessment of uncertainties in landslide susceptibility modelling in a changing environment is an important, yet often neglected, task. In an Austrian case study, we investigated the uncertainty cascade in storylines of landslide susceptibility emerging from climate change and parametric landslide model uncertainty. In June 2009, extreme events of heavy thunderstorms occurred in the Styrian Basin, triggering thousands of landslides. Using a storyline approach, we discovered a generally lower landslide susceptibility for the pre-industrial climate, while for the future climate (2071–2100) a potential increase of 35 % in highly susceptible areas (storyline of much heavier rain) may be compensated for by much drier soils (−45 % areas highly susceptible to landsliding). However, the estimated uncertainties in predictions were generally high. While uncertainties related to within-event internal climate model variability were substantially lower than parametric uncertainties in the landslide susceptibility model (ratio of around 0.25), parametric uncertainties were of the same order as the climate scenario uncertainty for the higher warming levels (+3 and +4 K). We suggest that in future uncertainty assessments, an improved availability of event-based landslide inventories and high-resolution soil and precipitation data will help to reduce parametric uncertainties in landslide susceptibility models used to assess the impacts of climate change on landslide hazard and risk.</p

    Modelling of Hydrological Responses in the Upper Citarum Basin based on the Spatial Plan of West Java Province 2029 and Climate Change

    Get PDF
    In 2010, a spatial plan for West Java Province up to 2029 was published (Perda 22/2010). The purpose of the plan is to guide settlement area development. This study aims to assess the hydrological implications of the Spatial Plan 2029 within the Upper Citarum Basin (UCB) and with regard to climate change. A hydrological simulation based on land-use at the time of the plan (2010) and planned land use was performed using the JAMS/J2000 hydrological model. The settlement area from the spatial plan for 2029 was extracted and then superimposed onto the 2010 land use. Two different land-use scenarios (2010 and 2029) and a climate change scenario (1990-2030) were used for the hydrological simulation, with IPSL-CM4 and UKMO-HadCM3 being the products used for the latter. The simulation results were presented as river discharge and surface runoff. From the simulation results, the annual average of the simulated river discharge is expected to increase by 1.8% up to 2029 compared to the 2010 level. More substantial changes were noticed in the surface runoff, which is projected to increase on average by 8.9% annually due to the expansion of urban areas and agricultural land use. The seasonal analysis showed that river discharge and surface runoff both increased more markedly in the wet season. The study shows the potential of the JAMS/J2000 model to assess the impacts of land-use and climate change on hydrological dynamics
    corecore