1,506 research outputs found

    Unsupervised classification of multivariate geostatistical data: Two algorithms

    No full text
    International audienceWith the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset

    Towards geostatistical learning for the geosciences: A case study in improving the spatial awareness of spectral clustering

    Get PDF
    The particularities of geosystems and geoscience data must be understood before any development or implementation of statistical learning algorithms. Without such knowledge, the predictions and inferences may not be accurate and physically consistent. Accuracy, transparency and interpretability, credibility, and physical realism are minimum criteria for statistical learning algorithms when applied to the geosciences. This study briefly reviews several characteristics of geoscience data and challenges for novel statistical learning algorithms. A novel spatial spectral clustering approach is introduced to illustrate how statistical learners can be adapted for modelling geoscience data. The spatial awareness and physical realism of the spectral clustering are improved by utilising a dissimilarity matrix based on nonparametric higher-order spatial statistics. The proposed model-free technique can identify meaningful spatial clusters (i.e. meaningful geographical subregions) from multivariate spatial data at different scales without the need to define a model of co-dependence. Several mixed (e.g. continuous and categorical) variables can be used as inputs to the proposed clustering technique. The proposed technique is illustrated using synthetic and real mining datasets. The results of the case studies confirm the usefulness of the proposed method for modelling spatial data

    Geostatistical simulation of two-dimensional fields of raindrop size distributions at the meso-¿ scale

    Get PDF
    The large variability of the raindrop size distribution (DSD) in space and time must be taken into account to improve remote sensing of precipitation. The ability to simulate a large number of 2-D fields of DSDs sharing the same statistical properties provides a very useful simulation framework that nicely complements experimental approaches based on DSD ground measurements. These simulations can be used to investigate radar beam propagation through rain and to evaluate different radar retrieval techniques. The proposed approach uses geostatistical methods to provide structural analysis and stochastic simulation of DSD fields. First, the DSD is assumed to follow a Gamma distribution with three parameters. As a consequence, 2-D fields of DSDs can be described as a multivariate random function. The parameters are normalized using a Gaussian anamorphosis and simulated by taking advantage of fast Gaussian simulation algorithms. Variograms are used to characterize the spatial structure of the DSD fields. The generated fields have identical spatial structure and are consistent with the observations. Because intermittency cannot be simulated using this technique, the size of the simulation domain is limited to the meso-¿ scale (2-20 km). To assess the proposed approach, the method is applied to data collected during intense Mediterranean rainfall. Taylor's hypothesis is invoked to convert time series into 1-D range profiles. The anisotropy of the fields is derived from radar measurements. Simulated and measured reflectivity fields are in good agreement with respect to the mean, the standard deviation, and the spatial structure, demonstrating the promising potential of the proposed stochastic model of DSD field

    Remote sensing studies and morphotectonic investigations in an arid rift setting, Baja California, Mexico

    Get PDF
    The Gulf of California and its surrounding land areas provide a classic example of recently rifted continental lithosphere. The recent tectonic history of eastern Baja California has been dominated by oblique rifting that began at ~12 Ma. Thus, extensional tectonics, bedrock lithology, long-term climatic changes, and evolving surface processes have controlled the tectono-geomorphological evolution of the eastern part of the peninsula since that time. In this study, digital elevation data from the Shuttle Radar Topography Mission (SRTM) from Baja California were corrected and enhanced by replacing artifacts with real values that were derived using a series of geostatistical techniques. The next step was to generate accurate thematic geologic maps with high resolution (15-m) for the entire eastern coast of Baja California. The main approach that we used to clearly represent all the lithological units in the investigated area was objectoriented classification based on fuzzy logic theory. The area of study was divided into twenty-two blocks; each was classified independently on the basis of its own defined membership function. Overall accuracies were 89.6 %, indicating that this approach was highly recommended over the most conventional classification techniques. The third step of this study was to assess the factors that affected the geomorphologic development along the eastern side of Baja California, where thirty-four drainage basins were extracted from a 15-m-resolution absolute digital elevation model (DEM). Thirty morphometric parameters were extracted; these parameters were then reduced using principal component analysis (PCA). Cluster analysis classification defined four major groups of basins. We extracted stream length-gradient indices, which highlight the differential rock uplift that has occurred along fault escarpments bounding the basins. Also, steepness and concavity indices were extracted for bedrock channels within the thirty-four drainage basins. The results were highly correlated with stream length-gradient indices for each basin. Nine basins, exhibiting steepness index values greater than 0.07, indicated a strong tectonic signature and possible higher uplift rates in these basins. Further, our results indicated that drainage basins in the eastern rift province of Baja California could be classified according to the dominant geomorphologic controlling factors (i.e., faultcontrolled, lithology-controlled, or hybrid basins)

    A robust hierarchical clustering for georeferenced data

    Get PDF
    The detection of spatially contiguous clusters is a relevant task in geostatistics since near located observations might have similar features than distant ones. Spatially compact groups can also improve clustering results interpretation according to the different detected subregions. In this paper, we propose a robust metric approach to neutralize the effect of possible outliers, i.e. an exponential transformation of a dissimilarity measure between each pair of locations based on non-parametric kernel estimator of the direct and cross variograms (Fouedjio, 2016) and on a different bandwidth identification, suitable for agglomerative hierarchical clustering techniques applied to data indexed by geographical coordinates. Simulation results are very promising showing very good performances of our proposed metric with respect to the baseline ones. Finally, the new clustering approach is applied to two real-word data sets, both giving locations and top soil heavy metal concentrations
    • …