1,506 research outputs found
Unsupervised classification of multivariate geostatistical data: Two algorithms
International audienceWith the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset
Towards geostatistical learning for the geosciences: A case study in improving the spatial awareness of spectral clustering
The particularities of geosystems and geoscience data must be understood before any development or implementation of statistical learning algorithms. Without such knowledge, the predictions and inferences may not be accurate and physically consistent. Accuracy, transparency and interpretability, credibility, and physical realism are minimum criteria for statistical learning algorithms when applied to the geosciences. This study briefly reviews several characteristics of geoscience data and challenges for novel statistical learning algorithms. A novel spatial spectral clustering approach is introduced to illustrate how statistical learners can be adapted for modelling geoscience data. The spatial awareness and physical realism of the spectral clustering are improved by utilising a dissimilarity matrix based on nonparametric higher-order spatial statistics. The proposed model-free technique can identify meaningful spatial clusters (i.e. meaningful geographical subregions) from multivariate spatial data at different scales without the need to define a model of co-dependence. Several mixed (e.g. continuous and categorical) variables can be used as inputs to the proposed clustering technique. The proposed technique is illustrated using synthetic and real mining datasets. The results of the case studies confirm the usefulness of the proposed method for modelling spatial data
Geostatistical simulation of two-dimensional fields of raindrop size distributions at the meso-¿ scale
The large variability of the raindrop size distribution (DSD) in space and time must be taken into account to improve remote sensing of precipitation. The ability to simulate a large number of 2-D fields of DSDs sharing the same statistical properties provides a very useful simulation framework that nicely complements experimental approaches based on DSD ground measurements. These simulations can be used to investigate radar beam propagation through rain and to evaluate different radar retrieval techniques. The proposed approach uses geostatistical methods to provide structural analysis and stochastic simulation of DSD fields. First, the DSD is assumed to follow a Gamma distribution with three parameters. As a consequence, 2-D fields of DSDs can be described as a multivariate random function. The parameters are normalized using a Gaussian anamorphosis and simulated by taking advantage of fast Gaussian simulation algorithms. Variograms are used to characterize the spatial structure of the DSD fields. The generated fields have identical spatial structure and are consistent with the observations. Because intermittency cannot be simulated using this technique, the size of the simulation domain is limited to the meso-¿ scale (2-20 km). To assess the proposed approach, the method is applied to data collected during intense Mediterranean rainfall. Taylor's hypothesis is invoked to convert time series into 1-D range profiles. The anisotropy of the fields is derived from radar measurements. Simulated and measured reflectivity fields are in good agreement with respect to the mean, the standard deviation, and the spatial structure, demonstrating the promising potential of the proposed stochastic model of DSD field
Remote sensing studies and morphotectonic investigations in an arid rift setting, Baja California, Mexico
The Gulf of California and its surrounding land areas provide a classic example
of recently rifted continental lithosphere. The recent tectonic history of eastern Baja
California has been dominated by oblique rifting that began at ~12 Ma. Thus,
extensional tectonics, bedrock lithology, long-term climatic changes, and evolving
surface processes have controlled the tectono-geomorphological evolution of the eastern
part of the peninsula since that time. In this study, digital elevation data from the Shuttle
Radar Topography Mission (SRTM) from Baja California were corrected and enhanced
by replacing artifacts with real values that were derived using a series of geostatistical
techniques. The next step was to generate accurate thematic geologic maps with high
resolution (15-m) for the entire eastern coast of Baja California. The main approach that
we used to clearly represent all the lithological units in the investigated area was objectoriented
classification based on fuzzy logic theory. The area of study was divided into
twenty-two blocks; each was classified independently on the basis of its own defined
membership function. Overall accuracies were 89.6 %, indicating that this approach was
highly recommended over the most conventional classification techniques. The third step of this study was to assess the factors that affected the
geomorphologic development along the eastern side of Baja California, where thirty-four
drainage basins were extracted from a 15-m-resolution absolute digital elevation model
(DEM). Thirty morphometric parameters were extracted; these parameters were then
reduced using principal component analysis (PCA). Cluster analysis classification
defined four major groups of basins. We extracted stream length-gradient indices, which
highlight the differential rock uplift that has occurred along fault escarpments bounding
the basins. Also, steepness and concavity indices were extracted for bedrock channels
within the thirty-four drainage basins.
The results were highly correlated with stream length-gradient indices for each
basin. Nine basins, exhibiting steepness index values greater than 0.07, indicated a
strong tectonic signature and possible higher uplift rates in these basins. Further, our
results indicated that drainage basins in the eastern rift province of Baja California could
be classified according to the dominant geomorphologic controlling factors (i.e., faultcontrolled,
lithology-controlled, or hybrid basins)
Development of an unsupervised remote sensing methodology of detect surface leakage from terrestrial CO2 storage sites
Imperial Users onl
A robust hierarchical clustering for georeferenced data
The detection of spatially contiguous clusters is a relevant task in geostatistics since near located observations might have similar features than distant ones. Spatially compact groups can also improve clustering results interpretation according to the different detected subregions. In this paper, we propose a robust metric approach to neutralize the effect of possible outliers, i.e. an exponential transformation of a dissimilarity measure between each pair of locations based on non-parametric kernel estimator of the direct and cross variograms (Fouedjio, 2016) and on a different bandwidth identification, suitable for agglomerative hierarchical clustering techniques applied to data indexed by geographical coordinates. Simulation results are very promising showing very good performances of our proposed metric with respect to the baseline ones. Finally, the new clustering approach is applied to two real-word data sets, both giving locations and top soil heavy metal concentrations
- …