9 research outputs found

    Towards geostatistical learning for the geosciences: A case study in improving the spatial awareness of spectral clustering

    Get PDF
    The particularities of geosystems and geoscience data must be understood before any development or implementation of statistical learning algorithms. Without such knowledge, the predictions and inferences may not be accurate and physically consistent. Accuracy, transparency and interpretability, credibility, and physical realism are minimum criteria for statistical learning algorithms when applied to the geosciences. This study briefly reviews several characteristics of geoscience data and challenges for novel statistical learning algorithms. A novel spatial spectral clustering approach is introduced to illustrate how statistical learners can be adapted for modelling geoscience data. The spatial awareness and physical realism of the spectral clustering are improved by utilising a dissimilarity matrix based on nonparametric higher-order spatial statistics. The proposed model-free technique can identify meaningful spatial clusters (i.e. meaningful geographical subregions) from multivariate spatial data at different scales without the need to define a model of co-dependence. Several mixed (e.g. continuous and categorical) variables can be used as inputs to the proposed clustering technique. The proposed technique is illustrated using synthetic and real mining datasets. The results of the case studies confirm the usefulness of the proposed method for modelling spatial data

    On the role of statistics in the era of big data: A call for a debate

    Get PDF
    While discussing the plenary talk of Dunson (2016) at the 48th Scientific Meeting of the Italian Statistical Society, I formulated a few general questions on the role of statistics in the era of big data which stimulated an interesting debate. They are reported here with the aim of engaging a larger audience on an issue which promises to change radically our discipline and, more generally, science as we know it. But is it so

    Statistical analysis of complex and spatially dependent data: A review of Object Oriented Spatial Statistics

    Get PDF
    We review recent advances in Object Oriented Spatial Statistics, a system of ideas, algorithms and methods that allows the analysis of high dimensional and complex data when their spatial dependence is an important issue. At the intersection of different disciplines – including mathematics, statistics, computer science and engineering – Object Oriented Spatial Statistics provides the right perspective to address key problems in varied contexts, from Earth and life sciences to urban planning. We illustrate a few paradigmatic methods applied to problems of prediction, classification and smoothing, giving emphasis to the key ideas Object Oriented Spatial Statistics relies upon

    Variograms for kriging and clustering of spatial functional data with phase variation

    Get PDF
    Spatial, amplitude and phase variations in spatial functional data are confounded. Conclusions from the popular functional trace-variogram, which quantifies spatial variation, can be misleading when analyzing misaligned functional data with phase variation. To remedy this, we describe a framework that extends amplitude-phase separation methods in functional data to the spatial setting, with a view towards performing clustering and spatial prediction. We propose a decomposition of the trace-variogram into amplitude and phase components, and quantify how spatial correlations between functional observations manifest in their respective amplitude and phase. This enables us to generate separate amplitude and phase clustering methods for spatial functional data, and develop a novel spatial functional interpolant at unobserved locations based on combining separate amplitude and phase predictions. Through simulations and real data analyses, we demonstrate advantages of our approach when compared to standard ones that ignore phase variation, through more accurate predictions and more interpretable clustering results

    Bagging Voronoi classifiers for clustering spatial functional data

    No full text
    We propose a bagging strategy based on random Voronoi tessellations for the exploration of geo- referenced functional data, suitable for different purposes (e.g., classification, regression, dimensional reduction, ...). Urged by an application to environmental data contained in the Surface Solar Energy database, we focus in particular on the problem of clustering functional data indexed by the sites of a spatial finite lattice. We thus illustrate our strategy by implementing a specific algorithm whose rationale is to (i) replace the original data set with a reduced one, composed by local representatives of neighbor- hoods covering the entire investigated area; (ii) analyze the local representatives; (iii) repeat the previous analysis many times for different reduced data sets associated to randomly generated different sets of neighborhoods, thus obtaining many different weak formulations of the analysis; (iv) finally, bag together the weak analyses to obtain a conclusive strong analysis. Through an extensive simulation study, we show that this new procedure – which does not require an explicit model for spatial dependence – is statistically and computationally efficient

    Bagging Voronoi-classifiers for clustering spatial functional data

    No full text
    We consider the problem of clustering functional data indexed by the sites of a spatial finite lattice, motivated by the analysis of the environmental data contained in the Surface Solar Energy database (NASA 2010). To this purpose, we exploit the bagging Voronoi-classifiers algorithm introduced in Secchi et al. (2012), based on repeatedly partitioning the investigated area in random neighborhoods, and on replacing the original data set with a reduced one, composed by local representatives of neighboring data. In this way we obtain many different weak formulations of the analysis, whose results are then bagged together to give a conclusive strong analysis
    corecore