26 research outputs found

    GeoCoDA: Recognizing and Validating Structural Processes in Geochemical Data. A Workflow on Compositional Data Analysis in Lithogeochemistry

    Full text link
    Geochemical data are compositional in nature and are subject to the problems typically associated with data that are restricted to the real non-negative number space with constant-sum constraint, that is, the simplex. Geochemistry can be considered a proxy for mineralogy, comprised of atomically ordered structures that define the placement and abundance of elements in the mineral lattice structure. Based on the innovative contributions of John Aitchison, who introduced the logratio transformation into compositional data analysis, this contribution provides a systematic workflow for assessing geochemical data in an efficient way, such that significant geochemical (mineralogical) processes can be recognized and validated. The results of a workflow, called GeoCoDA and presented here in the form of a tutorial, enables the recognition of processes from which models can be constructed based on the associations of elements that reflect mineralogy. Both the original compositional values and their transformation to logratios are considered. These models can reflect rock forming processes, metamorphic, alteration and ore mineralization. Moreover, machine learning methods, both unsupervised and supervised, applied to an optimized set of subcompositions of the data, provide a systematic, accurate, efficient and defensible approach to geochemical data analysis. The workflow is illustrated on lithogeochemical data from exploration of the Star kimberlite, consisting of a series of eruptions with five recognized phases.Comment: 38 pages, 18 figures (including Supplementary Material

    Aitchison's Compositional Data Analysis 40 Years On: A Reappraisal

    Full text link
    The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the principles on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly -- quasi-coherence is sufficient, that is near enough to being coherent for all practical purposes. This opens up the field to using simpler data transformations, such as power transformations, that permit zero values in the data. The additional principle of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these are complicated and problematic to interpret, involving ratios of geometric means. If this principle is regarded as important in certain analytical contexts, for example unsupervised learning, it can be relaxed by showing that regular pairwise logratios, as well as the alternative quasi-coherent transformations, can also be quasi-isometric, meaning they are close enough to exact isometry for all practical purposes. It is concluded that the isometric and related logratio transformations such as pivot logratios are not a prerequisite for good practice, although many authors insist on their obligatory use. This conclusion is fully supported here by case studies in geochemistry and in genomics, where the good performance is demonstrated of pairwise logratios, as originally proposed by Aitchison, or Box-Cox power transforms of the original compositions where no zero replacements are necessary.Comment: 26 pages, 18 figures, plus Supplementary Material. This is a complete revision of the first version of this paper, placing the geochemical example upfront and adding a large section on CoDA of wide matrice

    Surficial and deep earth material prediction from geochemical compositions

    Get PDF
    Prediction of true classes of surficial and deep earth materials using multivariate spatial data is a common challenge for geoscience modelers. Most geological processes leave a footprint that can be explored by geochemical data analysis. These footprints are normally complex statistical and spatial patterns buried deep in the high-dimensional compositional space. This paper proposes a spatial predictive model for classification of surficial and deep earth materials derived from the geochemical composition of surface regolith. The model is based on a combination of geostatistical simulation and machine learning approaches. A random forest predictive model is trained, and features are ranked based on their contribution to the predictive model. To generate potential and uncertainty maps, compositional data are simulated at unsampled locations via a chain of transformations (isometric log-ratio transformation followed by the flow anamorphosis) and geostatistical simulation. The simulated results are subsequently back-transformed to the original compositional space. The trained predictive model is used to estimate the probability of classes for simulated compositions. The proposed approach is illustrated through two case studies. In the first case study, the major crustal blocks of the Australian continent are predicted from the surface regolith geochemistry of the National Geochemical Survey of Australia project. The aim of the second case study is to discover the superficial deposits (peat) from the regional-scale soil geochemical data of the Tellus Project. The accuracy of the results in these two case studies confirms the usefulness of the proposed method for geological class prediction and geological process discovery

    Surficial and Deep Earth Material Prediction from Geochemical Compositions

    Get PDF
    Prediction of true classes of surficial and deep earth materials using multivariate spatial data is a common challenge for geoscience modelers. Most geological processes leave a footprint that can be explored by geochemical data analysis. These footprints are normally complex statistical and spatial patterns buried deep in the high-dimensional compositional space. This paper proposes a spatial predictive model for classification of surficial and deep earth materials derived from the geochemical composition of surface regolith. The model is based on a combination of geostatistical simulation and machine learning approaches. A random forest predictive model is trained, and features are ranked based on their contribution to the predictive model. To generate potential and uncertainty maps, compositional data are simulated at unsampled locations via a chain of transformations (isometric log-ratio transformation followed by the flow anamorphosis) and geostatistical simulation. The simulated results are subsequently back-transformed to the original compositional space. The trained predictive model is used to estimate the probability of classes for simulated compositions. The proposed approach is illustrated through two case studies. In the first case study, the major crustal blocks of the Australian continent are predicted from the surface regolith geochemistry of the National Geochemical Survey of Australia project. The aim of the second case study is to discover the superficial deposits (peat) from the regional-scale soil geochemical data of the Tellus Project. The accuracy of the results in these two case studies confirms the usefulness of the proposed method for geological class prediction and geological process discovery.The first three authors acknowledge financial support through DAAD-UA grant CodaBlock CoEstimation. The National Geochemical Survey of Australia project was part of the Australian Governments Onshore Energy Security Program 2006–2011, from which funding support is gratefully acknowledged. The Tellus Project was carried out by GSNI and funded by The Department for Enterprise, Trade and Investment (DETINI) and The Rural Development Programme through the Northern Ireland Programme for Building Sustainable Prosperity

    Environmental monitoring and peat assessment using a multivariate analysis of regional-scale geochemical data

    Get PDF
    A compositional multivariate approach was used to analyse regional-scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey of Northern Ireland. The multi-element total concentration data presented comprise X-ray fluorescence (XRF) analyses of 6862 rural soil samples collected at 20-cm depth on a non-aligned grid at one site per 2km2 role= presentation \u3e2km2. Censored data were imputed using published detection limits. Each soil sample site was assigned to the regional geology map, resulting in spatial data for one categorical variable and 35 continuous variables comprised of individual and amalgamated elements. This paper examines the extent to which soil geochemistry reflects the underlying geology or superficial deposits. Since the soil geochemistry is compositional, log-ratios were computed to adequately evaluate the data using multivariate statistical methods. Principal component analysis (PCA) and minimum/maximum autocorrelation factors (MAF) were used to carry out linear discriminant analysis (LDA) as a means to discover and validate processes related to the geologic assemblages coded as age bracket. Peat cover was introduced as an additional category to measure the ability to predict and monitor fragile ecosystems. Overall prediction accuracies for the age bracket categories were 68.4 % using PCA and 74.7 % using MAF. With inclusion of peat, the accuracy for LDA classification decreased to 65.0 and 69.9 %, respectively. The increase in misclassification due to the presence of peat may reflect degradation of peat-covered areas since the creation of superficial deposit classification

    Editorial

    No full text

    Practical Aspects of Compositional Data Analysis using Regional Geochemical Survey Data

    No full text
    Government geological surveys and mineral exploration companies collect large amounts of geochemical data, which are used in search for mineral commodities or for determining environmental disturbances. These surveys consist of many thousands of samples (observations) with as many as 50 elements determined for each. Because the nature of the data is compositional, they must be treated according the protocols established by John Aitchison and others. This contribution details an approach based on the application of the alr, clr and ilr transforms for process discovery and validation. Issues of around the treatment of zeros and/or missing values are complicated due to the stoichiometric nature of the data. Case studies are presented where the use of logratio transforms and the estimation of replacement values for missing data are considered in the context of stoichiometric constraint

    The isometric logratio transformation in compositional data analysis: a practical evaluation

    No full text
    The isometric logratio transformation has been promoted by several authors as the theoretically correct way to contrast groups of parts in a compositional data set. But this transformation has only attractive theoretical properties, the practical benefits of which are questionable. A simple counter-example demonstrates the dangers of using the isometric logratio as a univariate response variable in practice. The study is then extended to a real geochemical data set, where the practical value of isometric logratios is further investigated. When groups of parts are required in practical applications, preferably based on substantive knowledge, it is demonstrated that logratios of amalgamations serve as a simpler, more intuitive and more interpretable alternative to isometric logratios. A reduced set of simple logratios of pairs of parts, possibly involving prescribed amalgamations, is adequate in accounting for the variance in a compositional data set, and highlights which parts are driving the data structure
    corecore