26 research outputs found
GeoCoDA: Recognizing and Validating Structural Processes in Geochemical Data. A Workflow on Compositional Data Analysis in Lithogeochemistry
Geochemical data are compositional in nature and are subject to the problems
typically associated with data that are restricted to the real non-negative
number space with constant-sum constraint, that is, the simplex. Geochemistry
can be considered a proxy for mineralogy, comprised of atomically ordered
structures that define the placement and abundance of elements in the mineral
lattice structure. Based on the innovative contributions of John Aitchison, who
introduced the logratio transformation into compositional data analysis, this
contribution provides a systematic workflow for assessing geochemical data in
an efficient way, such that significant geochemical (mineralogical) processes
can be recognized and validated. The results of a workflow, called GeoCoDA and
presented here in the form of a tutorial, enables the recognition of processes
from which models can be constructed based on the associations of elements that
reflect mineralogy. Both the original compositional values and their
transformation to logratios are considered. These models can reflect rock
forming processes, metamorphic, alteration and ore mineralization. Moreover,
machine learning methods, both unsupervised and supervised, applied to an
optimized set of subcompositions of the data, provide a systematic, accurate,
efficient and defensible approach to geochemical data analysis. The workflow is
illustrated on lithogeochemical data from exploration of the Star kimberlite,
consisting of a series of eruptions with five recognized phases.Comment: 38 pages, 18 figures (including Supplementary Material
Aitchison's Compositional Data Analysis 40 Years On: A Reappraisal
The development of John Aitchison's approach to compositional data analysis
is followed since his paper read to the Royal Statistical Society in 1982.
Aitchison's logratio approach, which was proposed to solve the problematic
aspects of working with data with a fixed sum constraint, is summarized and
reappraised. It is maintained that the principles on which this approach was
originally built, the main one being subcompositional coherence, are not
required to be satisfied exactly -- quasi-coherence is sufficient, that is near
enough to being coherent for all practical purposes. This opens up the field to
using simpler data transformations, such as power transformations, that permit
zero values in the data. The additional principle of exact isometry, which was
subsequently introduced and not in Aitchison's original conception, imposed the
use of isometric logratio transformations, but these are complicated and
problematic to interpret, involving ratios of geometric means. If this
principle is regarded as important in certain analytical contexts, for example
unsupervised learning, it can be relaxed by showing that regular pairwise
logratios, as well as the alternative quasi-coherent transformations, can also
be quasi-isometric, meaning they are close enough to exact isometry for all
practical purposes. It is concluded that the isometric and related logratio
transformations such as pivot logratios are not a prerequisite for good
practice, although many authors insist on their obligatory use. This conclusion
is fully supported here by case studies in geochemistry and in genomics, where
the good performance is demonstrated of pairwise logratios, as originally
proposed by Aitchison, or Box-Cox power transforms of the original compositions
where no zero replacements are necessary.Comment: 26 pages, 18 figures, plus Supplementary Material. This is a complete
revision of the first version of this paper, placing the geochemical example
upfront and adding a large section on CoDA of wide matrice
Surficial and deep earth material prediction from geochemical compositions
Prediction of true classes of surficial and deep earth materials using multivariate spatial data is a common challenge for geoscience modelers. Most geological processes leave a footprint that can be explored by geochemical data analysis. These footprints are normally complex statistical and spatial patterns buried deep in the high-dimensional compositional space. This paper proposes a spatial predictive model for classification of surficial and deep earth materials derived from the geochemical composition of surface regolith. The model is based on a combination of geostatistical simulation and machine learning approaches. A random forest predictive model is trained, and features are ranked based on their contribution to the predictive model. To generate potential and uncertainty maps, compositional data are simulated at unsampled locations via a chain of transformations (isometric log-ratio transformation followed by the flow anamorphosis) and geostatistical simulation. The simulated results are subsequently back-transformed to the original compositional space. The trained predictive model is used to estimate the probability of classes for simulated compositions. The proposed approach is illustrated through two case studies. In the first case study, the major crustal blocks of the Australian continent are predicted from the surface regolith geochemistry of the National Geochemical Survey of Australia project. The aim of the second case study is to discover the superficial deposits (peat) from the regional-scale soil geochemical data of the Tellus Project. The accuracy of the results in these two case studies confirms the usefulness of the proposed method for geological class prediction and geological process discovery
Surficial and Deep Earth Material Prediction from Geochemical Compositions
Prediction of true classes of surficial and deep earth materials using multivariate spatial data is a common challenge for geoscience modelers. Most geological processes leave a footprint that can be explored by geochemical data analysis. These footprints are normally complex statistical and spatial patterns buried deep in the high-dimensional compositional space. This paper proposes a spatial predictive model for classification of surficial and deep earth materials derived from the geochemical composition of surface regolith. The model is based on a combination of geostatistical simulation and machine learning approaches. A random forest predictive model is trained, and features are ranked based on their contribution to the predictive model. To generate potential and uncertainty maps, compositional data are simulated at unsampled locations via a chain of transformations (isometric log-ratio transformation followed by the flow anamorphosis) and geostatistical simulation. The simulated results are subsequently back-transformed to the original compositional space. The trained predictive model is used to estimate the probability of classes for simulated compositions. The proposed approach is illustrated through two case studies. In the first case study, the major crustal blocks of the Australian continent are predicted from the surface regolith geochemistry of the National Geochemical Survey of Australia project. The aim of the second case study is to discover the superficial deposits (peat) from the regional-scale soil geochemical data of the Tellus Project. The accuracy of the results in these two case studies confirms the usefulness of the proposed method for geological class prediction and geological process discovery.The first three authors acknowledge financial
support through DAAD-UA grant CodaBlock
CoEstimation. The National Geochemical Survey of
Australia project was part of the Australian
Governments Onshore Energy Security Program
2006–2011, from which funding support is gratefully
acknowledged. The Tellus Project was carried out by GSNI
and funded by The Department for Enterprise,
Trade and Investment (DETINI) and The Rural
Development Programme through the Northern
Ireland Programme for Building Sustainable Prosperity
Environmental monitoring and peat assessment using a multivariate analysis of regional-scale geochemical data
A compositional multivariate approach was used to analyse regional-scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey of Northern Ireland. The multi-element total concentration data presented comprise X-ray fluorescence (XRF) analyses of 6862 rural soil samples collected at 20-cm depth on a non-aligned grid at one site per 2km2 role= presentation \u3e2km2. Censored data were imputed using published detection limits. Each soil sample site was assigned to the regional geology map, resulting in spatial data for one categorical variable and 35 continuous variables comprised of individual and amalgamated elements. This paper examines the extent to which soil geochemistry reflects the underlying geology or superficial deposits. Since the soil geochemistry is compositional, log-ratios were computed to adequately evaluate the data using multivariate statistical methods. Principal component analysis (PCA) and minimum/maximum autocorrelation factors (MAF) were used to carry out linear discriminant analysis (LDA) as a means to discover and validate processes related to the geologic assemblages coded as age bracket. Peat cover was introduced as an additional category to measure the ability to predict and monitor fragile ecosystems. Overall prediction accuracies for the age bracket categories were 68.4 % using PCA and 74.7 % using MAF. With inclusion of peat, the accuracy for LDA classification decreased to 65.0 and 69.9 %, respectively. The increase in misclassification due to the presence of peat may reflect degradation of peat-covered areas since the creation of superficial deposit classification
Practical Aspects of Compositional Data Analysis using Regional Geochemical Survey Data
Government geological surveys and mineral exploration companies collect large amounts of geochemical
data, which are used in search for mineral commodities or for determining environmental
disturbances. These surveys consist of many thousands of samples (observations) with as many as
50 elements determined for each. Because the nature of the data is compositional, they must be
treated according the protocols established by John Aitchison and others. This contribution details
an approach based on the application of the alr, clr and ilr transforms for process discovery and validation.
Issues of around the treatment of zeros and/or missing values are complicated due to the
stoichiometric nature of the data. Case studies are presented where the use of logratio transforms
and the estimation of replacement values for missing data are considered in the context of stoichiometric
constraint
The isometric logratio transformation in compositional data analysis: a practical evaluation
The isometric logratio transformation has been promoted by several authors as the theoretically correct way to contrast groups of parts in a compositional data set. But this transformation has only attractive theoretical properties, the practical benefits of which are questionable. A simple counter-example demonstrates the dangers of using the isometric logratio as a univariate response variable in practice. The study is then extended to a real geochemical data set, where the practical value of isometric logratios is further investigated. When groups of parts are required in practical applications, preferably based on substantive knowledge, it is demonstrated that logratios of amalgamations serve as a simpler, more intuitive and more interpretable alternative to isometric logratios. A reduced set of simple logratios of pairs of parts, possibly involving prescribed amalgamations, is adequate in accounting for the variance in a compositional data set, and highlights which parts are driving the data structure