43 research outputs found

    An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models

    Get PDF
    Zeros in compositional data are classified into “rounded” zeros and “essential” zeros. The rounded zero corresponds to a small proportion or below detection limit value while the essential zero is an indication of the complete absence of the component in the composition. Several parametric and non-parametric imputation techniques have been proposed to replace rounded zeros and model the essential zeros under logratio model. In this paper, a new method based on EM algorithm is proposed for replacing rounded zeros. The proposed method is illustrated using simulated data

    Analysis of compositional data using robust methods. The R-package robCompositons

    Get PDF
    The free and open-source programming language and software environment R (R Development Core Team, 2010) is currently both, the most widely used and most popular software for statistics and data analysis. In addition, R becomes quite popular as a (programming) language, ranked currently (February 2011) on place 25 at the TIOBE Programming Community Index (e.g., Matlab: 29, SAS: 30, see http://www.tiobe.com). The basic R environment can be downloaded from the comprehensive R archive network (http://cran.rproject.org). R is enhanceable via packages which consist of code and structured standard documentation including code application examples and possible further documents (so called vignettes) showing further applications of the packages. Two contributed packages for compositional data analysis comes with R, version 2.12.1.: the package compositions (van den Boogaart et al., 2010) and the package robCompositions (Templ et al., 2011). Package compositions provides functions for the consistent analysis of compositional data and positive numbers in the way proposed originally by John Aitchison (see van den Boogaart et al., 2010). In addition to the basic functionality and estimation procedures in package compositions, package robCompositions provides tools for a (classical) and robust multivariate statistical analysis of compositional data together with corresponding graphical tools. In addition, several data sets are provided as well as useful utility functions

    Scalar-on-composition regression to evaluate the impact of class composition on educational achievement

    Get PDF
    In den letzten Jahren ist der Einfluss von Gruppenverhalten, der so genannte Peer-Effekt, auf Individuen von Interesse gewesen. Daher wird eine neue Form des Peer-Effekts mit kompositorischen Eigenschaften in Betracht gezogen. Die Verwendung einer Komposition als Peer-Effekt bietet die Möglichkeit zu untersuchen, wie die Verteilung der Schulleistungen in einer Klasse ein einzelnes Schulkind zu späteren Zeitpunkten in seiner Schullaufbahn beeinflusst hat. Zur Analyse der Wirkung von kompositionellen Peer-Effekten werden die Methoden der kompositionellen Datenanalyse eingesetzt. Diese Methoden werden auf den Datensatz des Projekts STAR angewandt, der Informationen über Schüler über ihre gesamte Schullaufbahn enthält. Der Kompositionsterm basiert auf der Verteilung der Testergebnisse zu Beginn des Projekts, als die Schüler im Kindergarten waren. Um den Einfluss zu analysieren, werden Null-Imputationsmethoden und ilr-Transformation verwendet, sodass klassische statistische Modelle angewendet werden können. In einem ersten Schritt werden die Auswirkungen der verwendeten Null-Imputationsmethoden und der Intervallauswahl der kontinuierlichen Variablen untersucht. Anschließend wird der Einfluss des kompositorischen Peer-Effekts, der die Information über die Verteilung der Schüler innerhalb derselben Klasse enthält, analysiert. Diese Analysen zeigen, dass es in der Tat signifikante Auswirkungen auf die späteren Testergebnisse eines Individuums gibt, die auf der Verteilung der Testergebnisse in seiner Klasse beruhen. Je höher der Anteil der Schüler mit besseren Ergebnissen im Kindergarten war, desto mehr sank das individuelle Testergebnis in den folgenden Jahren und umgekehrt. Dieser Effekt ändert sich, wenn die Verteilung der Schüler in der ersten, zweiten oder dritten Klasse als kompositorische Einflussgröße betrachtet wird. In diesem Fall steigt die individuelle Punktzahl in den Folgejahren umso stärker an, je höher der Anteil der Schüler mit hohen Punktzahlen ist.In recent years the influence of group behaviour, the so-called peer effect, on individuals has been of interest. Therefore, a new form of peer effect which is of compositional character is considered. Using a composition as peer effects would offer a possibility to consider how the distribution of the peers educational achievements based on their test scores influences an individual of the cohort while they attend school. To analyse the effect of compositional peer effects, the methods of compositional data analysis are used. These methods are applied on the data set of Project STAR containing information about students throughout their whole school career. The compositional term is based on the distribution of test scores at the beginning of the project, when the students were in kindergarten, within each student's class. To analyse the influence of such terms, zero imputation methods and ilr-transformation are used to apply classical statistical models. In the first step, the impact of the used zero imputation methods and interval selection of the continuous variable are studied. Then the influence of the compositional peer effect, containing the information of the distribution of the students inside the same class, is analysed. These analyses show that there are indeed significant impacts on an individual's subsequent test scores based on the distribution of test scores in their class. The higher the ratio of students with higher scores was in kindergarten, the more the individual test score in the following years decreased and vice versa. However, using the distribution of the students in the first, second or third grade as covariate changes this effect. In this case, the higher the ratio of students scoring high is, the more the individual score increased in the subsequent years

    Transformations for compositional data with zeros with an application to forensic evidence evaluation

    Get PDF
    In forensic science likelihood ratios provide a natural way of computing the value of evidence under competing propositions such as "the compared samples have originated from the same object" (prosecution) and "the compared samples have originated from different objects" (defence). We use a two-level multivariate likelihood ratio model for comparison of forensic glass evidence in the form of elemental composition data under three data transformations: the logratio transformation, a complementary log-log type transformation and a hyperspherical transformation. The performances of the three transformations in the evaluation of evidence are assessed in simulation experiments through use of the proportions of false negatives and false positives

    The compositional meaning of a detection limit

    Get PDF
    Conclusions The chemical interpretation of the detection limit and its stochastic model counterpart has thus different consequences for the statistical analysis than we would expect from the word by word interpretation of “below detection limit” as a concentration below some limit. The state of the art model on BDL compositional analysis is biased. Some ideas are not directly applicable to the true effects of measurement errors near the detection limit. Even the basic principles like subcompositional coherence and the requirement of the independence of the analysis from the total are not fully valid near the detection limit.Peer ReviewedPostprint (published version

    Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns

    Get PDF
    Survival analysis is commonly conducted in medical and public health research to assess the association of an exposure or intervention with a hard end outcome such as mortality. The Cox (proportional hazards) regression model is probably the most popular statistical tool used in this context. However, when the exposure includes compositional covariables (that is, variables representing a relative makeup such as a nutritional or physical activity behaviour composition), some basic assumptions of the Cox regression model and associated significance tests are violated. Compositional variables involve an intrinsic interplay between one another which precludes results and conclusions based on considering them in isolation as is ordinarily done. In this work, we introduce a formulation of the Cox regression model in terms of log-ratio coordinates which suitably deals with the constraints of compositional covariates, facilitates the use of common statistical inference methods, and allows for scientifically meaningful interpretations. We illustrate its practical application to a public health problem: the estimation of the mortality hazard associated with the composition of daily activity behaviour (physical activity, sitting time and sleep) using data from the U.S. National Health and Nutrition Examination Survey (NHANES)

    Artificial neural networks to impute rounded zeros in compositional data

    Get PDF
    Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis. Imputation methods for compositional data are typically applied on additive, centered or isometric log-ratio representations of the data. Generally, methods for compositional data analysis can only be applied to observed positive entries in a data matrix. Therefore one tries to impute missing values or measurements that were below a detection limit. In this paper, a new method for imputing rounded zeros based on artificial neural networks is shown and compared with conventional methods. We are also interested in the question whether for ANNs, a representation of the data in log-ratios for imputation purposes, is relevant. It can be shown, that ANNs are competitive or even performing better when imputing rounded zeros of data sets with moderate size. They deliver better results when data sets are big. Also, we can see that log-ratio transformations within the artificial neural network imputation procedure nevertheless help to improve the results. This proves that the theory of compositional data analysis and the fulfillment of all properties of compositional data analysis is still very important in the age of deep learning

    Artificial neural networks to impute rounded zeros in compositional data

    Get PDF
    Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis. Imputation methods for compositional data are typically applied on additive, centered, or isometric log-ratio representations of the data. Generally, methods for compositional data analysis can only be applied to observed positive entries in a data matrix. Therefore, one tries to impute missing values or measurements that were below a detection limit. In this paper, a new method for imputing rounded zeros based on artificial neural networks is shown and compared with conventional methods. We are also interested in the question whether for ANNs, a representation of the data in log-ratios for imputation purposes is relevant. It can be shown that ANNs are competitive or even performing better when imputing rounded zeros of data sets with moderate size. They deliver better results when data sets are big. Also, we can see that log-ratio transformations within the artificial neural network imputation procedure nevertheless help to improve the results. This proves that the theory of compositional data analysis and the fulfillment of all properties of compositional data analysis is still very important in the age of deep learning

    Comparison of zero replacement strategies for compositional data with large numbers of zeros

    Get PDF
    Modern applications in chemometrics and bioinformatics result in compositional data sets with a high proportion of zeros. An example are microbiome data, where zeros refer to measurements below the detection limit of one count. When building statistical models, it is important that zeros are replaced by sensible values. Different replacement techniques from compositional data analysis are considered and compared by a simulation study and examples. The comparison also includes a recently proposed method (Templ, 2020) [1] based on deep learning. Detailed insights into the appropriateness of the methods for a problem at hand are provided, and differences in the outcomes of statistical results are discussed
    corecore