39 research outputs found

    Aitchison's Compositional Data Analysis 40 Years On: A Reappraisal

    Full text link
    The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the principles on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly -- quasi-coherence is sufficient, that is near enough to being coherent for all practical purposes. This opens up the field to using simpler data transformations, such as power transformations, that permit zero values in the data. The additional principle of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these are complicated and problematic to interpret, involving ratios of geometric means. If this principle is regarded as important in certain analytical contexts, for example unsupervised learning, it can be relaxed by showing that regular pairwise logratios, as well as the alternative quasi-coherent transformations, can also be quasi-isometric, meaning they are close enough to exact isometry for all practical purposes. It is concluded that the isometric and related logratio transformations such as pivot logratios are not a prerequisite for good practice, although many authors insist on their obligatory use. This conclusion is fully supported here by case studies in geochemistry and in genomics, where the good performance is demonstrated of pairwise logratios, as originally proposed by Aitchison, or Box-Cox power transforms of the original compositions where no zero replacements are necessary.Comment: 26 pages, 18 figures, plus Supplementary Material. This is a complete revision of the first version of this paper, placing the geochemical example upfront and adding a large section on CoDA of wide matrice

    Business and the Risk of Crime in China

    Get PDF
    The book analyses the results of a large scale victimisation survey that was conducted in 2005-06 with businesses in Hong Kong, Shanghai, Shenzhen and Xi’an. It also provides comprehensive background materials on crime and the criminal justice system in China. The survey, which measured common and non-conventional crime such as fraud, IP theft and corruption, is important because few crime victim surveys have been conducted with Chinese populations and it provides an understanding of some dimensions of crime in non-western societies. In addition, China is one of the fastest-growing economies in the world and it attracts a great amount of foreign investment; however, corruption and economic crimes are perceived by some investors as significant obstacles to good business practices. Key policy implications of the survey are discussed

    Associations between childhood maltreatment and psychiatric disorders: analysis from electronic health records in Hong Kong

    Get PDF
    There has been a lack of high-quality evidence concerning the association between childhood maltreatment and psychiatric diagnoses particularly for Axis II disorders. This study aimed to examine the association between childhood maltreatment exposure and Axis I and Axis II psychiatry disorders using electronic health records. In this study, the exposed group (n = 7473) comprised patients aged 0 to 19 years with a first-time record of maltreatment episode between January 1, 2001 and December 31, 2010, whereas the unexposed group (n = 26,834) comprised individuals of the same gender and age who were admitted into the same hospital in the same calendar year and month but had no records of maltreatment in the Hong Kong Clinical Data Analysis and Reporting System (CDARS). Data on their psychiatric diagnoses recorded from the date of admission to January 31, 2019 were extracted. A Cox proportional hazard regression model was fitted to estimate the hazard ratio (HR, plus 95% CIs) between childhood maltreatment exposure and psychiatric diagnoses, adjusting for age at index visit, sex, and government welfare recipient status. Results showed that childhood maltreatment exposure was significantly associated with subsequent diagnosis of conduct disorder/ oppositional defiant disorder (adjusted HR, 10.99 [95% CI 6.36, 19.01]), attention deficit hyperactivity disorder (ADHD) (7.28 [5.49, 9.65]), and personality disorders (5.36 [3.78, 7.59]). The risk of psychiatric disorders following childhood maltreatment did not vary by history of childhood sexual abuse, age at maltreatment exposure, and gender. Individuals with a history of childhood maltreatment are vulnerable to psychiatric disorders. Findings support the provision of integrated care within the primary health care setting to address the long-term medical and psychosocial needs of individuals with a history of childhood maltreatment

    Association of early nutritional status With child development in the Asia Pacific region

    Get PDF
    Importance: Stunting was used as a proxy for underdevelopment in early childhood in previous studies, but the associations between child development and other growth and body composition parameters were rarely studied. Objective: To estimate the association between malnutrition and early child development (ECD) at an individual level. Design, Setting, and Participants: This population-based, cross-sectional study used data from the East Asia Pacific Early Child Development Scales, a population-representative survey of children aged 3 to 5 years old, conducted in 2012 to 2014 in communities in Cambodia, China, Mongolia, Papua New Guinea, and Vanuatu. Data analysis was performed from November 2019 to April 2021. Exposures: Stunting (height-for-age [HFA] z score less than −2), wasting (weight-for-height z score less than −2), overweight (weight-for-height z score greater than 2), body mass index (BMI)–for-age z score, and body fat proportion based on existing growth standard and formula. Main Outcomes and Measures: ECD directly assessed using the validated East Asia–Pacific ECD Scales. Results: A total of 7108 children (3547 girls; mean [SD], age 4.48 [0.84] years) were included in this study. The prevalence of stunting was 27.1% (range across countries, 1.2%-55.0%), that of wasting was 13.7% (range, 5.4%-35.9%), and that of overweight was 15.9% (range, 2.2%-53.7%). Adjusted for country variations, age, sex, urbanicity, family socioeconomic status, and body fat proportion, ECD was linearly associated with HFA (β, 1.57; 95% CI, 1.35-1.80) and BMI-for-age (β, 0.64; 95% CI, 0.45-0.82). After adjustment for BMI and height, better ECD was associated with low body fat proportion (β, 0.93; 95% CI, 0.45-1.42). The association of HFA was more pronounced in Southeast Asia and the Pacific region than in East Asia, and the association of fat proportion was specific to children living in urban environments. Conclusions and Relevance: HFA, BMI-for-age, and body fat proportion were independently associated with ECD, and these findings suggest that future studies should consider using these parameters to estimate the prevalence of child underdevelopment; nutritional trials should examine to what extent the associations are causal

    Modelling structural zeros in compositional data

    No full text
    This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)Geologische Vereinigung; Universitat de Barcelona, Equip de Recerca Arqueomètrica; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Patronat de l’Escola Politècnica Superior de la Universitat de Girona; Fundació privada: Girona, Universitat i Futu

    Convex linear combination processes for compositions

    No full text
    Aitchison and Bacon-Shone (1999) considered convex linear combinations ofcompositions. In other words, they investigated compositions of compositions, wherethe mixing composition follows a logistic Normal distribution (or a perturbationprocess) and the compositions being mixed follow a logistic Normal distribution. Inthis paper, I investigate the extension to situations where the mixing compositionvaries with a number of dimensions. Examples would be where the mixingproportions vary with time or distance or a combination of the two. Practicalsituations include a river where the mixing proportions vary along the river, or acrossa lake and possibly with a time trend. This is illustrated with a dataset similar to thatused in the Aitchison and Bacon-Shone paper, which looked at how pollution in aloch depended on the pollution in the three rivers that feed the loch. Here, I explicitlymodel the variation in the linear combination across the loch, assuming that the meanof the logistic Normal distribution depends on the river flows and relative distancefrom the source originsGeologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Patronat de l’Escola Politècnica Superior de la Universitat de Girona; Fundació privada: Girona, Universitat i Futur; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Consell Social de la Universitat de Girona; Ministerio de Ciencia i Tecnología.ca

    Discrete and continuous compositions

    No full text
    This paper examines a dataset which is modeled well by thePoisson-Log Normal process and by this process mixed with LogNormal data, which are both turned into compositions. Thisgenerates compositional data that has zeros without any need forconditional models or assuming that there is missing or censoreddata that needs adjustment. It also enables us to model dependenceon covariates and within the compositionGeologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Generalitat de Catalunya, Departament d’Innovació, Universitats i Recerca; Ministerio de Educación y Ciencia; Ingenio 2010

    Introduction to Quantitative Research Methods

    Get PDF
    This coursebook is designed to include sufficient statistical concepts to allow students to make good sense of the statistical figures and numbers that they are exposed to in daily life. After reading the book, students should understand the basics of quantitative research and be able to critically review simple statistical analysis. The book is intended to be self-contained but does not include mathematical proofs

    Compositional Data, Bayesian Inference and the Modeling Process

    No full text
    Statistical modeling in practice encompasses both the exploratory process, which is an inductive scientific approach and the confirmatory modeling process, which uses the deductive scientific approach. This paper will focus primarily on the confirmatory modeling process. As the great applied statistician George Box, has famously said “all models are wrong, but some are useful”. My version would be “all models are wrong, but some are essential for progress”! While John Aitchison has changed the world of compositional data analysis, the world of Bayesian statistics has also changed dramatically thanks to the Gibbs sampler, which allows Bayesian analysis of complex non-linear models and particularly random effects models. The beauty of Bayesian analysis is that it allows us to build models hierarchically to incorporate all our knowledge about the structure of the data generation process, not just about the parameters. In practice, we often know quite a lot about how data might have been generated and that knowledge can make a dramatic difference in how precise our inference can be. The paper examines the use of Bayesian inference in statistical models that include a compositional process. It discusses the insights that may be obtained from this approach, including as examples: distinguishing between structural and censored zeros, examining the choice between compositional or multivariate covariates, identifying the number of end-members in a composition and identifying changepoints in compositional processe

    Charting multilingualism : language censuses and language surveys in Hong Kong

    No full text
    The chapter reviews census and language survey data to present a comprehensive, longitudinal survey of the complex pattern of multilingualism and language diversity in Hong Kong over the twentieth century
    corecore