39 research outputs found
Aitchison's Compositional Data Analysis 40 Years On: A Reappraisal
The development of John Aitchison's approach to compositional data analysis
is followed since his paper read to the Royal Statistical Society in 1982.
Aitchison's logratio approach, which was proposed to solve the problematic
aspects of working with data with a fixed sum constraint, is summarized and
reappraised. It is maintained that the principles on which this approach was
originally built, the main one being subcompositional coherence, are not
required to be satisfied exactly -- quasi-coherence is sufficient, that is near
enough to being coherent for all practical purposes. This opens up the field to
using simpler data transformations, such as power transformations, that permit
zero values in the data. The additional principle of exact isometry, which was
subsequently introduced and not in Aitchison's original conception, imposed the
use of isometric logratio transformations, but these are complicated and
problematic to interpret, involving ratios of geometric means. If this
principle is regarded as important in certain analytical contexts, for example
unsupervised learning, it can be relaxed by showing that regular pairwise
logratios, as well as the alternative quasi-coherent transformations, can also
be quasi-isometric, meaning they are close enough to exact isometry for all
practical purposes. It is concluded that the isometric and related logratio
transformations such as pivot logratios are not a prerequisite for good
practice, although many authors insist on their obligatory use. This conclusion
is fully supported here by case studies in geochemistry and in genomics, where
the good performance is demonstrated of pairwise logratios, as originally
proposed by Aitchison, or Box-Cox power transforms of the original compositions
where no zero replacements are necessary.Comment: 26 pages, 18 figures, plus Supplementary Material. This is a complete
revision of the first version of this paper, placing the geochemical example
upfront and adding a large section on CoDA of wide matrice
Business and the Risk of Crime in China
The book analyses the results of a large scale victimisation survey that was conducted in 2005-06 with businesses in Hong Kong, Shanghai, Shenzhen and Xiâan. It also provides comprehensive background materials on crime and the criminal justice system in China. The survey, which measured common and non-conventional crime such as fraud, IP theft and corruption, is important because few crime victim surveys have been conducted with Chinese populations and it provides an understanding of some dimensions of crime in non-western societies. In addition, China is one of the fastest-growing economies in the world and it attracts a great amount of foreign investment; however, corruption and economic crimes are perceived by some investors as significant obstacles to good business practices. Key policy implications of the survey are discussed
Associations between childhood maltreatment and psychiatric disorders: analysis from electronic health records in Hong Kong
There has been a lack of high-quality evidence concerning the association between childhood maltreatment and psychiatric diagnoses particularly for Axis II disorders. This study aimed to examine the association between childhood maltreatment exposure and Axis I and Axis II psychiatry disorders using electronic health records. In this study, the exposed group (nâ=â7473) comprised patients aged 0 to 19 years with a first-time record of maltreatment episode between January 1, 2001 and December 31, 2010, whereas the unexposed group (nâ=â26,834) comprised individuals of the same gender and age who were admitted into the same hospital in the same calendar year and month but had no records of maltreatment in the Hong Kong Clinical Data Analysis and Reporting System (CDARS). Data on their psychiatric diagnoses recorded from the date of admission to January 31, 2019 were extracted. A Cox proportional hazard regression model was fitted to estimate the hazard ratio (HR, plus 95% CIs) between childhood maltreatment exposure and psychiatric diagnoses, adjusting for age at index visit, sex, and government welfare recipient status. Results showed that childhood maltreatment exposure was significantly associated with subsequent diagnosis of conduct disorder/ oppositional defiant disorder (adjusted HR, 10.99 [95% CI 6.36, 19.01]), attention deficit hyperactivity disorder (ADHD) (7.28 [5.49, 9.65]), and personality disorders (5.36 [3.78, 7.59]). The risk of psychiatric disorders following childhood maltreatment did not vary by history of childhood sexual abuse, age at maltreatment exposure, and gender. Individuals with a history of childhood maltreatment are vulnerable to psychiatric disorders. Findings support the provision of integrated care within the primary health care setting to address the long-term medical and psychosocial needs of individuals with a history of childhood maltreatment
Association of early nutritional status With child development in the Asia Pacific region
Importance: Stunting was used as a proxy for underdevelopment in early childhood in previous studies, but the associations between child development and other growth and body composition parameters were rarely studied.
Objective: To estimate the association between malnutrition and early child development (ECD) at an individual level.
Design, Setting, and Participants: This population-based, cross-sectional study used data from the East Asia Pacific Early Child Development Scales, a population-representative survey of children aged 3 to 5 years old, conducted in 2012 to 2014 in communities in Cambodia, China, Mongolia, Papua New Guinea, and Vanuatu. Data analysis was performed from November 2019 to April 2021.
Exposures: Stunting (height-for-age [HFA] z score less than â2), wasting (weight-for-height z score less than â2), overweight (weight-for-height z score greater than 2), body mass index (BMI)âfor-age z score, and body fat proportion based on existing growth standard and formula.
Main Outcomes and Measures: ECD directly assessed using the validated East AsiaâPacific ECD Scales.
Results: A total of 7108 children (3547 girls; mean [SD], age 4.48 [0.84] years) were included in this study. The prevalence of stunting was 27.1% (range across countries, 1.2%-55.0%), that of wasting was 13.7% (range, 5.4%-35.9%), and that of overweight was 15.9% (range, 2.2%-53.7%). Adjusted for country variations, age, sex, urbanicity, family socioeconomic status, and body fat proportion, ECD was linearly associated with HFA (β, 1.57; 95% CI, 1.35-1.80) and BMI-for-age (β, 0.64; 95% CI, 0.45-0.82). After adjustment for BMI and height, better ECD was associated with low body fat proportion (β, 0.93; 95% CI, 0.45-1.42). The association of HFA was more pronounced in Southeast Asia and the Pacific region than in East Asia, and the association of fat proportion was specific to children living in urban environments.
Conclusions and Relevance: HFA, BMI-for-age, and body fat proportion were independently associated with ECD, and these findings suggest that future studies should consider using these parameters to estimate the prevalence of child underdevelopment; nutritional trials should examine to what extent the associations are causal
Modelling structural zeros in compositional data
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)Geologische Vereinigung; Universitat de Barcelona, Equip de Recerca Arqueomètrica; Institut dâEstadĂstica de Catalunya; International Association for Mathematical Geology; Patronat de lâEscola Politècnica Superior de la Universitat de Girona; FundaciĂł privada: Girona, Universitat i Futu
Convex linear combination processes for compositions
Aitchison and Bacon-Shone (1999) considered convex linear combinations ofcompositions. In other words, they investigated compositions of compositions, wherethe mixing composition follows a logistic Normal distribution (or a perturbationprocess) and the compositions being mixed follow a logistic Normal distribution. Inthis paper, I investigate the extension to situations where the mixing compositionvaries with a number of dimensions. Examples would be where the mixingproportions vary with time or distance or a combination of the two. Practicalsituations include a river where the mixing proportions vary along the river, or acrossa lake and possibly with a time trend. This is illustrated with a dataset similar to thatused in the Aitchison and Bacon-Shone paper, which looked at how pollution in aloch depended on the pollution in the three rivers that feed the loch. Here, I explicitlymodel the variation in the linear combination across the loch, assuming that the meanof the logistic Normal distribution depends on the river flows and relative distancefrom the source originsGeologische Vereinigung; Institut dâEstadĂstica de Catalunya; International Association for Mathematical Geology; Patronat de lâEscola Politècnica Superior de la Universitat de Girona; FundaciĂł privada: Girona, Universitat i Futur; CĂ tedra LluĂs SantalĂł dâAplicacions de la MatemĂ tica; Consell Social de la Universitat de Girona; Ministerio de Ciencia i TecnologĂa.ca
Discrete and continuous compositions
This paper examines a dataset which is modeled well by thePoisson-Log Normal process and by this process mixed with LogNormal data, which are both turned into compositions. Thisgenerates compositional data that has zeros without any need forconditional models or assuming that there is missing or censoreddata that needs adjustment. It also enables us to model dependenceon covariates and within the compositionGeologische Vereinigung; Institut dâEstadĂstica de Catalunya; International Association for Mathematical Geology; CĂ tedra LluĂs SantalĂł dâAplicacions de la MatemĂ tica; Generalitat de Catalunya, Departament dâInnovaciĂł, Universitats i Recerca; Ministerio de EducaciĂłn y Ciencia; Ingenio 2010
Introduction to Quantitative Research Methods
This coursebook is designed to include sufficient statistical concepts to allow students to make good sense of the statistical figures and numbers that they are exposed to in daily life. After
reading the book, students should understand the basics of quantitative research and be able to critically review simple statistical analysis. The book is intended to be self-contained but
does not include mathematical proofs
Compositional Data, Bayesian Inference and the Modeling Process
Statistical modeling in practice encompasses both the exploratory process,
which is an inductive scientific approach and the confirmatory modeling process,
which uses the deductive scientific approach. This paper will focus primarily on the
confirmatory modeling process.
As the great applied statistician George Box, has famously said âall models
are wrong, but some are usefulâ. My version would be âall models are wrong, but
some are essential for progressâ!
While John Aitchison has changed the world of compositional data analysis,
the world of Bayesian statistics has also changed dramatically thanks to the Gibbs
sampler, which allows Bayesian analysis of complex non-linear models and
particularly random effects models.
The beauty of Bayesian analysis is that it allows us to build models
hierarchically to incorporate all our knowledge about the structure of the data
generation process, not just about the parameters.
In practice, we often know quite a lot about how data might have been
generated and that knowledge can make a dramatic difference in how precise our
inference can be.
The paper examines the use of Bayesian inference in statistical models that
include a compositional process. It discusses the insights that may be obtained from
this approach, including as examples: distinguishing between structural and censored
zeros, examining the choice between compositional or multivariate covariates,
identifying the number of end-members in a composition and identifying changepoints
in compositional processe
Charting multilingualism : language censuses and language surveys in Hong Kong
The chapter reviews census and language survey data to present a comprehensive, longitudinal survey of the complex pattern of multilingualism and language diversity in Hong Kong over the twentieth century