87 research outputs found
The risk ratio versus odds ratio argument revisited from a compositional data analysis perspective
published_or_final_versio
Compositional data, bayesian inference and the modeling process
Statistical modeling in practice encompasses both the exploratory process,
which is an inductive scientific approach and the confirmatory modeling process,
which uses the deductive scientific approach. This paper will focus primarily on the
confirmatory modeling process.
As the great applied statistician George Box, has famously said “all models
are wrong, but some are useful”. My version would be “all models are wrong, but
some are essential for progress”!
While John Aitchison has changed the world of compositional data analysis,
the world of Bayesian statistics has also changed dramatically thanks to the Gibbs
sampler, which allows Bayesian analysis of complex non-linear models and
particularly random effects models.
The beauty of Bayesian analysis is that it allows us to build models
hierarchically to incorporate all our knowledge about the structure of the data
generation process, not just about the parameters.
In practice, we often know quite a lot about how data might have been
generated and that knowledge can make a dramatic difference in how precise our
inference can be.
The paper examines the use of Bayesian inference in statistical models that
include a compositional process. It discusses the insights that may be obtained from
this approach, including as examples: distinguishing between structural and censored
zeros, examining the choice between compositional or multivariate covariates,
identifying the number of end-members in a composition and identifying changepoints
in compositional processes
Multivariate discrete density estimation using kernel densities
Despite intensive recent research into density estimation little attention seems to have been paid to its possible use in describing patterns of variability of univariate or multivariate counts. This paper discusses the relative merits of a number of possible kernels, and illustrates their application to univariate and bivariate count data.postprin
Compound compositional data processes
Compositional data is non-negative data subject to the unit sum constraint. The logistic normal distribution provides a framework for compositional data when it satisfies sub-compositional coherence in that the inference from a sub- composition should be the same based on the full composition or the sub-composition alone. However, in many cases sub-compositions are not coherent because of additional structure on the compositions, which can be modelled as process(es) inducing change. Sometimes data are collected with a model already well validated and hence with the focus on estimation of the model parameters. Alternatively, sometimes the appropriate model is unknown in advance and it is necessary to use the data to identify a suitable model. In both cases, a hierarchy of possible structure(s) is very helpful. This is evident in the evaluation of, for example, geochemical and household expenditure data. In the case of geochemical data, the structural process might be the stoichiometric constraints induced by the crystal lattice sites, which ensures that amalgamations of some elements are constant in molar terms. The choice of units (weight percent oxide or moles) has an impact on how the data can be modelled and interpreted. For simple igneous systems (e.g. Hawaiian basalt) mineral modes can be calculated from which a valid geochemical interpretation can be obtained. For household expenditure data, the structural process might be how teetotal households have distinct spending patterns on discretionary items from non-teetotal households. Measurement error is an example of another underlying process that reflects how an underlying discrete distribution (e.g. for the number of molecules in a sample) is converted using a linear calibration into a non-negative measurement, where measurements below the stated detection limit are reported as zero. Compositional perturbation involves additive errors on the log-ratio space and is the process that does show sub-compositional coherence. The mixing process involves the combination of compositions into a new composition, such as minerals combining to form a rock, where there may be considerable knowledge about the set of possible mixing processes. Finally, recording error may affect the composition, such as recording the components to a specified number of decimal digits, implying interval censoring, which implies error is close to uniform on the simplex.postprin
An approximation to ordering probabilities of multi-entry competitions
To predict the ordering probabilities of multi-entry competitions (e.g. horse-races), Harville (1973) proposed a simple way of computing the ordering probabilities based on the simple, winning probabilities. This simple model essentially assumes the underlying model (e.g. running time in horse-racing) is independent exponential. Henery (1981) and Stern (1990) respectively proposed to use normal and gamma distributions for the running time. However both the Henery and Stern model are too complicated to use in practice. Bacon-Shone, Lo & Busche (1992,b) have shown that the Henery model fits better in horse-racing using particular data sets. In this paper, we propose to use a simple way of computing ordering probabilities which approximate both the Henery and Stern model quite well. Using Hong Kong. U.S. and Japanese data, a large scale empirical investigation is undertaken.postprin
The stoichiometry of mineral compositions
Previous work by John Aitchison (1999) showed how log-ratio compositional data analysis can illuminate the relationships between components of a composition based on mineral constituents,
However, his analysis was framed in terms of weight based compositional, so it did not illustrate
directly the stoichometric relationships of the olivine minerals he investigated. We show how applying log-ratio compositional data analysis to the mole based composition illustrates the stoichometric
relationships directly by investigating olivines, alkali feldspars and plagioclases. This approach has
the potential to provide much greater meaning to geochemists than one based on weight based composition
Probability and Statistical models for racing
Racing data provides a rich source of analysis for quantitative researchers to study multi-entry competitions. This paper first explores statistical modeling to investigate the favorite-longshot betting bias using world-wide horse race data. The result shows that the bias phenomenon is not universal. Economic interpretation using utility theory will also be provided. Additionally, previous literature have proposed various probability distributions to model racing running time in order to estimate higher order probabilities such as probabilities of finishing second and third. We extend the normal distribution assumption to include certain correlation and variance structure and apply the extended model to actual data. While horse race data is used in this paper, the methodologies can be applied to other types of racing data such as cars and dogs.published_or_final_versio
Proceedings of the opening of the Drug Addiction Research Unit of the University of Hong Kong
published_or_final_versio
- …