6,580 research outputs found
A method for Bayesian regression modelling of composition data
Many scientific and industrial processes produce data that is best analysed
as vectors of relative values, often called compositions or proportions. The
Dirichlet distribution is a natural distribution to use for composition or
proportion data. It has the advantage of a low number of parameters, making it
the parsimonious choice in many cases. In this paper we consider the case where
the outcome of a process is Dirichlet, dependent on one or more explanatory
variables in a regression setting. We explore some existing approaches to this
problem, and then introduce a new simulation approach to fitting such models,
based on the Bayesian framework. We illustrate the advantages of the new
approach through simulated examples and an application in sport science. These
advantages include: increased accuracy of fit, increased power for inference,
and the ability to introduce random effects without additional complexity in
the analysis.Comment: 10 pages, 1 figure, 2 table
Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese
Mandarin Chinese is characterized by being a tonal language; the pitch (or
) of its utterances carries considerable linguistic information. However,
speech samples from different individuals are subject to changes in amplitude
and phase which must be accounted for in any analysis which attempts to provide
a linguistically meaningful description of the language. A joint model for
amplitude, phase and duration is presented which combines elements from
Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects
Models. By decomposing functions via a functional principal component analysis,
and connecting registration functions to compositional data analysis, a joint
multivariate mixed effect model can be formulated which gives insights into the
relationship between the different modes of variation as well as their
dependence on linguistic and non-linguistic covariates. The model is applied to
the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin,
containing approximately 50 thousand phonetically diverse sample contours
(syllables), and reveals that phonetic information is jointly carried by both
amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio
Principal Component Analysis for Functional Data on Riemannian Manifolds and Spheres
Functional data analysis on nonlinear manifolds has drawn recent interest.
Sphere-valued functional data, which are encountered for example as movement
trajectories on the surface of the earth, are an important special case. We
consider an intrinsic principal component analysis for smooth Riemannian
manifold-valued functional data and study its asymptotic properties. Riemannian
functional principal component analysis (RFPCA) is carried out by first mapping
the manifold-valued data through Riemannian logarithm maps to tangent spaces
around the time-varying Fr\'echet mean function, and then performing a
classical multivariate functional principal component analysis on the linear
tangent spaces. Representations of the Riemannian manifold-valued functions and
the eigenfunctions on the original manifold are then obtained with exponential
maps. The tangent-space approximation through functional principal component
analysis is shown to be well-behaved in terms of controlling the residual
variation if the Riemannian manifold has nonnegative curvature. Specifically,
we derive a central limit theorem for the mean function, as well as root-
uniform convergence rates for other model components, including the covariance
function, eigenfunctions, and functional principal component scores. Our
applications include a novel framework for the analysis of longitudinal
compositional data, achieved by mapping longitudinal compositional data to
trajectories on the sphere, illustrated with longitudinal fruit fly behavior
patterns. RFPCA is shown to be superior in terms of trajectory recovery in
comparison to an unrestricted functional principal component analysis in
applications and simulations and is also found to produce principal component
scores that are better predictors for classification compared to traditional
functional functional principal component scores
New statistical method identifes cytokines that distinguish stool microbiomes
Regressing an outcome or dependent variable onto a set of input or independent variables allows the analyst to measure associations between the two so that changes in the outcome can be described by and predicted by changes in the inputs. While there are many ways of doing this in classical statistics, where the dependent variable has certain properties (e.g., a scalar, survival time, count), little progress on regression where the dependent variable are microbiome taxa counts has been made that do not impose extremely strict conditions on the data. In this paper, we propose and apply a new regression model combining the Dirichlet-multinomial distribution with recursive partitioning providing a fully non-parametric regression model. This model, called DM-RPart, is applied to cytokine data and microbiome taxa count data and is applicable to any microbiome taxa count/metadata, is automatically fit, and intuitively interpretable. This is a model which can be applied to any microbiome or other compositional data and software (R package HMP) available through the R CRAN website
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
- …