6,580 research outputs found

    A method for Bayesian regression modelling of composition data

    Get PDF
    Many scientific and industrial processes produce data that is best analysed as vectors of relative values, often called compositions or proportions. The Dirichlet distribution is a natural distribution to use for composition or proportion data. It has the advantage of a low number of parameters, making it the parsimonious choice in many cases. In this paper we consider the case where the outcome of a process is Dirichlet, dependent on one or more explanatory variables in a regression setting. We explore some existing approaches to this problem, and then introduce a new simulation approach to fitting such models, based on the Bayesian framework. We illustrate the advantages of the new approach through simulated examples and an application in sport science. These advantages include: increased accuracy of fit, increased power for inference, and the ability to introduce random effects without additional complexity in the analysis.Comment: 10 pages, 1 figure, 2 table

    Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese

    Full text link
    Mandarin Chinese is characterized by being a tonal language; the pitch (or F0F_0) of its utterances carries considerable linguistic information. However, speech samples from different individuals are subject to changes in amplitude and phase which must be accounted for in any analysis which attempts to provide a linguistically meaningful description of the language. A joint model for amplitude, phase and duration is presented which combines elements from Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects Models. By decomposing functions via a functional principal component analysis, and connecting registration functions to compositional data analysis, a joint multivariate mixed effect model can be formulated which gives insights into the relationship between the different modes of variation as well as their dependence on linguistic and non-linguistic covariates. The model is applied to the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin, containing approximately 50 thousand phonetically diverse sample F0F_0 contours (syllables), and reveals that phonetic information is jointly carried by both amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio

    Principal Component Analysis for Functional Data on Riemannian Manifolds and Spheres

    Full text link
    Functional data analysis on nonlinear manifolds has drawn recent interest. Sphere-valued functional data, which are encountered for example as movement trajectories on the surface of the earth, are an important special case. We consider an intrinsic principal component analysis for smooth Riemannian manifold-valued functional data and study its asymptotic properties. Riemannian functional principal component analysis (RFPCA) is carried out by first mapping the manifold-valued data through Riemannian logarithm maps to tangent spaces around the time-varying Fr\'echet mean function, and then performing a classical multivariate functional principal component analysis on the linear tangent spaces. Representations of the Riemannian manifold-valued functions and the eigenfunctions on the original manifold are then obtained with exponential maps. The tangent-space approximation through functional principal component analysis is shown to be well-behaved in terms of controlling the residual variation if the Riemannian manifold has nonnegative curvature. Specifically, we derive a central limit theorem for the mean function, as well as root-nn uniform convergence rates for other model components, including the covariance function, eigenfunctions, and functional principal component scores. Our applications include a novel framework for the analysis of longitudinal compositional data, achieved by mapping longitudinal compositional data to trajectories on the sphere, illustrated with longitudinal fruit fly behavior patterns. RFPCA is shown to be superior in terms of trajectory recovery in comparison to an unrestricted functional principal component analysis in applications and simulations and is also found to produce principal component scores that are better predictors for classification compared to traditional functional functional principal component scores

    New statistical method identifes cytokines that distinguish stool microbiomes

    Get PDF
    Regressing an outcome or dependent variable onto a set of input or independent variables allows the analyst to measure associations between the two so that changes in the outcome can be described by and predicted by changes in the inputs. While there are many ways of doing this in classical statistics, where the dependent variable has certain properties (e.g., a scalar, survival time, count), little progress on regression where the dependent variable are microbiome taxa counts has been made that do not impose extremely strict conditions on the data. In this paper, we propose and apply a new regression model combining the Dirichlet-multinomial distribution with recursive partitioning providing a fully non-parametric regression model. This model, called DM-RPart, is applied to cytokine data and microbiome taxa count data and is applicable to any microbiome taxa count/metadata, is automatically fit, and intuitively interpretable. This is a model which can be applied to any microbiome or other compositional data and software (R package HMP) available through the R CRAN website

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page
    • …
    corecore