18 research outputs found

    Properties of a square root transformation regression model

    Get PDF
    We consider the problem of modelling the conditional distribution of a response given a vector of covariates x when the response is a compositional data vector u. That is, u is defined on the unit simplex [...] This definition of the unit simplex differs subtly from that of Aitchison (1982), as we relax the con- dition that the components of u must be strictly positive. Under this scenario, use of the ratio (or logratio) to compare different compositions is not ideal since it is undefined in some instances, and subcompositional analysis is also not appropriate due to the possibility of division by zero. It has long been recognised that the square root transformation [...] transforms compositional data (including zeros) onto the surface of the (p-1)-dimensional hyperspher

    Scaled von Mises-Fisher distributions and regression models for paleomagnetic directional data

    Get PDF
    We propose a new distribution for analysing paleomagnetic directional data that is a novel transformation of the von Mises-Fisher distribution. The new distribution has ellipse-like symmetry, as does the Kent distribution; however, unlike the Kent distribution the normalising constant in the new density is easy to compute and estimation of the shape parameters is straightforward. To accommodate outliers, the model also incorporates an additional shape parameter which controls the tail-weight of the distribution. We also develop a general regression model framework that allows both the mean direction and the shape parameters of the error distribution to depend on covariates. The proposed regression procedure is shown to be equivariant with respect to the choice of coordinate system for the directional response. To illustrate, we analyse paleomagnetic directional data from the GEOMAGIA50.v3 database (Brown et al. 2015). We predict the mean direction at various geological 1 time points and show that there is significant heteroscedasticity present. It is envisaged that the regression structures and error distribution proposed here will also prove useful when covariate information is available with (i) other types of directional response data; and (ii) square-root transformed compositional data of general dimension

    A Directional Mixed Effects Model for Compositional Expenditure Data

    Get PDF
    Compositional data are vectors of proportions defined on the unit simplex and this type of constrained data occur frequently in Government surveys. It is also possible for the compositional data to be correlated due to the clustering or grouping of the observations within small domains or areas. We propose a new class of the mixed model for compositional data based on the Kent distribution for directional data, where the random effects also have Kent distributions. One useful property of the new directional mixed model is that the marginal mean direction has a closed form and is interpretable. The random effects enter the model in a multiplicative way via the product of a set of rotation matrices and the conditional mean direction is a random rotation of the marginal mean direction. In small area estimation settings, the mean proportions are usually of primary interest and these are shown to be simple functions of the marginal mean direction. For estimation, we apply a quasi-likelihood method which results in solving a new set of generalized estimating equations and these are shown to have low bias in typical situations. For inference, we use a nonparametric bootstrap method for clustered data which does not rely on estimates of the shape parameters (shape parameters are difficult to estimate in Kent models). We analyze data from the 2009–2010 Australian Household Expenditure Survey CURF (confidentialized unit record file). We predict the proportions of total weekly expenditure on food and housing costs for households in a chosen set of domains. The new approach is shown to be more tractable than the traditional approach based on the logratio transformation

    Defining Predictive Probability Functions for Species Sampling Models

    No full text
    Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model with other desirable properties from a possibly very large set of candidate statistical models. Over the last 5–10 years the literature on model selection in linear mixed models has grown extremely rapidly. The problem is much more complicated than in linear regression because selection on the covariance structure is not straightforward due to computational issues and boundary problems arising from positive semidefinite constraints on covariance matrices. To obtain a better understanding of the available methods, their properties and the relationships between them, we review a large body of literature on linear mixed model selection. We arrange, implement, discuss and compare model selection methods based on four major approaches: information criteria such as AIC or BIC, shrinkage methods based on penalized loss functions such as LASSO, the Fence procedure and Bayesian techniques.This research was supported by an Australian Research Council discovery project grant

    Robust principal component analysis for power transformed compositional data

    No full text
    Geochemical surveys collect sediment or rock samples, measure the concentration of chemical elements, and report these typically either in weight percent or in parts per million (ppm). There are usually a large number of elements measured and the distributions are often skewed, containing many potential outliers. We present a new robust principal component analysis (PCA) method for geochemical survey data, that involves first transforming the compositional data onto a manifold using a relative power transformation. A flexible set of moment assumptions are made which take the special geometry of the manifold into account. The Kent distribution moment structure arises as a special case when the chosen manifold is the hypersphere. We derive simple moment and robust estimators (RO) of the parameters which are also applicable in high-dimensional settings. The resulting PCA based on these estimators is done in the tangent space and is related to the power transformation method used in correspondence analysis. To illustrate, we analyze major oxide data from the National Geochemical Survey of Australia. When compared with the traditional approach in the literature based on the centered log-ratio transformation, the new PCA method is shown to be more successful at dimension reduction and gives interpretable results
    corecore