50 research outputs found

    A novel, divergence based, regression for compositional data

    Get PDF
    In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science amongst others. The goal of this paper is to propose a new, divergence based, regression modelling technique for compositional data. To do so, a recently proved metric which is a special case of the Jensen-Shannon divergence is employed. A strong advantage of this new regression technique is that zeros are naturally handled. An example with real data and simulation studies are presented and are both compared with the log-ratio based regression suggested by Aitchison in 1986.Comment: This is a preprint of the paper accepted for publication in the Proceedings of the 28th Panhellenic Statistics Conference, 15-18/4/2015, Athens, Greec

    Regression analysis with compositional data containing zero values

    Get PDF
    Regression analysis with compositional data containing zero valuesComment: The paper has been accepted for publication in the Chilean Journal of Statistics. It consists of 12 pages with 4 figure

    The k-NN algorithm for compositional data: a revised approach with and without zero values present

    Get PDF
    In compositional data, an observation is a vector with non-negative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for compositional data by employing a power transformation. Both metrics are to be used in the k-nearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited.Comment: This manuscript will appear at the. http://www.jds-online.com/volume-12-number-3-july-201

    The FEDHC Bayesian network learning algorithm

    Full text link
    The paper proposes a new hybrid Bayesian network learning algorithm, termed Forward Early Dropping Hill Climbing (FEDHC), devised to work with either continuous or categorical variables. Specifically for the case of continuous data, a robust to outliers version of FEDHC, that can be adopted by other BN learning algorithms, is proposed. Further, the paper manifests that the only implementation of MMHC in the statistical software \textit{R}, is prohibitively expensive and a new implementation is offered. The FEDHC is tested via Monte Carlo simulations that distinctly show it is computationally efficient, and produces Bayesian networks of similar to, or of higher accuracy than MMHC and PCHC. Finally, an application of FEDHC, PCHC and MMHC algorithms to real data, from the field of economics, is demonstrated using the statistical software \textit{R}

    Circular and Spherical Projected Cauchy Distributions: A Novel Framework for Circular and Directional Data Modeling

    Full text link
    We introduce a novel family of projected distributions on the circle and the sphere, namely the circular and spherical projected Cauchy distributions as promising alternatives for modeling circular and directional data. The circular distribution encompasses the wrapped Cauchy distribution as a special case, featuring a more convenient parameterisation. Next, we propose a generalised wrapped Cauchy distribution that includes an extra parameter, enhancing the fit of the distribution. In the spherical context, we impose two conditions on the scatter matrix, resulting in an elliptically symmetric distribution. Our projected distributions exhibit attractive properties, such as closed-form normalising constants and straightforward random value generation. The distribution parameters can be estimated using maximum likelihood and we assess their bias through numerical studies. We compare our proposed distributions to existing models with real data sets, demonstrating superior fit both with and without covariates.Comment: Preprin

    Improved classification for compositional data using the α\alpha-transformation

    Get PDF
    In compositional data analysis an observation is a vector containing non-negative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centres on the idea of using the α\alpha-transformation to transform the data and then to classify the transformed data via regularised discriminant analysis and the k-nearest neighbours algorithm. Using the α\alpha-transformation generalises two rival approaches in compositional data analysis, one (when α=1\alpha=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α=0\alpha=0) that employs Aitchison's centred log-ratio transformation. A numerical study with several real datasets shows that whether using α=1\alpha=1 or α=0\alpha=0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α\alpha can sometimes give better performance than using either 1 or 0.Comment: This is a 17-page preprint and has been accepted for publication at the Journal of Classificatio
    corecore