50 research outputs found
A novel, divergence based, regression for compositional data
In compositional data, an observation is a vector with non-negative
components which sum to a constant, typically 1. Data of this type arise in
many areas, such as geology, archaeology, biology, economics and political
science amongst others. The goal of this paper is to propose a new, divergence
based, regression modelling technique for compositional data. To do so, a
recently proved metric which is a special case of the Jensen-Shannon divergence
is employed. A strong advantage of this new regression technique is that zeros
are naturally handled. An example with real data and simulation studies are
presented and are both compared with the log-ratio based regression suggested
by Aitchison in 1986.Comment: This is a preprint of the paper accepted for publication in the
Proceedings of the 28th Panhellenic Statistics Conference, 15-18/4/2015,
Athens, Greec
Regression analysis with compositional data containing zero values
Regression analysis with compositional data containing zero valuesComment: The paper has been accepted for publication in the Chilean Journal of
Statistics. It consists of 12 pages with 4 figure
The k-NN algorithm for compositional data: a revised approach with and without zero values present
In compositional data, an observation is a vector with non-negative
components which sum to a constant, typically 1. Data of this type arise in
many areas, such as geology, archaeology, biology, economics and political
science among others. The goal of this paper is to extend the taxicab metric
and a newly suggested metric for compositional data by employing a power
transformation. Both metrics are to be used in the k-nearest neighbours
algorithm regardless of the presence of zeros. Examples with real data are
exhibited.Comment: This manuscript will appear at the.
http://www.jds-online.com/volume-12-number-3-july-201
The FEDHC Bayesian network learning algorithm
The paper proposes a new hybrid Bayesian network learning algorithm, termed
Forward Early Dropping Hill Climbing (FEDHC), devised to work with either
continuous or categorical variables. Specifically for the case of continuous
data, a robust to outliers version of FEDHC, that can be adopted by other BN
learning algorithms, is proposed. Further, the paper manifests that the only
implementation of MMHC in the statistical software \textit{R}, is prohibitively
expensive and a new implementation is offered. The FEDHC is tested via Monte
Carlo simulations that distinctly show it is computationally efficient, and
produces Bayesian networks of similar to, or of higher accuracy than MMHC and
PCHC. Finally, an application of FEDHC, PCHC and MMHC algorithms to real data,
from the field of economics, is demonstrated using the statistical software
\textit{R}
Circular and Spherical Projected Cauchy Distributions: A Novel Framework for Circular and Directional Data Modeling
We introduce a novel family of projected distributions on the circle and the
sphere, namely the circular and spherical projected Cauchy distributions as
promising alternatives for modeling circular and directional data. The circular
distribution encompasses the wrapped Cauchy distribution as a special case,
featuring a more convenient parameterisation. Next, we propose a generalised
wrapped Cauchy distribution that includes an extra parameter, enhancing the fit
of the distribution. In the spherical context, we impose two conditions on the
scatter matrix, resulting in an elliptically symmetric distribution. Our
projected distributions exhibit attractive properties, such as closed-form
normalising constants and straightforward random value generation. The
distribution parameters can be estimated using maximum likelihood and we assess
their bias through numerical studies. We compare our proposed distributions to
existing models with real data sets, demonstrating superior fit both with and
without covariates.Comment: Preprin
Improved classification for compositional data using the -transformation
In compositional data analysis an observation is a vector containing
non-negative values, only the relative sizes of which are considered to be of
interest. Without loss of generality, a compositional vector can be taken to be
a vector of proportions that sum to one. Data of this type arise in many areas
including geology, archaeology, biology, economics and political science. In
this paper we investigate methods for classification of compositional data. Our
approach centres on the idea of using the -transformation to transform
the data and then to classify the transformed data via regularised discriminant
analysis and the k-nearest neighbours algorithm. Using the
-transformation generalises two rival approaches in compositional data
analysis, one (when ) that treats the data as though they were
Euclidean, ignoring the compositional constraint, and another (when )
that employs Aitchison's centred log-ratio transformation. A numerical study
with several real datasets shows that whether using or
gives better classification performance depends on the dataset, and moreover
that using an intermediate value of can sometimes give better
performance than using either 1 or 0.Comment: This is a 17-page preprint and has been accepted for publication at
the Journal of Classificatio