1,581 research outputs found
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Kernel density estimation on the torus
Kernel density estimation for multivariate, circular data has been formulated only when the sample space is the sphere, but theory for the torus would also be useful. For data lying on a d-dimensional torus (d >= 1), we discuss kernel estimation of a density, its mixed partial derivatives, and their squared functionals. We introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L-2-risk formulas whose structure can be compared to their Euclidean counterparts. Our kernels are based on circular densities; however, we also discuss smaller bias estimation involving negative kernels which are functions of circular densities. Practical rules for selecting the smoothing degree, based on cross-validation, bootstrap and plug-in ideas are derived. Moreover, we provide specific results on the use of kernels based on the von Mises density. Finally, real-data examples and simulation studies illustrate the findings
Fast robust correlation for high-dimensional data
The product moment covariance is a cornerstone of multivariate data analysis,
from which one can derive correlations, principal components, Mahalanobis
distances and many other results. Unfortunately the product moment covariance
and the corresponding Pearson correlation are very susceptible to outliers
(anomalies) in the data. Several robust measures of covariance have been
developed, but few are suitable for the ultrahigh dimensional data that are
becoming more prevalent nowadays. For that one needs methods whose computation
scales well with the dimension, are guaranteed to yield a positive semidefinite
covariance matrix, and are sufficiently robust to outliers as well as
sufficiently accurate in the statistical sense of low variability. We construct
such methods using data transformations. The resulting approach is simple, fast
and widely applicable. We study its robustness by deriving influence functions
and breakdown values, and computing the mean squared error on contaminated
data. Using these results we select a method that performs well overall. This
also allows us to construct a faster version of the DetectDeviatingCells method
(Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can
deal with much higher dimensions. The approach is illustrated on genomic data
with 12,000 variables and color video data with 920,000 dimensions
A Unified Framework of Constrained Regression
Generalized additive models (GAMs) play an important role in modeling and
understanding complex relationships in modern applied statistics. They allow
for flexible, data-driven estimation of covariate effects. Yet researchers
often have a priori knowledge of certain effects, which might be monotonic or
periodic (cyclic) or should fulfill boundary conditions. We propose a unified
framework to incorporate these constraints for both univariate and bivariate
effect estimates and for varying coefficients. As the framework is based on
component-wise boosting methods, variables can be selected intrinsically, and
effects can be estimated for a wide range of different distributional
assumptions. Bootstrap confidence intervals for the effect estimates are
derived to assess the models. We present three case studies from environmental
sciences to illustrate the proposed seamless modeling framework. All discussed
constrained effect estimates are implemented in the comprehensive R package
mboost for model-based boosting.Comment: This is a preliminary version of the manuscript. The final
publication is available at
http://link.springer.com/article/10.1007/s11222-014-9520-
Directional statistics and filtering using libDirectional
In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation. It supports a variety of commonly used distributions on the unit circle, such as the von Mises, wrapped normal, and wrapped Cauchy distributions. Furthermore, various distributions on higher-dimensional manifolds such as the unit hypersphere and the hypertorus are available. Based on these distributions, several recursive filtering algorithms in libDirectional allow estimation on these manifolds. The functionality is implemented in a clear, well-documented, and object-oriented structure that is both easy to use and easy to extend
BAMBI: An R Package for Fitting Bivariate Angular Mixture Models
Statistical analyses of directional or angular data have applications in a variety of fields, such as geology, meteorology and bioinformatics. There is substantial literature on descriptive and inferential techniques for univariate angular data, with the bivariate (or more generally, multivariate) cases receiving more attention in recent years. More specifically, the bivariate wrapped normal, von Mises sine and von Mises cosine distributions, and mixtures thereof, have been proposed for practical use. However, there is a lack of software implementing these distributions and the associated inferential techniques. In this article, we introduce BAMBI, an R package for analyzing bivariate (and univariate) angular data. We implement random data generation, density evaluation, and computation of theoretical summary measures (variances and correlation coefficients) for the three aforementioned bivariate angular distributions, as well as two univariate angular distributions: the univariate wrapped normal and the univariate von Mises distribution. The major contribution of BAMBI to statistical computing is in providing Bayesian methods for modeling angular data using finite mixtures of these distributions. We also provide functions for visual and numerical diagnostics and Bayesian inference for the fitted models. In this article, we first provide a brief review of the distributions and techniques used in BAMBI, then describe the capabilities of the package, and finally conclude with demonstrations of mixture model fitting using BAMBI on the two real data sets included in the package, one univariate and one bivariate
CopulaDTA: An R Package for Copula Based Bivariate Beta-Binomial Models for Diagnostic Test Accuracy Studies in a Bayesian Framework
The current statistical procedures implemented in statistical software
packages for pooling of diagnostic test accuracy data include hSROC regression
and the bivariate random-effects meta-analysis model (BRMA). However, these
models do not report the overall mean but rather the mean for a central study
with random-effect equal to zero and have difficulties estimating the
correlation between sensitivity and specificity when the number of studies in
the meta-analysis is small and/or when the between-study variance is relatively
large. This tutorial on advanced statistical methods for meta-analysis of
diagnostic accuracy studies discusses and demonstrates Bayesian modeling using
CopulaDTA package in R to fit different models to obtain the meta-analytic
parameter estimates. The focus is on the joint modelling of sensitivity and
specificity using copula based bivariate beta distribution. Essentially, we
extend the work of Nikoloulopoulos by: i) presenting the Bayesian approach
which offers flexibility and ability to perform complex statistical modelling
even with small data sets and ii) including covariate information, and iii)
providing an easy to use code. The statistical methods are illustrated by
re-analysing data of two published meta-analyses. Modelling sensitivity and
specificity using the bivariate beta distribution provides marginal as well as
study-specific parameter estimates as opposed to using bivariate normal
distribution (e.g., in BRMA) which only yields study-specific parameter
estimates. Moreover, copula based models offer greater flexibility in modelling
different correlation structures in contrast to the normal distribution which
allows for only one correlation structure.Comment: 26 pages, 5 figure
- …