34 research outputs found
Skew-rotationally-symmetric distributions and related efficient inferential procedures
peer reviewedMost commonly used distributions on the unit hypersphere Sk−1={v∈Rk:v⊤v=1}, k≥2, assume that the data are rotationally symmetric about some direction θ∈Sk−1. However, there is empirical evidence that this assumption often fails to describe reality. We study in this paper a new class of skew-rotationally-symmetric distributions on Sk−1 that enjoy numerous good properties. We discuss the Fisher information structure of the model and derive efficient inferential procedures. In particular, we obtain the first semi-parametric test for rotational symmetry about a known direction. We also propose a second test for rotational symmetry, obtained through the definition of a new measure of skewness on the hypersphere. We investigate the finite-sample behavior of the new tests through a Monte Carlo simulation study. We conclude the paper with a discussion about some intriguing open questions related to our new models
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Coming together of Bayesian inference and skew spherical data
This paper presents Bayesian directional data modeling via the
skew-rotationally-symmetric Fisher-von Mises-Langevin (FvML) distribution. The
prior distributions for the parameters are a pivotal building block in Bayesian analysis,
therefore, the impact of the proposed priors will be quantified using the Wasserstein
Impact Measure (WIM) to guide the practitioner in the implementation process. For the
computation of the posterior, modifications of Gibbs and slice samplings are applied for
generating samples. We demonstrate the applicability of our contribution via synthetic
and real data analyses. Our investigation paves the way for Bayesian analysis of skew
circular and spherical data.The Visiting Professor programme, University of Pretoria and the National Research Foundation (NRF) of South Africa, SARChI Research Chair and DSINRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), South Africa.https://www.frontiersin.org/journals/big-datadm2022Statistic
Curved factor analysis with the Ellipsoid-Gaussian distribution
There is a need for new models for characterizing dependence in multivariate
data. The multivariate Gaussian distribution is routinely used, but cannot
characterize nonlinear relationships in the data. Most non-linear extensions
tend to be highly complex; for example, involving estimation of a non-linear
regression model in latent variables. In this article, we propose a relatively
simple class of Ellipsoid-Gaussian multivariate distributions, which are
derived by using a Gaussian linear factor model involving latent variables
having a von Mises-Fisher distribution on a unit hyper-sphere. We show that the
Ellipsoid-Gaussian distribution can flexibly model curved relationships among
variables with lower-dimensional structures. Taking a Bayesian approach, we
propose a hybrid of gradient-based geodesic Monte Carlo and adaptive Metropolis
for posterior sampling. We derive basic properties and illustrate the utility
of the Ellipsoid-Gaussian distribution on a variety of simulated and real data
applications. An accompanying R package is also available
Sine-skewed toroidal distributions and their application in protein bioinformatics
In the bioinformatics field, there has been a growing interest in modelling
dihedral angles of amino acids by viewing them as data on the torus. This has
motivated, over the past years, new proposals of distributions on the bivariate
torus. The main drawback of most of these models is that the related densities
are (pointwise) symmetric, despite the fact that the data usually present
asymmetric patterns. This motivates the need to find a new way of constructing
asymmetric toroidal distributions starting from a symmetric distribution. We
tackle this problem in this paper by introducing the sine-skewed toroidal
distributions. The general properties of the new models are derived. Based on
the initial symmetric model, explicit expressions for the shape parameters are
obtained, a simple algorithm for generating random numbers is provided, and
asymptotic results for the maximum likelihood estimators are established. An
important feature of our construction is that no normalizing constant needs to
be calculated, leading to more flexible distributions without increasing the
complexity of the models. The benefit of employing these new sine-skewed
distributions is shown on the basis of protein data, where, in general, the new
models outperform their symmetric antecedents
Enhancing wind direction prediction of South Africa wind energy hotspots with Bayesian mixture modeling
Wind energy production depends not only on wind speed but also on wind direction. Thus, predicting
and estimating the wind direction for sites accurately will enhance measuring the wind energy
potential. The uncertain nature of wind direction can be presented through probability distributions
and Bayesian analysis can improve the modeling of the wind direction using the contribution of the
prior knowledge to update the empirical shreds of evidence. This must align with the nature of the
empirical evidence as to whether the data are skew or multimodal or not. So far mixtures of von
Mises within the directional statistics domain, are used for modeling wind direction to capture the
multimodality nature present in the data. In this paper, due to the skewed and multimodal patterns
of wind direction on diferent sites of the locations understudy, a mixture of multimodal skewed
von Mises is proposed for wind direction. Furthermore, a Bayesian analysis is presented to take
into account the uncertainty inherent in the proposed wind direction model. A simulation study is
conducted to evaluate the performance of the proposed Bayesian model. This proposed model is
ftted to datasets of wind direction of Marion island and two wind farms in South Africa and show
the superiority of the approach. The posterior predictive distribution is applied to forecast the wind
direction on a wind farm. It is concluded that the proposed model ofers an accurate prediction by
means of credible intervals. The mean wind direction of Marion island in 2017 obtained from 1079
observations was 5.0242 (in radian) while using our proposed method the predicted mean wind
direction and its corresponding 95% credible interval based on 100 generated samples from the
posterior predictive distribution are obtained 5.0171 and (4.7442, 5.2900). Therefore, our results
open a new approach for accurate prediction of wind direction implementing a Bayesian approach via
mixture of skew circular distributions.https://www.nature.com/srepStatistic
Statistical methods for random rotations
The analysis of orientation data is a growing field in statistics. Though the rotationally symmetric location model for orientation data is simple, statistical methods for estimation and inference for the location parameter, S are limited. In this dissertation we develop point estimation and confidence region methods for the central orientation.
Both extrinsic and intrinsic approaches to estimating the central orientation S have been proposed in the literature, but no rigorous comparison of the approaches is available. In Chapter 2 we consider both intrinsic and extrinsic estimators of the central orientation and compare their statistical properties in a simulation study. In particular we consider the projected mean, geometric mean and geometric median. In addition we introduce the projected median as a novel robust estimator of the location parameter. The results of a simulation study suggest the projected median is the preferred estimator because of its low bias and mean square error.
Non-parametric confidence regions for the central orientation have been proposed in the literature, but they have undesirable coverage rates for small samples. In Chapter 3 we propose a nonparametric pivotal bootstrap to calibrate confidence regions for the central orientation. We demonstrate the benefits of using calibrated confidence regions in a simulation study and prove the proposed bootstrap method is consistent.
Robust statistical methods for estimating the central orientation has received very little attention. In Chapter 4 we explore the finite sample and asymptotic properties of the projected median. In particular we derive the asymptotic distribution of the projected median and show it is SB-robust for the Cayley and matrix Fisher distributions. Confidence regions for the central orientation S are proposed, which can be shown to have preferable finite sample coverage rates compared to those based on the projected mean.
Finally the rotations package is developed in Chapter 5, which contains functions for the statistical analysis of rotation data in SO(3)
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Nonparametric inference with directional and linear data
The term directional data refers to data whose support is a circumference, a sphere or, generally,
an hypersphere of arbitrary dimension. This kind of data appears naturally in several
applied disciplines: proteomics, environmental sciences, biology, astronomy, image analysis or
text mining. The aim of this thesis is to provide new methodological tools for nonparametric
inference with directional and linear data (i.e., usual Euclidean data). Nonparametric methods
are obtained for both estimation and testing, for the density and the regression curves, in situations
where directional random variables are present, that is, directional, directional-linear and
directional-directional random variables. The main contributions of the thesis are collected in
six papers briefly described in what follows.
In García-Portugués et al. (2013a) different ways of estimating circular-linear and circularcircular
densities via copulas are explored for an environmental application. A new directionallinear
kernel density estimator is introduced in García-Portugués et al. (2013b) together with
its basic properties. Three new bandwidth selectors for the kernel density estimator with directional
data are given in García-Portugués (2013) and compared with the available ones. The
directional-linear estimator is used in García-Portugués et al. (2014a) for constructing an independence
test for directional and linear variables that is applied to study the dependence
between wildfire orientation and size. In García-Portugués et al. (2014b) a central limit theorem
for the integrated squared error of the directional-linear estimator is presented. This result is
used to derive the asymptotic distribution of the independence test and of a goodness-of-fit test
for parametric directional-linear and directional-directional densities. Finally, a local linear estimator
with directional predictor and linear response is given in García-Portugués et al. (2014)
jointly with a goodness-of-fit test for parametric regression functions