36 research outputs found
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Nonparametric inference with directional and linear data
The term directional data refers to data whose support is a circumference, a sphere or, generally,
an hypersphere of arbitrary dimension. This kind of data appears naturally in several
applied disciplines: proteomics, environmental sciences, biology, astronomy, image analysis or
text mining. The aim of this thesis is to provide new methodological tools for nonparametric
inference with directional and linear data (i.e., usual Euclidean data). Nonparametric methods
are obtained for both estimation and testing, for the density and the regression curves, in situations
where directional random variables are present, that is, directional, directional-linear and
directional-directional random variables. The main contributions of the thesis are collected in
six papers briefly described in what follows.
In GarcĂa-PortuguĂ©s et al. (2013a) different ways of estimating circular-linear and circularcircular
densities via copulas are explored for an environmental application. A new directionallinear
kernel density estimator is introduced in GarcĂa-PortuguĂ©s et al. (2013b) together with
its basic properties. Three new bandwidth selectors for the kernel density estimator with directional
data are given in GarcĂa-PortuguĂ©s (2013) and compared with the available ones. The
directional-linear estimator is used in GarcĂa-PortuguĂ©s et al. (2014a) for constructing an independence
test for directional and linear variables that is applied to study the dependence
between wildfire orientation and size. In GarcĂa-PortuguĂ©s et al. (2014b) a central limit theorem
for the integrated squared error of the directional-linear estimator is presented. This result is
used to derive the asymptotic distribution of the independence test and of a goodness-of-fit test
for parametric directional-linear and directional-directional densities. Finally, a local linear estimator
with directional predictor and linear response is given in GarcĂa-PortuguĂ©s et al. (2014)
jointly with a goodness-of-fit test for parametric regression functions
A Framework for Statistical Modeling of Wind Speed and Wind Direction
Atmospheric near surface wind speed and wind direction play an important role in many applications, ranging from air quality modeling, building design, wind turbine placement to climate change research. It is therefore crucial to accurately estimate the joint probability distribution of wind speed and direction. This dissertation aims to provide a modeling framework for studying the variation of wind speed and wind direction. To this end, three projects are conducted to address some of the key issues for modeling wind vectors.\\
First, a conditional decomposition approach is developed to model the joint distribution of wind speed and direction. Specifically, the joint distribution is decomposed into the product of the marginal distribution of wind direction and the conditional distribution of wind speed given wind direction. Von Mises mixture model is used to accommodate the circular nature of wind direction. The conditional wind speed distribution is modeled as a directional dependent Weibull distribution via a two-stage estimation procedure, consisting of a directional binned Weibull parameter estimation, followed by a harmonic regression to estimate the functional dependence of the Weibull parameters on wind direction. The conditional decomposition approach allows the modeling of complex distributions with relatively simple and flexible univariate models. Moreover, by studying the variations of wind speed with respect to wind direction, we gain valuable insights that would be overlooked if we solely focused on studying wind speed alone. These insights have significant implications for a wide range of applications involving wind data. This conditional modeling framework is further extended to investigate the potential enhancement of estimating extreme wind speeds. Specifically, parametric extreme value modeling approaches, including block maxima, peaks-over-thresholds, and point process methods, are utilized to model the upper tail of its conditional distribution. The purpose of this extension is to avoid misspecification issues associated with the Weibull model and to improve estimation efficiency. Simulation studies, analysis of output from climate model simulation, and model comparisons are discussed.\\
A key feature of the wind field data is its complicated temporal and spatial structure. Therefore, the final goal of this dissertation involves the spatio-temporal modeling of wind speed. The proposed model captures the seasonal variation and temporal and spatial variability by decomposing the wind speed process into the ``global structure\u27\u27 of the spatio-temporal mean component, the ``local structure\u27\u27 that consists of a combination of time varying empirical orthogonal functions (EOFs), and a first-order dynamical spatial Gaussian process (GP). A crucial element of the proposed decomposition is leveraging the inherent circularity of the annual seasonal cycle to create effective replications in time. This enables us to employ more flexible nonstationary space-time modeling through EOF analysis and enhance computation efficiency using dynamical GPs
Directional naive Bayes classifiers
Directional data are ubiquitous in science.
These data have some special properties that rule out the
use of classical statistics. Therefore, different distributions
and statistics, such as the univariate von Mises and the
multivariate von MisesâFisher distributions, should be
used to deal with this kind of information. We extend the
naive Bayes classifier to the case where the conditional
probability distributions of the predictive variables follow
either of these distributions. We consider the simple scenario,
where only directional predictive variables are used,
and the hybrid case, where discrete, Gaussian and directional
distributions are mixed. The classifier decision
functions and their decision surfaces are studied at length.
Artificial examples are used to illustrate the behavior of the
classifiers. The proposed classifiers are then evaluated over
eight datasets, showing competitive performances against
other naive Bayes classifiers that use Gaussian distributions
or discretization to manage directional data
Untangling hotel industryâs inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
Seventh International Workshop on Simulation, 21-25 May, 2013, Department of Statistical Sciences, Unit of Rimini, University of Bologna, Italy. Book of Abstracts
Seventh International Workshop on Simulation, 21-25 May, 2013, Department of Statistical Sciences, Unit of Rimini, University of Bologna, Italy. Book of Abstract
A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium
When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its Ï parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available