4 research outputs found

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Directional naive Bayes classifiers

    Get PDF
    Directional data are ubiquitous in science. These data have some special properties that rule out the use of classical statistics. Therefore, different distributions and statistics, such as the univariate von Mises and the multivariate von Mises–Fisher distributions, should be used to deal with this kind of information. We extend the naive Bayes classifier to the case where the conditional probability distributions of the predictive variables follow either of these distributions. We consider the simple scenario, where only directional predictive variables are used, and the hybrid case, where discrete, Gaussian and directional distributions are mixed. The classifier decision functions and their decision surfaces are studied at length. Artificial examples are used to illustrate the behavior of the classifiers. The proposed classifiers are then evaluated over eight datasets, showing competitive performances against other naive Bayes classifiers that use Gaussian distributions or discretization to manage directional data

    Topics in computational statistics

    Full text link
    Research in computational statistics develops numerically efficient methods to estimate statistical models, with Monte Carlo algorithms a subset of such methods. This thesis develops novel Monte Carlo methods to solve three important problems in Bayesian statistics. For many complex models, it is prohibitively expensive to run simulation methods such as Markov chain Monte Carlo (MCMC) on the model directly when the likelihood function includes an intractable term or is computationally challenging in some other way. The first two topics investigate models having such likelihoods. The third topic proposes a novel model to solve a popular question in causal inference, which requires solving a computationally challenging problem. The first application is to symbolic data analysis, where classical data are summarised and represented as symbolic objects. The likelihood function of such aggregated-level data is often intractable as it usually includes a high dimensional integral with large exponents. Bayesian inference on symbolic data is carried out in the thesis by using a pseudo-marginal method, which replaces the likelihood function with its unbiased estimate. The second application is to doubly intractable models, where the likelihood includes an intractable normalising constant. The pseudo-marginal method is combined with the introduction of an auxiliary variable to obtain simulation consistent inference. The proposed algorithm offers a generic solution to a wider range of problems, where the existing methods are often impractical as the assumptions required for their application do not hold. The last application is to causal inference using Bayesian additive regression trees (BART), a non-parametric Bayesian regression technique. The likelihood function is complex as it is based on a sum of trees whose structures change dynamically with the MCMC iterates. An extension to BART is developed to estimate the heterogeneous treatment effect, aiming to overcome the regularisation-induced confounding issue which is often observed in the direct application of BART in causal inference
    corecore