315 research outputs found

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Scalable Inference of Customer Similarities from Interactions Data using Dirichlet Processes

    Get PDF
    Under the sociological theory of homophily, people who are similar to one another are more likely to interact with one another. Marketers often have access to data on interactions among customers from which, with homophily as a guiding principle, inferences could be made about the underlying similarities. However, larger networks face a quadratic explosion in the number of potential interactions that need to be modeled. This scalability problem renders probability models of social interactions computationally infeasible for all but the smallest networks. In this paper we develop a probabilistic framework for modeling customer interactions that is both grounded in the theory of homophily, and is flexible enough to account for random variation in who interacts with whom. In particular, we present a novel Bayesian nonparametric approach, using Dirichlet processes, to moderate the scalability problems that marketing researchers encounter when working with networked data. We find that this framework is a powerful way to draw insights into latent similarities of customers, and we discuss how marketers can apply these insights to segmentation and targeting activities

    Harold Jeffreys's Theory of Probability Revisited

    Full text link
    Published exactly seventy years ago, Jeffreys's Theory of Probability (1939) has had a unique impact on the Bayesian community and is now considered to be one of the main classics in Bayesian Statistics as well as the initiator of the objective Bayes school. In particular, its advances on the derivation of noninformative priors as well as on the scaling of Bayes factors have had a lasting impact on the field. However, the book reflects the characteristics of the time, especially in terms of mathematical rigor. In this paper we point out the fundamental aspects of this reference work, especially the thorough coverage of testing problems and the construction of both estimation and testing noninformative priors based on functional divergences. Our major aim here is to help modern readers in navigating in this difficult text and in concentrating on passages that are still relevant today.Comment: This paper commented in: [arXiv:1001.2967], [arXiv:1001.2968], [arXiv:1001.2970], [arXiv:1001.2975], [arXiv:1001.2985], [arXiv:1001.3073]. Rejoinder in [arXiv:0909.1008]. Published in at http://dx.doi.org/10.1214/09-STS284 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Finite Mixture Models based on Scale Mixtures of Skew-Normal distributions applied to serological data

    Get PDF
    Serological data can be described as a mixture of distributions, with each mixture component representing a serological population (e.g. seronegative and seropositive population). In seroepidemiological studies of infectious diseases, mixture models with Normal distribution are mostly used, which implies that the components that make up the mixture are approximately symmetric. However, it has been observed that, especially in seropositive populations, it is possible to observe skewness to the left, leading to the violation of the assumption of normality underlying the data. Thus, and in order to capture the possible skewness in serological data, the family of Scale Mixtures of Skew-Normal (SMSN) distributions is used, of which the Skew-Normal distribution and the Skew-t distribution are particular cases. In the case of the Skew-t distribution, being a heavy-tailed distribution, it allows capturing the possible existence of outliers. In addition to the models used to describe the behavior of the serological data, the issue of estimating the cutoff point for classifying an individual as seropositive is explored. In this sense, two perspectives on the problem are presented: one in which the true state of the disease is unknown; another in which this state is known a priori. The generalization of the use of a cutoff point without statistical methodology to support the estimation of this point may have consequences in the seroprevalence of a population, that is, in the proportion of seropositive individuals. Thus, three methods based on mixture models are proposed in this work for estimating the cutoff point when the true infection status is unknown

    An Introduction to Inductive Statistical Inference: from Parameter Estimation to Decision-Making

    Full text link
    These lecture notes aim at a post-Bachelor audience with a background at an introductory level in Applied Mathematics and Applied Statistics. They discuss the logic and methodology of the Bayes-Laplace approach to inductive statistical inference that places common sense and the guiding lines of the scientific method at the heart of systematic analyses of quantitative-empirical data. Following an exposition of exactly solvable cases of single- and two-parameter estimation problems, the main focus is laid on Markov Chain Monte Carlo (MCMC) simulations on the basis of Hamiltonian Monte Carlo sampling of posterior joint probability distributions for regression parameters occurring in generalised linear models for a univariate outcome variable. The modelling of fixed effects as well as of correlated varying effects via multi-level models in non-centred parametrisation is considered. The simulation of posterior predictive distributions is outlined. The assessment of a model's relative out-of-sample posterior predictive accuracy with information entropy-based criteria WAIC and LOOIC and model comparison with Bayes factors are addressed. Concluding, a conceptual link to the behavioural subjective expected utility representation of a single decision-maker's choice behaviour in static one-shot decision problems is established. Vectorised codes for MCMC simulations of multi-dimensional posterior joint probability distributions with the Stan probabilistic programming language implemented in the statistical software R are provided. The lecture notes are fully hyperlinked. They direct the reader to original scientific research papers, online resources on inductive statistical inference, and to pertinent biographical information.Comment: 161 pages, 22 *.eps figures, LaTeX2e, hyperlinked references. First thorough revision, extended list of reference

    Probabilistic models for structured sparsity

    Get PDF
    • …
    corecore