315 research outputs found
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Scalable Inference of Customer Similarities from Interactions Data using Dirichlet Processes
Under the sociological theory of homophily, people who are similar to one
another are more likely to interact with one another. Marketers often have
access to data on interactions among customers from which, with homophily as a
guiding principle, inferences could be made about the underlying similarities.
However, larger networks face a quadratic explosion in the number of potential
interactions that need to be modeled. This scalability problem renders
probability models of social interactions computationally infeasible for all
but the smallest networks. In this paper we develop a probabilistic framework
for modeling customer interactions that is both grounded in the theory of
homophily, and is flexible enough to account for random variation in who
interacts with whom. In particular, we present a novel Bayesian nonparametric
approach, using Dirichlet processes, to moderate the scalability problems that
marketing researchers encounter when working with networked data. We find that
this framework is a powerful way to draw insights into latent similarities of
customers, and we discuss how marketers can apply these insights to
segmentation and targeting activities
Harold Jeffreys's Theory of Probability Revisited
Published exactly seventy years ago, Jeffreys's Theory of Probability (1939)
has had a unique impact on the Bayesian community and is now considered to be
one of the main classics in Bayesian Statistics as well as the initiator of the
objective Bayes school. In particular, its advances on the derivation of
noninformative priors as well as on the scaling of Bayes factors have had a
lasting impact on the field. However, the book reflects the characteristics of
the time, especially in terms of mathematical rigor. In this paper we point out
the fundamental aspects of this reference work, especially the thorough
coverage of testing problems and the construction of both estimation and
testing noninformative priors based on functional divergences. Our major aim
here is to help modern readers in navigating in this difficult text and in
concentrating on passages that are still relevant today.Comment: This paper commented in: [arXiv:1001.2967], [arXiv:1001.2968],
[arXiv:1001.2970], [arXiv:1001.2975], [arXiv:1001.2985], [arXiv:1001.3073].
Rejoinder in [arXiv:0909.1008]. Published in at
http://dx.doi.org/10.1214/09-STS284 the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS
The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research
Finite Mixture Models based on Scale Mixtures of Skew-Normal distributions applied to serological data
Serological data can be described as a mixture of distributions, with each mixture component representing a serological population (e.g. seronegative and seropositive population). In seroepidemiological studies of infectious diseases, mixture models with Normal distribution are mostly used, which implies that the components that make up the mixture are approximately symmetric. However, it has been observed that, especially in seropositive populations, it is possible to observe skewness to the left, leading to the violation of the assumption of normality underlying the data. Thus, and in order to capture the possible skewness in serological data, the family of Scale Mixtures of Skew-Normal (SMSN) distributions is used, of which the Skew-Normal distribution and the Skew-t distribution are particular cases. In the case of the Skew-t distribution, being a heavy-tailed distribution, it allows capturing the possible existence of outliers.
In addition to the models used to describe the behavior of the serological data, the issue of estimating the cutoff point for classifying an individual as seropositive is explored. In this sense, two perspectives on the problem are presented: one in which the true state of the disease is unknown; another in which this state is known a priori.
The generalization of the use of a cutoff point without statistical methodology to support the estimation of this point may have consequences in the seroprevalence of a population, that is, in the proportion of seropositive individuals. Thus, three methods based on mixture models are proposed in this work for estimating the cutoff point when the true infection status is unknown
An Introduction to Inductive Statistical Inference: from Parameter Estimation to Decision-Making
These lecture notes aim at a post-Bachelor audience with a background at an
introductory level in Applied Mathematics and Applied Statistics. They discuss
the logic and methodology of the Bayes-Laplace approach to inductive
statistical inference that places common sense and the guiding lines of the
scientific method at the heart of systematic analyses of quantitative-empirical
data. Following an exposition of exactly solvable cases of single- and
two-parameter estimation problems, the main focus is laid on Markov Chain Monte
Carlo (MCMC) simulations on the basis of Hamiltonian Monte Carlo sampling of
posterior joint probability distributions for regression parameters occurring
in generalised linear models for a univariate outcome variable. The modelling
of fixed effects as well as of correlated varying effects via multi-level
models in non-centred parametrisation is considered. The simulation of
posterior predictive distributions is outlined. The assessment of a model's
relative out-of-sample posterior predictive accuracy with information
entropy-based criteria WAIC and LOOIC and model comparison with Bayes factors
are addressed. Concluding, a conceptual link to the behavioural subjective
expected utility representation of a single decision-maker's choice behaviour
in static one-shot decision problems is established. Vectorised codes for MCMC
simulations of multi-dimensional posterior joint probability distributions with
the Stan probabilistic programming language implemented in the statistical
software R are provided. The lecture notes are fully hyperlinked. They direct
the reader to original scientific research papers, online resources on
inductive statistical inference, and to pertinent biographical information.Comment: 161 pages, 22 *.eps figures, LaTeX2e, hyperlinked references. First
thorough revision, extended list of reference
- …