222 research outputs found
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Topic Modelling Meets Deep Neural Networks: A Survey
Topic modelling has been a successful technique for text analysis for almost
twenty years. When topic modelling met deep neural networks, there emerged a
new and increasingly popular research area, neural topic models, with over a
hundred models developed and a wide range of applications in neural language
understanding such as text generation, summarisation and language models. There
is a need to summarise research developments and discuss open problems and
future directions. In this paper, we provide a focused yet comprehensive
overview of neural topic models for interested researchers in the AI community,
so as to facilitate them to navigate and innovate in this fast-growing research
area. To the best of our knowledge, ours is the first review focusing on this
specific topic.Comment: A review on Neural Topic Model
spatial and temporal predictions for positive vectors
Predicting a given pixel from surrounding neighboring pixels is of great interest for several image processing tasks. To model images, many researchers use Gaussian distributions. However, some data are obviously non-Gaussian, such as the image clutter and texture. In such cases, predictors are hard to derive and to obtain. In this
thesis, we analytically derive a new non-linear predictor based on an inverted Dirichlet mixture. The non-linear combination of the neighbouring pixels and the combination of the mixture parameters demonstrate a good efficiency in predicting pixels. In order
to prove the efficacy of our predictor, we use two challenging tasks, which are; object detection and image restoration.
We also develop a pixel prediction framework based on a finite generalized inverted Dirichlet (GID) mixture model that has proven its efficiency in several machine learning applications. We propose a GID optimal predictor, and we learn its parameters using a likelihood-based approach combined with the Newton-Raphson method. We demonstrate the efficiency of our proposed approach through a challenging application, namely image inpainting, and we compare the experimental results with related-work methods.
Finally, we build a new time series state space model based on inverted Dirichlet distribution. We use the power steady modeling approach and we derive an analytical expression of the model latent variable using the maximum a posteriori technique.
We also approximate the predictive density using local variational inference, and we validate our model on the electricity consumption time series dataset of Germany. A comparison with the Generalized Dirichlet state space model is conducted, and the results demonstrate the merits of our approach in modeling continuous positive vectors
A Tutorial on Bayesian Nonparametric Models
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.Comment: 28 pages, 8 figure
Semiparametric Bayesian Density Estimation with Disparate Data Sources: A Meta-Analysis of Global Childhood Undernutrition
Undernutrition, resulting in restricted growth, and quantified here using
height-for-age z-scores, is an important contributor to childhood morbidity and
mortality. Since all levels of mild, moderate and severe undernutrition are of
clinical and public health importance, it is of interest to estimate the shape
of the z-scores' distributions.
We present a finite normal mixture model that uses data on 4.3 million
children to make annual country-specific estimates of these distributions for
under-5-year-old children in the world's 141 low- and middle-income countries
between 1985 and 2011. We incorporate both individual-level data when
available, as well as aggregated summary statistics from studies whose
individual-level data could not be obtained. We place a hierarchical Bayesian
probit stick-breaking model on the mixture weights. The model allows for
nonlinear changes in time, and it borrows strength in time, in covariates, and
within and across regional country clusters to make estimates where data are
uncertain, sparse, or missing.
This work addresses three important problems that often arise in the fields
of public health surveillance and global health monitoring. First, data are
always incomplete. Second, different data sources commonly use different
reporting metrics. Last, distributions, and especially their tails, are often
of substantive interest.Comment: 41 total pages, 6 figures, 1 tabl
- …