164 research outputs found
Model-based clustering of categorical data based on the Hamming distance
A model-based approach is developed for clustering categorical data with no
natural ordering. The proposed method exploits the Hamming distance to define a
family of probability mass functions to model the data. The elements of this
family are then considered as kernels of a finite mixture model with unknown
number of components. Conjugate Bayesian inference has been derived for the
parameters of the Hamming distribution model. The mixture is framed in a
Bayesian nonparametric setting and a transdimensional blocked Gibbs sampler is
developed to provide full Bayesian inference on the number of clusters, their
structure and the group-specific parameters, facilitating the computation with
respect to customary reversible jump algorithms. The proposed model encompasses
a parsimonious latent class model as a special case, when the number of
components is fixed. Model performances are assessed via a simulation study and
reference datasets, showing improvements in clustering recovery over existing
approaches
A nonparametric HMM for genetic imputation and coalescent inference
Genetic sequence data are well described by hidden Markov models (HMMs) in
which latent states correspond to clusters of similar mutation patterns. Theory
from statistical genetics suggests that these HMMs are nonhomogeneous (their
transition probabilities vary along the chromosome) and have large support for
self transitions. We develop a new nonparametric model of genetic sequence
data, based on the hierarchical Dirichlet process, which supports these self
transitions and nonhomogeneity. Our model provides a parameterization of the
genetic process that is more parsimonious than other more general nonparametric
models which have previously been applied to population genetics. We provide
truncation-free MCMC inference for our model using a new auxiliary sampling
scheme for Bayesian nonparametric HMMs. In a series of experiments on male X
chromosome data from the Thousand Genomes Project and also on data simulated
from a population bottleneck we show the benefits of our model over the popular
finite model fastPHASE, which can itself be seen as a parametric truncation of
our model. We find that the number of HMM states found by our model is
correlated with the time to the most recent common ancestor in population
bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics
applied to large and complex genetic data
Probabilistic Models for Droughts: Applications in Trigger Identification, Predictor Selection and Index Development
The current practice of drought declaration (US Drought Monitor) provides a hard classification of droughts using various hydrologic variables. However, this method does not yield model uncertainty, and is very limited for forecasting upcoming droughts. The primary goal of this thesis is to develop and implement methods that incorporate uncertainty estimation into drought characterization, thereby enabling more informed and better decision making by water users and managers. Probabilistic models using hydrologic variables are developed, yielding new insights into drought characterization enabling fundamental applications in droughts
Nonparametric Identification and Estimation of Earnings Dynamics using a Hidden Markov Model: Evidence from the PSID
This paper presents a hidden Markov model designed to investigate the complex
nature of earnings persistence. The proposed model assumes that the residuals
of log-earnings consist of a persistent component and a transitory component,
both following general Markov processes. Nonparametric identification is
achieved through spectral decomposition of linear operators, and a modified
stochastic EM algorithm is introduced for model estimation. Applying the
framework to the Panel Study of Income Dynamics (PSID) dataset, we find that
the earnings process displays nonlinear persistence, conditional skewness, and
conditional kurtosis. Additionally, the transitory component is found to
possess non-Gaussian properties, resulting in a significantly asymmetric
distributional impact when high-earning households face negative shocks or
low-earning households encounter positive shocks. Our empirical findings also
reveal the presence of ARCH effects in earnings at horizons ranging from 2 to 8
years, further highlighting the complex dynamics of earnings persistence
Scanpath modeling and classification with Hidden Markov Models
How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.published_or_final_versio
- …