33,687 research outputs found
On Computing the Total Variation Distance of Hidden Markov Models
We prove results on the decidability and complexity of computing the total variation distance (equivalently, the L_1-distance) of hidden Markov models (equivalently, labelled Markov chains). This distance measures the difference between the distributions on words that two hidden Markov models induce. The main results are: (1) it is undecidable whether the distance is greater than a given threshold; (2) approximation is #P-hard and in PSPACE
On the Inability of Markov Models to Capture Criticality in Human Mobility
We examine the non-Markovian nature of human mobility by exposing the
inability of Markov models to capture criticality in human mobility. In
particular, the assumed Markovian nature of mobility was used to establish a
theoretical upper bound on the predictability of human mobility (expressed as a
minimum error probability limit), based on temporally correlated entropy. Since
its inception, this bound has been widely used and empirically validated using
Markov chains. We show that recurrent-neural architectures can achieve
significantly higher predictability, surpassing this widely used upper bound.
In order to explain this anomaly, we shed light on several underlying
assumptions in previous research works that has resulted in this bias. By
evaluating the mobility predictability on real-world datasets, we show that
human mobility exhibits scale-invariant long-range correlations, bearing
similarity to a power-law decay. This is in contrast to the initial assumption
that human mobility follows an exponential decay. This assumption of
exponential decay coupled with Lempel-Ziv compression in computing Fano's
inequality has led to an inaccurate estimation of the predictability upper
bound. We show that this approach inflates the entropy, consequently lowering
the upper bound on human mobility predictability. We finally highlight that
this approach tends to overlook long-range correlations in human mobility. This
explains why recurrent-neural architectures that are designed to handle
long-range structural correlations surpass the previously computed upper bound
on mobility predictability
Learning loopy graphical models with latent variables: Efficient methods and guarantees
The problem of structure estimation in graphical models with latent variables
is considered. We characterize conditions for tractable graph estimation and
develop efficient methods with provable guarantees. We consider models where
the underlying Markov graph is locally tree-like, and the model is in the
regime of correlation decay. For the special case of the Ising model, the
number of samples required for structural consistency of our method scales
as , where p is the
number of variables, is the minimum edge potential, is
the depth (i.e., distance from a hidden node to the nearest observed nodes),
and is a parameter which depends on the bounds on node and edge
potentials in the Ising model. Necessary conditions for structural consistency
under any algorithm are derived and our method nearly matches the lower bound
on sample requirements. Further, the proposed method is practical to implement
and provides flexibility to control the number of latent variables and the
cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Recovering Structured Probability Matrices
We consider the problem of accurately recovering a matrix B of size M by M ,
which represents a probability distribution over M2 outcomes, given access to
an observed matrix of "counts" generated by taking independent samples from the
distribution B. How can structural properties of the underlying matrix B be
leveraged to yield computationally efficient and information theoretically
optimal reconstruction algorithms? When can accurate reconstruction be
accomplished in the sparse data regime? This basic problem lies at the core of
a number of questions that are currently being considered by different
communities, including building recommendation systems and collaborative
filtering in the sparse data regime, community detection in sparse random
graphs, learning structured models such as topic models or hidden Markov
models, and the efforts from the natural language processing community to
compute "word embeddings".
Our results apply to the setting where B has a low rank structure. For this
setting, we propose an efficient algorithm that accurately recovers the
underlying M by M matrix using Theta(M) samples. This result easily translates
to Theta(M) sample algorithms for learning topic models and learning hidden
Markov Models. These linear sample complexities are optimal, up to constant
factors, in an extremely strong sense: even testing basic properties of the
underlying matrix (such as whether it has rank 1 or 2) requires Omega(M)
samples. We provide an even stronger lower bound where distinguishing whether a
sequence of observations were drawn from the uniform distribution over M
observations versus being generated by an HMM with two hidden states requires
Omega(M) observations. This precludes sublinear-sample hypothesis tests for
basic properties, such as identity or uniformity, as well as sublinear sample
estimators for quantities such as the entropy rate of HMMs
Distinguishing Hidden Markov Chains
Hidden Markov Chains (HMCs) are commonly used mathematical models of
probabilistic systems. They are employed in various fields such as speech
recognition, signal processing, and biological sequence analysis. We consider
the problem of distinguishing two given HMCs based on an observation sequence
that one of the HMCs generates. More precisely, given two HMCs and an
observation sequence, a distinguishing algorithm is expected to identify the
HMC that generates the observation sequence. Two HMCs are called
distinguishable if for every there is a distinguishing
algorithm whose error probability is less than . We show that one
can decide in polynomial time whether two HMCs are distinguishable. Further, we
present and analyze two distinguishing algorithms for distinguishable HMCs. The
first algorithm makes a decision after processing a fixed number of
observations, and it exhibits two-sided error. The second algorithm processes
an unbounded number of observations, but the algorithm has only one-sided
error. The error probability, for both algorithms, decays exponentially with
the number of processed observations. We also provide an algorithm for
distinguishing multiple HMCs. Finally, we discuss an application in stochastic
runtime verification.Comment: This is the full version of a LICS'16 pape
Kinetic distance and kinetic maps from molecular dynamics simulation
Characterizing macromolecular kinetics from molecular dynamics (MD)
simulations requires a distance metric that can distinguish
slowly-interconverting states. Here we build upon diffusion map theory and
define a kinetic distance for irreducible Markov processes that quantifies how
slowly molecular conformations interconvert. The kinetic distance can be
computed given a model that approximates the eigenvalues and eigenvectors
(reaction coordinates) of the MD Markov operator. Here we employ the
time-lagged independent component analysis (TICA). The TICA components can be
scaled to provide a kinetic map in which the Euclidean distance corresponds to
the kinetic distance. As a result, the question of how many TICA dimensions
should be kept in a dimensionality reduction approach becomes obsolete, and one
parameter less needs to be specified in the kinetic model construction. We
demonstrate the approach using TICA and Markov state model (MSM) analyses for
illustrative models, protein conformation dynamics in bovine pancreatic trypsin
inhibitor and protein-inhibitor association in trypsin and benzamidine
Identifiability and consistent estimation of nonparametric translation hidden Markov models with general state space
This paper considers hidden Markov models where the observations are given as
the sum of a latent state which lies in a general state space and some
independent noise with unknown distribution. It is shown that these fully
nonparametric translation models are identifiable with respect to both the
distribution of the latent variables and the distribution of the noise, under
mostly a light tail assumption on the latent variables. Two nonparametric
estimation methods are proposed and we prove that the corresponding estimators
are consistent for the weak convergence topology. These results are illustrated
with numerical experiments
A cross-center smoothness prior for variational Bayesian brain tissue segmentation
Suppose one is faced with the challenge of tissue segmentation in MR images,
without annotators at their center to provide labeled training data. One option
is to go to another medical center for a trained classifier. Sadly, tissue
classifiers do not generalize well across centers due to voxel intensity shifts
caused by center-specific acquisition protocols. However, certain aspects of
segmentations, such as spatial smoothness, remain relatively consistent and can
be learned separately. Here we present a smoothness prior that is fit to
segmentations produced at another medical center. This informative prior is
presented to an unsupervised Bayesian model. The model clusters the voxel
intensities, such that it produces segmentations that are similarly smooth to
those of the other medical center. In addition, the unsupervised Bayesian model
is extended to a semi-supervised variant, which needs no visual interpretation
of clusters into tissues.Comment: 12 pages, 2 figures, 1 table. Accepted to the International
Conference on Information Processing in Medical Imaging (2019
- …