19 research outputs found
Spectral dimensionality reduction for HMMs
Hidden Markov Models (HMMs) can be accurately approximated using
co-occurrence frequencies of pairs and triples of observations by using a fast
spectral method in contrast to the usual slow methods like EM or Gibbs
sampling. We provide a new spectral method which significantly reduces the
number of model parameters that need to be estimated, and generates a sample
complexity that does not depend on the size of the observation vocabulary. We
present an elementary proof giving bounds on the relative accuracy of
probability estimates from our model. (Correlaries show our bounds can be
weakened to provide either L1 bounds or KL bounds which provide easier direct
comparisons to previous work.) Our theorem uses conditions that are checkable
from the data, instead of putting conditions on the unobservable Markov
transition matrix
When black box algorithms are (not) appropriate: a principled prediction-problem ontology
In the 1980s a new, extraordinarily productive way of reasoning about
algorithms emerged. Though this type of reasoning has come to dominate areas of
data science, it has been under-discussed and its impact under-appreciated. For
example, it is the primary way we reason about "black box" algorithms. In this
paper we analyze its current use (i.e., as "the common task framework") and its
limitations; we find a large class of prediction-problems are inappropriate for
this type of reasoning. Further, we find the common task framework does not
provide a foundation for the deployment of an algorithm in a real world
situation. Building off of its core features, we identify a class of problems
where this new form of reasoning can be used in deployment. We purposefully
develop a novel framework so both technical and non-technical people can
discuss and identify key features of their prediction problem and whether or
not it is suitable for this new kind of reasoning
Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models
The Baum-Welch (B-W) algorithm is the most widely accepted method for
inferring hidden Markov models (HMM). However, it is prone to getting stuck in
local optima, and can be too slow for many real-time applications. Spectral
learning of HMMs (SHMMs), based on the method of moments (MOM) has been
proposed in the literature to overcome these obstacles. Despite its promises,
asymptotic theory for SHMM has been elusive, and the long-run performance of
SHMM can degrade due to unchecked propogation of error. In this paper, we (1)
provide an asymptotic distribution for the approximate error of the likelihood
estimated by SHMM, and (2) propose a novel algorithm called projected SHMM
(PSHMM) that mitigates the problem of error propogation, and (3) develop online
learning variantions of both SHMM and PSHMM that accommodate potential
nonstationarity. We compare the performance of SHMM with PSHMM and estimation
through the B-W algorithm on both simulated data and data from real world
applications, and find that PSHMM not only retains the computational advantages
of SHMM, but also provides more robust estimation and forecasting
Nonlinear Permuted Granger Causality
Granger causal inference is a contentious but widespread method used in
fields ranging from economics to neuroscience. The original definition
addresses the notion of causality in time series by establishing functional
dependence conditional on a specified model. Adaptation of Granger causality to
nonlinear data remains challenging, and many methods apply in-sample tests that
do not incorporate out-of-sample predictability, leading to concerns of model
overfitting. To allow for out-of-sample comparison, a measure of functional
connectivity is explicitly defined using permutations of the covariate set.
Artificial neural networks serve as featurizers of the data to approximate any
arbitrary, nonlinear relationship, and consistent estimation of the variance
for each permutation is shown under certain conditions on the featurization
process and the model residuals. Performance of the permutation method is
compared to penalized variable selection, naive replacement, and omission
techniques via simulation, and it is applied to neuronal responses of acoustic
stimuli in the auditory cortex of anesthetized rats. Targeted use of the
Granger causal framework, when prior knowledge of the causal mechanisms in a
dataset are limited, can help to reveal potential predictive relationships
between sets of variables that warrant further study
Change Point Detection with Conceptors
Offline change point detection retrospectively locates change points in a
time series. Many nonparametric methods that target i.i.d. mean and variance
changes fail in the presence of nonlinear temporal dependence, and model based
methods require a known, rigid structure. For the at most one change point
problem, we propose use of a conceptor matrix to learn the characteristic
dynamics of a baseline training window with arbitrary dependence structure. The
associated echo state network acts as a featurizer of the data, and change
points are identified from the nature of the interactions between the features
and their relationship to the baseline state. This model agnostic method can
suggest potential locations of interest that warrant further study. We prove
that, under mild assumptions, the method provides a consistent estimate of the
true change point, and quantile estimates are produced via a moving block
bootstrap of the original data. The method is evaluated with clustering metrics
and Type 1 error control on simulated data, and applied to publicly available
neural data from rats experiencing bouts of non-REM sleep prior to exploration
of a radial maze. With sufficient spacing, the framework provides a simple
extension to the sparse, multiple change point problem
Spectral estimation of hidden Markov models
This thesis extends and improves methods for estimating key quantities of hidden Markov models through spectral method-of-moments estimation. Unlike traditional estimation methods like EM and Gibbs sampling, the set of estimation methods, which we call spectral HMMs (sHMMs), are incredibly fast, do not require multiple restarts, and come with provable guarantees. Our first result improves upon the original spectral estimation of hidden Markov models algorithm by estimating the parameters from fully reduced data. We also show that the parameters developed in the fully reduced dimensional version can be estimated using various forms of regression, which can lead to major speed gains, as well as allowing flexibility in the estimation scheme. We then extend the algorithm beyond basic hidden Markov models to latent variable tree structures that have linguistic applications, especially dependency parsing, and finally to hidden Markov models in which the output is a high-dimensional, continuously distributed variable. We show that spectral estimation of hidden Markov models can be factored into two major components- estimation of the hidden state space dynamics, and estimation of the observation probability distributions. This leads to extremely flexible estimation procedures that can be tailored precisely for the task of interest. These tools are all simple to implement, fast, and naturally incorporate dimension reduction, which allows them to scale gracefully as the dimension of the data increases
The q–q Boxplot
Boxplots have become an extremely popular display of distribution summaries for collections of data, especially when we need to visualize summaries for several collections simultaneously. The whiskers in the boxplot show only the extent of the tails for most of the data (with outside values denoted separately); more detailed information about the shape of the tails, such as skewness and \weight" relative to a standard reference distribution, is much better displayed via quantile-quantile (q-q) plots. We incorporate the q-q plot's tail information into the traditional boxplot by replacing the boxplot's whiskers with the tails from a q-q plot, and display these tails with con dence bands for the tails that would be expected from the tails of the reference distribution. We describe the construction of the "q-q boxplot" and demonstrate its advantages over earlier proposed boxplot modi cations on data from economics and neuroscience, which illustrate q-q boxplots' effectiveness in showing important tail behavior especially for large datasets. The package qqboxplot (an extension to the ggplot2 package (Wickham, 2016)) is available for the R (R Core Team, 2020) programming language.This is a manuscript of an article published as Rodu, J., & Kafadar, K. (2022). The q–q Boxplot. Journal of Computational and Graphical Statistics, 31(1), 26-39. doi:10.1080/10618600.2021.1938586. Posted with permission of CSAFE