    Spectral dimensionality reduction for HMMs

    Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix

    When black box algorithms are (not) appropriate: a principled prediction-problem ontology

    In the 1980s a new, extraordinarily productive way of reasoning about algorithms emerged. Though this type of reasoning has come to dominate areas of data science, it has been under-discussed and its impact under-appreciated. For example, it is the primary way we reason about "black box" algorithms. In this paper we analyze its current use (i.e., as "the common task framework") and its limitations; we find a large class of prediction-problems are inappropriate for this type of reasoning. Further, we find the common task framework does not provide a foundation for the deployment of an algorithm in a real world situation. Building off of its core features, we identify a class of problems where this new form of reasoning can be used in deployment. We purposefully develop a novel framework so both technical and non-technical people can discuss and identify key features of their prediction problem and whether or not it is suitable for this new kind of reasoning

    Bridging the Usability Gap: Theoretical and Methodological Advances for Spectral Learning of Hidden Markov Models

    The Baum-Welch (B-W) algorithm is the most widely accepted method for inferring hidden Markov models (HMM). However, it is prone to getting stuck in local optima, and can be too slow for many real-time applications. Spectral learning of HMMs (SHMMs), based on the method of moments (MOM) has been proposed in the literature to overcome these obstacles. Despite its promises, asymptotic theory for SHMM has been elusive, and the long-run performance of SHMM can degrade due to unchecked propogation of error. In this paper, we (1) provide an asymptotic distribution for the approximate error of the likelihood estimated by SHMM, and (2) propose a novel algorithm called projected SHMM (PSHMM) that mitigates the problem of error propogation, and (3) develop online learning variantions of both SHMM and PSHMM that accommodate potential nonstationarity. We compare the performance of SHMM with PSHMM and estimation through the B-W algorithm on both simulated data and data from real world applications, and find that PSHMM not only retains the computational advantages of SHMM, but also provides more robust estimation and forecasting

    Nonlinear Permuted Granger Causality

    Granger causal inference is a contentious but widespread method used in fields ranging from economics to neuroscience. The original definition addresses the notion of causality in time series by establishing functional dependence conditional on a specified model. Adaptation of Granger causality to nonlinear data remains challenging, and many methods apply in-sample tests that do not incorporate out-of-sample predictability, leading to concerns of model overfitting. To allow for out-of-sample comparison, a measure of functional connectivity is explicitly defined using permutations of the covariate set. Artificial neural networks serve as featurizers of the data to approximate any arbitrary, nonlinear relationship, and consistent estimation of the variance for each permutation is shown under certain conditions on the featurization process and the model residuals. Performance of the permutation method is compared to penalized variable selection, naive replacement, and omission techniques via simulation, and it is applied to neuronal responses of acoustic stimuli in the auditory cortex of anesthetized rats. Targeted use of the Granger causal framework, when prior knowledge of the causal mechanisms in a dataset are limited, can help to reveal potential predictive relationships between sets of variables that warrant further study

    Change Point Detection with Conceptors

    Offline change point detection retrospectively locates change points in a time series. Many nonparametric methods that target i.i.d. mean and variance changes fail in the presence of nonlinear temporal dependence, and model based methods require a known, rigid structure. For the at most one change point problem, we propose use of a conceptor matrix to learn the characteristic dynamics of a baseline training window with arbitrary dependence structure. The associated echo state network acts as a featurizer of the data, and change points are identified from the nature of the interactions between the features and their relationship to the baseline state. This model agnostic method can suggest potential locations of interest that warrant further study. We prove that, under mild assumptions, the method provides a consistent estimate of the true change point, and quantile estimates are produced via a moving block bootstrap of the original data. The method is evaluated with clustering metrics and Type 1 error control on simulated data, and applied to publicly available neural data from rats experiencing bouts of non-REM sleep prior to exploration of a radial maze. With sufficient spacing, the framework provides a simple extension to the sparse, multiple change point problem

    Spectral estimation of hidden Markov models

    This thesis extends and improves methods for estimating key quantities of hidden Markov models through spectral method-of-moments estimation. Unlike traditional estimation methods like EM and Gibbs sampling, the set of estimation methods, which we call spectral HMMs (sHMMs), are incredibly fast, do not require multiple restarts, and come with provable guarantees. Our first result improves upon the original spectral estimation of hidden Markov models algorithm by estimating the parameters from fully reduced data. We also show that the parameters developed in the fully reduced dimensional version can be estimated using various forms of regression, which can lead to major speed gains, as well as allowing flexibility in the estimation scheme. We then extend the algorithm beyond basic hidden Markov models to latent variable tree structures that have linguistic applications, especially dependency parsing, and finally to hidden Markov models in which the output is a high-dimensional, continuously distributed variable. We show that spectral estimation of hidden Markov models can be factored into two major components- estimation of the hidden state space dynamics, and estimation of the observation probability distributions. This leads to extremely flexible estimation procedures that can be tailored precisely for the task of interest. These tools are all simple to implement, fast, and naturally incorporate dimension reduction, which allows them to scale gracefully as the dimension of the data increases

    The q–q Boxplot

    Boxplots have become an extremely popular display of distribution summaries for collections of data, especially when we need to visualize summaries for several collections simultaneously. The whiskers in the boxplot show only the extent of the tails for most of the data (with outside values denoted separately); more detailed information about the shape of the tails, such as skewness and \weight" relative to a standard reference distribution, is much better displayed via quantile-quantile (q-q) plots. We incorporate the q-q plot's tail information into the traditional boxplot by replacing the boxplot's whiskers with the tails from a q-q plot, and display these tails with con dence bands for the tails that would be expected from the tails of the reference distribution. We describe the construction of the "q-q boxplot" and demonstrate its advantages over earlier proposed boxplot modi cations on data from economics and neuroscience, which illustrate q-q boxplots' effectiveness in showing important tail behavior especially for large datasets. The package qqboxplot (an extension to the ggplot2 package (Wickham, 2016)) is available for the R (R Core Team, 2020) programming language.This is a manuscript of an article published as Rodu, J., & Kafadar, K. (2022). The q–q Boxplot. Journal of Computational and Graphical Statistics, 31(1), 26-39. doi:10.1080/10618600.2021.1938586. Posted with permission of CSAFE