Search CORE

30,521 research outputs found

Diffusion of Context and Credit Information in Markovian Models

Author: Bengio Y.
Frasconi P.
Publication venue
Publication date: 01/01/1995
Field of study

This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Recovering Structured Probability Matrices

Author: Huang Qingqing
Kakade Sham M.
Kong Weihao
Valiant Gregory
Publication venue
Publication date: 01/01/2018
Field of study

We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient algorithm that accurately recovers the underlying M by M matrix using Theta(M) samples. This result easily translates to Theta(M) sample algorithms for learning topic models and learning hidden Markov Models. These linear sample complexities are optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. We provide an even stronger lower bound where distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by an HMM with two hidden states requires Omega(M) observations. This precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Hidden Markov Models and their Application for Predicting Failure Events

Author: A Gelman
AR Cassandra
DM Blei
DM Blei
G Shani
GA Satten
GD Forney
KP Murphy
LJ Wei
LP Kaelbling
MD Hoffman
P Koprinkova-Hristova
WB Powell
Y Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/05/2020
Field of study

We show how Markov mixed membership models (MMMM) can be used to predict the degradation of assets. We model the degradation path of individual assets, to predict overall failure rates. Instead of a separate distribution for each hidden state, we use hierarchical mixtures of distributions in the exponential family. In our approach the observation distribution of the states is a finite mixture distribution of a small set of (simpler) distributions shared across all states. Using tied-mixture observation distributions offers several advantages. The mixtures act as a regularization for typically very sparse problems, and they reduce the computational effort for the learning algorithm since there are fewer distributions to be found. Using shared mixtures enables sharing of statistical strength between the Markov states and thus transfer learning. We determine for individual assets the trade-off between the risk of failure and extended operating hours by combining a MMMM with a partially observable Markov decision process (POMDP) to dynamically optimize the policy for when and how to maintain the asset.Comment: Will be published in the proceedings of ICCS 2020; @Booklet{EasyChair:3183, author = {Paul Hofmann and Zaid Tashman}, title = {Hidden Markov Models and their Application for Predicting Failure Events}, howpublished = {EasyChair Preprint no. 3183}, year = {EasyChair, 2020}

arXiv.org e-Print Archive

Crossref

Sparse Nested Markov models with Log-linear Parameters

Author: Evans Robin J.
Richardson Thomas S.
Robins James M.
Shpitser Ilya
Publication venue
Publication date: 01/01/2013
Field of study

Hidden variables are ubiquitous in practical data analysis, and therefore modeling marginal densities and doing inference with the resulting models is an important problem in statistics, machine learning, and causal inference. Recently, a new type of graphical model, called the nested Markov model, was developed which captures equality constraints found in marginals of directed acyclic graph (DAG) models. Some of these constraints, such as the so called `Verma constraint', strictly generalize conditional independence. To make modeling and inference with nested Markov models practical, it is necessary to limit the number of parameters in the model, while still correctly capturing the constraints in the marginal of a DAG model. Placing such limits is similar in spirit to sparsity methods for undirected graphical models, and regression models. In this paper, we give a log-linear parameterization which allows sparse modeling with nested Markov models. We illustrate the advantages of this parameterization with a simulation study.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Oxford University Research Archive

Interpretable Input-Output Hidden Markov Model-Based Deep Reinforcement Learning for the Predictive Maintenance of Turbofan Engines

Author: Abbas Ammar N.
Chasparis Georgios C.
Kelleher John
Publication venue: Technological University Dublin
Publication date: 01/01/2023
Field of study

An open research question in deep reinforcement learning is how to focus the policy learning of key decisions within a sparse domain. This paper emphasizes on combining the advantages of input-output hidden Markov models and reinforcement learning. We propose a novel hierarchical modeling methodology that, at a high level, detects and interprets the root cause of a failure as well as the health degradation of the turbofan engine, while at a low level, provides the optimal replacement policy. This approach outperforms baseline deep reinforcement learning (DRL) models and has performance comparable to that of a state-of-the-art reinforcement learning system while being more interpretable

Arrow@TUDublin

Learning the Structure of Deep Sparse Graphical Models

Author: Adams Ryan Prescott
Ghahramani Zoubin
Wallach Hanna M.
Publication venue
Publication date: 01/01/2010
Field of study

Deep belief networks are a powerful way to model complex probability distributions. However, learning the structure of a belief network, particularly one with hidden units, is difficult. The Indian buffet process has been used as a nonparametric Bayesian prior on the directed structure of a belief network with a single infinitely wide hidden layer. In this paper, we introduce the cascading Indian buffet process (CIBP), which provides a nonparametric prior on the structure of a layered, directed belief network that is unbounded in both depth and width, yet allows tractable inference. We use the CIBP prior with the nonlinear Gaussian belief network so each unit can additionally vary its behavior between discrete and continuous representations. We provide Markov chain Monte Carlo algorithms for inference in these belief networks and explore the structures learned on several image data sets.Comment: 20 pages, 6 figures, AISTATS 2010, Revise

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst