A Bayesian Approach to Learning Hidden Markov Model Topology with Applications to Biological Sequence Analysis

Schliep, Alexander

thesis

A Bayesian Approach to Learning Hidden Markov Model Topology with Applications to Biological Sequence Analysis

Authors: Alexander Schliep
Publication date: 1 January 2001
Publisher

Abstract

Hidden-Markov-Models (HMMs) are a widely and successfully used tool in statistical modeling and statistical pattern recognition. One fundamental problem in the application of HMMs is finding the underlying architecture or topology, particularly when there is no strong evidence from the application domain — e.g., when doing black box modeling. Topology is important with regard to good parameter estimates and with regard to performance: A model with “too many” states — and hence too many parameters — requires too much training data while an model with “not enough” states impedes the HMM from capturing subtle statistical patterns. We have developed a novel algorithm that, given sequence data originating from an ergodic process, infers an HMM, its topology and its parameters. We introduce a Bayesian approach