14 research outputs found

    A linear memory algorithm for Baum-Welch training

    Get PDF
    Background: Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. Methods and results: We introduce a linear space algorithm for Baum-Welch training. For a hidden Markov model with M states, T free transition and E free emission parameters, and an input sequence of length L, our new algorithm requires O(M) memory and O(L M T_max (T + E)) time for one Baum-Welch iteration, where T_max is the maximum number of states that any state is connected to. The most memory efficient algorithm until now was the checkpointing algorithm with O(log(L) M) memory and O(log(L) L M T_max) time requirement. Our novel algorithm thus renders the memory requirement completely independent of the length of the training sequences. More generally, for an n-hidden Markov model and n input sequences of length L, the memory requirement of O(log(L) L^(n-1) M) is reduced to O(L^(n-1) M) memory while the running time is changed from O(log(L) L^n M T_max + L^n (T + E)) to O(L^n M T_max (T + E)). Conclusions: For the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can thus be both faster and more memory-efficient than any of the existing algorithms.Comment: 14 pages, 1 figure version 2: fixed some errors, final version of pape

    HMMConverter 1.0: a toolbox for hidden Markov models

    Get PDF
    Hidden Markov models (HMMs) and their variants are widely used in Bioinformatics applications that analyze and compare biological sequences. Designing a novel application requires the insight of a human expert to define the model's architecture. The implementation of prediction algorithms and algorithms to train the model's parameters, however, can be a time-consuming and error-prone task. We here present HMMConverter, a software package for setting up probabilistic HMMs, pair-HMMs as well as generalized HMMs and pair-HMMs. The user defines the model itself and the algorithms to be used via an XML file which is then directly translated into efficient C++ code. The software package provides linear-memory prediction algorithms, such as the Hirschberg algorithm, banding and the integration of prior probabilities and is the first to present computationally efficient linear-memory algorithms for automatic parameter training. Users of HMMConverter can thus set up complex applications with a minimum of effort and also perform parameter training and data analyses for large data sets

    Duration learning for analysis of nanopore ionic current blockades

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties, with potential implications for DNA sequencing. The alpha-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern. Typically, recorded current blockade signals have several levels of blockade, with various durations, all obeying a fixed statistical profile for a given molecule. Hidden Markov Model (HMM) based duration learning experiments on artificial two-level Gaussian blockade signals helped us to identify proper modeling framework. We then apply our framework to the real multi-level DNA hairpin blockade signal.</p> <p>Results</p> <p>The identified upper level blockade state is observed with durations that are geometrically distributed (consistent with an a physical decay process for remaining in any given state). We show that mixture of convolution chains of geometrically distributed states is better for presenting multimodal long-tailed duration phenomena. Based on learned HMM profiles we are able to classify 9 base-pair DNA hairpins with accuracy up to 99.5% on signals from same-day experiments.</p> <p>Conclusion</p> <p>We have demonstrated several implementations for <it>de novo </it>estimation of duration distribution probability density function with HMM framework and applied our model topology to the real data. The proposed design could be handy in molecular analysis based on nanopore current blockade signal.</p

    Огляд сучасних розробок прогнозування часових рядів з використанням прихованих марківських моделей

    Get PDF
    The review of modern developments in the area of time series forecasting using Markov processes has been performed. The main problems arising in the process of using Markov models (HMM) in forecasting and algorithms of their solution are considered. The basic algorithm of time series forecasting using HMM has been presented. The review of basic algorithm modifications has been performed. The range of questions to investigate in this area of research has been discussed.Выполнен обзор современных разработок в области использования марковских процессов к решению задачи прогнозирования временных рядов. Рассмотрены основные проблемы, возникающие в процессе использования марковских моделей (СММ) в прогнозировании, и алгоритмы их решения. Представлен базовый алгоритм прогнозирования временных рядов с использованием СММ. Сделан обзор модификаций базового алгоритма. Рассмотрен круг открытых вопросов в этой области исследований.Виконано огляд сучасних розробок у сфері використання марківських процесів до вирішення задачі прогнозування часових рядів. Розглянуто основні проблеми, які виникають у процесі використання прихованих марківських моделей (ПММ) у прогнозуванні, та алгоритми їх розв’язку. Переставлено базовий алгоритм прогнозування часових рядів з використанням ПММ. Зроблено огляд модифікацій базового алгоритму. Розглянуто коло відкритих питань у цій сфері досліджень.

    Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Baum-Welch learning procedure for Hidden Markov Models (HMMs) provides a powerful tool for tailoring HMM topologies to data for use in knowledge discovery and clustering. A linear memory procedure recently proposed by <it>Miklós, I. and Meyer, I.M. </it>describes a memory sparse version of the Baum-Welch algorithm with modifications to the original probabilistic table topologies to make memory use independent of sequence length (and linearly dependent on state number). The original description of the technique has some errors that we amend. We then compare the corrected implementation on a variety of data sets with conventional and checkpointing implementations.</p> <p>Results</p> <p>We provide a correct recurrence relation for the emission parameter estimate and extend it to parameter estimates of the Normal distribution. To accelerate estimation of the prior state probabilities, and decrease memory use, we reverse the originally proposed forward sweep. We describe different scaling strategies necessary in all real implementations of the algorithm to prevent underflow. In this paper we also describe our approach to a linear memory implementation of the Viterbi decoding algorithm (with linearity in the sequence length, while memory use is approximately independent of state number). We demonstrate the use of the linear memory implementation on an extended Duration Hidden Markov Model (DHMM) and on an HMM with a spike detection topology. Comparing the various implementations of the Baum-Welch procedure we find that the checkpointing algorithm produces the best overall tradeoff between memory use and speed. In cases where sequence length is very large (for Baum-Welch), or state number is very large (for Viterbi), the linear memory methods outlined may offer some utility.</p> <p>Conclusion</p> <p>Our performance-optimized Java implementations of Baum-Welch algorithm are available at <url>http://logos.cs.uno.edu/~achurban</url>. The described method and implementations will aid sequence alignment, gene structure prediction, HMM profile training, nanopore ionic flow blockades analysis and many other domains that require efficient HMM training with EM.</p

    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

    Full text link
    Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC
    corecore