91 research outputs found

    Decoding Hidden Markov Models Faster Than Viterbi Via Online Matrix-Vector (max, +)-Multiplication

    Full text link
    In this paper, we present a novel algorithm for the maximum a posteriori decoding (MAPD) of time-homogeneous Hidden Markov Models (HMM), improving the worst-case running time of the classical Viterbi algorithm by a logarithmic factor. In our approach, we interpret the Viterbi algorithm as a repeated computation of matrix-vector (max,+)(\max, +)-multiplications. On time-homogeneous HMMs, this computation is online: a matrix, known in advance, has to be multiplied with several vectors revealed one at a time. Our main contribution is an algorithm solving this version of matrix-vector (max,+)(\max,+)-multiplication in subquadratic time, by performing a polynomial preprocessing of the matrix. Employing this fast multiplication algorithm, we solve the MAPD problem in O(mn2/logn)O(mn^2/ \log n) time for any time-homogeneous HMM of size nn and observation sequence of length mm, with an extra polynomial preprocessing cost negligible for m>nm > n. To the best of our knowledge, this is the first algorithm for the MAPD problem requiring subquadratic time per observation, under the only assumption -- usually verified in practice -- that the transition probability matrix does not change with time.Comment: AAAI 2016, to appea

    Dynamic Controllability Made Simple

    Get PDF
    Simple Temporal Networks with Uncertainty (STNUs) are a well-studied model for representing temporal constraints, where some intervals (contingent links) have an unknown but bounded duration, discovered only during execution. An STNU is dynamically controllable (DC) if there exists a strategy to execute its time-points satisfying all the constraints, regardless of the actual duration of contingent links revealed during execution. In this work we present a new system of constraint propagation rules for STNUs, which is sound-and-complete for DC checking. Our system comprises just three rules which, differently from the ones proposed in all previous works, only generate unconditioned constraints. In particular, after applying our sound rules, the network remains an STNU in all respects. Moreover, our completeness proof is short and non-algorithmic, based on the explicit construction of a valid execution strategy. This is a substantial simplification of the theory which underlies all the polynomial-time algorithms for DC-checking. Our analysis also shows: (1) the existence of late execution strategies for STNUs, (2) the equivalence of several variants of the notion of DC, (3) the existence of a fast algorithm for real-time execution of STNUs, which runs in O(KN) total time in a network with K contingent links and N time points, considerably improving the previous O(N^3)-time bound

    A Streamlined Model of Conditional Simple Temporal Networks - Semantics and Equivalence Results

    Get PDF
    A Conditional Simple Temporal Network (CSTN) augments a Simple Temporal Network to include a new kind of time-points, called observation time-points. The execution of an observation time-point generates information in real time, specifically, the truth value of a propositional letter. In addition, time-points and temporal constraints may be labeled by conjunctions of (positive or negative) propositional letters. A CSTN is called dynamically consistent (DC) if there exists a dynamic strategy for executing its time-points such that no matter how the observations turn out during execution, the time-points whose labels are consistent with those observations have all been executed, and the constraints whose labels are consistent with those observations have all been satisfied. The strategy is dynamic in that its execution decisions may react to observations. The original formulation of CSTNs included propositional labels only on time-points, but the DC-checking algorithm was impractical because it was based on a conversion of the semantic constraints into an exponentially-sized Disjunctive Temporal Network. Later work added propositional labels to temporal constraints, and yielded a sound-and-complete propagation-based DC-checking algorithm, empirically demonstrated to be practical across a variety of CSTNs. This paper introduces a streamlined version of a CSTN in which propositional labels may appear on constraints, but not on time-points. This change simplifies the definition of the DC property, as well as the propagation rules for the DC-checking algorithm. It also simplifies the proofs of the soundness and completeness of those rules. This paper provides two translations from traditional CSTNs to streamlined CSTNs. Each translation preserves the DC property and, for any DC network, ensures that any dynamic execution strategy for that network can be extended to a strategy for its streamlined counterpart. Finally, this paper presents an empirical comparison of two versions of the DC-checking algorithm: the original version and a simplified version for streamlined CSTNs. The comparison is based on CSTN benchmarks from earlier work. For small-sized CSTNs, the original version shows the best performance, but the performance difference between the two versions decreases as the number of time-points in the CSTN increases. We conclude that the simplified algorithm is a practical alternative for checking the dynamic consistency of CSTNs

    Hardness of Covering Alignment : Phase Transition in Post-Sequence Genomics

    Get PDF
    Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a covering alignment of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths R-1 (red) and G(1) (green) in DAG D-1 and two paths R-2 (red) and G(2) (green) in DAG D-2 that cover the nodes of the graphs and maximize the sum of the global alignment scores: asosp(R-1), sp(R-2)) + asosp(G(1)), sp(G(2))), where sp(P) is the concatenation of labels on the path P. Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombinations. We also give a reduction to the other direction, to show that such a recombination-oblivious diploid alignment is NP-hard on alphabets of size 3.Peer reviewe

    Safety in s-t Paths, Trails and Walks

    Get PDF
    Given a directed graph G and a pair of nodes s and t, an s-t bridge of G is an edge whose removal breaks all s-t paths of G (and thus appears in all s-t paths). Computing all s-t bridges of G is a basic graph problem, solvable in linear time. In this paper, we consider a natural generalisation of this problem, with the notion of “safety” from bioinformatics. We say that a walk W is safe with respect to a set W' of s-t walks, if W is a subwalk of all walks in W'. We start by considering the maximal safe walks when consists of: all s-t paths, all s-t trails, or all s-t walks of G. We show that the solutions for the first two problems immediately follow from finding all s-t bridges after incorporating simple characterisations. However, solving the third problem requires non-trivial techniques for incorporating its characterisation. In particular, we show that there exists a compact representation computable in linear time, that allows outputting all maximal safe walks in time linear in their length. Our solutions also directly extend to multigraphs, except for the second problem, which requires a more involved approach. We further generalise these problems, by assuming that safety is defined only with respect to a subset of visible edges. Here we prove a dichotomy between the s-t paths and s-t trails cases, and the s-t walks case: the former two are NP-hard, while the latter is solvable with the same complexity as when all edges are visible. We also show that the same complexity results hold for the analogous generalisations of s-t articulation points (nodes appearing in all s-t paths). We thus obtain the best possible results for natural “safety”-generalisations of these two fundamental graph problems. Moreover, our algorithms are simple and do not employ any complex data structures, making them ideal for use in practice.Peer reviewe

    Optimal Omnitig Listing for Safe and Complete Contig Assembly

    Get PDF
    Genome assembly is the problem of reconstructing a genome sequence from a set of reads from a sequencing experiment. Typical formulations of the assembly problem admit in practice many genomic reconstructions, and actual genome assemblers usually output contigs, namely substrings that are promised to occur in the genome. To bridge the theory and practice, Tomescu and Medvedev [RECOMB 2016] reformulated contig assembly as finding all substrings common to all genomic reconstructions. They also gave a characterization of those walks (omnitigs) that are common to all closed edge-covering walks of a (directed) graph, a typical notion of genomic reconstruction. An algorithm for listing all maximal omnitigs was also proposed, by launching an exhaustive visit from every edge. In this paper, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Omega(nm). We implement this algorithm and show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev [RECOMB 2016]