156 research outputs found

    Fast Arc-Annotated Subsequence Matching in Linear Space

    Full text link
    An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings PP and QQ the arc-preserving subsequence problem is to determine if PP can be obtained from QQ by deleting bases from QQ. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are ``nested'' are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. [ACM Trans. Algorithms 2006] gave an algorithm for this problem using O(nm)O(nm) time and space, where mm and nn are the lengths of PP and QQ, respectively. In this paper we present a new algorithm using O(nm)O(nm) time and O(n+m)O(n + m) space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated strings.Comment: To appear in Algoritmic

    Comparing RNA structures using a full set of biologically relevant edit operations is intractable

    Get PDF
    7 pagesArc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see \cite{IGMA_BliDenDul08} for more details), the most important one is the general edit distance. The problem of computing an edit distance between two non-crossing arc-annotated sequences was introduced in \cite{Evans99}. The introduced model uses edit operations that involve either single letters or pairs of letters (never considered separately) and is solvable in polynomial-time \cite{ZhangShasha:1989}. To account for other possible RNA structural evolutionary events, new edit operations, allowing to consider either silmutaneously or separately letters of a pair were introduced in \cite{jiangli}; unfortunately at the cost of computational tractability. It has been proved that comparing two RNA secondary structures using a full set of biologically relevant edit operations is {\sf\bf NP}-complete. Nevertheless, in \cite{DBLP:conf/spire/GuignonCH05}, the authors have used a strong combinatorial restriction in order to compare two RNA stem-loops with a full set of biologically relevant edit operations; which have allowed them to design a polynomial-time and space algorithm for comparing general secondary RNA structures. In this paper we will prove theoretically that comparing two RNA structures using a full set of biologically relevant edit operations cannot be done without strong combinatorial restrictions

    Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem

    Get PDF
    Finding the longest common subsequence of a given set of input strings is a relevant problem arising in various practical settings. One of these problems is the so-called longest arc-preserving common subsequence problem. This NP-hard combinatorial optimization problem was introduced for the comparison of arc-annotated ribonucleic acid (RNA) sequences. In this work we present an integer linear programming (ILP) formulation of the problem. As even in the context of rather small problem instances the application of a general purpose ILP solver is not viable due to the size of the model, we study alternative ways based on model reduction in order to take profit from this ILP model. First, we present a heuristic way for reducing the model, with the subsequent application of an ILP solver. Second, we propose the application of an iterative hybrid algorithm that makes use of an ILP solver for generating high quality solutions at each iteration. Experimental results concerning artificial and real problem instances show that the proposed techniques outperform an available technique from the literature.Peer ReviewedPostprint (author's final draft

    How to compare arc-annotated sequences: The alignment hierarchy

    No full text
    International audienceWe describe a new unifying framework to express comparison of arc-annotated sequences, which we call alignment of arc-annotated sequences. We first prove that this framework encompasses main existing models, which allows us to deduce complexity results for several cases from the literature. We also show that this framework gives rise to new relevant problems that have not been studied yet. We provide a thorough analysis of these novel cases by proposing two polynomial time algorithms and an NP-completeness proof. This leads to an almost exhaustive study of alignment of arc-annotated sequences

    How to compare arc-annotated sequences: The alignment hierarchy

    Get PDF
    International audienceWe describe a new unifying framework to express comparison of arc-annotated sequences, which we call alignment of arc-annotated sequences. We first prove that this framework encompasses main existing models, which allows us to deduce complexity results for several cases from the literature. We also show that this framework gives rise to new relevant problems that have not been studied yet. We provide a thorough analysis of these novel cases by proposing two polynomial time algorithms and an NP-completeness proof. This leads to an almost exhaustive study of alignment of arc-annotated sequences

    A hybrid evolutionary algorithm based on solution merging for the longest arc-preserving common subsequence problem

    Get PDF
    The longest arc-preserving common subsequence problem is an NP-hard combinatorial optimization problem from the field of computational biology. This problem finds applications, in particular, in the comparison of art-annotated ribonucleic acid (RNA) sequences. In this work we propose a simple, hybrid evolutionary algorithm to tackle this problem. The most important feature of this algorithm concerns a crossover operator based on solution merging. In solution merging, two or more solutions to the problem are merged, and an exact technique is used to find the best solution within this union. It is experimentally shown that the proposed algorithm outperforms a heuristic from the literature.Peer ReviewedPostprint (author's final draft

    What Makes the Arc-Preserving Subsequence Problem Hard?

    Get PDF
    International audienceGiven two arc-annotated sequences (S, P ) and (T, Q) representing RNA structures, the Arc-Preserving Subsequence (APS) problem asks whether (T, Q) can be obtained from (S, P ) by deleting some of its bases (together with their incident arcs, if any). In previous studies [3, 6], this problem has been naturally divided into subproblems reflecting intrinsic complexity of arc structures. We show that APS(Crossing, Plain) is NP-complete, thereby answering an open problem [6]. Furthermore, to get more insight into where actual border of APS hardness is, we refine APS classical subproblems in much the same way as in [11] and give a complete categorization among various restrictions of APS problem complexity

    Lightweight comparison of RNAs based on exact sequence–structure matches

    Get PDF
    Motivation: Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence–structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs

    A list of parameterized problems in bioinformatics

    Get PDF
    In this report we present a list of problems that originated in bionformatics. Our aim is to collect information on such problems that have been analyzed from the point of view of Parameterized Complexity. For every problem we give its definition and biological motivation together with known complexity results.Postprint (published version
    corecore