34 research outputs found

    Comparing Genomes with Duplications: A Computational Complexity Point of View

    Full text link

    How to compare arc-annotated sequences: The alignment hierarchy

    Get PDF
    International audienceWe describe a new unifying framework to express comparison of arc-annotated sequences, which we call alignment of arc-annotated sequences. We first prove that this framework encompasses main existing models, which allows us to deduce complexity results for several cases from the literature. We also show that this framework gives rise to new relevant problems that have not been studied yet. We provide a thorough analysis of these novel cases by proposing two polynomial time algorithms and an NP-completeness proof. This leads to an almost exhaustive study of alignment of arc-annotated sequences

    A Faster Algorithm for Finding Minimum Tucker Submatrices.

    No full text
    A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1s on each row are consecutive. Algorithmic issues of the C1P are central in computational molecular biology, in particular for physical mapping and ancestral genome reconstruction. In 1972, Tucker gave a characterization of matrices that have the C1P by a set of forbidden submatrices, and a substantial amount of research has been devoted to the problem of efficiently finding such a minimum size forbidden submatrix. This paper presents a new O(Δ^3 m 2 (mΔ + n^3)) time algorithm for this particular task for a m ×n binary matrix with at most Δ 1-entries per row, thereby improving the O(Δ^3 m 2(mn + n^3)) time algorithm of Dom et al. [17]

    Common Structured Patterns in Linear Graphs: Approximation and Combinatorics

    No full text
    A linear graph is a graph whose vertices are linearly ordered. This linear ordering allows pairs of disjoint edges to be either preceding (<), nesting ( N ) or crossing ( C ). Given a family of linear graphs, and a non-empty subset R 86 {<, N, C}, we are interested in the Maximum Common Structured Pattern (MCSP) problem: find a maximum size edge-disjoint graph, with edge-pairs all comparable by one of the relations in R, that occurs as a subgraph in each of the linear graphs of the family. The MCSP problem generalizes many structure-comparison and structure-prediction problems that arise in computational molecular biology. We give tight hardness results for the MCSP problem for {<, C }-structured pat- terns and { N, C }-structured patterns. Furthermore, we prove that the problem is approximable within ratios: (i) 2H (k) for {<, C }-structured patterns, (ii) 1ak for { N, C }-structured patterns, and (iii) O( 1a(k log k) ) for {<, N, C }-structured patterns, where k is the size of the optimal solution and H (k) is the k-th harmonic number. Also, we provide combinatorial results concerning the different types of structured patterns that are of independent interest in their own right

    Approximating the 2-Interval Pattern problem

    Get PDF
    We address the problem of approximating the 2-Interval Pattern problem over its various models and restrictions. This problem, which is motivated by RNA secondary structure prediction, asks to find a maximum cardinality subset of a 2-interval set with respect to some prespecified model. For each such model, we give varying approximation quality depending on the different possible restrictions imposed on the input 2-interval set

    Pattern Matching for Arc-Annotated Sequences

    No full text
    A study of pattern matching for arc-annotated sequences is started. An O(nm) time algorithm is given to determine whether a length m sequence with nested arc annotations is an arc-preserving subsequence of a length n sequence with nested arc annotations, called APS(nested,nested). Arc-annotated sequences and, in particular, those with nested arc structure are motivated by applications in RNA structure comparison. Our algorithm can be used to accelerate a recent fixedparameter algorithm for LAPCS(nested,nested) and generalizes results for ordered tree inclusion problems. In particular, the presented dynamic programming methodology implies a quadratic time algorithm for an open problem posed by Vialette
    corecore