240 research outputs found

    Multiple sequence alignment based on set covers

    Full text link
    We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

    Inapproximability of maximal strip recovery

    Get PDF
    In comparative genomic, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given dd genomic maps as sequences of gene markers, the objective of \msr{d} is to find dd subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant d2d \ge 2, a polynomial-time 2d-approximation for \msr{d} was previously known. In this paper, we show that for any d2d \ge 2, \msr{d} is APX-hard, even for the most basic version of the problem in which all gene markers are distinct and appear in positive orientation in each genomic map. Moreover, we provide the first explicit lower bounds on approximating \msr{d} for all d2d \ge 2. In particular, we show that \msr{d} is NP-hard to approximate within Ω(d/logd)\Omega(d/\log d). From the other direction, we show that the previous 2d-approximation for \msr{d} can be optimized into a polynomial-time algorithm even if dd is not a constant but is part of the input. We then extend our inapproximability results to several related problems including \cmsr{d}, \gapmsr{\delta}{d}, and \gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC 2009) and the Proceedings of the 4th International Frontiers of Algorithmics Workshop (FAW 2010

    Structure of conflict graphs in constraint alignment problems and algorithms

    Get PDF
    We consider the constrained graph alignment problem which has applications in biological network analysis. Given two input graphs G1=(V1,E1),G2=(V2,E2)G_1=(V_1,E_1), G_2=(V_2,E_2), a pair of vertex mappings induces an {\it edge conservation} if the vertex pairs are adjacent in their respective graphs. %In general terms The goal is to provide a one-to-one mapping between the vertices of the input graphs in order to maximize edge conservation. However the allowed mappings are restricted since each vertex from V1V_1 (resp. V2V_2) is allowed to be mapped to at most m1m_1 (resp. m2m_2) specified vertices in V2V_2 (resp. V1V_1). Most of results in this paper deal with the case m2=1m_2=1 which attracted most attention in the related literature. We formulate the problem as a maximum independent set problem in a related {\em conflict graph} and investigate structural properties of this graph in terms of forbidden subgraphs. We are interested, in particular, in excluding certain wheals, fans, cliques or claws (all terms are defined in the paper), which corresponds in excluding certain cycles, paths, cliques or independent sets in the neighborhood of each vertex. Then, we investigate algorithmic consequences of some of these properties, which illustrates the potential of this approach and raises new horizons for further works. In particular this approach allows us to reinterpret a known polynomial case in terms of conflict graph and to improve known approximation and fixed-parameter tractability results through efficiently solving the maximum independent set problem in conflict graphs. Some of our new approximation results involve approximation ratios that are function of the optimal value, in particular its square root; this kind of results cannot be achieved for maximum independent set in general graphs.Comment: 22 pages, 6 figure

    Evaluation of ILP-based approaches for partitioning into colorful components

    Get PDF
    The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances

    An optimal adaptive Fictitious Domain Method

    Get PDF
    We consider a Fictitious Domain formulation of an elliptic partial differential equation and approximate the resulting saddle-point system using an inexact preconditioned Uzawa iterative algorithm. Each iteration entails the approximation of an elliptic problems performed using adaptive finite element methods. We prove that the overall method converges with the best possible rate and illustrate numerically our theoretical findings
    corecore