240 research outputs found
Multiple sequence alignment based on set covers
We introduce a new heuristic for the multiple alignment of a set of
sequences. The heuristic is based on a set cover of the residue alphabet of the
sequences, and also on the determination of a significant set of blocks
comprising subsequences of the sequences to be aligned. These blocks are
obtained with the aid of a new data structure, called a suffix-set tree, which
is constructed from the input sequences with the guidance of the
residue-alphabet set cover and generalizes the well-known suffix tree of the
sequence set. We provide performance results on selected BAliBASE amino-acid
sequences and compare them with those yielded by some prominent approaches
Inapproximability of maximal strip recovery
In comparative genomic, the first step of sequence analysis is usually to
decompose two or more genomes into syntenic blocks that are segments of
homologous chromosomes. For the reliable recovery of syntenic blocks, noise and
ambiguities in the genomic maps need to be removed first. Maximal Strip
Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff
for reliably recovering syntenic blocks from genomic maps in the midst of noise
and ambiguities. Given genomic maps as sequences of gene markers, the
objective of \msr{d} is to find subsequences, one subsequence of each
genomic map, such that the total length of syntenic blocks in these
subsequences is maximized. For any constant , a polynomial-time
2d-approximation for \msr{d} was previously known. In this paper, we show that
for any , \msr{d} is APX-hard, even for the most basic version of the
problem in which all gene markers are distinct and appear in positive
orientation in each genomic map. Moreover, we provide the first explicit lower
bounds on approximating \msr{d} for all . In particular, we show that
\msr{d} is NP-hard to approximate within . From the other
direction, we show that the previous 2d-approximation for \msr{d} can be
optimized into a polynomial-time algorithm even if is not a constant but is
part of the input. We then extend our inapproximability results to several
related problems including \cmsr{d}, \gapmsr{\delta}{d}, and
\gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the
Proceedings of the 20th International Symposium on Algorithms and Computation
(ISAAC 2009) and the Proceedings of the 4th International Frontiers of
Algorithmics Workshop (FAW 2010
Structure of conflict graphs in constraint alignment problems and algorithms
We consider the constrained graph alignment problem which has applications in
biological network analysis. Given two input graphs , a pair of vertex mappings induces an {\it edge conservation} if
the vertex pairs are adjacent in their respective graphs. %In general terms The
goal is to provide a one-to-one mapping between the vertices of the input
graphs in order to maximize edge conservation. However the allowed mappings are
restricted since each vertex from (resp. ) is allowed to be mapped
to at most (resp. ) specified vertices in (resp. ). Most
of results in this paper deal with the case which attracted most
attention in the related literature. We formulate the problem as a maximum
independent set problem in a related {\em conflict graph} and investigate
structural properties of this graph in terms of forbidden subgraphs. We are
interested, in particular, in excluding certain wheals, fans, cliques or claws
(all terms are defined in the paper), which corresponds in excluding certain
cycles, paths, cliques or independent sets in the neighborhood of each vertex.
Then, we investigate algorithmic consequences of some of these properties,
which illustrates the potential of this approach and raises new horizons for
further works. In particular this approach allows us to reinterpret a known
polynomial case in terms of conflict graph and to improve known approximation
and fixed-parameter tractability results through efficiently solving the
maximum independent set problem in conflict graphs. Some of our new
approximation results involve approximation ratios that are function of the
optimal value, in particular its square root; this kind of results cannot be
achieved for maximum independent set in general graphs.Comment: 22 pages, 6 figure
Evaluation of ILP-based approaches for partitioning into colorful components
The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances
An optimal adaptive Fictitious Domain Method
We consider a Fictitious Domain formulation of an elliptic partial
differential equation and approximate the resulting saddle-point system using
an inexact preconditioned Uzawa iterative algorithm. Each iteration entails the
approximation of an elliptic problems performed using adaptive finite element
methods. We prove that the overall method converges with the best possible rate
and illustrate numerically our theoretical findings
- …