Search CORE

240 research outputs found

Multiple sequence alignment based on set covers

Author: A. Bahr
B. Manthey
B. Morgenstern
B. Morgenstern
C. Notredame
D. Gusfield
G. Vogt
J.D. Thompson
K. Katoh
O. Gotoh
P. Zhao
R.E. Green
R.F. Smith
S. Henikoff
T. Müller
T.P. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

arXiv.org e-Print Archive

CiteSeerX

Crossref

Inapproximability of maximal strip recovery

Author: C. Zheng
C.H. Papadimitriou
E. Hazan
I. Dinur
J. Akiyama
J. Akiyama
L. Bulteau
L. Wang
M. Chlebík
M. Jiang
M. Jiang
P. Alimonti
R. Bar-Yehuda
R.B. Lyngsø
Z. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In comparative genomic, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given

d

genomic maps as sequences of gene markers, the objective of \msr{d} is to find

d

subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant

d \ge 2

, a polynomial-time 2d-approximation for \msr{d} was previously known. In this paper, we show that for any

d \ge 2

, \msr{d} is APX-hard, even for the most basic version of the problem in which all gene markers are distinct and appear in positive orientation in each genomic map. Moreover, we provide the first explicit lower bounds on approximating \msr{d} for all

d \ge 2

. In particular, we show that \msr{d} is NP-hard to approximate within

\Omega(d/\log d)

. From the other direction, we show that the previous 2d-approximation for \msr{d} can be optimized into a polynomial-time algorithm even if

d

is not a constant but is part of the input. We then extend our inapproximability results to several related problems including \cmsr{d}, \gapmsr{\delta}{d}, and \gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC 2009) and the Proceedings of the 4th International Frontiers of Algorithmics Workshop (FAW 2010

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Structure of conflict graphs in constraint alignment problems and algorithms

Author: Alkan Ferhat
Bıyıkoğlu Türker
Demange Marc
Erten Cesim
Publication venue
Publication date: 01/01/2019
Field of study

We consider the constrained graph alignment problem which has applications in biological network analysis. Given two input graphs

G_1=(V_1,E_1), G_2=(V_2,E_2)

, a pair of vertex mappings induces an {\it edge conservation} if the vertex pairs are adjacent in their respective graphs. %In general terms The goal is to provide a one-to-one mapping between the vertices of the input graphs in order to maximize edge conservation. However the allowed mappings are restricted since each vertex from

V_1

(resp.

V_2

) is allowed to be mapped to at most

m_1

(resp.

m_2

) specified vertices in

V_2

(resp.

V_1

). Most of results in this paper deal with the case

m_2=1

which attracted most attention in the related literature. We formulate the problem as a maximum independent set problem in a related {\em conflict graph} and investigate structural properties of this graph in terms of forbidden subgraphs. We are interested, in particular, in excluding certain wheals, fans, cliques or claws (all terms are defined in the paper), which corresponds in excluding certain cycles, paths, cliques or independent sets in the neighborhood of each vertex. Then, we investigate algorithmic consequences of some of these properties, which illustrates the potential of this approach and raises new horizons for further works. In particular this approach allows us to reinterpret a known polynomial case in terms of conflict graph and to improve known approximation and fixed-parameter tractability results through efficiently solving the maximum independent set problem in conflict graphs. Some of our new approximation results involve approximation ratios that are function of the optimal value, in particular its square root; this kind of results cannot be achieved for maximum independent set in general graphs.Comment: 22 pages, 6 figure

arXiv.org e-Print Archive

Episciences.org

Antalya Bilim University Institutional Repository

Evaluation of ILP-based approaches for partitioning into colorful components

Author: Bruckner S.
Hüffner F.
Komusiewicz Ch.
Niedermeier R.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2013
Field of study

The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

An optimal adaptive Fictitious Domain Method

Author: Berrone Stefano
Bonito Andrea
Stevenson Rob
Verani Marco
Publication venue
Publication date: 20/09/2018
Field of study

We consider a Fictitious Domain formulation of an elliptic partial differential equation and approximate the resulting saddle-point system using an inexact preconditioned Uzawa iterative algorithm. Each iteration entails the approximation of an elliptic problems performed using adaptive finite element methods. We prove that the overall method converges with the best possible rate and illustrate numerically our theoretical findings

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

UvA-DARE

International Migration, Integration and Social Cohesion online publications