42 research outputs found

    A Lagrangian relaxation approach for the multiple sequence alignment problem

    Get PDF
    We present a branch-and-bound (bb) algorithm for the multiple sequence alignment problem (MSA), one of the most important problems in computational biology. The upper bound at each bb node is based on a Lagrangian relaxation of an integer linear programming formulation for MSA. Dualizing certain inequalities, the Lagrangian subproblem becomes a pairwise alignment problem, which can be solved efficiently by a dynamic programming approach. Due to a reformulation w.r.t. additionally introduced variables prior to relaxation we improve the convergence rate dramatically while at the same time being able to solve the Lagrangian problem efficiently. Our experiments show that our implementation, although preliminary, outperforms all exact algorithms for the multiple sequence alignment problem

    On Tree-Constrained Matchings and Generalizations

    Get PDF
    We consider the following \textsc{Tree-Constrained Bipartite Matching} problem: Given two rooted trees T1=(V1,E1)T_1=(V_1,E_1), T2=(V2,E2)T_2=(V_2,E_2) and a weight function w:V1×V2R+w: V_1\times V_2 \mapsto \mathbb{R}_+, find a maximum weight matching M\mathcal{M} between nodes of the two trees, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is APX\mathcal{APX}-hard and thus, unless P=NP\mathcal{P} = \mathcal{NP}, disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a 22-approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of 2o(1)2-o(1). In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a 2kρ2k\rho-approximation for the kk-dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by ρ\rho. We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on ρ\rho is most likely unavoidable

    On tree-constrained matchings and generalizations

    Get PDF

    Computing H/D-Exchange rates of single residues from data of proteolytic fragments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein conformation and protein/protein interaction can be elucidated by solution-phase Hydrogen/Deuterium exchange (sHDX) coupled to high-resolution mass analysis of the digested protein or protein complex. In sHDX experiments mutant proteins are compared to wild-type proteins or a ligand is added to the protein and compared to the wild-type protein (or mutant). The number of deuteriums incorporated into the polypeptides generated from the protease digest of the protein is related to the solvent accessibility of amide protons within the original protein construct.</p> <p>Results</p> <p>In this work, sHDX data was collected on a 14.5 T FT-ICR MS. An algorithm was developed based on combinatorial optimization that predicts deuterium exchange with high spatial resolution based on the sHDX data of overlapping proteolytic fragments. Often the algorithm assigns deuterium exchange with single residue resolution.</p> <p>Conclusions</p> <p>With our new method it is possible to automatically determine deuterium exchange with higher spatial resolution than the level of digested fragments.</p

    Shape Distributions and Protein Similarity

    No full text

    {LASA}: A Tool for Non-heuristic Alignment of Multiple Sequences

    No full text

    The duplication-loss small phylogeny problem: from cherries to trees.

    No full text
    Abstract The reconstruction of the history of evolutionary genome-wide events among a set of related organisms is of great biological interest since it can help to reveal the genomic basis of phenotypes. The sequencing of whole genomes faciliates the study of gene families that vary in size through duplication and loss events, like transfer RNA. However, a high sequence similarity often does not allow one to distinguish between orthologs and paralogs. Previous methods have addressed this difficulty by taking into account flanking regions of members of a family independently. We go one step further by inferring the order of genes of (a set of) families for ancestral genomes by considering the order of these genes on sequenced genomes. We present a novel branch-and-cut algorithm to solve the two species small phylogeny problem in the evolutionary model of duplications and losses. On average, our implementation, DupLoCut, improves the running time of a recently proposed method in the experiments on six Vibrionaceae lineages by a factor of â&lt;88&gt;¼200. Besides the mere improvement in running time, the efficiency of our approach allows us to extend our model from cherries of a species tree, that is, subtrees with two leaves, to the median of three species setting. Being able to determine the median of three species is of key importance to one of the most common approaches to ancestral reconstruction, and our experiments show that its repeated computation considerably reduces the number of duplications and losses along the tree both on simulated instances comprising 128 leaves and a set of Bacillus genomes. Furthermore, in our simulations we show that a reduction in cost goes hand in hand with an improvement of the predicted ancestral genomes. Finally, we prove that the small phylogeny problem in the duplication-loss model is NP-complete already for two species
    corecore