27,305 research outputs found
Algorithm engineering for optimal alignment of protein structure distance matrices
Protein structural alignment is an important problem in computational
biology. In this paper, we present first successes on provably optimal pairwise
alignment of protein inter-residue distance matrices, using the popular Dali
scoring function. We introduce the structural alignment problem formally, which
enables us to express a variety of scoring functions used in previous work as
special cases in a unified framework. Further, we propose the first
mathematical model for computing optimal structural alignments based on dense
inter-residue distance matrices. We therefore reformulate the problem as a
special graph problem and give a tight integer linear programming model. We
then present algorithm engineering techniques to handle the huge integer linear
programs of real-life distance matrix alignment problems. Applying these
techniques, we can compute provably optimal Dali alignments for the very first
time
Integer Linear Programming for Sequence Problems: A general approach to reduce the problem size
Sequence problems belong to the most challenging interdisciplinary topics
of the actuality. They are ubiquitous in science and daily life and occur, for
example, in form of DNA sequences encoding all information of an
organism, as a text (natural or formal) or in form of a computer program.
Therefore, sequence problems occur in many variations in computational
biology (drug development), coding theory, data compression, quantitative
and computational linguistics (e.g. machine translation).
In recent years appeared some proposals to formulate sequence
problems like the closest string problem (CSP) and the farthest string
problem (FSP) as an Integer Linear Programming Problem (ILPP). In the
present talk we present a general novel approach to reduce the size of the
ILPP by grouping isomorphous columns of the string matrix together. The
approach is of practical use, since the solution of sequence problems is very
time consuming, in particular when the sequences are long.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Large neighborhood search for the most strings with few bad columns problem
In this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.Peer ReviewedPostprint (author's final draft
A Lagrangian relaxation approach for the multiple sequence alignment problem
We present a branch-and-bound (bb) algorithm for the multiple sequence alignment problem (MSA), one of the most important problems in computational biology. The upper bound at each bb node is based on a Lagrangian relaxation of an integer linear programming formulation for MSA. Dualizing certain inequalities, the Lagrangian subproblem becomes a pairwise alignment problem, which can be solved efficiently by a dynamic programming approach. Due to a reformulation w.r.t. additionally introduced variables prior to relaxation we improve the convergence rate dramatically while at the same time being able to solve the Lagrangian problem efficiently. Our experiments show that our implementation, although preliminary, outperforms all exact algorithms for the multiple sequence alignment problem
An exact mathematical programming approach to multiple RNA sequence-structure alignment
One of the main tasks in computational biology is the computation of
alignments of genomic sequences to reveal their commonalities. In case of DNA
or protein sequences, sequence information alone is usually sufficient to
compute reliable alignments. RNA molecules, however, build spatial
conformations—the secondary structure—that are more conserved than the actual
sequence. Hence, computing reliable alignments of RNA molecules has to take
into account the secondary structure. We present a novel framework for the
computation of exact multiple sequence-structure alignments: We give a graph-
theoretic representation of the sequence-structure alignment problem and
phrase it as an integer linear program. We identify a class of constraints
that make the problem easier to solve and relax the original integer linear
program in a Lagrangian manner. Experiments on a recently published benchmark
show that our algorithms has a comparable performance than more costly dynamic
programming algorithms, and outperforms all other approaches in terms of
solution quality with an increasing number of input sequences
- …