130 research outputs found
A new graph-based method for pairwise global network alignment
<p>Abstract</p> <p>Background</p> <p>In addition to component-based comparative approaches, <it>network alignments </it>provide the means to study conserved network topology such as common pathways and more complex network motifs. Yet, unlike in classical sequence alignment, the comparison of networks becomes computationally more challenging, as most meaningful assumptions instantly lead to <it>NP</it>-hard problems. Most previous algorithmic work on network alignments is heuristic in nature.</p> <p>Results</p> <p>We introduce the graph-based <it>maximum structural matching </it>formulation for pairwise global network alignment. We relate the formulation to previous work and prove <it>NP</it>-hardness of the problem.</p> <p>Based on the new formulation we build upon recent results in computational structural biology and present a novel Lagrangian relaxation approach that, in combination with a branch-and-bound method, computes provably optimal network alignments. The Lagrangian algorithm alone is a powerful heuristic method, which produces solutions that are often near-optimal and – unlike those computed by pure heuristics – come with a quality guarantee.</p> <p>Conclusion</p> <p>Computational experiments on the alignment of protein-protein interaction networks and on the classification of metabolic subnetworks demonstrate that the new method is reasonably fast and has advantages over pure heuristics. Our software tool is freely available as part of the L<smcaps>I</smcaps>SA library.</p
Next Generation Cluster Editing
This work aims at improving the quality of structural variant prediction from
the mapped reads of a sequenced genome. We suggest a new model based on cluster
editing in weighted graphs and introduce a new heuristic algorithm that allows
to solve this problem quickly and with a good approximation on the huge graphs
that arise from biological datasets
On optimal comparability editing with applications to molecular diagnostics
<p>Abstract</p> <p>Background</p> <p>The C<smcaps>OMPARABILITY</smcaps> E<smcaps>DITING</smcaps> problem appears in the context of hierarchical disease classification based on noisy data. We are given a directed graph <it>G </it>representing hierarchical relationships between patient subgroups. The task is to identify the minimum number of edge insertions or deletions to transform <it>G </it>into a transitive graph, that is, if edges (<it>u</it>, <it>v</it>) and (<it>v</it>, <it>w</it>) are present then edge (<it>u</it>, <it>w</it>) must be present, too.</p> <p>Results</p> <p>We present two new approaches for the problem based on fixed-parameter algorithmics and integer linear programming. In contrast to previously used heuristics, our approaches compute provably optimal solutions.</p> <p>Conclusion</p> <p>Our computational results demonstrate that our exact algorithms are by far more efficient in practice than a previously used heuristic approach. In addition to the superior running time performance, our algorithms are capable of enumerating all optimal solutions, and naturally solve the weighted version of the problem.</p
An exact mathematical programming approach to multiple RNA sequence-structure alignment
One of the main tasks in computational biology is the computation of
alignments of genomic sequences to reveal their commonalities. In case of DNA
or protein sequences, sequence information alone is usually sufficient to
compute reliable alignments. RNA molecules, however, build spatial
conformations—the secondary structure—that are more conserved than the actual
sequence. Hence, computing reliable alignments of RNA molecules has to take
into account the secondary structure. We present a novel framework for the
computation of exact multiple sequence-structure alignments: We give a graph-
theoretic representation of the sequence-structure alignment problem and
phrase it as an integer linear program. We identify a class of constraints
that make the problem easier to solve and relax the original integer linear
program in a Lagrangian manner. Experiments on a recently published benchmark
show that our algorithms has a comparable performance than more costly dynamic
programming algorithms, and outperforms all other approaches in terms of
solution quality with an increasing number of input sequences
Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem
Peptide sequencing from mass spectrometry data is a key step in proteome
research. Especially de novo sequencing, the identification of a peptide from
its spectrum alone, is still a challenge even for state-of-the-art algorithmic
approaches. In this paper we present Antilope, a new fast and flexible approach
based on mathematical programming. It builds on the spectrum graph model and
works with a variety of scoring schemes. Antilope combines Lagrangian
relaxation for solving an integer linear programming formulation with an
adaptation of Yen's k shortest paths algorithm. It shows a significant
improvement in running time compared to mixed integer optimization and performs
at the same speed like other state-of-the-art tools. We also implemented a
generic probabilistic scoring scheme that can be trained automatically for a
dataset of annotated spectra and is independent of the mass spectrometer type.
Evaluations on benchmark data show that Antilope is competitive to the popular
state-of-the-art programs PepNovo and NovoHMM both in terms of run time and
accuracy. Furthermore, it offers increased flexibility in the number of
considered ion types. Antilope will be freely available as part of the open
source proteomics library OpenMS
Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization
Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. Results: We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. Conclusions: The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of inpu
A Realistic Model under which the Genetic Code is Optimal
The genetic code has a high level of error robustness. Using values of
hydrophobicity scales as a proxy for amino acid character, and the Mean Square
measure as a function quantifying error robustness, a value can be obtained for
a genetic code which reflects the error robustness of that code. By comparing
this value with a distribution of values belonging to codes generated by random
permutations of amino acid assignments, the level of error robustness of a
genetic code can be quantified. We present a calculation in which the standard
genetic code is shown to be optimal. We obtain this result by (1) using
recently updated values of polar requirement as input; (2) fixing seven
assignments (Ile, Trp, His, Phe, Tyr, Arg, and Leu) based on aptamer
considerations; and (3) using known biosynthetic relations of the 20 amino
acids. This last point is reflected in an approach of subdivision (restricting
the random reallocation of assignments to amino acid subgroups, the set of 20
being divided in four such subgroups). The three approaches to explain
robustness of the code (specific selection for robustness, amino acid-RNA
interactions leading to assignments, or a slow growth process of assignment
patterns) are reexamined in light of our findings. We offer a comprehensive
hypothesis, stressing the importance of biosynthetic relations, with the code
evolving from an early stage with just glycine and alanine, via intermediate
stages, towards 64 codons carrying todays meaning.Comment: 22 pages, 3 figures, 4 tables Journal of Molecular Evolution, July
201
eXamine: a Cytoscape app for exploring annotated modules in networks
Background. Biological networks have growing importance for the
interpretation of high-throughput "omics" data. Statistical and combinatorial
methods allow to obtain mechanistic insights through the extraction of smaller
subnetwork modules. Further enrichment analyses provide set-based annotations
of these modules.
Results. We present eXamine, a set-oriented visual analysis approach for
annotated modules that displays set membership as contours on top of a
node-link layout. Our approach extends upon Self Organizing Maps to
simultaneously lay out nodes, links, and set contours.
Conclusions. We implemented eXamine as a freely available Cytoscape app.
Using eXamine we study a module that is activated by the virally-encoded
G-protein coupled receptor US28 and formulate a novel hypothesis about its
functioning
- …