Search CORE

Next Generation Cluster Editing

Author: Bellitto Thomas
Klau Gunnar W.
Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

University of New Brunswick: Centre for Digital Scholarship Journals

An exact mathematical programming approach to multiple RNA sequence-structure alignment

Author: Bauer Markus
Klau Gunnar W.
Reinert Knut
Publication venue
Publication date: 01/01/2007
Field of study

One of the main tasks in computational biology is the computation of alignments of genomic sequences to reveal their commonalities. In case of DNA or protein sequences, sequence information alone is usually sufficient to compute reliable alignments. RNA molecules, however, build spatial conformations—the secondary structure—that are more conserved than the actual sequence. Hence, computing reliable alignments of RNA molecules has to take into account the secondary structure. We present a novel framework for the computation of exact multiple sequence-structure alignments: We give a graph- theoretic representation of the sequence-structure alignment problem and phrase it as an integer linear program. We identify a class of constraints that make the problem easier to solve and relax the original integer linear program in a Lagrangian manner. Experiments on a recently published benchmark show that our algorithms has a comparable performance than more costly dynamic programming algorithms, and outperforms all other approaches in terms of solution quality with an increasing number of input sequences

Institutional Repository of the Freie Universität Berlin

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

On optimal comparability editing with applications to molecular diagnostics

Author: Briesemeister Sebastian
Böcker Sebastian
Klau Gunnar W
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The C<smcaps>OMPARABILITY</smcaps> E<smcaps>DITING</smcaps> problem appears in the context of hierarchical disease classification based on noisy data. We are given a directed graph <it>G </it>representing hierarchical relationships between patient subgroups. The task is to identify the minimum number of edge insertions or deletions to transform <it>G </it>into a transitive graph, that is, if edges (<it>u</it>, <it>v</it>) and (<it>v</it>, <it>w</it>) are present then edge (<it>u</it>, <it>w</it>) must be present, too. Results We present two new approaches for the problem based on fixed-parameter algorithmics and integer linear programming. In contrast to previously used heuristics, our approaches compute provably optimal solutions. Conclusion Our computational results demonstrate that our exact algorithms are by far more efficient in practice than a previously used heuristic approach. In addition to the superior running time performance, our algorithms are capable of enumerating all optimal solutions, and naturally solve the weighted version of the problem.</p

Springer - Publisher Connector

Crossref

Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem

Author: Andreotti Sandro
Klau Gunnar W.
Reinert Knut
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper we present Antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. Antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen's k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a dataset of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that Antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of run time and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. Antilope will be freely available as part of the open source proteomics library OpenMS

VU Research Portal

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

Author: Bauer Markus
Klau Gunnar W
Reinert Knut
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. Results: We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. Conclusions: The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of inpu

Institutional Repository of the Freie Universität Berlin

Springer - Publisher Connector

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data

Author: Feng Weixing
Klau Gunnar W.
Liu Yunlong
Wang Guohua
Zhu Dongxiao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

VU Research Portal

IUPUIScholarWorks

A Realistic Model under which the Genetic Code is Optimal

Author: Buhrman Harry
Klau Gunnar W.
Schaffner Christian
Speijer Dave
Stougie Leen
van der Gulik Peter T. S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The genetic code has a high level of error robustness. Using values of hydrophobicity scales as a proxy for amino acid character, and the Mean Square measure as a function quantifying error robustness, a value can be obtained for a genetic code which reflects the error robustness of that code. By comparing this value with a distribution of values belonging to codes generated by random permutations of amino acid assignments, the level of error robustness of a genetic code can be quantified. We present a calculation in which the standard genetic code is shown to be optimal. We obtain this result by (1) using recently updated values of polar requirement as input; (2) fixing seven assignments (Ile, Trp, His, Phe, Tyr, Arg, and Leu) based on aptamer considerations; and (3) using known biosynthetic relations of the 20 amino acids. This last point is reflected in an approach of subdivision (restricting the random reallocation of assignments to amino acid subgroups, the set of 20 being divided in four such subgroups). The three approaches to explain robustness of the code (specific selection for robustness, amino acid-RNA interactions leading to assignments, or a slow growth process of assignment patterns) are reexamined in light of our findings. We offer a comprehensive hypothesis, stressing the importance of biosynthetic relations, with the code evolving from an early stage with just glycine and alanine, via intermediate stages, towards 64 codons carrying todays meaning.Comment: 22 pages, 3 figures, 4 tables Journal of Molecular Evolution, July 201

VU Research Portal

Crossref

International Migration, Integration and Social Cohesion online publications

eXamine: a Cytoscape app for exploring annotated modules in networks

Author: Bucur Cristina-Iulia
Dinkla Kasper
El-Kebir Mohammed
Klau Gunnar W.
Siderius Marco
Smit Martine J.
Westenberg Michel A.
Publication venue
Publication date: 01/01/2013
Field of study

Background. Biological networks have growing importance for the interpretation of high-throughput "omics" data. Statistical and combinatorial methods allow to obtain mechanistic insights through the extraction of smaller subnetwork modules. Further enrichment analyses provide set-based annotations of these modules. Results. We present eXamine, a set-oriented visual analysis approach for annotated modules that displays set membership as contours on top of a node-link layout. Our approach extends upon Self Organizing Maps to simultaneously lay out nodes, links, and set contours. Conclusions. We implemented eXamine as a freely available Cytoscape app. Using eXamine we study a module that is activated by the virally-encoded G-protein coupled receptor US28 and formulate a novel hypothesis about its functioning

Repository TU/e