178 research outputs found
Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem
Finding the longest common subsequence of a given set of input strings is a relevant problem arising in various practical settings. One of these problems is the so-called longest arc-preserving common subsequence problem. This NP-hard combinatorial optimization problem was introduced for the comparison of arc-annotated ribonucleic acid (RNA) sequences. In this work we present an integer linear programming (ILP) formulation of the problem. As even in the context of rather small problem instances the application of a general purpose ILP solver is not viable due to the size of the model, we study alternative ways based on model reduction in order to take profit from this ILP model. First, we present a heuristic way for reducing the model, with the subsequent application of an ILP solver. Second, we propose the application of an iterative hybrid algorithm that makes use of an ILP solver for generating high quality solutions at each iteration. Experimental results concerning artificial and real problem instances show that the proposed techniques outperform an available technique from the literature.Peer ReviewedPostprint (author's final draft
A new approach to feature extraction for RNA structure comparision
In recent years, RNA structural comparison becomes a crucial problem in bioinformatics research. Generally, it is a popular approach for representing the RNA secondary structures with arc-annotation sets. Several methods can be used to compare two RNA structures, such as tree edit distance, longest arc-preserving common subsequence (LAPCS) and stem based alignment. However, these methods may be helpful only for small RNA structures because of their high time complexity. In this thesis, we propose a simplified method to compare two RNA structures in O(mn) time, where m and n are the lengths of the two RNA sequences, respectively. The method transforms the RNA structures into specific sequences called object sequences, then compare these object sequences to find their common substructures. The comparison method is tested with 118 RNA structures obtained from RNase P Database. For any two structures, it is important to identify whether they are in the same family by both structure comparison and sequence comparison. In the experiment, it is found that the method for comparing RNA structures can yield better hit rates and is faster than the traditional method to compare the RNA sequences. Therefore, the approach to extract and compare the RNA secondary structures is more sensitive in biology and more efficient in time complexity
What Makes the Arc-Preserving Subsequence Problem Hard?
International audienceGiven two arc-annotated sequences (S, P ) and (T, Q) representing RNA structures, the Arc-Preserving Subsequence (APS) problem asks whether (T, Q) can be obtained from (S, P ) by deleting some of its bases (together with their incident arcs, if any). In previous studies [3, 6], this problem has been naturally divided into subproblems reflecting intrinsic complexity of arc structures. We show that APS(Crossing, Plain) is NP-complete, thereby answering an open problem [6]. Furthermore, to get more insight into where actual border of APS hardness is, we refine APS classical subproblems in much the same way as in [11] and give a complete categorization among various restrictions of APS problem complexity
An exact mathematical programming approach to multiple RNA sequence-structure alignment
One of the main tasks in computational biology is the computation of
alignments of genomic sequences to reveal their commonalities. In case of DNA
or protein sequences, sequence information alone is usually sufficient to
compute reliable alignments. RNA molecules, however, build spatial
conformations—the secondary structure—that are more conserved than the actual
sequence. Hence, computing reliable alignments of RNA molecules has to take
into account the secondary structure. We present a novel framework for the
computation of exact multiple sequence-structure alignments: We give a graph-
theoretic representation of the sequence-structure alignment problem and
phrase it as an integer linear program. We identify a class of constraints
that make the problem easier to solve and relax the original integer linear
program in a Lagrangian manner. Experiments on a recently published benchmark
show that our algorithms has a comparable performance than more costly dynamic
programming algorithms, and outperforms all other approaches in terms of
solution quality with an increasing number of input sequences
A list of parameterized problems in bioinformatics
In this report we present a list of problems that originated in bionformatics. Our aim is to collect information on such problems that have been analyzed from the point of view of Parameterized Complexity. For every problem we give its definition and biological motivation together with known complexity results.Postprint (published version
Festparameter-Algorithmen fuer die Konsens-Analyse Genomischer Daten
Fixed-parameter algorithms offer a constructive and powerful approach
to efficiently obtain solutions for NP-hard problems combining two
important goals: Fixed-parameter algorithms compute optimal solutions
within provable time bounds despite the (almost inevitable)
computational intractability of NP-hard problems. The essential idea
is to identify one or more aspects of the input to a problem as the
parameters, and to confine the combinatorial explosion of
computational difficulty to a function of the parameters such that the
costs are polynomial in the non-parameterized part of the input. This
makes especially sense for parameters which have small values in
applications. Fixed-parameter algorithms have become an established
algorithmic tool in a variety of application areas, among them
computational biology where small values for problem parameters are
often observed. A number of design techniques for fixed-parameter
algorithms have been proposed and bounded search trees are one of
them. In computational biology, however, examples of bounded search
tree algorithms have been, so far, rare.
This thesis investigates the use of bounded search tree algorithms for
consensus problems in the analysis of DNA and RNA data. More
precisely, we investigate consensus problems in the contexts of
sequence analysis, of quartet methods for phylogenetic reconstruction,
of gene order analysis, and of RNA secondary structure comparison. In
all cases, we present new efficient algorithms that incorporate the
bounded search tree paradigm in novel ways. On our way, we also obtain
results of parameterized hardness, showing that the respective
problems are unlikely to allow for a fixed-parameter algorithm, and we
introduce integer linear programs (ILP's) as a tool for classifying
problems as fixed-parameter tractable, i.e., as having fixed-parameter
algorithms. Most of our algorithms were implemented and tested on
practical data.Festparameter-Algorithmen bieten einen konstruktiven Ansatz zur
Loesung von kombinatorisch schwierigen, in der Regel NP-harten
Problemen, der zwei Ziele beruecksichtigt: innerhalb von beweisbaren
Laufzeitschranken werden optimale Ergebnisse berechnet. Die
entscheidende Idee ist dabei, einen oder mehrere Aspekte der
Problemeingabe als Parameter der Problems aufzufassen und die
kombinatorische Explosion der algorithmischen Schwierigkeit auf diese
Parameter zu beschraenken, so dass die Laufzeitkosten polynomiell in
Bezug auf den nicht-parametrisierten Teil der Eingabe sind. Gibt es
einen Festparameter-Algorithmus fuer ein kombinatorisches Problem,
nennt man das Problem festparameter-handhabbar. Die Entwicklung von
Festparameter-Algorithmen macht vor allem dann Sinn, wenn die
betrachteten Parameter im Anwendungsfall nur kleine Werte
annehmen. Festparameter-Algorithmen sind zu einem algorithmischen
Standardwerkzeug in vielen Anwendungsbereichen geworden, unter anderem
in der algorithmischen Biologie, wo in vielen Anwendungen kleine
Parameterwerte beobachtet werden koennen. Zu den bekannten Techniken
fuer den Entwurf von Festparameter-Algorithmen gehoeren unter anderem
groessenbeschraenkte Suchbaeume. In der algorithmischen Biologie gibt
es bislang nur wenige Beispiele fuer die Anwendung von
groessenbeschraenkten Suchbaeumen.
Diese Arbeit untersucht den Einsatz groessenbeschraenkter Suchbaeume
fuer NP-harte Konsens-Probleme in der Analyse von DNS- und
RNS-Daten. Wir betrachten Konsens-Probleme in der Analyse von
DNS-Sequenzdaten, in der Analyse von sogenannten Quartettdaten zur
Erstellung von phylogenetischen Hypothesen, in der Analyse von Daten
ueber die Anordnung von Genen und beim Vergleich von
RNS-Strukturdaten. In allen Faellen stellen wir neue effiziente
Algorithmen vor, in denen das Paradigma der groessenbeschraenkten
Suchbaeume auf neuartige Weise realisiert wird. Auf diesem Weg zeigen
wir auch Ergebnisse parametrisierter Haerte, die zeigen, dass fuer
die dabei betrachteten Probleme ein Festparameter-Algorithmus
unwahrscheinlich ist. Ausserdem fuehren wir ganzzahliges lineares
Programmieren als eine neue Technik ein, um die
Festparameter-Handhabbarkeit eines Problems zu zeigen. Die Mehrzahl
der hier vorgestellten Algorithmen wurde implementiert und auf
Anwendungsdaten getestet
- …