225 research outputs found

    Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem

    Get PDF
    In this paper, we design genetic algorithm and simulated annealing algorithm and their parallel versions to solve the Closest String problem. Our implementation and experiments show usefulness of the parallel GA and SA algorithms

    Genetic Design of Drugs Without Side-Effects

    Full text link

    Combinatorial and Probabilistic Approaches to Motif Recognition

    Get PDF
    Short substrings of genomic data that are responsible for biological processes, such as gene expression, are referred to as motifs. Motifs with the same function may not entirely match, due to mutation events at a few of the motif positions. Allowing for non-exact occurrences significantly complicates their discovery. Given a number of DNA strings, the motif recognition problem is the task of detecting motif instances in every given sequence without knowledge of the position of the instances or the pattern shared by these substrings. We describe a novel approach to motif recognition, and provide theoretical and experimental results that demonstrate its efficiency and accuracy. Our algorithm, MCL-WMR, builds an edge-weighted graph model of the given motif recognition problem and uses a graph clustering algorithm to quickly determine important subgraphs that need to be searched further for valid motifs. By considering a weighted graph model, we narrow the search dramatically to smaller problems that can be solved with significantly less computation. The Closest String problem is a subproblem of motif recognition, and it is NP-hard. We give a linear-time algorithm for a restricted version of the Closest String problem, and an efficient polynomial-time heuristic that solves the general problem with high probability. We initiate the study of the smoothed complexity of the Closest String problem, which in turn explains our empirical results that demonstrate the great capability of our probabilistic heuristic. Important to this analysis is the introduction of a perturbation model of the Closest String instances within which we provide a probabilistic analysis of our algorithm. The smoothed analysis suggests reasons why a well-known fixed parameter tractable algorithm solves Closest String instances extremely efficiently in practice. Although the Closest String model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the Closest String with Outliers problem, to overcome this limitation. A systematic parameterized complexity analysis accompanies the introduction of this problem, providing a surprising insight into the sensitivity of this problem to slightly different parameterizations. Through the application of probabilistic and combinatorial insights into the Closest String problem, we develop sMCL-WMR, a program that is much faster than its predecessor MCL-WMR. We apply and adapt sMCL-WMR and MCL-WMR to analyze the promoter regions of the canola seed-coat. Our results identify important regions of the canola genome that are responsible for specific biological activities. This knowledge may be used in the long-term aim of developing crop varieties with specific biological characteristics, such as being disease-resistant

    Festparameter-Algorithmen fuer die Konsens-Analyse Genomischer Daten

    Get PDF
    Fixed-parameter algorithms offer a constructive and powerful approach to efficiently obtain solutions for NP-hard problems combining two important goals: Fixed-parameter algorithms compute optimal solutions within provable time bounds despite the (almost inevitable) computational intractability of NP-hard problems. The essential idea is to identify one or more aspects of the input to a problem as the parameters, and to confine the combinatorial explosion of computational difficulty to a function of the parameters such that the costs are polynomial in the non-parameterized part of the input. This makes especially sense for parameters which have small values in applications. Fixed-parameter algorithms have become an established algorithmic tool in a variety of application areas, among them computational biology where small values for problem parameters are often observed. A number of design techniques for fixed-parameter algorithms have been proposed and bounded search trees are one of them. In computational biology, however, examples of bounded search tree algorithms have been, so far, rare. This thesis investigates the use of bounded search tree algorithms for consensus problems in the analysis of DNA and RNA data. More precisely, we investigate consensus problems in the contexts of sequence analysis, of quartet methods for phylogenetic reconstruction, of gene order analysis, and of RNA secondary structure comparison. In all cases, we present new efficient algorithms that incorporate the bounded search tree paradigm in novel ways. On our way, we also obtain results of parameterized hardness, showing that the respective problems are unlikely to allow for a fixed-parameter algorithm, and we introduce integer linear programs (ILP's) as a tool for classifying problems as fixed-parameter tractable, i.e., as having fixed-parameter algorithms. Most of our algorithms were implemented and tested on practical data.Festparameter-Algorithmen bieten einen konstruktiven Ansatz zur Loesung von kombinatorisch schwierigen, in der Regel NP-harten Problemen, der zwei Ziele beruecksichtigt: innerhalb von beweisbaren Laufzeitschranken werden optimale Ergebnisse berechnet. Die entscheidende Idee ist dabei, einen oder mehrere Aspekte der Problemeingabe als Parameter der Problems aufzufassen und die kombinatorische Explosion der algorithmischen Schwierigkeit auf diese Parameter zu beschraenken, so dass die Laufzeitkosten polynomiell in Bezug auf den nicht-parametrisierten Teil der Eingabe sind. Gibt es einen Festparameter-Algorithmus fuer ein kombinatorisches Problem, nennt man das Problem festparameter-handhabbar. Die Entwicklung von Festparameter-Algorithmen macht vor allem dann Sinn, wenn die betrachteten Parameter im Anwendungsfall nur kleine Werte annehmen. Festparameter-Algorithmen sind zu einem algorithmischen Standardwerkzeug in vielen Anwendungsbereichen geworden, unter anderem in der algorithmischen Biologie, wo in vielen Anwendungen kleine Parameterwerte beobachtet werden koennen. Zu den bekannten Techniken fuer den Entwurf von Festparameter-Algorithmen gehoeren unter anderem groessenbeschraenkte Suchbaeume. In der algorithmischen Biologie gibt es bislang nur wenige Beispiele fuer die Anwendung von groessenbeschraenkten Suchbaeumen. Diese Arbeit untersucht den Einsatz groessenbeschraenkter Suchbaeume fuer NP-harte Konsens-Probleme in der Analyse von DNS- und RNS-Daten. Wir betrachten Konsens-Probleme in der Analyse von DNS-Sequenzdaten, in der Analyse von sogenannten Quartettdaten zur Erstellung von phylogenetischen Hypothesen, in der Analyse von Daten ueber die Anordnung von Genen und beim Vergleich von RNS-Strukturdaten. In allen Faellen stellen wir neue effiziente Algorithmen vor, in denen das Paradigma der groessenbeschraenkten Suchbaeume auf neuartige Weise realisiert wird. Auf diesem Weg zeigen wir auch Ergebnisse parametrisierter Haerte, die zeigen, dass fuer die dabei betrachteten Probleme ein Festparameter-Algorithmus unwahrscheinlich ist. Ausserdem fuehren wir ganzzahliges lineares Programmieren als eine neue Technik ein, um die Festparameter-Handhabbarkeit eines Problems zu zeigen. Die Mehrzahl der hier vorgestellten Algorithmen wurde implementiert und auf Anwendungsdaten getestet

    Separating sets of strings by finding matching patterns is almost always hard

    Get PDF
    © 2017 Elsevier B.V. We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]-hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others

    Parameterized complexity and polynomial-time approximation schemes

    Get PDF
    According to the theory of NPcompleteness, many problems that have important realworld applications are NPhard. This excludes the possibility of solving them in polynomial time unless P=NP. A number of approaches have been proposed in dealing with NPhard problems, among them are approximation algorithms and parameterized algorithms. The study of approximation algorithms tries to find good enough solutions instead of optimal solutions in polynomial time, while parameterized algorithms try to give exact solutions when a natural parameter is small. In this thesis, we study the structural properties of parameterized computation and approximation algorithms for NP optimization problems. In particular, we investigate the relationship between parameterized complexity and polynomialtime approximation scheme (PTAS) for NP optimization problems. We give nice characterizations for two important subclasses in PTAS: Fully Polynomial Time Approximation Scheme (FPTAS) and Effcient Polynomial Time Approximation Scheme (EPTAS), using the theory of parameterized complexity. Our characterization of the class FPTAS has its advantages over the former characterizations, and our characterization of EPTAS is the first systematic investigation of this new but important approximation class. We develop new techniques to derive strong computational lower bounds for certain parameterized problems based on the theory of parameterized complexity. For example, we prove that unless an unlikely collapse occurs in parameterized complexity theory, the clique problem could not be solved in time O(f (k)no(k)) for any function f . This lower bound matches the upper bound of the trivial algorithm that simply enumerates and checks all subsets of k vertices in the given graph of n vertices. We then extend our techniques to derive computational lower bounds for PTAS and EPTAS algorithms of NP optimization problems. We prove that certain NP optimization problems with known PTAS algorithms have no PTAS algorithms of running time O(f (1/Epsilon)no(1/Epsilon)) for any function f . Therefore, for these NP optimization problems, although theoretically they can be approximated in polynomial time to an arbitrarily small error bound Epsilon, they have no practically effective approximation algorithms for small error bound Epsilon. To our knowledge, this is the first time such lower bound results have been derived for PTAS algorithms. This seems to open a new direction for the study of computational lower bounds on the approximability of NP optimization problems

    Parallel Genetic Algorithm and Parallel Simulated Annealing Algorithm for the Closest String Problem

    Get PDF
    In this paper, we design genetic algorithm and simulated annealing algorithm and their parallel versions to solve the Closest String problem. Our implementation and experiments show usefulness of the parallel GA and SA algorithms

    The k-center Problem for Classes of Cyclic Words

    Get PDF
    The problem of finding k uniformly spaced points (centres) within a metric space is well known as the k-centre selection problem. In this paper, we introduce the challenge of k-centre selection on a class of objects of exponential size and study it for the class of combinatorial necklaces, known as cyclic words. The interest in words under translational symmetry is motivated by various applications in algebra, coding theory, crystal structures and other physical models with periodic boundary conditions. We provide solutions for the centre selection problem for both one-dimensional necklaces and largely unexplored objects in combinatorics on words - multidimensional combinatorial necklaces. The problem is highly non-trivial as even verifying a solution to the k-centre problem for necklaces can not be done in polynomial time relative to the length of the cyclic words and the alphabet size unless P= NP. Despite this challenge, we develop a technique of centre selection for a class of necklaces based on de-Bruijn Sequences and provide the first polynomial O(k· n) time approximation algorithm for selecting k centres in the set of 1D necklaces of length n over an alphabet of size q with an approximation factor of O(1+logq(k·n)n-logq(k·n)). For the set of multidimensional necklaces of size n1× n2× … × nd we develop an O(k· N2) time algorithm with an approximation factor of O(1+logq(k·N)N-logq(k·N)) in O(k· N2) time, where N= n1· n2· … · nd by approximating de Bruijn hypertori technique
    corecore