473 research outputs found

    The Longest Filled Common Subsequence Problem

    Get PDF
    Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multiset M of symbols missing in B, asks for a sequence B* obtained by inserting the symbols of M into B so that B* induces a common subsequence with A of maximum length. First, we investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol. Then, we give a 3/5 approximation algorithm for the problem. Finally, we present a fixed-parameter algorithm, when the problem is parameterized by the number of symbols inserted in B that "match" symbols of A

    ALIGNMENT-FREE METHODS AND ITS APPLICATIONS

    Get PDF
    Comparing biological sequences remains one of the most vital activities in Bioinformatics. Comparing biological sequences would address the relatedness between species, and find similar structures that might lead to similar functions. Sequence alignment is the default method, and has been used in the domain for over four decades. It gained a lot of trust, but limitations and even failure has been reported, especially with the new generated genomes. These new generated genomes have bigger size, and to some extent suffer errors. Such errors come mainly as a result from the sequencing machine. These sequencing errors should be considered when submitting sequences to GenBank, for sequence comparison, it is often hard to address or even trace this problem. Alignment-based methods would fail with such errors, and even if biologists still trust them, reports showed failure with these methods. The poor results of alignment-based methods with erratic sequences, motivated researchers in the domain to look for alternatives. These alternative methods are alignment-free, and would overcome the shortcomings of alignment-based methods. The work of this thesis is based on alignment-free methods, and it conducts an in-depth study to evaluate these methods, and find the right domain’s application for them. The right domain for alignment-free methods could be by applying them to data that were subjected to manufactured errors, and test the methods provide better comparison results with data that has naturally severe errors. The two techniques used in this work are compression-based and motif-based (or k-mer based, or signal based). We also addressed the selection of the used motifs in the second technique, and how to progress the results by selecting specific motifs that would enhance the quality of results. In addition, we applied an alignment-free method to a different domain, which is gene prediction. We are using alignment-free in gene prediction to speed up the process of providing high quality results, and predict accurate stretches in the DNA sequence, which would be considered parts of genes

    The Longest Common Subsequence via Generalized Suffix Trees

    Get PDF
    Given two strings S1 and S 2, finding the longest common subsequence (LCS) is a classical problem in computer science. Many algorithms have been proposed to find the longest common subsequence between two strings. The most common and widely used method is the dynamic programming approach, which runs in quadratic time and takes quadratic space. Other algorithms have been introduced later to solve the LCS problem in less time and space. In this work, we present a new algorithm to find the longest common subsequence using the generalized suffix tree and directed acyclic graph.;The Generalized suffix tree (GST) is the combined suffix tree for a set of strings {lcub}S1, S 2, ..., Sn{rcub}. Both the suffix tree and the generalized suffix tree can be calculated in linear time and linear space. One application for generalized suffix tree is to find the longest common substring between two strings. But finding the longest common subsequence is not straight forward using the generalized suffix tree. Here we describe how we can use the GST to find the common substrings between two strings and introduce a new approach to calculate the longest common subsequence (LCS) from the common substrings. This method takes a different view at the LCS problem, shading more light at novel applications of the LCS. We also show how this method can motivate the development of new compression techniques for genome resequencing data

    Algorithms for the uniqueness of the longest common subsequence

    Full text link
    Given several number sequences, determining the longest common subsequence is a classical problem in computer science. This problem has applications in bioinformatics, especially determining transposable genes. Nevertheless, related works only consider how to find one longest common subsequence. In this paper, we consider how to determine the uniqueness of the longest common subsequence. If there are multiple longest common subsequences, we also determine which number appears in all/some/none of the longest common subsequences. We focus on four scenarios: (1) linear sequences without duplicated numbers; (2) circular sequences without duplicated numbers; (3) linear sequences with duplicated numbers; (4) circular sequences with duplicated numbers. We develop corresponding algorithms and apply them to gene sequencing data

    Demystifying ÎĽ\mu

    Full text link
    We develop the theory of illfounded and cyclic proof systems in the context of the modal ÎĽ\mu-calculus. A fine analysis of provability and admissibility bridges the finitary, cyclic and illfounded notions of proof for this logic and re-enforces the subtlety of two important normal form theorems: guardedness and disjunctiveness

    Preventing premature convergence and proving the optimality in evolutionary algorithms

    Get PDF
    http://ea2013.inria.fr//proceedings.pdfInternational audienceEvolutionary Algorithms (EA) usually carry out an efficient exploration of the search-space, but get often trapped in local minima and do not prove the optimality of the solution. Interval-based techniques, on the other hand, yield a numerical proof of optimality of the solution. However, they may fail to converge within a reasonable time due to their inability to quickly compute a good approximation of the global minimum and their exponential complexity. The contribution of this paper is a hybrid algorithm called Charibde in which a particular EA, Differential Evolution, cooperates with a Branch and Bound algorithm endowed with interval propagation techniques. It prevents premature convergence toward local optima and outperforms both deterministic and stochastic existing approaches. We demonstrate its efficiency on a benchmark of highly multimodal problems, for which we provide previously unknown global minima and certification of optimality

    28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

    Get PDF
    Peer reviewe
    • …
    corecore