8 research outputs found

    Safe solutions for walks on graphs

    Get PDF
    In this thesis we study the concept of “safe solutions” in different problems whose solutions are walks on graphs. A safe solution to a problem X can be understood as a partial solution common to all solutions to problem X. In problems whose solutions are walks on graphs, safe solutions refer to walks common to all walks which are solutions to the problem. In this thesis, we focused on formulating four main graph traversal problems and finding characterizations for those walks contained in all their solutions. We give formulations for these graph traversal problems, we prove some of their combinatorial and structural properties, and we give safe and complete algorithms for finding their safe solutions based on their characterizations. We use the genome assembly problem and its applications as our main motivating example for finding safe solutions in these graph traversal problems. We begin by motivating and exemplifying the notion of safe solutions through a problem on s-t paths in undirected graphs with at least two non-trivial biconnected components S and T and with s ∈ S, t ∈ T . We continue by reviewing similar and related notions in other fields, especially in combinatorial optimization and previous work on the bioinformatics problem of genome assembly. We then proceed to characterize the safe solutions to the Eulerian cycle problem, where one must find a circular walk in a graph G which traverses each edge exactly once. We suggest a characterization for them by improving on (Nagarajan, Pop, JCB 2009) and a polynomial-time algorithm for finding them. We then study edge-covering circular walks in a graph G. We look at the characterization from (Tomescu, Medvedev, JCB 2017) for their safe solutions and their suggested polynomial-time algorithm and we show an optimal O(mn)-time algorithm that we proposed in (Cairo et al. CPM 2017). Finally, we generalize this to edge-covering collections of circular walks. We characterize safe solutions in an edge-covering setting and provide a polynomial-time algorithm for computing them. We suggested these originally in (Obscura et al. ALMOB 2018)

    Optimal Omnitig Listing for Safe and Complete Contig Assembly

    Get PDF
    Genome assembly is the problem of reconstructing a genome sequence from a set of reads from a sequencing experiment. Typical formulations of the assembly problem admit in practice many genomic reconstructions, and actual genome assemblers usually output contigs, namely substrings that are promised to occur in the genome. To bridge the theory and practice, Tomescu and Medvedev [RECOMB 2016] reformulated contig assembly as finding all substrings common to all genomic reconstructions. They also gave a characterization of those walks (omnitigs) that are common to all closed edge-covering walks of a (directed) graph, a typical notion of genomic reconstruction. An algorithm for listing all maximal omnitigs was also proposed, by launching an exhaustive visit from every edge. In this paper, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Omega(nm). We implement this algorithm and show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev [RECOMB 2016]

    Improved Pattern-Avoidance Bounds for Greedy BSTs via Matrix Decomposition

    Full text link
    Greedy BST (or simply Greedy) is an online self-adjusting binary search tree defined in the geometric view ([Lucas, 1988; Munro, 2000; Demaine, Harmon, Iacono, Kane, Patrascu, SODA 2009). Along with Splay trees (Sleator, Tarjan 1985), Greedy is considered the most promising candidate for being dynamically optimal, i.e., starting with any initial tree, their access costs on any sequence is conjectured to be within O(1)O(1) factor of the offline optimal. However, in the past four decades, the question has remained elusive even for highly restricted input. In this paper, we prove new bounds on the cost of Greedy in the ''pattern avoidance'' regime. Our new results include: The (preorder) traversal conjecture for Greedy holds up to a factor of O(2α(n))O(2^{\alpha(n)}), improving upon the bound of 2α(n)O(1)2^{\alpha(n)^{O(1)}} in (Chalermsook et al., FOCS 2015). This is the best known bound obtained by any online BSTs. We settle the postorder traversal conjecture for Greedy. The deque conjecture for Greedy holds up to a factor of O(α(n))O(\alpha(n)), improving upon the bound 2O(α(n))2^{O(\alpha(n))} in (Chalermsook, et al., WADS 2015). The split conjecture holds for Greedy up to a factor of O(2α(n))O(2^{\alpha(n)}). Key to all these results is to partition (based on the input structures) the execution log of Greedy into several simpler-to-analyze subsets for which classical forbidden submatrix bounds can be leveraged. Finally, we show the applicability of this technique to handle a class of increasingly complex pattern-avoiding input sequences, called kk-increasing sequences. As a bonus, we discover a new class of permutation matrices whose extremal bounds are polynomially bounded. This gives a partial progress on an open question by Jacob Fox (2013).Comment: Accepted to SODA 202

    An Optimal O(nm) Algorithm for Enumerating All Walks Common to All Closed Edge-covering Walks of a Graph

    Get PDF
    In this article, we consider the following problem. Given a directed graph G, output all walks of G that are sub-walks of all closed edge-covering walks of G. This problem was first considered by Tomescu and Medvedev (RECOMB 2016), who characterized these walks through the notion of omnitig. Omnitigs were shown to be relevant for the genome assembly problem from bioinformatics, where a genome sequence must be assembled from a set of reads from a sequencing experiment. Tomescu and Medvedev (RECOMB 2016) also proposed an algorithm for listing all maximal omnitigs, by launching an exhaustive visit from every edge. In this article, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Omega(nm). We implement this algorithm arid show that it is 9-12 times faster in practice than the one of Tomescu and Medvedev (RECOMB 2016).Peer reviewe

    A safe and complete algorithm for metagenomic assembly

    Get PDF
    Background: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. Approach: We address this problem with the "safe and complete" framework of Tomescu and Medvedev (Research in computational Molecular biology-20th annual conference, RECOMB 9649: 152-163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. Results: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time O(m(2) + n(3)), and in the edge-covering case it runs in time O(m(2)n); n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.Peer reviewe

    Simplicity in Eulerian circuits: Uniqueness and safety

    No full text
    An Eulerian circuit in a directed graph is one of the most fundamental Graph Theory notions. Detecting if a graph G has a unique Eulerian circuit can be done in polynomial time via the BEST theorem by de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte (1941--1951) [15], [16] (involving counting arborescences), or via a tailored characterization by Pevzner, 1989 (involving computing the intersection graph of simple cycles of G), both of which thus rely on overly complex notions for the simpler uniqueness problem. In this paper we give a new linear-time checkable characterization of directed graphs with a unique Eulerian circuit. This is based on a simple condition of when two edges must appear consecutively in all Eulerian circuits, in terms of cut nodes of the underlying undirected graph of G. As a by-product, we can also compute in linear-time all maximal safe walks appearing in all Eulerian circuits, for which Nagarajan and Pop proposed in 2009 [12] a polynomial-time algorithm based on Pevzner characterization.Peer reviewe
    corecore