41 research outputs found

    Fast Dynamic Arrays

    Get PDF
    We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of n elements supporting access in time O(1) and insertion and deletion in time O(n^e) for e > 0 while using o(n) extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and set from the standard library on sequences with up to 10^8 elements. Our fastest implementation uses much less space than set while providing speedups of 40x for access operations compared to set and speedups of 10.000x compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations

    Compression Algorithm for Colored de Bruijn Graphs

    Get PDF
    A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead

    Studies in Efficient Discrete Algorithms

    Get PDF
    This thesis consists of five papers within the design and analysis of efficient algorithms.In the first paper, we consider the problem of computing all-pairs shortest paths in a directed graph with real weights assigned to vertices. We develop a combinatorial randomized algorithm that runs in subcubic time for a special class of graphs.In the second paper, we present a polynomial-time dynamic programming algorithm for optimal partitions of a complete edge-weighted graph, where the edges are weighted by the length of the unique shortest path connecting those vertices in the a priori given tree (shortest path metric induced by a tree). Our result resolves, in particular, the complexity status of the optimal partition problems in one-dimensional geometric (Euclidean) setting.In the third paper, we study the NP-hard problem of partitioning an orthogonal polyhedron P into a minimum number of 3D rectangles. We present an approximation algorithm with the approximation ratio 4 for the special case of the problem in which P is a so-called 3D histogram. We then apply it to compute the exact arithmetic matrix product of two matrices with non-negative integer entries. The computation is time-efficient if the 3D histograms induced by the input matrices can be partitioned into relatively few 3D rectangles.In the fourth paper, we present the first quasi-polynomial approximation schemes for the base of the number of triangulations of a planar point set and the base of the number of crossing-free spanning trees on a planar point set, respectively.In the fifth paper, we study the complexity of detecting monomials with special properties in the sum-product expansion of a polynomial represented by an arithmetic circuit of size polynomial in the number of input variables and using only multiplication and addition. We present a fixed-parameter tractable algorithms for the detection of monomial having at least k distinct variables, parametrized with respect to k. Furthermore, we derive several hardness results on the detection of monomials with such properties within exact, parametrized and approximation complexity

    Algorithmic and Combinatorial Results in Selection and Computational Geometry

    Get PDF
    This dissertation investigates two sets of algorithmic and combinatorial problems. Thefirst part focuses on the selection problem under the pairwise comparison model. For the classic “median of medians” scheme, contrary to the popular belief that smaller group sizes cause superlinear behavior, several new linear time algorithms that utilize small groups are introduced. Then the exact number of comparisons needed for an optimal selection algorithm is studied. In particular, the implications of a long standing conjecture known as Yao’s hypothesis are explored. For the multiparty model, we designed low communication complexity protocols for selecting an exact or an approximate median of data that is distributed among multiple players. In the second part, three computational geometry problems are studied. For the longestspanning tree with neighborhoods, approximation algorithms are provided. For the stretch factor of polygonal chains, upper bounds are proved and almost matching lower bound constructions in \mathbb{R}^2 and higher dimensions are developed. For the piercing number τ and independence number Îœ of a family of axis-parallel rectangles in the plane, a lower bound construction for Îœ = 4 that matches Wegner’s conjecture is analyzed. The previous matching construction for Îœ = 3, due to Wegner himself, dates back to 1968

    Sixth Biennial Report : August 2001 - May 2003

    No full text
    corecore