13 research outputs found

    Coverage statistics for sequence census methods

    Get PDF
    Background: We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce the notion of the shape of a coverage function, which can be used to detect abberations in coverage. The probability theory underlying these problems is essential for constructing models of current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions. Results: We show that regardless of fragment length distribution and under the mild assumption that fragment start sites are Poisson distributed, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the jump skeleton of the the coverage function, and show that the induced trees are Galton-Watson trees whose parameters can be computed. Conclusions: Our results extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. By focusing on fragments, we are also led to a new approach for visualizing sequencing data that should be of independent interest.Comment: 10 pages, 4 figure

    Task-based Augmented Contour Trees with Fibonacci Heaps

    Full text link
    This paper presents a new algorithm for the fast, shared memory, multi-core computation of augmented contour trees on triangulations. In contrast to most existing parallel algorithms our technique computes augmented trees, enabling the full extent of contour tree based applications including data segmentation. Our approach completely revisits the traditional, sequential contour tree algorithm to re-formulate all the steps of the computation as a set of independent local tasks. This includes a new computation procedure based on Fibonacci heaps for the join and split trees, two intermediate data structures used to compute the contour tree, whose constructions are efficiently carried out concurrently thanks to the dynamic scheduling of task parallelism. We also introduce a new parallel algorithm for the combination of these two trees into the output global contour tree. Overall, this results in superior time performance in practice, both in sequential and in parallel thanks to the OpenMP task runtime. We report performance numbers that compare our approach to reference sequential and multi-threaded implementations for the computation of augmented merge and contour trees. These experiments demonstrate the run-time efficiency of our approach and its scalability on common workstations. We demonstrate the utility of our approach in data segmentation applications

    Computing the Fréchet Distance with a Retractable Leash

    Get PDF
    All known algorithms for the Fréchet distance between curves proceed in two steps: first, they construct an efficient oracle for the decision version; second, they use this oracle to find the optimum from a finite set of critical values. We present a novel approach that avoids the detour through the decision version. This gives the first quadratic time algorithm for the Fréchet distance between polygonal curves in (Formula presented.) under polyhedral distance functions (e.g., (Formula presented.) and (Formula presented.)). We also get a (Formula presented.)-approximation of the Fréchet distance under the Euclidean metric, in quadratic time for any fixed (Formula presented.). For the exact Euclidean case, our framework currently yields an algorithm with running time (Formula presented.). However, we conjecture that it may eventually lead to a faster exact algorithm

    Trekking in the Alps without freezing or getting tired

    No full text
    Let F be a polyhedral terrain with n vertices. We show how to preprocess F such that for any two query points on F it can be decided whether there exists a path on F between the two points whose height decreases monotonically. More generally, the minimum total ascent or descent along any path between the two points can be computed. It is also possible to decide, given two query points and a height, whether there is a path that stays below this height. All these queries can be answered with one data structure which stores the so-called height-level map of the terrain. Although the height-level map has quadratic worst-case complexity, it is stored implicitly using only linear storage. The query time for all the above queries is and the structure can be built in time. A path with the desired property can be reported in additional time that is linear in the description size of the path

    Trekking in the Alps Without Freezing or Getting Tired

    No full text