13 research outputs found
Coverage statistics for sequence census methods
Background: We study the statistical properties of fragment coverage in
genome sequencing experiments. In an extension of the classic Lander-Waterman
model, we consider the effect of the length distribution of fragments. We also
introduce the notion of the shape of a coverage function, which can be used to
detect abberations in coverage. The probability theory underlying these
problems is essential for constructing models of current high-throughput
sequencing experiments, where both sample preparation protocols and sequencing
technology particulars can affect fragment length distributions.
Results: We show that regardless of fragment length distribution and under
the mild assumption that fragment start sites are Poisson distributed, the
fragments produced in a sequencing experiment can be viewed as resulting from a
two-dimensional spatial Poisson process. We then study the jump skeleton of the
the coverage function, and show that the induced trees are Galton-Watson trees
whose parameters can be computed.
Conclusions: Our results extend standard analyses of shotgun sequencing that
focus on coverage statistics at individual sites, and provide a null model for
detecting deviations from random coverage in high-throughput sequence census
based experiments. By focusing on fragments, we are also led to a new approach
for visualizing sequencing data that should be of independent interest.Comment: 10 pages, 4 figure
Task-based Augmented Contour Trees with Fibonacci Heaps
This paper presents a new algorithm for the fast, shared memory, multi-core
computation of augmented contour trees on triangulations. In contrast to most
existing parallel algorithms our technique computes augmented trees, enabling
the full extent of contour tree based applications including data segmentation.
Our approach completely revisits the traditional, sequential contour tree
algorithm to re-formulate all the steps of the computation as a set of
independent local tasks. This includes a new computation procedure based on
Fibonacci heaps for the join and split trees, two intermediate data structures
used to compute the contour tree, whose constructions are efficiently carried
out concurrently thanks to the dynamic scheduling of task parallelism. We also
introduce a new parallel algorithm for the combination of these two trees into
the output global contour tree. Overall, this results in superior time
performance in practice, both in sequential and in parallel thanks to the
OpenMP task runtime. We report performance numbers that compare our approach to
reference sequential and multi-threaded implementations for the computation of
augmented merge and contour trees. These experiments demonstrate the run-time
efficiency of our approach and its scalability on common workstations. We
demonstrate the utility of our approach in data segmentation applications
Computing the Fréchet Distance with a Retractable Leash
All known algorithms for the Fréchet distance between curves proceed in two steps: first, they construct an efficient oracle for the decision version; second, they use this oracle to find the optimum from a finite set of critical values. We present a novel approach that avoids the detour through the decision version. This gives the first quadratic time algorithm for the Fréchet distance between polygonal curves in (Formula presented.) under polyhedral distance functions (e.g., (Formula presented.) and (Formula presented.)). We also get a (Formula presented.)-approximation of the Fréchet distance under the Euclidean metric, in quadratic time for any fixed (Formula presented.). For the exact Euclidean case, our framework currently yields an algorithm with running time (Formula presented.). However, we conjecture that it may eventually lead to a faster exact algorithm
Trekking in the Alps without freezing or getting tired
Let F be a polyhedral terrain with n vertices. We show how to preprocess F such that for any two query points on F it can be decided whether there exists a path on F between the two points whose height decreases monotonically. More generally, the minimum total ascent or descent along any path between the two points can be computed. It is also possible to decide, given two query points and a height, whether there is a path that stays below this height. All these queries can be answered with one data structure which stores the so-called height-level map of the terrain. Although the height-level map has quadratic worst-case complexity, it is stored implicitly using only linear storage. The query time for all the above queries is and the structure can be built in time. A path with the desired property can be reported in additional time that is linear in the description size of the path