8 research outputs found

    Graph-theoretic concepts in computer science

    Get PDF

    Optimization Problems in Dotted Interval Graphs

    No full text
    The class of D-dotted interval (D-DI) graphs is the class of intersection graphs of arithmetic progressions with jump (common difference) at most D. We consider various classical graph-theoretic optimization problems when these are restricted to D-DI graphs of arbitrarily fixed D. We show that Maximum Independent Set, Minimum Vertex Cover, and Minimum Dominating Set can be solved in polynomial time in this graph class, answering an open question posed by Jiang [17]. We also show that Minimum Vertex Cover can be approximated within a factor of (1 + ε) for any ε> 0 in linear time. This algorithm generalizes to a wide class of deletion problems including the classical Minimum Feedback Vertex Set and Minimum Planar Deletion problems. Our algorithms are based on classical results in algorithmic graph theory and new structural properties of D-DI graphs that may be of independent interest.

    Dotted interval graphs and high throughput genotyping

    No full text
    We introduce a generalization of interval graphs, which we call dotted interval graphs (DIG). A dotted interval graph is an intersection graph of arithmetic progressions (=dotted intervals). Coloring of dotted intervals graphs naturally arises in the context of high throughput genotyping. We study the properties of dotted interval graphs, with a focus on coloring. We show that any graph is a DIG but that DIGd graphs, i.e. DIGs in which the arithmetic progressions have a jump of at most d, form a strict hierarchy. We show that coloring DIGd graphs is NP-complete even for d = 2. For any fixed d, we provide a 7 8 d approximation for the coloring of DIGd graphs.

    Various Algorithms for High Throughput Sequencing

    No full text
    The growing volume of generated DNA sequencing data makes the problem of its long-term storage increasingly important. In the first chapter we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large datasets of short readsDNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate. While compression based on encoding mismatches between the dataset and a similar reference can yield high compression rate, good quality reference sequence may be unavailable. Instead, ReCoil's compression is based on encoding the differences between similar or overlapping reads. As such reads may appear at large distances from each other in the dataset and since random access memory is a limited resource, ReCoil is designed to work efficiently in external memory, leveraging high bandwidth of modern hard disk drives.The next problem we address is the problem of mapping sequencing data to the reference genome. We revisit the basic problem of mapping a dataset of reads to the reference, developing a fast, memory-efficient algorithm for mapping of the new generation of long high error rate reads. We introduce a novel approach to hash based indexing which is both succinct and can be constructed very efficiently, avoiding the need to build the index in advance. Using our new index, we design a cache oblivious algorithm that makes an efficient use of the modern hardware to quickly cluster the reads in the dataset according to their position on the reference.Next, we show an algorithm for aligning two-pass Single Molecule Sequencing reads. Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. The technology suffers from a high error rate, ameliorated by the ability to sample multiple reads from the same location. We develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combinethe Weighted Sequence Graph representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. Our method combines these approaches with fast implementation of the Smith-Waterman algorithm in order to build an algorithm that is both fast andaccurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing.While genetic sequencing is growing in popularity as the tool to answer various kinds of questions about the genome, technical as well as practical considerations, such as high cost, impose limitations on its use. Microarrays are a commonly used alternative to sequencing for the task of microsatellite genotyping. We discuss the problem of optimal coloring of dotted interval graphs that arises naturally in the context of microsatelite genotyping using microarrays.Ph.D
    corecore