Search CORE

824 research outputs found

Optimal Assembly for High Throughput Shotgun Sequencing

Author: Bresler Guy
Bresler Ma'ayan
Tse David
Publication venue
Publication date: 18/02/2013
Field of study

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Sequencing by Hybridization of Long Targets

Author: AM Frieze
AR Abate
AR Abate
C Blum
Cynthia Gibas
DJ Cutler
J Blazewicz
J Blazewicz
J Blazewicz
M Dyer
Michael P. Brenner
PA Pevzner
R Arratia
R Drmanac
R Drmanac
SB Needleman
TA Endo
Tobias M. Schneider
W Bains
Yu Qin
Publication venue: Public Library of Science
Publication date: 04/05/2012
Field of study

Sequencing by Hybridization (SBH) reconstructs an n-long target DNA sequence from its biochemically determined l-long subsequences. In the standard approach, the length of a uniformly random sequence that can be unambiguously reconstructed is limited to due to repetitive subsequences causing reconstruction degeneracies. We present a modified sequencing method that overcomes this limitation without the need for different types of biochemical assays and is robust to error

Infoscience - École polytechnique fédérale de Lausanne

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Functional characterization and annotation of trait-associated genomic regions by transcriptome analysis

Author: Du Yang (gnd: 1062825780)
Publication venue: Universität Rostock Rostock
Publication date: 01/01/2014
Field of study

In this work, two novel implementations have been presented, which could assist in the design and data analysis of high-throughput genomic experiments. An efficient and flexible tiling probe selection pipeline utilizing the penalized uniqueness score has been implemented, which could be employed in the design of various types and scales of genome tiling task. A novel hidden semi-Markov model (HSMM) implementation is made available within the Bioconductor project, which provides a unified interface for segmenting genomic data in a wide range of research subjects.In dieser Arbeit werden zwei neuartige Implementierungen präsentiert, die im Design und in der Datenanalyse von genomischen Hochdurchsatz-Experiment hilfreich sein könnten. Die erste Implementierung bildet eine effiziente und flexible Auswahl-Pipeline für Tiling-Proben, basierend auf einem Eindeutigkeitsmaß mit einer Maluswertung. Als zweite Implementierung wurde ein neuartiges Hidden-Semi-Markov-Modell (HSMM) im Bioconductor Projekt verfügbar gemacht

Rostocker Dokumentenserver

Estimating DNA coverage and abundance in metagenomes using a gamma approximation

Author: Amrita Pati
Angly
Brass
Breitbart
Chao
Chao
Chao
Chevreux
Dalevi
Daniel Dalevi
Dropkin
el-Shaarawi
Heath
Izsák
Kalyuzhnaya
Konstantinos Mavromatis
Kunin
Lander
Mavromatis
Natalia N. Ivanova
Nikos C. Kyrpides
Quail
Quince
Raes
Richter
Schloss
Sean D. Hooper
Simon
Stein
Tringe
Venter
Warnecke
Wendl
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets

Crossref

PubMed Central

eScholarship - University of California

UNT Digital Library

Sensitivity of Noninvasive Prenatal Detection of Fetal Aneuploidy from Maternal Plasma Using Shotgun Sequencing Is Limited Only by Counting Statistics

Author: H. Christina Fan
Joanna Mary Bridger
Stephen R. Quake
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

We recently demonstrated noninvasive detection of fetal aneuploidy by shotgun sequencing cell-free DNA in maternal plasma using next-generation high throughput sequencer. However, GC bias introduced by the sequencer placed a practical limit on the sensitivity of aneuploidy detection. In this study, we describe a method to computationally remove GC bias in short read sequencing data by applying weight to each sequenced read based on local genomic GC content. We show that sensitivity is limited only by counting statistics and that sensitivity can be increased to arbitrary precision in sample containing arbitrarily small fraction of fetal DNA simply by sequencing more DNA molecules. High throughput shotgun sequencing of maternal plasma DNA should therefore enable noninvasive diagnosis of any type of fetal aneuploidy

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Detection of copy number variants in sequencing data.

Author: Neubert K.
Publication venue: FREIE UNIVERSITAET BERLIN
Publication date: 10/09/2010
Field of study

In this work a program for detection of CNVs in sequencing data based on depth of coverage was implemented in C++ (copyDOC). Single steps in the pipeline, the acquisition of DOC signals in windows, the event calling and merging are implemented using generic programming techniques that enable the future integration of other algorithms in the pipeline. Furthermore, a testing environment was implemented, the copySim platform, which is very useful for testing and evaluation of different algorithms. CopyDOC was successfully applied to synthetic and real data using constant sized windows. Dynamic windows, that adapt according to the local mappability of the sequence, are implemented in the pipeline, but could not be tested in this work. They might be advantageous in datasets that contain uniquely mapped reads. However, CNVs have been shown to be overrepresented in segmental duplications (Nguyen et al. 2006; Cooper et al. 2007) and by a general exclusion of multireads those CNVs might be difficult to ascertain. In the application of copyDOC to a 1000 genomes dataset the overlap of predicted variants was considerable higer using multireads compared to uniquely mapped reads. Thus there is a requirement for tools that can handle multireads. Futher improvements of copyDOC might be done for the CNV calling algorithm and the merging step. For example the program workflow could be tested with a direct comparison of the DOC signals in two datasets via log ratios instead of appling a t-test on DOC signals in the two datasets. CopyDOC and copySim could be used as platform for the implementation and evaluation of futher CNV detection algorithms

MPG.PuRe

Comparing De Novo Genome Assembly: The Long and Short of It

Author: A Phillippy
B Mishra
B Schmidt
Bud Mishra
C Alkan
C Aston
D Bryant
D Hernandez
D Schwartz
D Sommer
DR Zerbino
DR Zerbino
EW Myers
F Sanger
FR Blattner
G Narzisi
GG Sutton
Giuseppe Narzisi
IT Paulsen
J Butler
J Tarhio
JC Dohm
JC Mullikin
JM Kidd
JR Miller
JT Simpson
M Antoniotti
M Eppinger
M Hossain
M Wu
MJ Chaisson
P Green
P Medvedev
PA Pevzner
PN Ariyaratne
R Li
RL Warren
RW Hung
S Batzoglou
S Boisvert
S Gnerre
S Kim
S Kurtz
SL Salzberg
SR Gill
SS Hall
Stein Aerts
T Anantharaman
T Baba
TS Anantharaman
TS Anantharaman
WR Jeck
X Huang
X Huang
Publication venue: Public Library of Science
Publication date: 29/04/2011
Field of study

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Validation of S. Pombe sequence assembly by microarray hybridization

Author: Casey W.
Healy J.
Mishra B.
West J.
Wigler M. H.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2006
Field of study

We describe a method to make physical maps of genomes using correlative hybridization patterns of probes to random pools of BACs. We derive thereby an estimated distance between probes, and then use this estimated distance to order probes. To test the method, we used BAC libraries from Schizzosaccharomyces pombe. We compared our data to the known sequence assembly, in order to assess accuracy. We demonstrate a small number of significant discrepancies between our method and the map derived by sequence assembly. Some of these discrepancies may arise because genome order within a population is not stable; imposing a linear order on a population may not be biologically meaningful

Cold Spring Harbor Laboratory Institutional Repository