Search CORE

23 research outputs found

Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads

Author: A Camilli
AL Delcher
CK Stover
D Hernandez
D Zerbino
Daniel D. Sommer
Daniela Puiu
DB Jaffe
DD Sommer
DG Lee
DL Kasper
E Drenkard
ER Mardis
EW Myers
G Robertson
H Kulasakara
J Butler
LR Hoffman
LW Hillier
M Margulies
M Merighi
M Pop
MG Smith
MJ Chaisson
ML Metzker
N Dasgupta
N Whiteford
P Rice
RL Warren
S Batzoglou
S Kurtz
SF Altschul
SM Goldberg
Steven L. Salzberg
U Romling
Vincent T. Lee
VT Lee
William Stafford Noble
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that the most efficient technology produces the shortest read lengths. Short-read sequencing has been applied successfully to resequence the human genome and those of other species but not to whole-genome sequencing of novel organisms. Here we describe the sequencing and assembly of a novel clinical isolate of Pseudomonas aeruginosa, strain PAb1, using very short read technology. From 8,627,900 reads, each 33 nucleotides in length, we assembled the genome into one scaffold of 76 ordered contiguous sequences containing 6,290,005 nucleotides, including one contig spanning 512,638 nucleotides, plus an additional 436 unordered contigs containing 416,897 nucleotides. Our method includes a novel gene-boosting algorithm that uses amino acid sequences from predicted proteins to build a better assembly. This study demonstrates the feasibility of very short read sequencing for the sequencing of bacterial genomes, particularly those for which a related species has been sequenced previously, and expands the potential application of this new technology to most known prokaryotic species

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The genome of the medieval Black Death agent (extended abstract)

Author: Chauve Cedric
Rajaraman Ashok
Tannier Eric
Publication venue
Publication date: 29/07/2013
Field of study

The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.Comment: Extended abstract of a talk presented at the conference JOBIM 2013, https://colloque.inra.fr/jobim2013_eng/. Full paper submitte

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Optimal reference sequence selection for genome assembly using minimum description length principle

Author: Bilal Wajid
Erchin Serpedin
Hazem Nounou
Mohamed Nounou
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome

Springer - Publisher Connector

PubMed Central

Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly

Author: Dutilh Bas E.
Huynen Martijn A.
Strous Marc
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Most microbial species can not be cultured in the laboratory. Metagenomic sequencing may still yield a complete genome if the sequenced community is enriched and the sequencing coverage is high. However, the complexity in a natural population may cause the enrichment culture to contain multiple related strains. This diversity can confound existing strict assembly programs and lead to a fragmented assembly, which is unnecessary if we have a related reference genome available that can function as a scaffold

PubMed Central

Publications at Bielefeld University

Radboud Repository

Novel software package for cross-platform transcriptome analysis (CPTRA)

Author: A Fischer
A Mortazavi
C Neal Stewart
C Preston
CH Koger
D Hernandez
D MacLean
DR Zerbino
DS Johnson
E Pettersson
J Butler
J Shendure
JC Dohm
JM Rothberg
Joshua S Yuan
JS Yuan
JS Yuan
JS Yuan
JS Yuan
M de Hoon
M Meyer
MB Eisen
MD Gerald
MJ Chaisson
MJ Fullwood
O Morozova
P Fortina
P Ng
PAC t Hoen
Patrick J Tranel
PCC Feng
R Douglas Sammons
RL Warren
RQ Li
SL Salzberg
TC Mueller
VE Velculescu
W Brockman
Xin Zhou
Y Yashiro
Yanhui Peng
Z Zhang
Zhen Su
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Texas A&M Repository

De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads

Author: Anna Klimes
Anna Klimes
Barbara A. Methé
Bernhard Ø. Palsson
Christian L. Barrett
Derek Lovley
Derek R. Lovley
Geobacter Sulfurreducens
Harish Nagarajan
Harish Nagarajan
Jessica E Butler
Jessica E. Butler
Joy Ward
Karsten Zengler
Nelson D
Using Only Short
Yu Qiu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated

CiteSeerX

Aberdeen University Research

Public Library of Science (PLOS)

ScholarWorks@UMass Amherst

Directory of Open Access Journals

PubMed Central

Finishing genomes with limited resources: lessons from an ensemble of microbial genomes

Author: Bishop-Lilly Kimberly A
Cook Christopher
DeSalle Robert
Di Bonaventura MariaPia
Ge Hong
Nagarajan Niranjan
Pop Mihai
Read Timothy D
Richards Allen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

While new sequencing technologies have ushered in an era where microbial genomes can be easily sequenced, the goal of routinely producing high-quality draft and finished genomes in a cost-effective fashion has still remained elusive. Due to shorter read lengths and limitations in library construction protocols, shotgun sequencing and assembly based on these technologies often results in fragmented assemblies. Correspondingly, while draft assemblies can be obtained in days, finishing can take many months and hence the time and effort can only be justified for high-priority genomes and in large sequencing centers. In this work, we revisit this issue in light of our own experience in producing finished and nearly-finished genomes for a range of microbial species in a small-lab setting. These genomes were finished with surprisingly little investments in terms of time, computational effort and lab work, suggesting that the increased access to sequencing might also eventually lead to a greater proportion of finished genomes from small labs and genomics cores

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland

ScholarBank@NUS

Read Length and Repeat Resolution: Exploring Prokaryote Genomes Using Next-Generation Sequencing Technologies

Author: AL Delcher
B Haubold
C Fraser
C Kingsford
Claudio U. Köser
D Hernandez
D MacLean
DR Zerbino
DW Bryant
E Mardis
E Mardis
E Stackebrandt
ES Lander
G Achaz
I Maccallum
J Eid
J Shendure
JC Dohm
John A. C. Archer
M Chaisson
M Margulies
M Pop
Matt J. Cahill
MC Wendl
N Hall
N Whiteford
Nicholas E. Ross
O Morozova
RA Farrer
S Kurtz
SF Altschul
SL Salzberg
TJ Treangen
Wenjun Li
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50 % of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism unde

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

LOCAS – A Low Coverage Assembly Tool for Resequencing Projects

Author: A Doring
AR Quinlan
B Langmead
C Nusbaum
D Hernandez
D Weigel
Daniel H. Huson
DC Richter
Detlef Weigel
DR Zerbino
EW Myers
H Li
H Li
I Birol
JD Kececioglu
JO Korbel
JT Simpson
Juliane D. Klein
K Schneeberger
K Schneeberger
Korbinian Schneeberger
LE Palmer
M Pop
M Pop
MC Wendl
MJ Chaisson
PA Pevzner
R Li
R Li
RM Durbin
S Ossowski
SL Salzberg
SM Rumble
SQ Le
Stephan Ossowski
T Rausch
Ying Xu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Motivation: Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking. Results: We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homologyguided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime. Conclusion: LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequenc

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

ScholarBank@NUS