Search CORE

INRIA a CCSD electronic archive server

HAL-Rennes 1

Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

Author: A Abd-Alla
A Sundquist
AMM Abd-Alla
Andrew G Parker
B Raphael
D Zhi
E Elahi
F Mashayekhi
F Sanger
JM Prober
M Chaisson
M Margulies
M Pop
MJ Chaisson
MT Tammi
MT Tammi
MT Tammi
N Whiteford
Nicolas J Parker
P Ng
PA Pevzner
RL Warren
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, <it>Glossina pallidipes</it>, we found the need for tools to search quickly a set of reads for near exact text matches. Methods A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of <it>de novo </it>assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. Results Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. Conclusion The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.</p

QSRA – a quality-value guided de novo short read assembler

Author: D Hernandez
Douglas W Bryant
DR Zerbino
J Butler
J Dohm
J Kent
MJ Chaisson
NG de Bruijn
R Cronn
R Warren
Todd C Mockler
W Jeck
Weng-Keen Wong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. Results We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. Conclusion QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.</p

Public Library of Science (PLOS)

Meraculous: De Novo Genome Assembly with Short Paired-End Reads

Author: A Edwards
A Edwards
B Ewing
D Hernandez
DA Wheeler
Daniel S. Rokhsar
DR Bentley
DR Bentley
DR Smith
DR Zerbino
DR Zerbino
ES Lander
EW Myers
EW Myers
EW Myers
Gary P. Schroth
GG Sutton
I Maccallum
Isaac Ho
J Butler
Jarrod A. Chapman
JC Roach
JL Weber
JT Simpson
K Hayashi
M Chaisson
M Margulies
M Pop
M Pop
MJ Chaisson
MJ Chaisson
ML Metzker
P Flicek
PA Pevzner
R Li
R Li
RL Warren
RM Idury
SC Schuster
SF Altschul
Shujun Luo
Sirisha Sunkara
Steven L. Salzberg
TW Jeffries
TW Jeffries
Publication venue: Public Library of Science
Publication date: 01/08/2011
Field of study

We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed

UNT Digital Library

Sequence assembly using next generation sequencing data—challenges and solutions

Author: D Hernandez
DR Kelley
DR Zerbino
EA Rodland
EW Myers
F Sanger
Francis Y. L. Chin
H Leung
HCM Leung
Henry C. M. Leung
J Butler
JC Dohm
JT Simpson
K Salikhov
M Burrows
MJ Chaisson
MJ Chaisson
N Vyahhi
R Li
RL Warren
RW Holley
RW Holley
S. M. Yiu
W Fiers
W Min Jou
WR Jeck
Y Peng
Y Peng
Y Peng
Y Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Assembly complexity of prokaryotic genomes using short reads

Author: A Guénoche
AR Rubinov
B Bollobás
B Haubold
C Smith
Carl Kingsford
D Gusfield
DH Huson
DR Zerbino
Dvan den Broek
E Myers
EW Myers
I Simon
J Butler
J Parkhill
JAA Quitzau
JC Dohm
JP Hutchinson
JP Hutchinson
M Antoniotti
M Margulies
Michael C Schatz
Mihai Pop
MJ Chaisson
MJ Chaisson
MS Waterman
N de Bruijn
N Whiteford
OG Troyanskaya
P Medvedev
PA Pevzner
PA Pevzner
R Barrangou
R Idury
S Batzoglou
T van Aardenne-Ehrenfest
TD Harris
WR Jeck
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes. Results We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for <it>de novo </it>reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages). Conclusions Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.</p

Cold Spring Harbor Laboratory Institutional Repository

Digital Repository at the University of Maryland

Corrigendum: An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Author: A McKenna
AC English
AL Price
B Zhang
C Alkan
C Camacho
C Soderlund
D Earl
D Muddyman
D Reich
DM Church
ES Lander
FE Dewey
G Abrusán
G Benson
G Tosato
GM Church
GS Slater
H Bai
H Cao
H Cao
H Li
J Huddleston
J Jurka
J Wang
JA Bedell
JA Rosenfeld
JR MacDonald
JT Simpson
K Howe
K Prüfer
KD Pruitt
KM Steinberg
L Fan
L Shi
M Pendleton
M Stanke
MJ Chaisson
MJ Chaisson
MJ Landrum
P Cingolani
P Kersbergen
PH Sudmant
R Li
R Li
R Luo
RC McCoy
RE Green
RE Mills
S Gnerre
S Koren
S Levy
S Purcell
S Schiffels
S Sheehan
ST Sherry
W Zhang
WJ Kent
Y Choi
Y Dong
Y Li
Z Jiang
Publication venue: Nat Commun
Publication date: 24/11/2016
Field of study

This corrects the article DOI: 10.1038/ncomms13637

Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing

Author: A Bankevich
A Desai
A Gurevich
AP Masella
AS Mikheyev
Cheng-Hsun Chiu
Cheng-Yang Lee
Chi-Ching Lee
CS Chin
D Hernandez
D Sims
DR Kelley
DR Zerbino
G Benson
H Li
J Butler
J Shendure
J Zhang
JA Reinhardt
JR Miller
MJ Chaisson
MJ Chaisson
MT Tammi
N Haiminen
N Whiteford
NJ Loman
PA Pevzner
Petrus Tang
Po-Jung Huang
R Li
R Luo
RC McCoy
Ruei-Chi Gan
S Koren
S Koren
T Tatusova
Timothy H. Wu
Ting-Wen Chen
W Zhang
Wei-Chao Liao
Y Peng
Yi-Feng Chang
Yi-Ywan M. Chen
YY Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

Mutation Detection with Next-Generation Resequencing through a Mediator Genome

Author: A Dashiff
A Srivatsan
AV Dalca
DR Smith
Edouard Jurkevitch
ER Mardis
Eshel Ben-Jacob
G Barel
H Li
H Tettelin
J Butler
J Klockgether
J Mrazek
K Liolios
Mally Dori-Bachash
MJ Chaisson
NA Moran
Omri Wurtzel
P Stothard
R Li
Rotem Sorek
S Rendulic
SF Altschul
Shmuel Pietrokovski
SR Harris
TW Cotter
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes

CiteSeerX

eScholarship - University of California

The stepped wedge trial design: a systematic review

Author: A Jadad
AD Grant
AD Hutson
Celia A Brown
CK Fairley
D Moher
D Oliver
DH Barlow
G Priestly
Gambia Hepatitis Study Group
I Askew
IW Bailey
J Hughes
JP Habicht
M Somerville
MA Ciliberto
MA Hussey
MJ Soloman
MK Campbell
PG Smith
R Lilford
RE Chaisson
RF Cook
Richard J Lilford
RW Levy
TBM Wilmink
TD Cook
Publication venue: BioMed Central
Publication date: 01/11/2006
Field of study

BACKGROUND: Stepped wedge randomised trial designs involve sequential roll-out of an intervention to participants (individuals or clusters) over a number of time periods. By the end of the study, all participants will have received the intervention, although the order in which participants receive the intervention is determined at random. The design is particularly relevant where it is predicted that the intervention will do more good than harm (making a parallel design, in which certain participants do not receive the intervention unethical) and/or where, for logistical, practical or financial reasons, it is impossible to deliver the intervention simultaneously to all participants. Stepped wedge designs offer a number of opportunities for data analysis, particularly for modelling the effect of time on the effectiveness of an intervention. This paper presents a review of 12 studies (or protocols) that use (or plan to use) a stepped wedge design. One aim of the review is to highlight the potential for the stepped wedge design, given its infrequent use to date. METHODS: Comprehensive literature review of studies or protocols using a stepped wedge design. Data were extracted from the studies in three categories for subsequent consideration: study information (epidemiology, intervention, number of participants), reasons for using a stepped wedge design and methods of data analysis. RESULTS: The 12 studies included in this review describe evaluations of a wide range of interventions, across different diseases in different settings. However the stepped wedge design appears to have found a niche for evaluating interventions in developing countries, specifically those concerned with HIV. There were few consistent motivations for employing a stepped wedge design or methods of data analysis across studies. The methodological descriptions of stepped wedge studies, including methods of randomisation, sample size calculations and methods of analysis, are not always complete. CONCLUSION: While the stepped wedge design offers a number of opportunities for use in future evaluations, a more consistent approach to reporting and data analysis is required

University of Birmingham Research Portal