Search CORE

398 research outputs found

Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim

Author: A. Lanzen
A. Sharma
Altschul
Blattner
Engle
Gomez-Alvarez
Huse
I. Jonassen
K. Malde
Kuhl
Margulies
Quince
Quinlan
Richter
S. Balzer
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments

NORA - Norwegian Open Research Archives

Finishing genomes with limited resources: lessons from an ensemble of microbial genomes

Author: Bishop-Lilly Kimberly A
Cook Christopher
DeSalle Robert
Di Bonaventura MariaPia
Ge Hong
Nagarajan Niranjan
Pop Mihai
Read Timothy D
Richards Allen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

While new sequencing technologies have ushered in an era where microbial genomes can be easily sequenced, the goal of routinely producing high-quality draft and finished genomes in a cost-effective fashion has still remained elusive. Due to shorter read lengths and limitations in library construction protocols, shotgun sequencing and assembly based on these technologies often results in fragmented assemblies. Correspondingly, while draft assemblies can be obtained in days, finishing can take many months and hence the time and effort can only be justified for high-priority genomes and in large sequencing centers. In this work, we revisit this issue in light of our own experience in producing finished and nearly-finished genomes for a range of microbial species in a small-lab setting. These genomes were finished with surprisingly little investments in terms of time, computational effort and lab work, suggesting that the increased access to sequencing might also eventually lead to a greater proportion of finished genomes from small labs and genomics cores

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland

ScholarBank@NUS

Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

Author: AH Singh
Aino I. Järvelin
Alison S. Waller
B Ewing
B Ewing
CB Abulencia
D Chivian
D Wu
Daniel R. Mende
DC Richter
DR Zerbino
ED Harrington
ES Lander
EW Myers
F Meyer
FE Angly
FE Angly
GW Tyson
H García Martín
H-H Chou
J Goecks
J Goll
J Handelsman
J Muller
J Peterson
J Qin
J Raes
J Raes
JC Venter
Jeroen Raes
John Parkinson
JR Miller
JR Miller
K Kurokawa
K Mavromatis
M Arumugam
M Arumugam
M Pignatelli
M Pop
Manimozhiyan Arumugam
Michelle M. Chan
MP Cox
Peer Bork
PJ Turnbaugh
PJA Cock
R Li
R Li
R Schmieder
RA Edwards
RL Warren
S Aparicio
SG Tringe
Shinichi Sunagawa
SR Gill
T Schoenfeld
TA Gianoulis
TC Glenn
VM Markowitz
W Zhu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

MDC Repository

FigShare

A sweetpotato gene index established by de novo assembly of pyrosequencing and Sanger sequences and mining for gene-based microsatellite markers

Author: A Conesa
A Kriegner
A Papanicolaou
B Chevreux
C Soderlund
Carlos Rivera
Cynthia Quispe
DA Shagin
Diogenes Cerna
DP Zhang
DR Bentley
G Aparicio
Genoveva Rossel
J Hu
J Low
J Quackenbush
Jack Hou
Jaime A Pacheco
JC Cervantes-Flores
JC Vera
Ji Young Kim
JJ Doyle
JR Miller
Julio Solis
JW Low
KL Childs
LC Da Maia
Luis Rojas
Luz R Tincopa
M Ghislain
MI Buteler
NR Thomson
O Harismendy
Omar Palomino
PC Bundock
Reinhard Simon
RL Jarret
Rocio Alagon
Roland Schafleitner
Ronald F Robles
WB Barbazuk
YT Tseng
YY Zhu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Sweetpotato (<it>Ipomoea batatas </it>(L.) Lam.), a hexaploid outcrossing crop, is an important staple and food security crop in developing countries in Africa and Asia. The availability of genomic resources for sweetpotato is in striking contrast to its importance for human nutrition. Previously existing sequence data were restricted to around 22,000 expressed sequence tag (EST) sequences and ~ 1,500 GenBank sequences. We have used 454 pyrosequencing to augment the available gene sequence information to enhance functional genomics and marker design for this plant species. Results Two quarter 454 pyrosequencing runs used two normalized cDNA collections from stems and leaves from drought-stressed sweetpotato clone <it>Tanzania </it>and yielded 524,209 reads, which were assembled together with 22,094 publically available expressed sequence tags into 31,685 sets of overlapping DNA segments and 34,733 unassembled sequences. Blastx comparisons with the UniRef100 database allowed annotation of 23,957 contigs and 15,342 singletons resulting in 24,657 putatively unique genes. Further, 27,119 sequences had no match to protein sequences of UniRef100database. On the basis of this gene index, we have identified 1,661 gene-based microsatellite sequences, of which 223 were selected for testing and 195 were successfully amplified in a test panel of 6 hexaploid (<it>I. batatas</it>) and 2 diploid (<it>I. trifida</it>) accessions. Conclusions The sweetpotato gene index is a useful source for functionally annotated sweetpotato gene sequences that contains three times more gene sequence information for sweetpotato than previous EST assemblies. A searchable version of the gene index, including a blastn function, is available at <url>http://www.cipotato.org/sweetpotato_gene_index</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Next generation sequencing analysis reveals a relationship between rDNA unit diversity and locus number in Nicotiana diploids

Author: Fulnecek J
Grandbastien M-A
Kovarik A
Leitch A
Macas J
Matyasek R
Nichols R
Renny-Byfield S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

© 2012 Matyášek et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

Directed sequencing and annotation of three Dicentrarchus labrax L. chromosomes by applying Sanger- and pyrosequencing technologies on pooled DNA of comparatively mapped BAC clones

Author: Beck Alfred
Kodira Chinnappa
Kuhl Heiner
Reinhardt Richard
Timmermann Bernd
Tine Mbaye
Publication venue: Elsevier Inc.
Publication date: 01/09/2011
Field of study

AbstractDicentrarchus labrax is one of the major marine aquaculture species in the European Union. In this study, we have developed a directed-sequencing strategy to sequence three sea bass chromosomes and compared results with other teleosts.Three BAC DNA pools were created from sea bass BAC clones that mapped to stickleback chromosomes/groups V, XVII and XXI. The pools were sequenced to 17–39x coverage by pyrosequencing. Data assembly was supported by Sanger reads and mate pair data and resulted in superscaffolds of 13.2Mb, 17.5Mb and 13.7Mb respectively. Annotation features of the superscaffolds include 1477 genes. We analyzed size change of exon, intron and intergenic sequence between teleost species and deduced a simple model for the evolution of genome composition in teleost lineage.Combination of second generation sequencing technologies, Sanger sequencing and genome partitioning strategies allows “high-quality draft assemblies” of chromosome-sized superscaffolds, which are crucial for the prediction and annotation of complete genes

Elsevier - Publisher Connector

MPG.PuRe

Transcriptome analysis reveals the time of the fourth round of genome duplication in common carp (Cyprinus carpio)

Author: Li Jiong-Tang
Sun Xiao-Wen
Wang Jin-Tu
Zhang Xiao-Feng
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Common carp (<it>Cyprinus carpio</it>) is thought to have undergone one extra round of genome duplication compared to zebrafish. Transcriptome analysis has been used to study the existence and timing of genome duplication in species for which genome sequences are incomplete. Large-scale transcriptome data for the common carp genome should help reveal the timing of the additional duplication event. Results We have sequenced the transcriptome of common carp using 454 pyrosequencing. After assembling the 454 contigs and the published common carp sequences together, we obtained 49,669 contigs and identified genes using homology searches and an ab initio method. We identified 4,651 orthologous pairs between common carp and zebrafish and found 129,984 paralogous pairs within the common carp. An estimation of the synonymous substitution rate in the orthologous pairs indicated that common carp and zebrafish diverged 120 million years ago (MYA). We identified one round of genome duplication in common carp and estimated that it had occurred 5.6 to 11.3 MYA. In zebrafish, no genome duplication event after speciation was observed, suggesting that, compared to zebrafish, common carp had undergone an additional genome duplication event. We annotated the common carp contigs with Gene Ontology terms and KEGG pathways. Compared with zebrafish gene annotations, we found that a set of biological processes and pathways were enriched in common carp. Conclusions The assembled contigs helped us to estimate the time of the fourth-round of genome duplication in common carp. The resource that we have built as part of this study will help advance functional genomics and genome annotation studies in the future.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A vertebrate case study of the quality of assemblies derived from next-generation sequences

Author: Chen Lei
Dooling David J
Haub Kevin V
Hillier LaDeana W
Locke Devin P
Mardis Elaine R
Martin John C
Miller Jason R
Minx Patrick
Mitreva Makedonka
Thane Nay
Warren Wesley C
Weinstock George M
Wilson Richard K
Ye Liang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references

Springer - Publisher Connector

PubMed Central

Digital Commons@Becker

UGD Academic Repository