Search CORE

171 research outputs found

QSRA – a quality-value guided de novo short read assembler

Author: D Hernandez
Douglas W Bryant
DR Zerbino
J Butler
J Dohm
J Kent
MJ Chaisson
NG de Bruijn
R Cronn
R Warren
Todd C Mockler
W Jeck
Weng-Keen Wong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. Results We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. Conclusion QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation

Author: Cronn Richard
Dean Jeffrey F. D.
Dolan Peter
Howe Glenn T.
Knaus Brian
Kolpak Scott
Lorenz W. Walter
Yu Jianbin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

(0) Save to: more options A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation Author(s): Howe, GT (Howe, Glenn T.)[ 1 ] ; Yu, JB (Yu, Jianbin)[ 1 ] ; Knaus, B (Knaus, Brian)[ 2 ] ; Cronn, R (Cronn, Richard)[ 2 ] ; Kolpak, S (Kolpak, Scott)[ 1 ] ; Dolan, P (Dolan, Peter)[ 3 ] ; Lorenz, WW (Lorenz, W. Walter)[ 4 ] ; Dean, JFD (Dean, Jeffrey F. D.)[ 4 ] Source: BMC GENOMICS Volume: 14 Article Number: 137 DOI: 10.1186/1471-2164-14-137 Published: FEB 28 2013 Times Cited: 0 (from Web of Science) Cited References: 81 [ view related records ] Citation Map Abstract: Background: Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results: We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions: Based on our validation efficiency, our SNP database may contain as many as similar to 200,000 true SNPs, and as many as similar to 69,000 SNPs that could be genotyped at similar to 20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change.Keywords: Genome wide association, Pseudotsuga menziesii, Seed orchard, Complex traits, Pinus taeda. L., Population, Generation, Database, Selection, White spruc

ScholarsArchive@OSU

Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

Author: Andrey Rzhetsky
CS Keith
D Hernandez
David N. Kuhn
DM Church
DR Zerbino
F Sanger
G Narzisi
Isidore Rigoutsos
J Shendure
JA Reinhardt
JC Dohm
JR Miller
JR Miller
JT Simpson
K Mavromatis
Laxmi Parida
MJ Chaisson
ML Metzker
Niina Haiminen
R Blakesley
R Cronn
R Li
R Li
S Altschul
S DiGuistini
S Gnerre
S Gnerre
S Ossowski
S Rounsley
SL Salzberg
W Zhang
WR Jeck
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

Author: B Fritig
B Langmead
BC Husband
Bryce A Richardson
C Iseli
CY Wan
D Tian
DA Pyke
DC Freeman
DJ Weber
E Meyer
E Novaes
ED McArthur
ED McArthur
ED McArthur
ED McArthur
ED McArthur
ED McArthur
F Bourgaud
F McCarthy
G Pareto
H Wang
Jared C Price
JC Vera
JH Graham
Joshua A Udall
K Kai
KJ Miglia
LC da Maia
LH Rieseberg
M Bamshad
MF Mahalovich
MJ Ford
ML Arnold
ML Shumar
MM Rowland
NE West
P Maughan
PJ Maughan
PJ Maughan
Prabin Bajgain
R Cronn
R Durrett
RG Kelsey
Richard C Cronn
RO Bray
S Götz
S Rozen
S Zeng
SP Otto
SR Eddy
T Parchman
T Thiel
TL Personius
U Arunyawat
W Wang
Y Yu
Y Yu
Y Zhang
Z Han
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Big sagebrush (<it>Artemisia tridentata</it>) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of <it>A. tridentata </it>sspp. <it>tridentata </it>and <it>vaseyana </it>were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. <it>tridentata </it>and 20,250 contigs in ssp. <it>vaseyana</it>. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. <it>wyomingensis</it>) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. <it>tridentata </it>and <it>vaseyana </it>identified in the combined assembly were also polymorphic within the two geographically distant ssp. <it>wyomingensis </it>samples. Conclusion We have produced a large EST dataset for <it>Artemisia tridentata</it>, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. <it>wyomingensis </it>via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics

Author: A. P. Vogler
Agnarsson
Amoldi
Andersson
Bae
Baker
Bininda-Emonds
Binladen
Bocakova
C. L. Culverwell
Cameron
Cameron
Cameron
Chevreux
Cronn
Crowson
D. Ahrens
D. T. J. Littlewood
Drummond
Erlich
Friedrich
Goloboff
Gray
Hassanin
Hebert
Hejnol
Hong
Huang
Huelsenbeck
Huelsenbeck
Hughes
Hunt
J. Pons
Jex
Jex
Katoh
Kim
L. Bocak
Lartillot
Lartillot
Lawrence
Li
Longhorn
M. J. T. N. Timmermans
Mardis
Margulies
McComish
Milne
Nardi
Nylander
Parameswaran
Parks
Patterson
Phillips
Pons
Posada
Richter
Roehrdanz
S. Dodsworth
Sheffield
Sheffield
Sogin
Song
Stamatakis
Stewart
Wyman
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags (‘barcodes’). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three ‘bait’ sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species ‘barcodes’ that currently use the cox1 gene only

Natural History Museum Repository

CiteSeerX

Crossref

PubMed Central

Birkbeck Institutional Research Online

Digital.CSIC

University of Bedfordshire Repository

Adventures in the Enormous: A 1.8 Million Clone BAC Library for the 21.7 Gb Genome of Loblolly Pine

Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Clemson University: TigerPrints

Scholars Junction - Mississippi State University Institutional Repository

Variability among inbred lines and RFLP mapping of sunflower isozymes

Author: A.J. León
Alicia D. Carrera
Allen RD
Berry ST
Berry ST
Berry ST
Berry ST
Carrera A
Carrera A
Carrera A
Cronn R
Dry PJ
Felsenstein J
G. Pizarro
Gentzbittel L
Gentzbittel L
Hongtrakul V
Jackson RC
Jan CC
Kahler A
Kinman ML
Lander ES
Lay C
León AJ
Lu YH
Lynch M
M. Poverene
Mestries E
Miller JF
Nei M
Quillet MC
Quillet MC
Rieseberg L
Rieseberg L
Rieseberg L
Rieseberg L
Rieseberg L
Rieseberg LH
S. Feingold
S.T. Berry
Soltis D
Swofford DL
Tersac M
Torres A
Torres AM
Vear F
Zhang YX
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2002
Field of study

Crossref

Progenitor-Derivative Relationships of Hordeum Polyploids (Poaceae, Triticeae) Inferred from Sequences of TOPO6, a Nuclear Low-Copy Gene Region

Author: A Bergerat
A Rambaut
AJ Drummond
BR Baum
D Lahr
D Posada
D Roelofs
DE Soltis
DJ Rodriguez
DL Swofford
DR Ayres
Elvira Hörandl
F Hartung
F Hartung
FR Blattner
FR Blattner
FR Blattner
FR Blattner
Frank R. Blattner
G Petersen
G Petersen
GL Stebbins
GL Stebbins
H Akaike
H Akaike
H Tang
H Wang
I Linde-Laursen
I Álvarez
J Escobar
J Lihovà
JAA Nylander
JAA Nylander
Jonathan Brassac
JP Huelsenbeck
K Kakeda
K Tanno
L Heath
M Hasegawa
O Jaillon
R Cronn
R Velasco
R von Bothmer
RL Small
S Taketa
S Taketa
S Taketa
Sabine S. Jakob
SB Hoot
SP Otto
SS Jakob
SS Jakob
T Komatsuda
T Komatsuda
T Marcussen
T Pleines
T Sang
V Kotseruba
V Mahelka
Y Kraytsberg
Z Yang
Publication venue: Public Library of Science
Publication date: 30/03/2012
Field of study

Polyploidization is a major mechanism of speciation in plants. Within the barley genus Hordeum, approximately half of the taxa are polyploids. While for diploid species a good hypothesis of phylogenetic relationships exists, there is little information available for the polyploids (4×, 6×) of Hordeum. Relationships among all 33 diploid and polyploid Hordeum species were analyzed with the low-copy nuclear marker region TOPO6 for 341 Hordeum individuals and eight outgroup species. PCR products were either directly sequenced or cloned and on average 12 clones per individual were included in phylogenetic analyses. In most diploid Hordeum species TOPO6 is probably a single-copy locus. Most sequences found in polyploid individuals phylogenetically cluster together with sequences derived from diploid species and thus allow the identification of parental taxa of polyploids. Four groups of sequences occurring only in polyploid taxa are interpreted as footprints of extinct diploid taxa, which contributed to allopolyploid evolution. Our analysis identifies three key species involved in the evolution of the American polyploids of the genus. (i) All but one of the American tetraploids have a TOPO6 copy originating from the Central Asian diploid H. roshevitzii, the second copy clustering with different American diploid species. (ii) All hexaploid species from the New World have a copy of an extinct close relative of H. californicum and (iii) possess the TOPO6 sequence pattern of tetraploid H. jubatum, each with an additional copy derived from different American diploids. Tetraploid H. bulbosum is an autopolyploid, while the assumed autopolyploid H. brevisubulatum (4×, 6×) was identified as allopolyploid throughout most of its distribution area. The use of a proof-reading DNA polymerase in PCR reduced the proportion of chimerical sequences in polyploids in comparison to Taq polymerase

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Geogenic and atmospheric sources for volatile organic compounds in fumarolic emissions from Mt. Etna and Vulcano Island (Sicily, Italy)

Author: Abrajano
Allard
Anderson
Andreae
Astaf'ev
Barberi
Barberi
Basiuk
Beccaluva
Berndt
Bolognesi
Bonaccorso
Brasseur
Brune
Burnett
Butler
Cadle
Campaigne
Capaccioni
Capaccioni
Capaccioni
Capaccioni
Capaccioni
Capaccioni
Capasso
Carapezza
Chester
Chiodini
Chiodini
Chiodini
Chiodini
Chung
Cicerone
Cioni
Claridge
Cronn
Cronn
D. Rouwet
Darling
Derwent
Des Marais
Des Marais
Etiope
Etiope
Etiope
F. Capecchiacci
F. Tassi
Farman
Fiebig
Fink
Fischer
Foustoukos
Frische
Frérot
Fu
G. Chiodini
G. Pecoraino
Gaffney
Galimov
Galuszka
Gamlen
Gerlach
Giggenbach
Giggenbach
Giggenbach
Giggenbach
Gize
Granieri
Hall
Harnisch
Harnisch
Huber
Huber
Huizinga
Inn
Intergovernmental Panel on Climate Change
International Atomic Energy Agency
Isidorov
Isidorov
Isidorov
J. Cabassi
Jenden
Jordan
Jordan
Katritzky
Keene
Keller
Kelley
Kenney
Keppler
Keppler
Khalil
Kissin
Kiyosu
Laturnus
Leifer
Leythaeuser
Li
Liotta
Lobert
Lovelock
Mangani
Mangani
Mangani
Mango
Mango
Martini
Martini
McCollom
McCollom
McCollom
McCollom
McCulloch
Mercalli
Molina
Montegrossi
Muenow
Mériaudeau
Needs
Neri
Neri
Neri
O. Vaselli
Ogniben
Oremland
Panichi
Paonita
Pereira
Petherbridge
Porshnev
Potter
Proskurowski
Putschew
Rasmussen
Rasmussen
Rice
Rowland
Rucker
Rudolph
Rudolph
S. Calabrese
Salvi
Satterfield
Savage
Schiano
Schulz
Schulz
Schwandner
Seewald
Seward
Sherwood Lollar
Sherwood Lollar
Shock
Shock
Sicardi
Simoneit
Smith
Southward
Stoiber
Sturrock
Sugisaki
Sugisaki
Symonds
Symonds
Szatmari
Tamers
Tanguy
Taran
Taran
Taran
Taran
Taran
Taran
Tassi
Tassi
Tassi
Tassi
Tassi
Tassi
Tedesco
Tedesco
Thompson
Tomov
Tonarini
Vaselli
Wahrenberger
Welhan
Whiticar
Yu
Zolotov
Zolotov
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/01/2012
Field of study

In this paper, fluid source(s) and processes controlling the chemical composition of volatile organic compounds (VOCs) in gas discharges from Mt. Etna and Vulcano Island(Sicily, Italy) were investigated. The main composition of the Etnean and Volcano gas emissions is produced by mixing, to various degrees, of magmatic and hydrothermal components. VOCs are dominated by alkanes, alkenes and aromatics, with minor, though significant, concentrations of O-, S- and Cl(F)-substituted compounds. The main mechanism for the production of alkanes is likely related to pyrolysis of organic-matterbearing sediments that interact with the ascending magmatic fluids. Alkanes are then converted to alkene and aromatic compounds via catalytic reactions (dehydrogenation and dehydroaromatization, respectively). Nevertheless, an abiogenic origin for the light hydrocarbons cannot be ruled out. Oxidative processes of hydrocarbons at relatively high temperatures and oxidizing conditions, typical of these volcanic-hydrothermal fluids, may explain the production of alcohols, esters, aldehydes, as well as O- and S-bearing heterocycles. By comparing the concentrations of hydrochlorofluorocarbons (HCFCs) in the fumarolic discharges with respect to those of background air, it is possible to highlight that they have a geogenic origin likely due to halogenation of both methane and alkenes. Finally, chlorofluorocarbon (CFC) abundances appear to be consistent with background air, although the strong air contamination that affects the Mt. Etna fumaroles may mask a possible geogenic contribution for these compounds. On the other hand, no CFCs were detected in the Vulcano gases, which are characterized by low air contribution. Nevertheless, a geogenic source for these compounds cannot be excluded on the basis of the present data

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Deep Sequencing of the Nicastrin Gene in Pooled DNA, the Identification of Genetic Variants That Affect Risk of Alzheimer's Disease

Author: A Confaloni
A Orlacchio
AA Out
AW Butler
B Dermaut
B Wang
Belinda M. Martin
Bruno Vellas
D Harold
DC Koboldt
Denise Harold
DR Dries
DW Craig
E Cousin
E Levy-Lahad
E Sidransky
EI Rogaev
G McKhann
G Yu
Gillian Hamilton
H Li
Hilkka Soininen
IJ Deary
Iwona Kloszewska
J Mitsui
JC Lambert
John F. Powell
Kathryn Lord
L Zhong
Magda Tsolaki
Makrina Danillidou
MD Abramoff
Megan Pritchard
MF Folstein
Michelle K. Lupton
P Proitsi
Patrizia Mecocci
Paul Hollingworth
Petroula Proitsi
R Cronn
Richard Wroe
Roland Roberts
S Helisalmi
S Lovestone
S Nejentsev
S Prabhu
S Shah
S Sunyaev
Simon Lovestone
T Wang
TE Druley
TE Druley
V Bansal
Y Erlich
Z Ma
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Nicastrin is an obligatory component of the γ-secretase; the enzyme complex that leads to the production of Aβ fragments critically central to the pathogenesis of Alzheimer's disease (AD). Analyses of the effects of common variation in this gene on risk for late onset AD have been inconclusive. We investigated the effect of rare variation in the coding regions of the Nicastrin gene in a cohort of AD patients and matched controls using an innovative pooling approach and next generation sequencing. Five SNPs were identified and validated by individual genotyping from 311 cases and 360 controls. Association analysis identified a non-synonymous rare SNP (N417Y) with a statistically higher frequency in cases compared to controls in the Greek population (OR 3.994, CI 1.105–14.439, p = 0.035). This finding warrants further investigation in a larger cohort and adds weight to the hypothesis that rare variation explains some of genetic heritability still to be identified in Alzheimer's disease

Public Library of Science (PLOS)

Crossref

Online Research @ Cardiff

Directory of Open Access Journals

PubMed Central

UCL Discovery

Edinburgh Research Explorer

King's Research Portal

ResearchOnline@GCU