Search CORE

138 research outputs found

QSRA – a quality-value guided de novo short read assembler

Author: D Hernandez
Douglas W Bryant
DR Zerbino
J Butler
J Dohm
J Kent
MJ Chaisson
NG de Bruijn
R Cronn
R Warren
Todd C Mockler
W Jeck
Weng-Keen Wong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. Results We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. Conclusion QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

Author: Andrey Rzhetsky
CS Keith
D Hernandez
David N. Kuhn
DM Church
DR Zerbino
F Sanger
G Narzisi
Isidore Rigoutsos
J Shendure
JA Reinhardt
JC Dohm
JR Miller
JR Miller
JT Simpson
K Mavromatis
Laxmi Parida
MJ Chaisson
ML Metzker
Niina Haiminen
R Blakesley
R Cronn
R Li
R Li
S Altschul
S DiGuistini
S Gnerre
S Gnerre
S Ossowski
S Rounsley
SL Salzberg
W Zhang
WR Jeck
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

Author: B Fritig
B Langmead
BC Husband
Bryce A Richardson
C Iseli
CY Wan
D Tian
DA Pyke
DC Freeman
DJ Weber
E Meyer
E Novaes
ED McArthur
ED McArthur
ED McArthur
ED McArthur
ED McArthur
ED McArthur
F Bourgaud
F McCarthy
G Pareto
H Wang
Jared C Price
JC Vera
JH Graham
Joshua A Udall
K Kai
KJ Miglia
LC da Maia
LH Rieseberg
M Bamshad
MF Mahalovich
MJ Ford
ML Arnold
ML Shumar
MM Rowland
NE West
P Maughan
PJ Maughan
PJ Maughan
Prabin Bajgain
R Cronn
R Durrett
RG Kelsey
Richard C Cronn
RO Bray
S Götz
S Rozen
S Zeng
SP Otto
SR Eddy
T Parchman
T Thiel
TL Personius
U Arunyawat
W Wang
Y Yu
Y Yu
Y Zhang
Z Han
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Big sagebrush (<it>Artemisia tridentata</it>) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of <it>A. tridentata </it>sspp. <it>tridentata </it>and <it>vaseyana </it>were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. <it>tridentata </it>and 20,250 contigs in ssp. <it>vaseyana</it>. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. <it>wyomingensis</it>) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. <it>tridentata </it>and <it>vaseyana </it>identified in the combined assembly were also polymorphic within the two geographically distant ssp. <it>wyomingensis </it>samples. Conclusion We have produced a large EST dataset for <it>Artemisia tridentata</it>, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. <it>wyomingensis </it>via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Progenitor-Derivative Relationships of Hordeum Polyploids (Poaceae, Triticeae) Inferred from Sequences of TOPO6, a Nuclear Low-Copy Gene Region

Author: A Bergerat
A Rambaut
AJ Drummond
BR Baum
D Lahr
D Posada
D Roelofs
DE Soltis
DJ Rodriguez
DL Swofford
DR Ayres
Elvira Hörandl
F Hartung
F Hartung
FR Blattner
FR Blattner
FR Blattner
FR Blattner
Frank R. Blattner
G Petersen
G Petersen
GL Stebbins
GL Stebbins
H Akaike
H Akaike
H Tang
H Wang
I Linde-Laursen
I Álvarez
J Escobar
J Lihovà
JAA Nylander
JAA Nylander
Jonathan Brassac
JP Huelsenbeck
K Kakeda
K Tanno
L Heath
M Hasegawa
O Jaillon
R Cronn
R Velasco
R von Bothmer
RL Small
S Taketa
S Taketa
S Taketa
Sabine S. Jakob
SB Hoot
SP Otto
SS Jakob
SS Jakob
T Komatsuda
T Komatsuda
T Marcussen
T Pleines
T Sang
V Kotseruba
V Mahelka
Y Kraytsberg
Z Yang
Publication venue: Public Library of Science
Publication date: 30/03/2012
Field of study

Polyploidization is a major mechanism of speciation in plants. Within the barley genus Hordeum, approximately half of the taxa are polyploids. While for diploid species a good hypothesis of phylogenetic relationships exists, there is little information available for the polyploids (4×, 6×) of Hordeum. Relationships among all 33 diploid and polyploid Hordeum species were analyzed with the low-copy nuclear marker region TOPO6 for 341 Hordeum individuals and eight outgroup species. PCR products were either directly sequenced or cloned and on average 12 clones per individual were included in phylogenetic analyses. In most diploid Hordeum species TOPO6 is probably a single-copy locus. Most sequences found in polyploid individuals phylogenetically cluster together with sequences derived from diploid species and thus allow the identification of parental taxa of polyploids. Four groups of sequences occurring only in polyploid taxa are interpreted as footprints of extinct diploid taxa, which contributed to allopolyploid evolution. Our analysis identifies three key species involved in the evolution of the American polyploids of the genus. (i) All but one of the American tetraploids have a TOPO6 copy originating from the Central Asian diploid H. roshevitzii, the second copy clustering with different American diploid species. (ii) All hexaploid species from the New World have a copy of an extinct close relative of H. californicum and (iii) possess the TOPO6 sequence pattern of tetraploid H. jubatum, each with an additional copy derived from different American diploids. Tetraploid H. bulbosum is an autopolyploid, while the assumed autopolyploid H. brevisubulatum (4×, 6×) was identified as allopolyploid throughout most of its distribution area. The use of a proof-reading DNA polymerase in PCR reduced the proportion of chimerical sequences in polyploids in comparison to Taq polymerase

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Adventures in the Enormous: A 1.8 Million Clone BAC Library for the 21.7 Gb Genome of Loblolly Pine

Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Clemson University: TigerPrints

Scholars Junction - Mississippi State University Institutional Repository

Variability among inbred lines and RFLP mapping of sunflower isozymes

Author: A.J. León
Alicia D. Carrera
Allen RD
Berry ST
Berry ST
Berry ST
Berry ST
Carrera A
Carrera A
Carrera A
Cronn R
Dry PJ
Felsenstein J
G. Pizarro
Gentzbittel L
Gentzbittel L
Hongtrakul V
Jackson RC
Jan CC
Kahler A
Kinman ML
Lander ES
Lay C
León AJ
Lu YH
Lynch M
M. Poverene
Mestries E
Miller JF
Nei M
Quillet MC
Quillet MC
Rieseberg L
Rieseberg L
Rieseberg L
Rieseberg L
Rieseberg L
Rieseberg LH
S. Feingold
S.T. Berry
Soltis D
Swofford DL
Tersac M
Torres A
Torres AM
Vear F
Zhang YX
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2002
Field of study

Crossref

Deep Sequencing of the Nicastrin Gene in Pooled DNA, the Identification of Genetic Variants That Affect Risk of Alzheimer's Disease

Author: A Confaloni
A Orlacchio
AA Out
AW Butler
B Dermaut
B Wang
Belinda M. Martin
Bruno Vellas
D Harold
DC Koboldt
Denise Harold
DR Dries
DW Craig
E Cousin
E Levy-Lahad
E Sidransky
EI Rogaev
G McKhann
G Yu
Gillian Hamilton
H Li
Hilkka Soininen
IJ Deary
Iwona Kloszewska
J Mitsui
JC Lambert
John F. Powell
Kathryn Lord
L Zhong
Magda Tsolaki
Makrina Danillidou
MD Abramoff
Megan Pritchard
MF Folstein
Michelle K. Lupton
P Proitsi
Patrizia Mecocci
Paul Hollingworth
Petroula Proitsi
R Cronn
Richard Wroe
Roland Roberts
S Helisalmi
S Lovestone
S Nejentsev
S Prabhu
S Shah
S Sunyaev
Simon Lovestone
T Wang
TE Druley
TE Druley
V Bansal
Y Erlich
Z Ma
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Nicastrin is an obligatory component of the γ-secretase; the enzyme complex that leads to the production of Aβ fragments critically central to the pathogenesis of Alzheimer's disease (AD). Analyses of the effects of common variation in this gene on risk for late onset AD have been inconclusive. We investigated the effect of rare variation in the coding regions of the Nicastrin gene in a cohort of AD patients and matched controls using an innovative pooling approach and next generation sequencing. Five SNPs were identified and validated by individual genotyping from 311 cases and 360 controls. Association analysis identified a non-synonymous rare SNP (N417Y) with a statistically higher frequency in cases compared to controls in the Greek population (OR 3.994, CI 1.105–14.439, p = 0.035). This finding warrants further investigation in a larger cohort and adds weight to the hypothesis that rare variation explains some of genetic heritability still to be identified in Alzheimer's disease

Public Library of Science (PLOS)

Crossref

Online Research @ Cardiff

Directory of Open Access Journals

PubMed Central

UCL Discovery

Edinburgh Research Explorer

King's Research Portal

ResearchOnline@GCU

High-Throughput Sequencing of Six Bamboo Chloroplast Genomes: Phylogenetic Implications for Temperate Woody Bamboos (Poaceae: Bambusoideae)

BACKGROUND: Bambusoideae is the only subfamily that contains woody members in the grass family, Poaceae. In phylogenetic analyses, Bambusoideae, Pooideae and Ehrhartoideae formed the BEP clade, yet the internal relationships of this clade are controversial. The distinctive life history (infrequent flowering and predominance of asexual reproduction) of woody bamboos makes them an interesting but taxonomically difficult group. Phylogenetic analyses based on large DNA fragments could only provide a moderate resolution of woody bamboo relationships, although a robust phylogenetic tree is needed to elucidate their evolutionary history. Phylogenomics is an alternative choice for resolving difficult phylogenies. METHODOLOGY/PRINCIPAL FINDINGS: Here we present the complete nucleotide sequences of six woody bamboo chloroplast (cp) genomes using Illumina sequencing. These genomes are similar to those of other grasses and rather conservative in evolution. We constructed a phylogeny of Poaceae from 24 complete cp genomes including 21 grass species. Within the BEP clade, we found strong support for a sister relationship between Bambusoideae and Pooideae. In a substantial improvement over prior studies, all six nodes within Bambusoideae were supported with ≥0.95 posterior probability from Bayesian inference and 5/6 nodes resolved with 100% bootstrap support in maximum parsimony and maximum likelihood analyses. We found that repeats in the cp genome could provide phylogenetic information, while caution is needed when using indels in phylogenetic analyses based on few selected genes. We also identified relatively rapidly evolving cp genome regions that have the potential to be used for further phylogenetic study in Bambusoideae. CONCLUSIONS/SIGNIFICANCE: The cp genome of Bambusoideae evolved slowly, and phylogenomics based on whole cp genome could be used to resolve major relationships within the subfamily. The difficulty in resolving the diversification among three clades of temperate woody bamboos, even with complete cp genome sequences, suggests that these lineages may have diverged very rapidly

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

Author: A Abzhanov
A Cavallini
A Drescher
A Kumar
A Ratan
A Rokas
A Saini
AA Agrawal
AA Agrawal
Aaron Liston
AJ Alverson
AK Shukla
AL Delcher
ARD Ganley
BL Cantarel
C Vitte
C Vitte
CJ Nock
D Zerbino
DA Rasmussen
DC Sastri
DE Stage
DM Hillis
E Datema
E Hribova
E Willerslev
F Maggini
F Wu
FM You
G Bena
GC Conant
H Dempewolf
H Kuroda
H Li
HL Lee
I Baxter
I Milne
I Ovcharenko
J Abdelkrim
J Dolezel
J He
J Jurka
J Macas
J Murata
J Murata
JL Bennetzen
K Katoh
K Neubig
K Swaminathan
Kevin Weitemier
KM Webb
M Fishbein
M Fujiwara
M Nei
M Parks
Mark Fishbein
Matthew Parks
ME Hudson
MJ Moore
MM Guisinger
MM Guisinger
MV Sanchez-Puerta
N Cusimano
N Samson
N Whiteford
NV Borisjuk
P Cavagnaro
P Gornicki
P Nguyen
R Volkov
R Wyatt
RA Volkov
RC Cronn
RC Haberle
RC Thomson
Richard C Cronn
RK Jansen
RK Jansen
RM Lee
S Greiner
S Kurtz
S Meyers
S Rasmann
SB Broyles
SB Malcolm
SF Altschul
Shannon CK Straub
SK Wyman
SO Rogers
Solexa Inc
T Konishi
T Kubo
T Livshultz
TA Castoe
TA Hall
Tatyana Livshultz
TJ Givnish
U Meve
V Knoop
V Kode
V Ravi
VO Kolosha
W Arthofer
WJ Kent
X Bai
Y Jo
Y Sugiyama
Zachary Foster
ZQ Cai
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Milkweeds (<it>Asclepias </it>L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (<it>Asclepias syriaca </it>L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing <it>A. syriaca </it>as a model in ecology and evolution. Results A 0.5× genome of <it>A. syriaca </it>was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: <it>accD, clpP</it>, and <it>ycf1</it>. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/<it>copia</it>-like retroelements are the most common repeat type in the milkweed genome. At least one <it>A. syriaca </it>microread hit 88% of <it>Catharanthus roseus </it>(Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the <it>A. syriaca </it>genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and <it>A. syriaca </it>in particular, as ecological and evolutionary models.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SHAREOK repository