Search CORE

7 research outputs found

Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics’ GemCode Sequencing Data

Author: Benjamin P. Vandervalk (3121776)
Chen Yang (207381)
Inanc Birol (277074)
Joerg Bohlmann (280238)
Lauren Coombe (3121782)
René L. Warren (276366)
Richard A. Moore (303942)
Robert A. Holt (222944)
Robin J. Coope (3121779)
Shaun D. Jackman (746366)
Stephen Pleasance (511217)
Steven J. M. Jones (63660)
Publication venue
Publication date: 15/09/2016
Field of study

<div>The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.</div

Directory of Open Access Journals

PubMed Central

FigShare

Molecular Phylogenetic analysis of five conifer chloroplast genomes by Maximum Likelihood method.

Author: Benjamin P. Vandervalk (3121776)
Chen Yang (207381)
Inanc Birol (277074)
Joerg Bohlmann (280238)
Lauren Coombe (3121782)
René L. Warren (276366)
Richard A. Moore (303942)
Robert A. Holt (222944)
Robin J. Coope (3121779)
Shaun D. Jackman (746366)
Stephen Pleasance (511217)
Steven J. M. Jones (63660)
Publication venue
Publication date
Field of study

The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref020" target="_blank">20</a>]. The tree with the highest log likelihood is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 5 chloroplast genome nucleotide sequences, white spruce (Picea glauca genotype PG29, KT634228.1 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref004" target="_blank">4</a>]), Norway spruce (P. abies NC021456.1 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref005" target="_blank">5</a>]), Sitka spruce (P. sitchensis [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref002" target="_blank">2</a>] from *our study KU215903.1 and from **previous public genome sequence EU998739.3 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref019" target="_blank">19</a>]), and Japanese black pine (Pinus thunbergii NC_001631.1 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref021" target="_blank">21</a>]). Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 106,346 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref022" target="_blank">22</a>].</p

FigShare

Alignments of the Sitka spruce chloroplast genome to the white spruce and Norway spruce chloroplast genomes.

Author: Benjamin P. Vandervalk (3121776)
Chen Yang (207381)
Inanc Birol (277074)
Joerg Bohlmann (280238)
Lauren Coombe (3121782)
René L. Warren (276366)
Richard A. Moore (303942)
Robert A. Holt (222944)
Robin J. Coope (3121779)
Shaun D. Jackman (746366)
Stephen Pleasance (511217)
Steven J. M. Jones (63660)
Publication venue
Publication date
Field of study

The cross_match alignments were visualized using XMatchView. Histograms at the top and bottom show the sequence identity (S.I.) over the length of the alignments, including those from repeated sequences. The dark blue represents sequences repeated only once, while the light blue represents sequences repeated twice. The middle section represents co-linear and inverted sequence alignment blocks in blue and pink, respectively.</p

FigShare

The complete plastid genome of Sitka spruce.

Author: Benjamin P. Vandervalk (3121776)
Chen Yang (207381)
Inanc Birol (277074)
Joerg Bohlmann (280238)
Lauren Coombe (3121782)
René L. Warren (276366)
Richard A. Moore (303942)
Robert A. Holt (222944)
Robin J. Coope (3121779)
Shaun D. Jackman (746366)
Stephen Pleasance (511217)
Steven J. M. Jones (63660)
Publication venue
Publication date
Field of study

The Sitka spruce chloroplast genome was annotated using MAKER and plotted using OrganellarGenomeDRAW [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0163059#pone.0163059.ref025" target="_blank">25</a>]. The inner grey track depicts the G+C content of the genome. The genome comprises 74 coding genes, 4 ribosomal RNA (rRNA), 36 transfer RNA (tRNA) genes and 14 ORFs.</p

FigShare

Automated high throughput nucleic acid purification from formalin-fixed paraffin-embedded tissue samples for next generation sequence analysis

<div>Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95–100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.</div

Directory of Open Access Journals

FigShare

Automated high throughput FormaPure-based extraction protocol.

(A) Work flow illustration of sample acquisition, upstream sample processing and extraction. Note that a separate high temperature incubation step is added to facilitate the reversal of remaining crosslinks. The upstream processes are manual in the original protocol whereas those steps are modified to be suitable for automation in the modified protocol. The in-house on-deck heating blocks were instrumental in rendering the lysis/deparaffinization steps automatable. Acquisition of samples in SBS format matrix tubes with their automated capping and decapping were also further measures that allowed the entire process to be amenable for automated liquid handling. (B) gDNA yield. Historical gDNA yield data from the Qiagen/High Pure protocol (Q; n = 142) using equivalent sizes of numerous FFPE samples of lymphoma origin was compared with that of the FormaPure protocol (F; n-91). (C) RNA yield. Comparison of the Qiagen-High Pure (Q-H), and FormaPure (F) protocols are shown. N = 142 for Q-H and N = 44 for F.</p

FigShare

Suitability of the FormaPure extracted RNA for FFPE strand-specific RNA-seq.

(A) Strand-specific libraries were generated from four different FormaPure extracted human FFPE samples (FFPE A-D) and UHR fresh RNA. Two different total (DNase-treated) RNA input amounts were used (100 and 200 ng, respectively). Final library yield (nM) (left panel) and % duplicates (middle panel) as well as the distribution of aligned reads to various regions of the transcriptome (right panel) are shown graphically. These libraries were sequenced as a pool at PE75 bp. (B) Comparison of Qiagen and FormaPure extraction protocols using mouse FFPE scrolls. Final library yield (nM) (Left panel) and % duplicates, % aligned, and the distribution of aligned reads to various regions of the transcriptome (middle panel) as well as number of genes with 1x coverage (right panel) are shown graphically.</p

FigShare