Search CORE

993 research outputs found

Comparing de novo assemblers for 454 transcriptome data

Author: A Barakat
A Guffanti
A Papanicolaou
AJ Enright
AL Eveland
AP Weber
B Chevreux
C Cantacessi
C Soderlund
C Sun
D Bellin
D Schwarz
DA Hahn
DR Zerbino
E Ghedin
E Kristiansson
E Meyer
E Novaes
F Cheung
F Cheung
F Roeding
F Zhang
FD Guerrero
G Pertea
H Wang
I Birol
I Milne
J Schmid
JC Vega-Arreguín
JC Vera
JE Allen
JR Monaghan
L Ferguson
M Margulies
M Zagrobelny
Mark L Blaxter
MS Barker
MS Clark
N Palmieri
PK Wall
RE Timme
RL Tatusov
RT Miller
S Altschul
S Jackman
S Zeng
SJ Emrich
SR Swindell
Sujai Kumar
TL Parchman
W Wang
WJ Kent
X Huang
Y Pauchet
Y Pauchet
Z Ning
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode <it>Litomosoides sigmodontis</it>. Results Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Hardware acceleration of genomics data analysis: challenges and opportunities

Author: Abdallah
Al Kawam
Al-Absi
Alser
Alser
Altschul
Angerer
Antipov
Arram
Arram
Audano
Ayling
Bahrebar
Banerjee
Bao
Bao
Barron
Behjati
Bohannan
Brittain
Broad Institute
Broad Institute
Cardon
Carrillo
Carrillo
Challis
Chen
Chen
Ciccolella
Cingolani
Clark
Croville
Das
Denti
Doan
Dobin
Du
Fei
Fleckhaus
Fonseca
Genome Research Ltd
Ghurye
Golosova
Goodwin
Goyal
Gök
Hackl
Hasnain
Houtgast
Hu
Illumina Inc
Jackson
Javed
Joardar
Joshi
Jourdren
Kaplan
Kent
Kim
Kim
Kosuri
Langmead
Langmead
Langmead
Lesk
Li
Li
Li
Li
Li
Li
Li
Lightbody
Lightbody
Liu
Liu
Liu
Lv
Margulies
Maruyama
Mcvicar
Milward
Muir
NCBI
Niedringhaus
Nsame
Orth
Oxford Nanopore Technologies
Park
Patel
Payne
Peddie
Rizzo
Robinson
Sarkar
Sboner
Schatz
Shang
Shang
Sharifi
Subbulakshmi
Sundfeld
Tian
Tsai
Turakhia
Turakhia
Wang
Wang
Ward
xilinx
Yano
Zaharia
Zokaee
Publication venue: 'Oxford University Press (OUP)'
Publication date: 25/05/2021
Field of study

Crossref

Ulster University's Research Portal

A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes

Author: A Chao
A Chao
C Quince
CD Sinigalliano
DT Kysela
E Pruesse
Elizabeth A. McCliment
F Zhu
GJ Olsen
Gordon Langsley
Hugh W. Ducklow
HW Ducklow
JA Huber
JD Neufeld
L Amaral-Zettler
L Medlin
Linda A. Amaral-Zettler
ML Sogin
NR Pace
PD Schloss
S Huse
S-M Lee
SM Huse
Susan M. Huse
W Ludwig
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2009
Field of study

© 2009 The Authors. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS ONE 4 (2009): e6372, doi:10.1371/journal.pone.0006372.Massively parallel pyrosequencing of amplicons from the V6 hypervariable regions of small-subunit (SSU) ribosomal RNA (rRNA) genes is commonly used to assess diversity and richness in bacterial and archaeal populations. Recent advances in pyrosequencing technology provide read lengths of up to 240 nucleotides. Amplicon pyrosequencing can now be applied to longer variable regions of the SSU rRNA gene including the V9 region in eukaryotes. We present a protocol for the amplicon pyrosequencing of V9 regions for eukaryotic environmental samples for biodiversity inventories and species richness estimation. The International Census of Marine Microbes (ICoMM) and the Microbial Inventory Research Across Diverse Aquatic Long Term Ecological Research Sites (MIRADA-LTERs) projects are already employing this protocol for tag sequencing of eukaryotic samples in a wide diversity of both marine and freshwater environments. Massively parallel pyrosequencing of eukaryotic V9 hypervariable regions of SSU rRNA genes provides a means of estimating species richness from deeply-sampled populations and for discovering novel species from the environment.This work was supported by grants from the W.M. Keck Foundation and the Woods Hole Center for Oceans and Human Health from the National Institutes of Health and National Science Foundation (NIH/NIEHS 1 P50 ES012742-01 and NSF/OCE 0430724-J) (LAZ and SH)

Public Library of Science (PLOS)

CiteSeerX

Crossref

Woods Hole Open Access Server

Directory of Open Access Journals

PubMed Central

Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.

Author: A Bird
A Hellman
A Meissner
AD Smith
Adam Olshen
AK Maunakea
Aleksandar Milosavljevic
Alexander Meissner
Allen Delaney
Andreas Gnirke
AP Feinberg
B Langmead
Bing Ren
Bradley E Bernstein
Brett E Johnson
C Coarfa
C Grunau
Charles B Epstein
Chibo Hong
Christoph Bock
Cristian Coarfa
D Serre
David Haussler
DM Duhl
FV Jacinto
G Bourque
G Kunarso
G Robertson
H Gu
H Lin
H O'Geen
Henriette O'Geen
Hongcang Gu
J Deng
Joseph F Costello
Joseph R Ecker
Junchen Gu
KD Robertson
Kevin J Forsberg
KR Blahnik
KS Pollard
LC Schalkwyk
Lorigail Echipare
M Pick
M Tahiliani
MA Gama-Sosa
Marco A Marra
Martin Hirst
Mattia Pelizzola
Michael Q Zhang
MP Ball
P Arnaud
Peggy J Farnham
PVK Pant
R Alan Harris
R David Hawkins
R Lister
R Lister
RA Waterland
RA Waterland
Raman P Nagarajan
Robert A Waterland
Ryan Lister
S Ito
S Kriaucionis
Sara L Downey
Shaun D Fouse
SJ Cokus
T Wang
TA Down
TE Ludwig
Ting Wang
Tracy Ballinger
Wei Li
Wen-Yu Chung
Xin Zhou
Y Xi
Yongjun Zhao
Yuanxin Xi
ZD Smith
Publication venue: eScholarship, University of California
Publication date: 01/01/2010
Field of study

Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression

Crossref

PubMed Central

eScholarship - University of California

Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

Author: Tork Bassam A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2013
Field of study

The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods

CiteSeerX

ScholarWorks @ Georgia State University

A genome-wide search for epigenetically regulated genes in zebra finch using MethylCap-seq and RNA-seq

Author: Bakker Antje
De Keulenaer Sarah
De Meester Ellen
De Meyer Tim
Diddens Jolien
Frankl-Vilches Carolina
Galle Jeroen
Sohnius-Wilhelmi Nina
Steyaert Sandra
Van Criekinge Wim
Van der Linden Annemie
Vanden Berghe Wim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Learning and memory formation are known to require dynamic CpG (de)methylation and gene expression changes. Here, we aimed at establishing a genome-wide DNA methylation map of the zebra finch genome, a model organism in neuroscience, as well as identifying putatively epigenetically regulated genes. RNA-and MethylCap-seq experiments were performed on two zebra finch cell lines in presence or absence of 5-aza-2'-deoxycytidine induced demethylation. First, the MethylCap-seq methodology was validated in zebra finch by comparison with RRBS-generated data. To assess the influence of (variable) methylation on gene expression, RNA-seq experiments were performed as well. Comparison of RNA-seq and MethylCap-seq results showed that at least 357 of the 3,457 AZA-upregulated genes are putatively regulated by methylation in the promoter region, for which a pathway analysis showed remarkable enrichment for neurological networks. A subset of genes was validated using Exon Arrays, quantitative RT-PCR and CpG pyrosequencing on bisulfite-treated samples. To our knowledge, this study provides the first genome-wide DNA methylation map of the zebra finch genome as well as a comprehensive set of genes of which transcription is under putative methylation control

Ghent University Academic Bibliography

PubMed Central

Institutional Repository Universiteit Antwerpen

WOODSTOCC: Extracting Latent Parallelism from a DNA Sequence Aligner on a GPU

Author: Buhler Jeremy D
Cole Stephen V
Gardner Jacob R
Publication venue: Washington University Open Scholarship
Publication date: 01/09/2015
Field of study

An exponential increase in the speed of DNA sequencing over the past decade has driven demand for fast, space-efficient algorithms to process the resultant data. The first step in processing is alignment of many short DNA sequences, or reads, against a large reference sequence. This work presents WOODSTOCC, an implementation of short-read alignment designed for Graphics Processing Unit (GPU) architectures. WOODSTOCC translates a novel CPU implementation of gapped short-read alignment, which has guaranteed optimal and complete results, to the GPU. Our implementation combines an irregular trie search with dynamic programming to expose regularly structured parallelism. We first describe this implementation, then discuss its port to the GPU. WOODSTOCC’s GPU port exploits three generally useful techniques for extracting regular parallelism from irregular computations: dynamic thread mapping with a worklist, kernel stage decoupling, and kernel slicing. We discuss the performance impact of these techniques and suggest further opportunities for improvement

Washington University St. Louis: Open Scholarship

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Author: Lee Byunghan
Moon Taesup
Weissman Tsachy
Yoon Sungroh
Publication venue
Publication date: 01/01/2017
Field of study

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare