Search CORE

21 research outputs found

Recommended from our members

A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

Author: D'Souza Mark
Harrison Travis
Keegan Kevin P.
Meyer Folker
Trimble William L.
Wilke Andreas
Wilkening Jared
Publication venue
Publication date: 21/12/2023
Field of study

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.</p

Knowledge UChicago

Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4–7, 2012

Author: Bao Yiming
Davies Neil
Edmunds Scott C.
Field Dawn
Garrity George M.
Gilbert Jack A.
Meyer Folker
Mizrachi Ilene
Moreau Corrie
Morrison Norman
Robbins Robert
Sansone Susanna-Assunta
Schriml Lynn M.
Smith Daniel P.
Sterk Peter
Wang Hui
Wilkening Jared
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This report details the outcome of the 13th Meeting of the Genomic Standards Consortium. The three-day conference was held at the Kingkey Palace Hotel, Shenzhen, China, on March 5–7, 2012, and was hosted by the Beijing Genomics Institute. The meeting, titled From Genomes to Interactions to Communities to Models, highlighted the role of data standards associated with genomic, metagenomic, and amplicon sequence data and the contextual information associated with the sample. To this end the meeting focused on genomic projects for animals, plants, fungi, and viruses; metagenomic studies in host-microbe interactions; and the dynamics of microbial communities. In addition, the meeting hosted a Genomic Observatories Network session, a Genomic Standards Consortium biodiversity working group session, and a Microbiology of the Built Environment session sponsored by the Alfred P. Sloan Foundatio

NERC Open Research Archive

A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

Author: Andreas Wilke
AR Quinlan
B Ewing
B Niu
C Quince
C Quince
C Quince
C von Mering
DH Huson
EA Dinsdale
F Meyer
Folker Meyer
HC Bravo
J Reeder
Jared Wilkening
JC Dohm
JG Caporaso
Kevin P. Keegan
KJ Hoff
KJ McKernan
M Margulies
Mark D'Souza
MJ Pallen
MP Cox
PJ Cock
R Seshadri
RA Freitas
RC Edgar
Scott Markel
SG Tringe
SM Huse
SM Huse
TD Harris
Travis Harrison
V Gomez-Alvarez
V Kunin
VM Markowitz
WC Kao
William L. Trimble
Y Sun
Publication venue: Public Library of Science
Publication date: 07/06/2012
Field of study

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

Author: Field Dawn
Glass Elizabeth M.
Harrison Travis
Kyrpides Nikos
Mavrommatis Konstantinos
Meyer Folker
Wilke Andreas
Wilkening Jared
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. Description We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. Conclusions The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

NERC Open Research Archive

DRISEE error profiles for metagenomic sequencing data sets.

Author: Andreas Wilke (42726)
Folker Meyer (11364)
Jared Wilkening (159286)
Kevin P. Keegan (159283)
Mark D'Souza (14094)
Travis Harrison (48999)
William L. Trimble (159285)
Publication venue
Publication date
Field of study

Total (% substitutions + % insertions + % deletions) DRISEE error (Y-axis) as a function of read position (X-axis) for all considered reads. (a) and (b): Phred vs. DRISEE: Total DRISEE (red) and average Phred (blue) derived errors (Q values converted to percent error) for (a) 20 metagenomic 454 samples and (b) 12 metagenomic Illumina samples. (c): DRISEE total error of several Illumina-based sample sets: DRISEE total error profiles are displayed for 5 different Illumina experiments/sample sets. Parentheses indicate the number of samples in each experiment/sample set. (d): DRISEE total error of single samples: DRISEE total error profiles are displayed for two individual samples. The samples represent the lowest and highest averaged DRISEE total errors (averaged across all read positions), observed in Sample Set 3 (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002541#pcbi-1002541-g004" target="_blank">Figure 4c</a> above). Pie charts indicate a summary of MG-RAST-based annotation of the two samples. The upper pie chart was produced from the data set that corresponds to the purple DRISEE profile (average DRISEE error = 45%). The lower pie chart corresponds to annotation of the data set that produced the green DRISEE profile (average DRISEE error = 1%).</p

The Francis Crick Institute

(a) Error detection capabilities of Score, Reference-genome, and DRISEE methods.

Author: Andreas Wilke (42726)
Folker Meyer (11364)
Jared Wilkening (159286)
Kevin P. Keegan (159283)
Mark D'Souza (14094)
Travis Harrison (48999)
William L. Trimble (159285)
Publication venue
Publication date
Field of study

(1) Simplified procedural diagram of a typical sequencing protocol. Sample collection: First, the biological sample is collected, Extraction/Initial purification: Then the RNA/DNA undergoes extraction and initial purification procedures, Pre-sequencing amplification(s): Next, the extracted genetic material may undergo amplification (e.g. whole genome amplification – see main text) followed by additional purifications and/or other processing procedures, “Sequencing”: Genetic material is placed in the sequencer itself, and is sequenced. Note that sequencing itself frequently involves additional rounds of amplification, Analyses of sequencing output: Sequencer outputs are analyzed. (2) Given a procedure such as A, the portion of the procedure over which score/Phred-based methods can detect error is indicated in red. (3) Given a procedure such as A, the portion of the procedure over which reference-genome-based methods can detect error is indicated in green. Note that reference-genome-based methods are only applicable to single genome data; they cannot consider metagenomic data. (4) Given a procedure such as A, the portion of the procedure over which DRISEE-based methods can detect error is indicated in blue. Note that DRISEE methods can be applied to metagenomic or genomic data, provided that certain requirements are met. See methods. 1: BMC Bioinformatics. 2008 Sep 19;9:386. 2: Nat Methods. 2010 May;7(5):335–6. Epub 2010 Apr 11. (b) DRISEE workflow The steps in a typical DRISEE workflow are depicted and briefly described (in figure captions). Please see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002541#pcbi.1002541.s002" target="_blank">Text S1</a> (Supplemental Methods, Typical DRISEE workflow) for a much more detailed description of each depicted step.</p

The Francis Crick Institute

A RESTful API for Accessing Microbial Community Data for MG-RAST

Author: Andreas Wilke (42726)
Elizabeth M. Glass (11353)
Folker Meyer (11364)
Hunter Matthews (682430)
Jared Bischof (255828)
Jared Wilkening (159286)
Mark D'Souza (14094)
Narayan Desai (682431)
Tobias Paczian (42722)
Tom Brettin (1557)
Travis Harrison (48999)
Wolfgang Gerlach (539010)
Publication venue
Publication date: 01/01/2015
Field of study

<div>Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, <a href="http://kbase.us" target="_blank">http://kbase.us</a>) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.</div

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.

Author: Andreas Wilke (42726)
Folker Meyer (11364)
Jared Wilkening (159286)
Kevin P. Keegan (159283)
Mark D'Souza (14094)
Travis Harrison (48999)
William L. Trimble (159285)
Publication venue
Publication date
Field of study

A boxplot (conventional five number summary) presents the distribution of averaged total DRISEE errors observed among 476 sequencing samples. The average total DRISEE error is plotted on the Y-axis. X-axis labels indicate the technology (454 or Illumina), type of sample (shotgun genomic or shotgun metagenomic), and in parenthesis, number of samples represented by each individual boxplot. Gray highlight indicates the range of values that have been previously reported for error on 454 and Illumina sequencing platforms (0.25–4%).</p

The Francis Crick Institute

DRISEE calculated Errors, separated by error type, for 454 and Illumina metagenomic samples.

Author: Andreas Wilke (42726)
Folker Meyer (11364)
Jared Wilkening (159286)
Kevin P. Keegan (159283)
Mark D'Souza (14094)
Travis Harrison (48999)
William L. Trimble (159285)
Publication venue
Publication date
Field of study

DRISEE error profiles are displayed for metagenomic data produced by the 454 (65 samples, (a)) and Illumina (127 samples, (b)) platforms. DRISEE determined errors (Y-axis) are plotted with respect to read position (X-axis). DRISEE errors are displayed as total (black) and type separated (A_sub = A substitutions, T_sub = T substitutions, C_sub = C substitutions, G_sub = G substitutions, and InDel indicates insertions and deletions).</p

The Francis Crick Institute