Search CORE

101 research outputs found

Advancing Benchmarks for Genome Sequencing

Author: Salit Marc
Zook Justin M.
Publication venue: Elsevier Inc.
Publication date: 23/09/2015
Field of study

Several recent benchmarking efforts provide reference datasets and samples to improve genome sequencing and calling of germline and somatic mutations

Elsevier - Publisher Connector

Learning from microarray interlaboratory studies: measures of precision for gene expression

Author: Duewer David L
Jones Wendell D
Reid Laura H
Salit Marc
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The ability to demonstrate the reproducibility of gene expression microarray results is a critical consideration for the use of microarray technology in clinical applications. While studies have asserted that microarray data can be "highly reproducible" under given conditions, there is little ability to quantitatively compare amongst the various metrics and terminology used to characterize and express measurement performance. Use of standardized conceptual tools can greatly facilitate communication among the user, developer, and regulator stakeholders of the microarray community. While shaped by less highly multiplexed systems, measurement science (metrology) is devoted to establishing a coherent and internationally recognized vocabulary and quantitative practice for the characterization of measurement processes. Results The two independent aspects of the metrological concept of "accuracy" are "trueness" (closeness of a measurement to an accepted reference value) and "precision" (the closeness of measurement results to each other). A carefully designed collaborative study enables estimation of a variety of gene expression measurement precision metrics: repeatability, several flavors of intermediate precision, and reproducibility. The three 2004 Expression Analysis Pilot Proficiency Test collaborative studies, each with 13 to 16 participants, provide triplicate microarray measurements on each of two reference RNA pools. Using and modestly extending the consensus ISO 5725 documentary standard, we evaluate the metrological precision figures of merit for individual microarray signal measurement, building from calculations appropriate to single measurement processes, such as technical replicate expression values for individual probes on a microarray, to the estimation and display of precision functions representing all of the probes in a given platform. Conclusion With only modest extensions, the established metrological framework can be fruitfully used to characterize the measurement performance of microarray and other highly multiplexed systems. Precision functions, summarizing routine precision metrics estimated from appropriately repeated measurements of one or more reference materials as functions of signal level, are demonstrated and merit further development for characterizing measurement platforms, monitoring changes in measurement system performance, and comparing performance among laboratories or analysts.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays

Author: Cam Margaret C
Lee Joseph C
Lu Jun
Salit Marc L
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis. RESULTS: Using AceView, a comprehensive human transcript database, we have reannotated the probes by matching them to RNA transcripts instead of genes. Based on this transcript-level annotation, a new probe set definition was created in which every probe in a probe set maps to a common set of AceView gene transcripts. In addition, using artificial data sets we identified that a minimal probe set size of 4 is necessary for reliable statistical summarization. We further demonstrate that applying the new probe set definition can detect specific transcript variants contributing to differential expression and it also improves cross-platform concordance. CONCLUSION: We conclude that our transcript-level reannotation and redefinition of probe sets complement the original Affymetrix design. Redefinitions introduce probe sets whose sizes may not support reliable statistical summarization; therefore, we advocate using our transcript-level mapping redefinition in a secondary analysis step rather than as a replacement. Knowing which specific transcripts are differentially expressed is important to properly design probe/primer pairs for validation purposes. For convenience, we have created custom chip-description-files (CDFs) and annotation files for our new probe set definitions that are compatible with Bioconductor, Affymetrix Expression Console or third party software

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Author: Chapman Brad Alan
Hide Winston
Hofmann Oliver Marc
Mittelman David
Salit Marc
Wang Jason
Zook Justin M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2013
Field of study

Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method

arXiv.org e-Print Archive

Harvard University - DASH

Using mixtures of biological samples as process controls for RNA-sequencing experiments

Author: Jennifer McDaniel
Jerod Parsons
Marc Salit
Michele Mehaffey
P. Scott Pine
Sarah Munro
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Bland-Altman log-ratio(M) - log average (A) plots comparing gene expression in BLM-1 to BLM-2, which were mixed with a designed ratio of 1:1 brain RNA, 2:1 muscle RNA and 1:2 liver RNA.âPoints representing gene expression values for genes expressed at 5-fold greater levels in a specific tissue are colored based on the tissue in which they are selectively expressed.âNon-tissue selective RNA are omitted for clarity. Library size normalization scales all libraries to a common total number of counts, while upper quartile normalization scales to the 75th percentile of the counts for each library. None of these normalizations accurately reflects the designed ratio of transcripts between samples. (PNG 473 kb

Springer - Publisher Connector

FigShare

PEPR: pipelines for evaluating prokaryotic references

Author: Daniel V. Samarov
Justin M. Zook
Marc L. Salit
Nathan D. Olson
Scott A. Jackson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A framework for assessing 16S rRNA marker-gene survey data analysis methods using mixtures.

Author: Braccia Domenick J.
Corrada Bravo Hector
Hao Stephanie
Kumar M. Senthil
Li Shan
Olson Nathan D.
Salit Marc L.
Stine O. Colin
Timp Winston
Publication venue: Springer Nature
Publication date: 13/03/2020
Field of study

There are a variety of bioinformatic pipelines and downstream analysis methods for analyzing 16S rRNA marker-gene surveys. However, appropriate assessment datasets and metrics are needed as there is limited guidance to decide between available analysis methods. Mixtures of environmental samples are useful for assessing analysis methods as one can evaluate methods based on calculated expected values using unmixed sample measurements and the mixture design. Previous studies have used mixtures of environmental samples to assess other sequencing methods such as RNAseq. But no studies have used mixtures of environmental to assess 16S rRNA sequencing. We developed a framework for assessing 16S rRNA sequencing analysis methods which utilizes a novel two-sample titration mixture dataset and metrics to evaluate qualitative and quantitative characteristics of count tables. Our qualitative assessment evaluates feature presence/absence exploiting features only present in unmixed samples or titrations by testing if random sampling can account for their observed relative abundance. Our quantitative assessment evaluates feature relative and differential abundance by comparing observed and expected values. We demonstrated the framework by evaluating count tables generated with three commonly used bioinformatic pipelines: (i) DADA2 a sequence inference method, (ii) Mothur a de novo clustering method, and (iii) QIIME an open-reference clustering method. The qualitative assessment results indicated that the majority of Mothur and QIIME features only present in unmixed samples or titrations were accounted for by random sampling alone, but this was not the case for DADA2 features. Combined with count table sparsity (proportion of zero-valued cells in a count table), these results indicate DADA2 has a higher false-negative rate whereas Mothur and QIIME have higher false-positive rates. The quantitative assessment results indicated the observed relative abundance and differential abundance values were consistent with expected values for all three pipelines. We developed a novel framework for assessing 16S rRNA marker-gene survey methods and demonstrated the framework by evaluating count tables generated with three bioinformatic pipelines. This framework is a valuable community resource for assessing 16S rRNA marker-gene survey bioinformatic methods and will help scientists identify appropriate analysis methods for their marker-gene surveys.https://doi.org/10.1186/s40168-020-00812-

Digital Repository at the University of Maryland

Exploring the use of internal and externalcontrols for assessing microarray technical performance

Author: A Kauffmann
A Kauffmann
A Kauffmann
AA Ellington
Affymetrix Inc
AJ Holloway
BM Bolstad
C Tomlinson
CA Ball
CL Yauk
David L Duewer
DL Duewer
EF Petricoin
FCP Holstege
GA Held
H van Bakel
Helen C Causton
IV Yang
J Brettschneider
Katrice A Lippa
L Gautier
Laurence Game
LH Reid
LJ Kricka
LM Shi
M Navarange
Marc L Salit
ML Salit
MN McCall
R Mansourian
RA Irizarry
RA Irizarry
SC Baker
SE Choe
TR Hughes
W Liggett
W Zhang
WD Tong
XH Fan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The maturing of gene expression microarray technology and interest in the use of microarray-based applications for clinical and diagnostic applications calls for quantitative measures of quality. This manuscript presents a retrospective study characterizing several approaches to assess technical performance of microarray data measured on the Affymetrix GeneChip platform, including whole-array metrics and information from a standard mixture of external spike-in and endogenous internal controls. Spike-in controls were found to carry the same information about technical performance as whole-array metrics and endogenous "housekeeping" genes. These results support the use of spike-in controls as general tools for performance assessment across time, experimenters and array batches, suggesting that they have potential for comparison of microarray data generated across species using different technologies. Results A layered PCA modeling methodology that uses data from a number of classes of controls (spike-in hybridization, spike-in polyA+, internal RNA degradation, endogenous or "housekeeping genes") was used for the assessment of microarray data quality. The controls provide information on multiple stages of the experimental protocol (e.g., hybridization, RNA amplification). External spike-in, hybridization and RNA labeling controls provide information related to both assay and hybridization performance whereas internal endogenous controls provide quality information on the biological sample. We find that the variance of the data generated from the external and internal controls carries critical information about technical performance; the PCA dissection of this variance is consistent with whole-array quality assessment based on a number of quality assurance/quality control (QA/QC) metrics. Conclusions These results provide support for the use of both external and internal RNA control data to assess the technical quality of microarray experiments. The observed consistency amongst the information carried by internal and external controls and whole-array quality measures offers promise for rationally-designed control standards for routine performance monitoring of multiplexed measurement platforms.</p

Crossref

Springer - Publisher Connector

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Achieving high-sensitivity for clinical applications using augmented exome sequencing

Author: Anil Patwardhan
Atul J. Butte
Carlos Bustamante
Christian Haudenschild
Deanna M. Church
Euan Ashley
Gabor Bartha
Gemma Chandratillake
Jason Harris
Jeanie Tirch
John West
Justin Zook
Marc Salit
Mark Pratt
Massimo Morra
Michael Clark
Michael Snyder
Ming Li
Nan Leng
Richard Chen
Russ Altman
Sarah Garcia
Scott Kirk
Shujun Luo
Stephen Chervitz
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

daSNVs in ACMG genes where inadequate coverage was observed among at least one platform, using WES/ACE data normalized to both 12 Gb and 100Ă mean coverage. (XLSX 312Â kb

Springer - Publisher Connector

FigShare