691 research outputs found
Removing batch effects for prediction problems with frozen surrogate variable analysis
Batch effects are responsible for the failure of promising genomic prognos-
tic signatures, major ambiguities in published genomic results, and retractions
of widely-publicized findings. Batch effect corrections have been developed to
re- move these artifacts, but they are designed to be used in population
studies. But genomic technologies are beginning to be used in clinical
applications where sam- ples are analyzed one at a time for diagnostic,
prognostic, and predictive applica- tions. There are currently no batch
correction methods that have been developed specifically for prediction. In
this paper, we propose an new method called frozen surrogate variable analysis
(fSVA) that borrows strength from a training set for individual sample batch
correction. We show that fSVA improves prediction ac- curacy in simulations and
in public genomic studies. fSVA is available as part of the sva Bioconductor
package
Transcriptomics in Toxicogenomics, Part II : Preprocessing and Differential Expression Analysis for High Quality Data
Preprocessing of transcriptomics data plays a pivotal role in the development of toxicogenomics-driven tools for chemical toxicity assessment. The generation and exploitation of large volumes of molecular profiles, following an appropriate experimental design, allows the employment of toxicogenomics (TGx) approaches for a thorough characterisation of the mechanism of action (MOA) of different compounds. To date, a plethora of data preprocessing methodologies have been suggested. However, in most cases, building the optimal analytical workflow is not straightforward. A careful selection of the right tools must be carried out, since it will affect the downstream analyses and modelling approaches. Transcriptomics data preprocessing spans across multiple steps such as quality check, filtering, normalization, batch effect detection and correction. Currently, there is a lack of standard guidelines for data preprocessing in the TGx field. Defining the optimal tools and procedures to be employed in the transcriptomics data preprocessing will lead to the generation of homogeneous and unbiased data, allowing the development of more reliable, robust and accurate predictive models. In this review, we outline methods for the preprocessing of three main transcriptomic technologies including microarray, bulk RNA-Sequencing (RNA-Seq), and single cell RNA-Sequencing (scRNA-Seq). Moreover, we discuss the most common methods for the identification of differentially expressed genes and to perform a functional enrichment analysis. This review is the second part of a three-article series on Transcriptomics in Toxicogenomics.Peer reviewe
An evaluation of processing methods for HumanMethylation450 BeadChip data
BackgroundIllumina's HumanMethylation450 arrays provide the most cost-effective means of high-throughput DNA methylation analysis. As with other types of microarray platforms, technical artifacts are a concern, including background fluorescence, dye-bias from the use of two color channels, bias caused by type I/II probe design, and batch effects. Several approaches and pipelines have been developed, either targeting a single issue or designed to address multiple biases through a combination of methods. We evaluate the effect of combining separate approaches to improve signal processing.ResultsIn this study nine processing methods, including both within- and between- array methods, are applied and compared in four datasets. For technical replicates, we found both within- and between-array methods did a comparable job in reducing variance across replicates. For evaluating biological differences, within-array processing always improved differential DNA methylation signal detection over no processing, and always benefitted from performing background correction first. Combinations of within-array procedures were always among the best performing methods, with a slight advantage appearing for the between-array method Funnorm when batch effects explained more variation in the data than the methylation alterations between cases and controls. However, when this occurred, RUVm, a new batch correction method noticeably improved reproducibility of differential methylation results over any of the signal-processing methods alone.ConclusionsThe comparisons in our study provide valuable insights in preprocessing HumanMethylation450 BeadChip data. We found the within-array combination of Noob + BMIQ always improved signal sensitivity, and when combined with the RUVm batch-correction method, outperformed all other approaches in performing differential DNA methylation analysis. The effect of the data processing method, in any given data set, was a function of both the signal and noise
Microarray Data Preprocessing: From Experimental Design to Differential Analysis
DNA microarray data preprocessing is of utmost importance in the analytical path starting from the experimental design and leading to a reliable biological interpretation. In fact, when all relevant aspects regarding the experimental plan have been considered, the following steps from data quality check to differential analysis will lead to robust, trustworthy results. In this chapter, all the relevant aspects and considerations about microarray preprocessing will be discussed. Preprocessing steps are organized in an orderly manner, from experimental design to quality check and batch effect removal, including the most common visualization methods. Furthermore, we will discuss data representation and differential testing methods with a focus on the most common microarray technologies, such as gene expression and DNA methylation.Peer reviewe
V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data.
SUMMARY: Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including \u27unwanted\u27 variation that needs to be removed in downstream analyses (e.g. batch effects) and \u27wanted\u27 or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying \u27wanted\u27 variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods.
AVAILABILITY AND IMPLEMENTATION: The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA.
CONTACT: [email protected] or [email protected].
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
- …