research

A comparative study of different strategies of batch effect removal in microarray data: a case study of three datasets

Abstract

Batch effects refer to the systematic non-biological variability that is introduced by experimental design and sample processing in microarray experiments. It is a common issue in microarray data and could introduce bias into the analysis, if ignored. Many batch effect removal methods have been developed. Previous comparative work has been focused on their effectiveness of batch effects removal and impact on downstream classification analysis. The most common type of analysis for microarray data is differential expression (DE) analysis, yet no study has examined the impact of these methods on downstream DE analysis, which identifies markers that are significantly associated with the outcome of interest. In this project, we investigated the performance of five popular batch effect removal methods, mean-centering, ComBat_p, ComBat_n, SVA, and ratio based methods, on batch effects reduction and their impact on DE analysis using three experimental datasets with different sources of batch effects. We found that the performance of these methods is data-dependent: simple mean-centering method performed reasonably well in all three datasets, but the more complicated algorithms such as ComBat method’s performance could be unstable for certain dataset and should be applied with caution. Given a new dataset, we recommend either using the mean-centering method or carefully investigating a few different batch removal methods and choosing the one that is the best for the data, if possible. This study has important public health significance because better handling of batch effect in microarray data can reduce biased results and lead to improved biomarker identification

    Similar works