'Royal College of Obstetricians & Gynaecologists (RCOG)'
Abstract
Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell
technology able to provide large samples of protein readouts. Already, there exists a
large pool of advanced high-dimensional analysis algorithms that explore the observed
heterogeneous distributions making intriguing biological inferences. A fact largely
overlooked by these methods, however, is the effect of the established data
preprocessing pipeline to the distributions of the measured quantities. In this article,
we focus on randomization, a transformation used for improving data visualization,
which can negatively affect multivariate data analysis methods such as dimensionality
reduction, clustering, and network reconstruction algorithms. Our results indicate that
randomization should be used only for visualization purposes, but not in conjunction
with high-dimensional analytical tools