653 research outputs found
Merging Mixture Components for Cell Population Identification in Flow Cytometry
We present a framework for the identification of cell subpopulations in
flow cytometry data based on merging mixture components using the
flowClust methodology. We show that the cluster merging algorithm
under our framework improves model fit and provides a better
estimate of the number of distinct cell subpopulations than
either Gaussian mixture models or flowClust, especially for
complicated flow cytometry data distributions. Our framework
allows the automated selection of the number of distinct cell
subpopulations and we are able to identify cases where the
algorithm fails, thus making it suitable for application in a high
throughput FCM analysis pipeline. Furthermore, we demonstrate a
method for summarizing complex merged cell subpopulations in a
simple manner that integrates with the existing flowClust
framework and enables downstream data analysis. We demonstrate the
performance of our framework on simulated and real FCM data. The
software is available in the flowMerge package through the
Bioconductor project
A computational framework to emulate the human perspective in flow cytometric data analysis
Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation.
<p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods.
<p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics
Identification and visualization of multidimensional antigen-specific T-cell populations in polychromatic cytometry data.
An important aspect of immune monitoring for vaccine development, clinical trials, and research is the detection, measurement, and comparison of antigen-specific T-cells from subject samples under different conditions. Antigen-specific T-cells compose a very small fraction of total T-cells. Developments in cytometry technology over the past five years have enabled the measurement of single-cells in a multivariate and high-throughput manner. This growth in both dimensionality and quantity of data continues to pose a challenge for effective identification and visualization of rare cell subsets, such as antigen-specific T-cells. Dimension reduction and feature extraction play pivotal role in both identifying and visualizing cell populations of interest in large, multi-dimensional cytometry datasets. However, the automated identification and visualization of rare, high-dimensional cell subsets remains challenging. Here we demonstrate how a systematic and integrated approach combining targeted feature extraction with dimension reduction can be used to identify and visualize biological differences in rare, antigen-specific cell populations. By using OpenCyto to perform semi-automated gating and features extraction of flow cytometry data, followed by dimensionality reduction with t-SNE we are able to identify polyfunctional subpopulations of antigen-specific T-cells and visualize treatment-specific differences between them
gEM/GANN: a multivariate computational strategy for auto-characterizing relationships between cellular and clinical phenotypes and predicting disease progression time using high-dimensional flow cytometry data
The dramatic increase in the complexity of flow cytometric datasets requires the development of new computational based approaches that can maximize the amount of information derived and overcome the limitations of traditional gating strategies. Herein, we present a multivariate computational analysis of the HIV-infected flow cytometry datasets that were provided as part of the FlowCAP-IV Challenge using unsupervised and supervised learning techniques. Out of 383 samples (stimulated and unstimulated), 191 samples were used as a training set (34 individuals whose disease did not progress, and 157 individuals whose disease did progress). Using the results from the training set, the participants in the Challenge were then asked to predict the condition and progression time of the remaining individuals (45 ‘non-progressors’ and 147 ‘progressors’). To achieve this, we first scaled down data resolution. We then excluded doublet cells from the analysis using Expectation Maximization approaches. We then standardized all samples into histograms and used Genetic Algorithm-Neural Network to extract feature sets from the datasets, the reliability of which were examined using WEKA-implemented classifiers. The selected feature set resulted in a high sensitivity and specificity for the discrimination of progressors and non-progressors in the training set (average True Positive Rate = 1.00 and average False Positive Rate = 0.033). The capacity of the feature set to predict real-time survival time was better when using data from the ‘unstimulated’ training set (r = 0.825). The p-values and 95% confidence interval logrank ratios between actual and predicted survival time in the test set were 0.682 and 0.9542±0.24 for the unstimulated dataset, and 0.4451 and 0.9173±0.23 for the stimulated dataset. Our analytic strategy has demonstrated a promising capacity to extract useful information from complex flow cytometry datasets, despite a significance imbalance and variation between the training and test sets
Clinico-pathological and transcriptomic determinants of SLFN11 expression in invasive breast carcinoma
Agile workflow for interactive analysis of mass cytometry data
Motivation: Single-cell proteomics technologies, such as mass cytometry, have enabled characterization of cell-tocell variation and cell populations at a single-cell resolution. These large amounts of data, require dedicated, interactive tools for translating the data into knowledge. Results: We present a comprehensive, interactive method called Cyto to streamline analysis of large-scale cytometry data. Cyto is a workflow-based open-source solution that automates the use of state-of-the-art single-cell analysis methods with interactive visualization. We show the utility of Cyto by applying it to mass cytometry data from peripheral blood and high-grade serous ovarian cancer (HGSOC) samples. Our results show that Cyto is able to reliably capture the immune cell sub-populations from peripheral blood and cellular compositions of unique immune- and cancer cell subpopulations in HGSOC tumor and ascites samples.Peer reviewe
MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data
Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST
- …