4 research outputs found

    Probability state modeling of memory CD8+ T-cell differentiation

    Get PDF
    AbstractFlow cytometric analysis enables the simultaneous single-cell interrogation of multiple biomarkers for phenotypic and functional identification of heterogeneous populations. Analysis of polychromatic data has become increasingly complex with more measured parameters. Furthermore, manual gating of multiple populations using standard analysis techniques can lead to errors in data interpretation and difficulties in the standardization of analyses. To characterize high-dimensional cytometric data, we demonstrate the use of probability state modeling (PSM) to visualize the differentiation of effector/memory CD8+ T cells. With this model, four major CD8+ T-cell subsets can be easily identified using the combination of three markers, CD45RA, CCR7 (CD197), and CD28, with the selection markers CD3, CD4, CD8, and side scatter (SSC). PSM enables the translation of complex multicolor flow cytometric data to pathway-specific cell subtypes, the capability of developing averaged models of healthy donor populations, and the analysis of phenotypic heterogeneity. In this report, we also illustrate the heterogeneity in memory T-cell subpopulations as branched differentiation markers that include CD127, CD62L, CD27, and CD57

    From cellular characteristics to disease diagnosis: uncovering phenotypes with supercells

    Get PDF
    Cell heterogeneity and the inherent complexity due to the interplay of multiple molecular processes within the cell pose difficult challenges for current single-cell biology. We introduce an approach that identifies a disease phenotype from multiparameter single-cell measurements, which is based on the concept of "supercell statistics", a single-cell-based averaging procedure followed by a machine learning classification scheme. We are able to assess the optimal tradeoff between the number of single cells averaged and the number of measurements needed to capture phenotypic differences between healthy and diseased patients, as well as between different diseases that are difficult to diagnose otherwise. We apply our approach to two kinds of single-cell datasets, addressing the diagnosis of a premature aging disorder using images of cell nuclei, as well as the phenotypes of two non-infectious uveitides (the ocular manifestations of Behçet's disease and sarcoidosis) based on multicolor flow cytometry. In the former case, one nuclear shape measurement taken over a group of 30 cells is sufficient to classify samples as healthy or diseased, in agreement with usual laboratory practice. In the latter, our method is able to identify a minimal set of 5 markers that accurately predict Behçet's disease and sarcoidosis. This is the first time that a quantitative phenotypic distinction between these two diseases has been achieved. To obtain this clear phenotypic signature, about one hundred CD8+ T cells need to be measured. Although the molecular markers identified have been reported to be important players in autoimmune disorders, this is the first report pointing out that CD8+ T cells can be used to distinguish two systemic inflammatory diseases. Beyond these specific cases, the approach proposed here is applicable to datasets generated by other kinds of state-of-the-art and forthcoming single-cell technologies, such as multidimensional mass cytometry, single-cell gene expression, and single-cell full genome sequencing techniques.Instituto de Física de Líquidos y Sistemas Biológico

    MULTI-DIMENSIONAL ANALYSIS APPROACHES FOR HETEROGENEOUS SINGLE-CELL DATA

    Get PDF
    Improvements in experimental techniques have led to an explosion of information in biology research. The increasing number of measurements comes with challenges in analyzing resulting data, as well as opportunities to obtain deeper insights of biological systems. Conventional average based methods are unfit to analyze high dimensional datasets since they fail to take full advantage of such rich information. More importantly, they are not able to capture the heterogeneity that is prevalent in biological systems. Sophisticated algorithms that are able to utilize all available measurements simultaneously are hence emerging rapidly. These algorithms excel at making full use of information within datasets and revealing detailed heterogeneity. However, there are several important disadvantages of existing algorithms. First, specific knowledge in statistics or machine learning is required to appropriately interpret and tune parameters in these algorithms for future use. This may result in misusage and misinterpretation. Second, using all measurements with equal weighting runs the risk of noise contamination. In addition, information overload has become more common in biology research, with a large volume of irrelevant measurements. Third, regardless of the quality of measurements, analysis methods that simultaneously use a large number of measurements need to avoid the “curse of dimensionality”, which warns that distance estimation and nearest neighbor estimation are not meaningful in high dimensional space. However, most current sophisticated algorithms involve distance estimation and/or nearest neighbor estimation. In this dissertation, my goal is to build analysis methods that are complex enough to capture heterogeneity and at the same time output results in a format that is easy to interpret and familiar to biologists and medical researchers. I tackle the dimension reduction problem by finding not the best subspace but dividing them into multiple subspaces and examine them one by one. I demonstrate my methods with three types of datasets: image-based high-throughput screening data, flow cytometry data, and mass cytometry data. From each dataset, I was able to discover new biological insights as well as re-validate well-established findings with my methods
    corecore