2,327 research outputs found

    Automated and reproducible cell identification in mass cytometry using neural networks

    Get PDF
    The principal use of mass cytometry is to identify distinct cell types and changes in their composition, phenotype and function in different samples and conditions. Combining data from different studies has the potential to increase the power of these discoveries in diverse fields such as immunology, oncology and infection. However, current tools are lacking in scalable, reproducible and automated methods to integrate and study data sets from mass cytometry that often use heterogenous approaches to study similar samples. To address these limitations, we present two novel developments: (1) a pre-trained cell identification model named Immunopred that allows automated identification of immune cells without user-defined prior knowledge of expected cell types and (2) a fully automated cytometry meta-analysis pipeline built around Immunopred. We evaluated this pipeline on six COVID-19 study data sets comprising 270 unique samples and uncovered novel significant phenotypic changes in the wider immune landscape of COVID-19 that were not identified when each study was analyzed individually. Applied widely, our approach will support the discovery of novel findings in research areas where cytometry data sets are available for integration

    A biology-driven deep generative model for cell-type annotation in cytometry

    Full text link
    Cytometry enables precise single-cell phenotyping within heterogeneous populations. These cell types are traditionally annotated via manual gating, but this method suffers from a lack of reproducibility and sensitivity to batch-effect. Also, the most recent cytometers - spectral flow or mass cytometers - create rich and high-dimensional data whose analysis via manual gating becomes challenging and time-consuming. To tackle these limitations, we introduce Scyan (https://github.com/MICS-Lab/scyan), a Single-cell Cytometry Annotation Network that automatically annotates cell types using only prior expert knowledge about the cytometry panel. We demonstrate that Scyan significantly outperforms the related state-of-the-art models on multiple public datasets while being faster and interpretable. In addition, Scyan overcomes several complementary tasks such as batch-effect removal, debarcoding, and population discovery. Overall, this model accelerates and eases cell population characterisation, quantification, and discovery in cytometry

    GateNet: A novel Neural Network Architecture for Automated Flow Cytometry Gating

    Full text link
    Flow cytometry is widely used to identify cell populations in patient-derived fluids such as peripheral blood (PB) or cerebrospinal fluid (CSF). While ubiquitous in research and clinical practice, flow cytometry requires gating, i.e. cell type identification which requires labor-intensive and error-prone manual adjustments. To facilitate this process, we designed GateNet, the first neural network architecture enabling full end-to-end automated gating without the need to correct for batch effects. We train GateNet with over 8,000,000 events based on N=127 PB and CSF samples which were manually labeled independently by four experts. We show that for novel, unseen samples, GateNet achieves human-level performance (F1 score ranging from 0.910 to 0.997). In addition we apply GateNet to a publicly available dataset confirming generalization with an F1 score of 0.936. As our implementation utilizes graphics processing units (GPU), gating only needs 15 microseconds per event. Importantly, we also show that GateNet only requires ~10 samples to reach human-level performance, rendering it widely applicable in all domains of flow cytometry

    Functional Analysis of Immunocompromised Patients’ Leucocytes by Single-cell Mass Cytometry

    Get PDF
    Immunodeficiencies make up a large group of diseases characterized by heterogeneous clinical manifestations, including life-threatening infections, autoimmunity, chronic inflammation, allergy and malignant diseases. They are classically divided in primary (PID) and secondary (SID) immunodeficiencies and they can be caused by monogenic defects or be secondary to exogenous factors, malignant or non-malignant diseases. In the last 20 years, accelerating progress has been made in identifying new forms of PIDs thanks to the advances of molecular and genetic characterizations. These disorders are either diagnosed early in life or even later, in adults. It is estimated that 1-2% of the population might be affected with any type of the whole PID spectrum. Immune cell characterization, particularly by flow cytometry techniques, has extensively showed its importance in the clinical management of patients presenting immune deficiencies with quantitative cell defects, as well as in the understanding of the immune system. It has already improved the classification of immunological diseases, as well as contributed to improve treatment efficacy and follow-up. Recently, mass cytometry techniques have been used for diagnostic purposes, significantly increasing the breadth and depth of the functional and phenotypic characterization of a patient’s immune cells, in comparison to traditional flow cytometry techniques. These advancements are driven by the great increase in measurable parameters provided by mass cytometry, which allows for all major known immune cell populations and subpopulations to be characterized with a single analysis. The major contribution of this research resides in directly testing the functional activity and response of a patient's immune cells to different stimuli. The highly multiparametric nature of mass cytometry allows for both a broad and in depth characterization of the functional immune response using only a minimal volume of a patient's blood (1 mL) with results available within one day, thus drastically improving time to diagnosis. In addition to having a proportional and phenotypic characterization of a patient's immune cells, identifying the functionally abnormal cell population(s) will provide the clinicians with an even better understanding of their patient's immunological defect. Interpretation of the mass cytometry results along with the patient's clinical data will allow for the identification of signatures associated with specific immunological defects, new classes of immunodeficiencies and therapies that are best adapted to a specific class of an immunological disorder, hence improving the diagnosis and the benefits for immunocompromised patients

    flowLearn: Fast and precise identification and quality checking of cell populations in flow cytometry

    Get PDF
    Lux M, Brinkman RR, Chauve C, et al. flowLearn: Fast and precise identification and quality checking of cell populations in flow cytometry. Bioinformatics. 2018;34(13):2245-2253.Motivation Identification of cell populations in flow cytometry is a critical part of the analysis and lays the groundwork for many applications and research discovery. The current paradigm of manual analysis is time consuming and subjective. A common goal of users is to replace manual analysis with automated methods that replicate their results. Supervised tools provide the best performance in such a use case, however they require fine parameterization to obtain the best results. Hence, there is a strong need for methods that are fast to setup, accurate and interpretable. Results flowLearn is a semi-supervised approach for the quality-checked identification of cell populations. Using a very small number of manually gated samples, through density alignments it is able to predict gates on other samples with high accuracy and speed. On two state-of-the-art data sets, our tool achieves median(F1)-measures exceeding 0.99 for 31%, and 0.90 for 80% of all analyzed populations. Furthermore, users can directly interpret and adjust automated gates on new sample files to iteratively improve the initial training

    Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data

    Full text link
    Clustering high-dimensional data, such as images or biological measurements, is a long-standingproblem and has been studied extensively. Recently, Deep Clustering has gained popularity due toits flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model.The model can learn multi-modal distributions of high-dimensional data and use these to generaterealistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder(VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts.Additionally, we encourage the lower dimensional latent representation of our model to follow aGaussian mixture distribution and to accurately represent the similarities between the data points. Weassess the performance of our model on the MNIST benchmark data set and challenging real-worldtasks of clustering mouse organs from single-cell RNA-sequencing measurements and defining cellsubpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets.MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to thebaselines as well as competitor methods.Comment: Submitted to PLOS Computational Biolog

    An open-source solution for advanced imaging flow cytometry data analysis using machine learning

    Get PDF
    Imaging flow cytometry (IFC) enables the high throughput collection of morphological and spatial information from hundreds of thousands of single cells. This high content, information rich image data can in theory resolve important biological differences among complex, often heterogeneous biological samples. However, data analysis is often performed in a highly manual and subjective manner using very limited image analysis techniques in combination with conventional flow cytometry gating strategies. This approach is not scalable to the hundreds of available image-based features per cell and thus makes use of only a fraction of the spatial and morphometric information. As a result, the quality, reproducibility and rigour of results are limited by the skill, experience and ingenuity of the data analyst. Here, we describe a pipeline using open-source software that leverages the rich information in digital imagery using machine learning algorithms. Compensated and corrected raw image files (.rif) data files from an imaging flow cytometer (the proprietary .cif file format) are imported into the open-source software CellProfiler, where an image processing pipeline identifies cells and subcellular compartments allowing hundreds of morphological features to be measured. This high-dimensional data can then be analysed using cutting-edge machine learning and clustering approaches using “user-friendly” platforms such as CellProfiler Analyst. Researchers can train an automated cell classifier to recognize different cell types, cell cycle phases, drug treatment/control conditions, etc., using supervised machine learning. This workflow should enable the scientific community to leverage the full analytical power of IFC-derived data set. It will help to reveal otherwise unappreciated populations of cells based on features that may be hidden to the human eye that include subtle measured differences in label free detection channels such as bright-field and dark-field imagery
    corecore