Search CORE

3 research outputs found

Immunophenotypes of Acute Myeloid Leukemia From Flow Cytometry Data Using Templates

Author: Azad Ariful
Pothen Alex
Rajwa Bartek
Publication venue
Publication date: 21/03/2014
Field of study

Motivation: We investigate whether a template-based classification pipeline could be used to identify immunophenotypes in (and thereby classify) a heterogeneous disease with many subtypes. The disease we consider here is Acute Myeloid Leukemia, which is heterogeneous at the morphologic, cytogenetic and molecular levels, with several known subtypes. The prognosis and treatment for AML depends on the subtype. Results: We apply flowMatch, an algorithmic pipeline for flow cytometry data created in earlier work, to compute templates succinctly summarizing classes of AML and healthy samples. We develop a scoring function that accounts for features of the AML data such as heterogeneity to identify immunophenotypes corresponding to various AML subtypes, including APL. All of the AML samples in the test set are classified correctly with high confidence. Availability: flowMatch is available at www.bioconductor.org/packages/devel/bioc/html/flowMatch.html; programs specific to immunophenotyping AML are at www.cs.purdue.edu/homes/aazad/software.html.Comment: 9 pages, 5 figure

arXiv.org e-Print Archive

Supervised Classification of Flow Cytometric Samples via the Joint Clustering and Matching (JCM) Procedure

Author: Lee Sharon X.
McLachlan Geoffrey J.
Pyne Saumyadipta
Publication venue
Publication date: 11/11/2014
Field of study

We consider the use of the Joint Clustering and Matching (JCM) procedure for the supervised classification of a flow cytometric sample with respect to a number of predefined classes of such samples. The JCM procedure has been proposed as a method for the unsupervised classification of cells within a sample into a number of clusters and in the case of multiple samples, the matching of these clusters across the samples. The two tasks of clustering and matching of the clusters are performed simultaneously within the JCM framework. In this paper, we consider the case where there is a number of distinct classes of samples whose class of origin is known, and the problem is to classify a new sample of unknown class of origin to one of these predefined classes. For example, the different classes might correspond to the types of a particular disease or to the various health outcomes of a patient subsequent to a course of treatment. We show and demonstrate on some real datasets how the JCM procedure can be used to carry out this supervised classification task. A mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample with the components in the mixture model corresponding to the various populations of cells in the composition of the sample. For each class of samples, a class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The classification of a new unclassified sample is undertaken by assigning the unclassified sample to the class that minimizes the Kullback-Leibler distance between its fitted mixture density and each class density provided by the class templates

arXiv.org e-Print Archive

An Algorithmic Pipeline for Analyzing Multi-parametric Flow Cytometry Data

Author: Azad Ariful
Publication venue
Publication date: 14/01/2015
Field of study

Flow cytometry (FC) is a single-cell profiling platform for measuring the phenotypes of individual cells from millions of cells in biological samples. FC employs high-throughput technologies and generates high-dimensional data, and hence algorithms for analyzing the data represent a bottleneck. This dissertation addresses several computational challenges arising in modern cytometry while mining information from high-dimensional and high-content biological data. A collection of combinatorial and statistical algorithms for locating, matching, prototyping, and classifying cellular populations from multi-parametric FC data is developed. The algorithmic pipeline, flowMatch, developed in this dissertation consists of five well-defined algorithmic modules to (1) transform data to stabilize within-population variance, (2) identify cell populations by robust clustering algorithms, (3) register cell populations across samples, (4) encapsulate a class of samples with templates, and (5) classify samples based on their similarity with the templates. Components of flowMatch can work independently or collaborate with each other to perform the complete data analysis. flowMatch is made available as an open-source R package in Bioconductor. We have employed flowMatch for classifying leukemia samples, evaluating the phosphorylation effects on T cells, classifying healthy immune profiles, and classifying the vaccination status of HIV patients. In these analyses, the pipeline is able to reach biologically meaningful conclusions quickly and efficiently with the automated algorithms. The algorithms included in flowMatch can also be applied to problems outside of flow cytometry such as in microarray data analysis and image recognition. Therefore, this dissertation contributes to the solution of fundamental problems in computational cytometry and related domains.Comment: PhD dissertation, May 2014, Purdue Universit

arXiv.org e-Print Archive