3 research outputs found
Immunophenotypes of Acute Myeloid Leukemia From Flow Cytometry Data Using Templates
Motivation: We investigate whether a template-based classification pipeline
could be used to identify immunophenotypes in (and thereby classify) a
heterogeneous disease with many subtypes. The disease we consider here is Acute
Myeloid Leukemia, which is heterogeneous at the morphologic, cytogenetic and
molecular levels, with several known subtypes. The prognosis and treatment for
AML depends on the subtype.
Results: We apply flowMatch, an algorithmic pipeline for flow cytometry data
created in earlier work, to compute templates succinctly summarizing classes of
AML and healthy samples. We develop a scoring function that accounts for
features of the AML data such as heterogeneity to identify immunophenotypes
corresponding to various AML subtypes, including APL. All of the AML samples in
the test set are classified correctly with high confidence.
Availability: flowMatch is available at
www.bioconductor.org/packages/devel/bioc/html/flowMatch.html; programs specific
to immunophenotyping AML are at www.cs.purdue.edu/homes/aazad/software.html.Comment: 9 pages, 5 figure
Supervised Classification of Flow Cytometric Samples via the Joint Clustering and Matching (JCM) Procedure
We consider the use of the Joint Clustering and Matching (JCM) procedure for
the supervised classification of a flow cytometric sample with respect to a
number of predefined classes of such samples. The JCM procedure has been
proposed as a method for the unsupervised classification of cells within a
sample into a number of clusters and in the case of multiple samples, the
matching of these clusters across the samples. The two tasks of clustering and
matching of the clusters are performed simultaneously within the JCM framework.
In this paper, we consider the case where there is a number of distinct classes
of samples whose class of origin is known, and the problem is to classify a new
sample of unknown class of origin to one of these predefined classes. For
example, the different classes might correspond to the types of a particular
disease or to the various health outcomes of a patient subsequent to a course
of treatment. We show and demonstrate on some real datasets how the JCM
procedure can be used to carry out this supervised classification task. A
mixture distribution is used to model the distribution of the expressions of a
fixed set of markers for each cell in a sample with the components in the
mixture model corresponding to the various populations of cells in the
composition of the sample. For each class of samples, a class template is
formed by the adoption of random-effects terms to model the inter-sample
variation within a class. The classification of a new unclassified sample is
undertaken by assigning the unclassified sample to the class that minimizes the
Kullback-Leibler distance between its fitted mixture density and each class
density provided by the class templates
An Algorithmic Pipeline for Analyzing Multi-parametric Flow Cytometry Data
Flow cytometry (FC) is a single-cell profiling platform for measuring the
phenotypes of individual cells from millions of cells in biological samples. FC
employs high-throughput technologies and generates high-dimensional data, and
hence algorithms for analyzing the data represent a bottleneck. This
dissertation addresses several computational challenges arising in modern
cytometry while mining information from high-dimensional and high-content
biological data. A collection of combinatorial and statistical algorithms for
locating, matching, prototyping, and classifying cellular populations from
multi-parametric FC data is developed.
The algorithmic pipeline, flowMatch, developed in this dissertation consists
of five well-defined algorithmic modules to (1) transform data to stabilize
within-population variance, (2) identify cell populations by robust clustering
algorithms, (3) register cell populations across samples, (4) encapsulate a
class of samples with templates, and (5) classify samples based on their
similarity with the templates. Components of flowMatch can work independently
or collaborate with each other to perform the complete data analysis. flowMatch
is made available as an open-source R package in Bioconductor.
We have employed flowMatch for classifying leukemia samples, evaluating the
phosphorylation effects on T cells, classifying healthy immune profiles, and
classifying the vaccination status of HIV patients. In these analyses, the
pipeline is able to reach biologically meaningful conclusions quickly and
efficiently with the automated algorithms. The algorithms included in flowMatch
can also be applied to problems outside of flow cytometry such as in microarray
data analysis and image recognition. Therefore, this dissertation contributes
to the solution of fundamental problems in computational cytometry and related
domains.Comment: PhD dissertation, May 2014, Purdue Universit