10 research outputs found
Conceptualizing Cancer Drugs as Classifiers
<div><p>Cancer and healthy cells have distinct distributions of molecular properties and thus respond differently to drugs. Cancer drugs ideally kill cancer cells while limiting harm to healthy cells. However, the inherent variance among cells in both cancer and healthy cell populations increases the difficulty of selective drug action. Here we formalize a classification framework based on the idea that an ideal cancer drug should maximally discriminate between cancer and healthy cells. More specifically, this discrimination should be performed on the basis of measurable cell markers. We divide the problem into three parts which we explore with examples. First, molecular markers should discriminate cancer cells from healthy cells at the single-cell level. Second, the effects of drugs should be statistically predicted by these molecular markers. Third, drugs should be optimized for classification performance. We find that expression levels of a handful of genes suffice to discriminate well between individual cells in cancer and healthy tissue. We also find that gene expression predicts the efficacy of some cancer drugs, suggesting that these cancer drugs act as suboptimal classifiers using gene profiles. Finally, we formulate a framework that defines an optimal drug, and predicts drug cocktails that may target cancer more accurately than the individual drugs alone. Conceptualizing cancer drugs as solving a discrimination problem in the high-dimensional space of molecular markers promises to inform the design of new cancer drugs and drug cocktails.</p></div
The idea of a classifier.
<p>This illustrates how one can combine information from two cellular markers to construct a classifier that separates the two populations (cancerous and healthy cells) better than either marker alone.</p
Cancer treatment optimization.
<p>Better discrimination between cell populations is achieved by including an additional drug. The classification threshold line shown, in reality, represents a gradient related to “probability of cell death” which is indicated by shading. See text for full description.</p
Drug sensitivity heatmap as a function of the two most important genes.
<p>Drug activity is to some extent predictable using molecular markers.</p
Classification accuracy showing the improvements achieved by using an additional drug.
<p>Accuracy (AUC) achieved by both drugs together is better than either drug alone.</p
Discriminability of healthy versus cancer cells as a function of the number of genes considered.
<p>When measuring accuracy of cell classification as cancerous or healthy, one should consider both types of errors: false positives and false negatives (or more conventionally, true positives). This is illustrated by the Receiver Operating Characteristic (ROC) Curve. Lines indicate mean values, and error bars indicate bootstrapped 95% confidence intervals. Accuracy was measured using cross-validation; and chance value was determined using shuffle control.</p
Discriminability of healthy versus cancer cells as a function of the number of cells and genes measured.
<p>Classification performance was measured as area under the curve (AUC) of the ROC curve. A perfect classifier would achieve an AUC of 1, whereas a random classifier would achieve an AUC of.5. Each colored line represents a different number of cells used to train the classifier, showing that performance improves as more cells are used. Lines indicate mean values, and shaded areas indicate bootstrapped 95% confidence intervals. Accuracy was measured using cross-validation; and chance value was determined using shuffle control.</p
Progenitor crypt cells fall in a tetrahedron.
<p>(a) Enterocytes, goblet cells and nodal cells analyzed separately do not form significant polytopes. Cells are color coded by type in the tetrahedron of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.g003" target="_blank">Fig 3</a>, and each cell class is plotted in its own first 3PC. (b) Progenitor cells analyzed separately fall uniformly in a tetrahedron. The best fit tetrahedron is shown (PCHA delta = 0.5). Arrow represents direction of development according to Axin2 levels, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.s007" target="_blank">S7b Fig</a>. Also shown are projections on the principal planes, which resemble triangles or quadrangles. Archetypes and their variation upon data resampling (bootstrapping) are shown as gray ellipses. (c) Explained variance as a function of polytope order k or number of PCs D both suggest a tetrahedron (k = 4, D = 3).</p
Expression profiles of the four colon crypt archetypes are each enriched for markers of specific cell types.
<p>(a) The expression profiles of the four archetypes, with enriched genes colored. Enriched genes were determined by leave-1-out enrichment analysis, binning the cells according to distance from each archetype and seeking when average expression in the bin closest to the archetype is maximal, as described in Methods: 1D Gene enrichment at archetypes (See full enriched genes list in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.s026" target="_blank">S2 Table</a>). Light blue—enterocyte archetype, yellow—Nodal archetype, green—stem cells archetype, red—goblet cell archetype. Genes that are not enriched, or enriched in more than one archetype, are in dark blue. Zero level represents the average expression of each gene in the dataset. (b) Leave-1-out enrichment plot: expression of a gene (SLC26A3—an enterocyte marker) as a function of distance from archetype in equal mass bins of cells (Methods: 1D Gene enrichment at archetypes), line color indicates archetype. This gene is maximally enriched only at the enterocyte archetype (blue line). For enrichment plots for additional genes see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.s005" target="_blank">S5 Fig</a>. (c) A two dimensional enrichment plot of SLC26A3, in which its expression is plotted on the plane of the first 2PCs of the data, indicating expression is maximal in the cells closest to the enterocyte archetype. Contours are expression density estimated using a Gaussian kernel (Methods: 2D Gene enrichment at archetypes). Archetype positions and PCs were calculated without the tested gene.</p
Different tissues analyzed by different single-cell technologies show polytopes and tasks.
<p>(a) Human bone marrow cells analyzed by single-cell mass cytometry [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.ref013" target="_blank">13</a>,<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.ref025" target="_blank">25</a>] in which proteins are detected using mass-tagged antibodies is well described by a 4D simplex (a polytope with 5 vertices). The simplex is shown projected on the first 3PCs, for other projections see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.s012" target="_blank">S12c Fig</a>. The archetypes correspond to cell types as indicated. Cell density peaks near each archetype. (b) Mouse spleen LPS stimulated dendritic cells analyzed by single-cell RNA-Seq [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.ref003" target="_blank">3</a>] are well described by a tetrahedron. Archetypes are labeled with functions inferred from genes maximally enriched in cells near each archetype. For more details see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.s032" target="_blank">S1G</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004224#pcbi.1004224.s032" target="_blank">S1H</a> Text.</p