22 research outputs found

    A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

    Get PDF
    <div><p>Background</p><p>Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results.</p><p>Methods</p><p>To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor.</p><p>Results</p><p>This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status.</p><p>Conclusions</p><p>Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.</p></div

    Basic machine learning framework.

    No full text
    <p>The bottom portion of this figures shows that a “Classifier” takes as input a description of a novel instance (here, the 27688 gene expression values from a microarray taken from a patient's biopsy), and returns a prediction for this instance (here, its prediction of whether this tumor is ER+ or ER−). The figure suggests this response is “No”. The Machine Learning challenge is to produce this classifier from a dataset of historical data (called labeled “Training Data”); this is the vertical portion, showing that a Learner uses that Training Data to produce the classifier. When evaluating the quality of a learned classifier, we require that the “Novel Instance” is not in the Training Data.</p

    Average accuracy of SVM, as a function of number of features.

    No full text
    <p>For each r = 1,2,…,18, line 3 of FS_SVM (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone-0082144-g002" target="_blank">Figure 2</a>) computes the mean <i>a<sub>r</sub></i> and standard deviation <i>σ<sub>r</sub></i> of the empirical accuracies obtained, over all 10 folds; this figure plots these bars, for each r. Notice the average accuracy on the hold-out sets increases as the number of features is increased, then levels out, with only minor fluctuations. Here, the largest accuracy occurs at r = 4; notice however that this accuracy is “essentially” the same as at r = 3. We therefore set r<sup>*</sup> = 3 as it is the smallest number of features whose accuracy's “mean + standard deviation” is at least the high-water-mark mean accuracy.</p

    Top 10 genes, sorted by mutual information related to ER-status, based on the E176-cohort.

    No full text
    <p>This table also provides the SVM coefficient, the index over the E23-cohort (see text), and a short description of the gene.</p

    The Eq3 Classifier Predicts ER-Status with High Accuracy.

    No full text
    <p>The individual patient <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e013" target="_blank">Eq3</a> values from the combined E176 and E23 datasets are sorted in descending order. The black triangular peaks mark patients classified as ER+ or ER- from IHC but the opposite from the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e013" target="_blank">Eq3</a> classifier, and the number of patients within each peak is labeled above. a) Histogram of the above sorted <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e013" target="_blank">Eq3</a> values, showing the percentage of IHC-determined ER+ patients, in each 10-patient bin.</p

    Kaplan-Meier Survival and Recurrence-Free Survival Curves For Patients Sorted by IHC-Determined ER-Status and Eq3 Predicted ER-Status.

    No full text
    <p>Both the survival and recurrence-free survival curves had greater separation and lower hazard ratios (HR) when the patients were sorted by <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e013" target="_blank">Eq3</a> ER-status compared with traditional IHC. a) Survival curves for patients split based on IHC ER-status (ER+ n = 126, median survival = 3807days; ER- n = 72, median survival = 2704days; HR = 0.5090; 95% CI = 0.2968–0.8731). b) Survival curves for patients split based on <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e013" target="_blank">Eq3</a> ER-status (ER+ n = 123, median survival = 3807days; ER- n = 75, median survival = 1623days; HR = 0.3901; 95% CI = 0.2420–0.6935). c) Recurrence-free survival curves for patients split based on IHC ER-status (ER+ n = 126, median recurrence-free survival = 1694days; ER- n = 72, median recurrence-free survival = 1246days; HR = 0.7160; 95% CI = 0.4623–1.109).d) Recurrence-free survival curves for patients split based on <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e013" target="_blank">Eq3</a> ER-status (ER+ n = 123, median recurrence-free survival = 1820days; ER- n = 75, median recurrence-free survival = 875days; HR = 0.5731; 95% CI = 0.3718–0.8833).</p

    FS_SVM; a feature selection version of the Support Vector Machine (SVM) learner.

    No full text
    <p>Line 6 runs SVM on the dataset S, but uses only the <i>r<sup>*</sup></i> “best” features, where features are ranked by their mRMR score<sup>15</sup>, which is computed in Line 5. Note this mRMR score combines mutual information (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.e006" target="_blank">Eq 1</a>) with minimum redundancy. The goal of the first 4 lines is to compute this <i>r<sup>*</sup></i> value: Here, we first partition the dataset into 10 disjoint same-sized subsets {S<sub>i</sub>, <sub>i = 1…10</sub>}, which are balanced (ie, each is of the same size, and has about the same number of ER+ instances). FS_SVM then considers each of these S<sub>i</sub> subsets, one by one. It first considers the remaining instances, S<sub>−i</sub>  =  S − S<sub>i</sub>, and computes the mRMR score for each feature with respect to this subset of instances. It then evaluates how well SVM does when using only the first r = 1, 2,… of these features, in order. Here, it runs SVM, using that size-r subset of features, on the training set S<sub>−i</sub>, then evaluates the resulting classifier on the remaining “testing subset” S<sub>i</sub>. Line 4 sets r<sup>*</sup> to be the smallest value that is within 1 standard deviation of the high-water mark. See <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0082144#pone.0082144.s001" target="_blank">Material S1</a> for more details.</p

    Accuracy for our 3-feature classifier, over various datasets.

    No full text
    <p>*#ER+/ER−: Number of patients that were estrogen receptor positive/negative from gold standard IHC analysis.</p

    Additional file 2: Table S1. of Next generation sequencing profiling identifies miR-574-3p and miR-660-5p as potential novel prognostic markers for breast cancer

    No full text
    Differentially expressed miRNAs. Description: miRNAs were profiled from apparently healthy normal (n = 11) reduction mammoplasty breast tissues and breast tumor tissues (n = 104). RNAs were filtered for low read counts (minimum 10 read counts in at least 90 % of the samples). Following batch effects correction, sample outlier removal and RPKM normalization, differentially expressed RNAs with fold change > 2.0 and a false discovery rate (FDR) < 0.05 were identified. (PDF 63 kb

    Nef alleles of SIV<sub>smm/mac</sub>, HIV-2 and HIV-1 counteract rhesus macaque and sooty mangabey tetherin, but not human tetherin.

    No full text
    <p>Nef alleles of SIV<sub>smm/mac</sub>, HIV-2 and HIV-1 were tested for the ability to rescue particle release for SIV <i>Δnef</i> in the presence of human tetherin (hBST2), rhesus macaque tetherin (rBST2) and sooty mangabey tetherin (sBST2). (A) The amino acid sequences corresponding to the cytoplasmic domains of hBST2, rBST2 and sBST2 are shown. Dashes represent sequence gaps and residues that differ from rBST2 are indicated in red. The mean and standard deviation (error bars) for total p27 release (B) and for percent maximal release (C) are shown for the indicated Nef alleles of SIV<sub>smm/mac</sub>, HIV-2 and HIV-1 in the presence of hBST2, rBST2 and sBST2. (D) Protein expression was confirmed for SIV p55 Gag, BST2, HIV-1 Vpu and for each of the Nef alleles by western blot analysis of 293T cell lysates. The Nef proteins of SIV<sub>mac</sub>239 and SIV<sub>smm</sub> (FYr1 and FWr1) were detected using plasma pooled from SIV-infected rhesus macaques and SIV-infected sooty mangabeys respectively. The Nef proteins of HIV-2 ROD10, ROD14, CBL-23 and 60415K were detected using plasma pooled from HIV-2-infected individuals. The Nef proteins of HIV-1 NL4-3 and NA7 were detected using polyclonal rabbit antisera. SIV p55 Gag, BST2 and β-actin were detected with the monoclonal antibodies 183-H12-5C, HM1.24 and C4. Following incubation with an appropriate HRP-conjugated secondary antibody, the blots were developed in chemiluminescent substrate and visualized using a Fujifilm Image Reader LAS 3000.</p
    corecore