11 research outputs found

    Datasets tested for features significantly separating successful targets from failed targets.

    No full text
    <p>Datasets tested for features significantly separating successful targets from failed targets.</p

    Examples of successful tissue specific targets.

    No full text
    <p>Examples of successful tissue specific targets.</p

    Examples of failed tissue specific targets with plausible exceptions.

    No full text
    <p>Examples of failed tissue specific targets with plausible exceptions.</p

    Examples of successful ubiquitously expressed targets with plausible exceptions.

    No full text
    <p>Examples of successful ubiquitously expressed targets with plausible exceptions.</p

    Classifier performance statistics.

    No full text
    <p>Classifier performance statistics.</p

    Modeling pipeline.

    No full text
    <p>We trained a classifier to predict phase III clinical trial outcomes, using 5-fold cross-validation repeated 200 times to assess the stability of the classifier and estimate its generalization performance. For each fold of cross-validation, modeling began with the non-redundant features for each dataset. <b>Step 1:</b> We split the targets with phase III outcomes into training and testing sets. <b>Step 2:</b> We performed univariate feature selection using permutation tests to quantify the significance of the difference between the means of the successful and failed targets in the training examples. We controlled for target class as a confounding factor by only shuffling outcomes within target classes. We accepted features with adjusted p-values less than 0.05 after correcting for multiple hypothesis testing using the Benjamini-Yekutieli method. <b>Step 3:</b> We aggregated significant features from all datasets into a single feature matrix. <b>Step 4:</b> We performed incremental feature elimination with an inner 5-fold cross-validation loop repeated 20 times to select the type of classifier (Random Forest or logistic regression) and smallest subset of features that had cross-validation area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPR) values within 95% of maximum. <b>Step 5:</b> We refit the selected model using all the training examples and evaluated its performance on the test examples.</p

    Features significantly correlated with phase III outcome.

    No full text
    <p>Features significantly correlated with phase III outcome.</p

    Classifier performance.

    No full text
    <p><b>(A)</b> Receiver operating characteristic (ROC) curve. The solid black line indicates the median performance across 200 repetitions of 5-fold cross-validation and the gray area indicates the range of the 2.5 and 97.5 percentiles. The dotted black line indicates the performance of random rankings. <b>(B)</b> Distributions of the probability of success predicted by the classifier for the successful, failed, and unlabeled targets. <b>(C)</b> Precision-recall curve for success predictions. <b>(D)</b> Precision-recall curve for failure predictions. <b>(E)</b> Pairwise target comparisons. For each pair of targets, we computed the fraction of repetitions of cross-validation in which Target B had a higher predicted probability of success greater than Target A. The heatmap illustrates this fraction, thresholded at 0.95 or 0.99, plotted as a function of the median predicted probabilities of success of two targets. The upper left region is where the classifier is 95% (above solid black line) or 99% (above dotted blue line) consistent in predicting greater probability of success of Target B than Target A. <b>(F)</b> Relationship between features and phase III outcomes. Heat map showing the projection of the predicted success probabilities onto the two dominant features selected for the classifier: mean expression across tissues and standard deviation of expression across tissues. Red, white, and blue background colors correspond to 1, 0.5, and 0 success probabilities. Red plusses and blue crosses mark the locations of the success and failure examples. It appears the model has learned that failures tend to have high mean expression and low standard deviation of expression across tissues, while successes tend to have low mean expression and high standard deviation of expression. The success and failure examples are not well separated, indicating that we did not discover enough features to fully explain why targets succeed or fail in phase III clinical trials.</p

    Examples of failed ubiquitously expressed targets.

    No full text
    <p>Examples of failed ubiquitously expressed targets.</p

    Feature selection pipeline.

    No full text
    <p>Each dataset took the form of a matrix with genes labeling the rows and features labeling the columns. We appended the mean and standard deviation computed across all features as two additional features. <b>Step 1:</b> We filtered the columns to eliminate redundant features, replacing each group of correlated features with the group average feature, where a group was defined as features with squared pair-wise correlation coefficient r<sup>2</sup> ≥ 0.5. If the dataset mean feature was included in a group of correlated features, we replaced the group with the dataset mean. <b>Step 2:</b> We filtered the rows for targets with clinical trial outcomes of interest: targets of selective drugs approved for non-cancer indications (successes) and targets of selective drug candidates that failed in phase III clinical trials for non-cancer indications (failures). <b>Step 3:</b> We tested the significance of each feature as an indicator of success or failure using permutation tests to quantify the significance of the difference between the means of the successful and failed targets. We corrected for multiple hypothesis testing using the Benjamini-Yekutieli method to control the false discovery rate at 0.05 within each dataset. <b>Step 4:</b> We “stressed” the significant features with additional tests to assess their robustness and generalizability. For example, we used bootstrapping to estimate probabilities that the significance findings will replicate on similar sets of targets.</p
    corecore