In pharmaceutical research, assessing drug candidates’ odds of success as they move through clinical
research often relies on crude methods based on historical data. However, the rapid progress of
machine learning offers a new tool to identify the more promising projects. To evaluate its usefulness,
we trained and validated several machine learning algorithms on a large database of projects. Using
various project descriptors as input data we were able to predict the clinical success and failure rates
of projects with an average balanced accuracy of 83% to 89%, which compares favorably with the 56%
to 70% balanced accuracy of the method based on historical data. We also identified the variables that
contributed most to trial success and used the algorithm to predict the success (or failure) of assets
currently in the industry pipeline. We conclude by discussing how pharmaceutical companies can use
such model to improve the quantity and quality of their new drugs, and how the broad adoption of
this technology could reduce the industry’s risk profile with important consequences for industry
structure, R&D investment, and the cost of innovation