Each year, thousands of software vulnerabilities are discovered and reported
to the public. Unpatched known vulnerabilities are a significant security risk.
It is imperative that software vendors quickly provide patches once
vulnerabilities are known and users quickly install those patches as soon as
they are available. However, most vulnerabilities are never actually exploited.
Since writing, testing, and installing software patches can involve
considerable resources, it would be desirable to prioritize the remediation of
vulnerabilities that are likely to be exploited. Several published research
studies have reported moderate success in applying machine learning techniques
to the task of predicting whether a vulnerability will be exploited. These
approaches typically use features derived from vulnerability databases (such as
the summary text describing the vulnerability) or social media posts that
mention the vulnerability by name. However, these prior studies share multiple
methodological shortcomings that inflate predictive power of these approaches.
We replicate key portions of the prior work, compare their approaches, and show
how selection of training and test data critically affect the estimated
performance of predictive models. The results of this study point to important
methodological considerations that should be taken into account so that results
reflect real-world utility