1 research outputs found
Improving Vulnerability Inspection Efficiency Using Active Learning
Software engineers can find vulnerabilities with less effort if they are
directed towards code that might contain more vulnerabilities. HARMLESS is an
incremental support vector machine tool that builds a vulnerability prediction
model from the sourcecode inspected to date, then suggests what source code
files should be inspected next. In this way, HARMLESS can reduce the time and
effort required to achieve some desired level of recall for finding
vulnerabilities. The tool also provides feedback on when to stop (at that
desired level of recall) while at the same time, correcting human errors by
double-checking suspicious files.
This paper evaluates HARMLESS on Mozilla Firefox vulnerability data. HARMLESS
found 80, 90, 95, 99% of the vulnerabilities by inspecting 10, 16, 20, 34% of
the source code files. When targeting 90, 95, 99% recall, HARMLESS could stop
after inspecting 23, 30, 47% of the source code files. Even when human
reviewers fail to identify half of the vulnerabilities (50% false negative
rate), HARMLESScould detect 96% of the missing vulnerabilities by
double-checking half of the inspected files.
Our results serve to highlight the very steep cost of protecting software
from vulnerabilities (in our case study that cost is, for example, the human
effort of inspecting 28,75020% = 5,750 source code files to identify
95% of the vulnerabilities). While this result could benefit the
mission-critical projects where human resources are available for inspecting
thousands of source code files, the research challenge for future work is how
to further reduce that cost. The conclusion of this paper discusses various
ways that goal might be achieved.Comment: 17+1 pages, 4 figures, 7 tables. Accepted by IEEE Transactions on
Software Engineerin