This short empirical paper investigates how well topic modeling and database
meta-data characteristics can classify web and other proof-of-concept (PoC)
exploits for publicly disclosed software vulnerabilities. By using a dataset
comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is
obtained in the empirical experiment. Text mining and topic modeling are a
significant boost factor behind this classification performance. In addition to
these empirical results, the paper contributes to the research tradition of
enhancing software vulnerability information with text mining, providing also a
few scholarly observations about the potential for semi-automatic
classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and
Expert Systems Applications (DEXA).
http://ieeexplore.ieee.org/abstract/document/8049693