national security vulnerability database classification based on an lda topic model

Abstract

采用隐含Dirichlet分布主题模型(latent Dirichletallocation,LDA)和支持向量机(support vector machine,SVM)相结合的方法,在主题向量空间构建一个自动漏洞分类器。以中国国家信息安全漏洞库(CNNVD)中漏洞记录为实验数据。实验表明:基于主题向量构建的分类器的分类准确度比直接使用词汇向量构建的分类器有8%的提高。国家重点科技专题“核高基”资助项目(2010ZX01036-001-002)The current vulnerabilities in China are analyzed using a dataset from the China National Vulnerability Database of Information Security (CNNVD), with a combined latent Dirichlet allocation (LDA) topic model and a support vector machine (SVM) to construct a classifier in the topic vector space. Tests show that the classifier based on topic vectors has about 8% better classification performance than that based on text vectors

    Similar works