An Improved Weighted Naive Bayes Classification Algorithm Using Feature Correlation

Abstract

朴素贝叶斯分类算法的特征项间强独立性的假设在现实中是很难满足的.为了在一定程度上放松这一假设,提出了基于特征相关的改进加权朴素贝叶斯分类算法,该算法采用一种新的权重计算方法,这种权重计算方法是在传统词频-反文档频率(Tf-Idf)权重计算基础上,考虑到特征项在类内和类间的分布情况,另外还结合特征项间的相关度,调整权重计算值,加大最能代表所属类的特征项的权重,将它称之为Tf-Idf-fC权重计算.与基于传统Tf-Idf权重的加权朴素贝叶斯分类算法和其他常用加权朴素贝叶斯分类算法比较,如基于属性加权的朴素贝叶斯分类算法,这种算法的分类效果均有一定的提高.The strong independence condition between the feature required by naive Bayes classification algorithm is very difficult to realize in reality.This paper puts forward an improved weighted naive naive Bayes classification algorithm using feature correlation to loose this condition to some extent,this algorithm adopts a new weighting method called TF-IDF-FC weight calculation,it takes into account the feature distribution within and between class based on the traditional TF-IDF weight calculation method and adjusts feature weight in combination with feature correlation in order to make the weight of the feature which can represent its class mostly.Compared with weighted naive Bayes classification based on the traditional TF-IDF weight and other commonly used weighted naive Bayes classification algorithms,such as attribute weighted naive Bayes classification,this algorithm improve the performance of classification to a certain extent

    Similar works