摘要: 基于密度的聚类是聚类分析中的一种,其主要优点是发现任意形状的聚类和对噪音数据不敏
感. 文章提出了一种新的基于网格密度和空间划分树的CGDSPT(Clustering based on Grid - Density and
Spatial Partition Tree) 聚类算法. 其创新点在于,将数据空间划分成多个体积相等的单元格,然后基于单元
格定义了密度、簇等概念,对单元格建立了一种基于空间划分的空间索引结构(空间划分树) 来对数据进
行聚类. CGDSPT算法保持了基于密度的聚类算法的上述优点,而且CGDSPT 算法具有线性的时间复杂
性,因此CGDSPT算法适合对大规模数据的挖掘. 理论分析和实验结果也证明了CGDSPT算法的优点.Abstract : The density2based clustering algorithm is a sort of clustering analysis , its main merit is to discover
arbitrary shape cluster and is insensitive to the noise data. This paper proposed a new clustering algorithm based on the
grid density and the spatial partition tree CGDSPT. It is able to cluster data through dividing the data space into several
unit cells. Some concepts , for example : the density , the bunch and so on , are defined on the unit cell. Then we
established a spatial index structure for spatial division. The CGDSPT inherits the merit of the density2based clustering
algorithm, moreover CGDSPT has the linear time2complexity , therefore it suits to the large2scale data mining. The
theoretical analysis and the experimental result have also proven the merit of CGDSPT.资助项目:厦门大学“985”工程二期项目“国防信息化安全智能创新平台”资