unknown

A clustering algorithm based on the grid2density and the spatial partition tree

Abstract

摘要:  基于密度的聚类是聚类分析中的一种,其主要优点是发现任意形状的聚类和对噪音数据不敏 感. 文章提出了一种新的基于网格密度和空间划分树的CGDSPT(Clustering based on Grid - Density and Spatial Partition Tree) 聚类算法. 其创新点在于,将数据空间划分成多个体积相等的单元格,然后基于单元 格定义了密度、簇等概念,对单元格建立了一种基于空间划分的空间索引结构(空间划分树) 来对数据进 行聚类. CGDSPT算法保持了基于密度的聚类算法的上述优点,而且CGDSPT 算法具有线性的时间复杂 性,因此CGDSPT算法适合对大规模数据的挖掘. 理论分析和实验结果也证明了CGDSPT算法的优点.Abstract :  The density2based clustering algorithm is a sort of clustering analysis , its main merit is to discover arbitrary shape cluster and is insensitive to the noise data. This paper proposed a new clustering algorithm based on the grid density and the spatial partition tree CGDSPT. It is able to cluster data through dividing the data space into several unit cells. Some concepts , for example : the density , the bunch and so on , are defined on the unit cell. Then we established a spatial index structure for spatial division. The CGDSPT inherits the merit of the density2based clustering algorithm, moreover CGDSPT has the linear time2complexity , therefore it suits to the large2scale data mining. The theoretical analysis and the experimental result have also proven the merit of CGDSPT.资助项目:厦门大学“985”工程二期项目“国防信息化安全智能创新平台”资

    Similar works