58 research outputs found

    RESEARCH ON WEB DATA INTEGRATION TECHNOLOGY BASED ON XML

    Get PDF
    本文首先介绍了Web环境下异构数据集成技术产生的背景和研究目的以及相关的概念、技术和方法,然后针对XML作为合适的数据交换格式的特点介绍了基于XML的信息集成的关键因素,基于此,提出了一种基于XML的Web数据集成操作模型,并讨论了该模型在Web数据集成时数据交换和共享过程。最后,给出了集成构架的结构及组成。Firstly,this paper discusses the origination background and research object of technology of heterogeneous data integration under Web environment,as well as the related concepts,technologies and method,the key factors XML-based information integration are presented for the characteristics of XML as the data interchange format well.A Web data integration interoperation model of XML based is proposed according to it,and discusses the process of exchanging or sharing data under the Web data integration using this model.At last the structure and constitution of the integration framework are given

    A Method to Improve Text Clustering Algorithm Quality

    Get PDF
    摘要:针对基于VSM(vector space model) 的文本聚类算法存在的主要问题,即忽略了词之间的语义信息、忽略了各 维度之间的联系而导致文本的相似度计算不够精确,提出基于语义距离计算文档间相似度及两阶段聚类方案来提 高文本聚类算法的质量. 首先,从语义上分析文档,采用最近邻算法进行第一次聚类;其次,根据相似度权重,对类 特征词进行优胜劣汰;然后进行类合并;最后,进行第二次聚类,解决最近邻算法对输入次序敏感的问题. 实验结果 表明,提出的方法在聚类精度和召回率上均有显著的提高,较好解决了基于VSM 的文本聚类算法存在的问题.Abstract : The main problem with the text clustering algorithm based on vector space model (VSM) is that semantic information between words and the link between the various dimensions are overlooked , resulting in inaccuracy in the text similarity calculation. A method based on computing the text similar2 ity using semantic distance and two2phrase clustering is proposed to improve the text clustering algo2 rithm. First , the text analyzed according to it s semantic ,with nearest neighbor algorithm used for the first cluster. Some feature words are chosen according to the similarity weight to represent the cluster with the remaining feature words similar to the main themes of the cluster ,and then class combination is carried out . Finally , the second clustering is carried out to improve the nearest neighbor clustering which is sensitive to the input order of the document . Simulation experiment s indicate that the pro2 posed algorithm can solve these problems and performs better than the text clustering algorithm based on VSM in the clustering precision and recall rate.基金项目: 国家自然科学基金资助项目(50474033

    Materialized view selection under maintenance cost constraint

    Get PDF
    为了在一定维护代价约束条件下,使查询过程中花费的总查询成本最优化,提出了最小/最大候选集变换算法.该算法构造最大候选视图集和最小候选视图集,基于最小有效的极大基数配比技术,并通过单位维护代价内的查询收益而设计的代价计算模型来获得最佳物化视图集.理论分析和实验结果表明该算法是高效、动态、近似最优的.与以往算法相比,在数据维度大、维层次复杂的情况下,该算法有着更优的执行效率.In order to minimize the total query processing cost for a given set of queries under maintenance cost constraint,min/max candidate set transforming algorithm was proposed,in which MACVS(maximum candidate views set) and MICVS(minimum candidate views set) were constructed.The best materialized views set was gained by the cost calculation model which was designed using maximum matching method of minimizing effective and in order of query benefit per unit maintenance-cost.Both the theoretical analysis and the experimental results showed that the algorithm is efficient,flexible and approximately optimal.The algorithm is more efficient than the previous algorithm under high dimension situations.国家自然科学基金资助项目(50474033

    一种高效的基于教与学的社区发现算法

    Get PDF
    社区结构是复杂网络的重要特征,社区发现就是为了挖掘复杂网络中的社区结构.为了提高基于教与学的多目标社区发现算法(MODTLBO/D)的准确率,降低时间复杂度,提出了一种在多种群进化策略下的MODTLBO/D(EMODTLBO/D).在E-MODTLBO/D中,采用自适应学习因子加强在教学阶段的探索与搜索能力;在学习阶段,每个个体在各自的子种群内采用随机学习策略或者是改进的量子行为学习策略.在每次迭代更新后,子种群间进行信息交流,维持算法的多样性与避免早熟收敛.实验表明,E-MODTLBO/D在时间复杂度与发现高质量的社区结构方面要优于MODTLBO/D等一些经典社区发现算法.国家社会科学基金重大项目(13&ZD148

    An Improved DBSCAN Clustering Algorithm

    Get PDF
    摘要: 针对 “基于密度的带有噪声的空间聚类” (DBSCAN)算法存在的不足 ,提出 “分而治之” 和 高效的并行方法对DBSCAN 算法进行改进.通过对数据进行划分,利用 “分而治之” 思想减少全 局变量 Ep s值的影响;利用并行处理方法和降维技术提高聚类效率 ,降低 DBSCAN 算法对内存 的较高要求;采用增量式处理方式解决数据对象的增加和删除对聚类的影响.结果表明:新方法 有效地解决了DBSCAN 算法存在的问题 ,其聚类效率和聚类效果明显优于传统 DBSCAN 聚类 算法 Abstract : An improved density based spatial clustering of applications with noise (DBSCAN) algorit hm , which can considerably improve cluster quality , is proposed. The algorithm is based on two ideas : dividing and ruling , and ; high performance parallel methods. The idea of dividing and ruling was used to reduce the effect of the global variable Eps by data partition. Parallel processing methods and the technique of reducing dimensionality were used to improve the efficiency of clustering and to reduce the large memory space requirements of the DBSCAN algorithm. Finally , an incremental processing method was applied to determine t he influence on clustering of inserting or deleting data objects. The results show that an implementation of the new met hod solves existing problems treated by the DBSCAN algorithm : Both the efficiency and the cluster quality are better than for the original DBSCAN algorithm.基金项目: 福建省自然科学基金项目(A0310008) ; 福建省高新技术研究开放计划重点项目(2003H043

    Distributed Clustering Algorithm Based on Centers and Density

    Get PDF
    针对分布式聚类算法dbdC存在的不足,提出一种基于中心点及密度的分布式聚类算法dCuCd。将数据分布计算出的虚拟点作为核心对象,核心对象的代表性随算法的执行次数提高,聚类即是对所有核心对象分类的过程。理论分析和实验结果表明,该算法能有效处理噪声和分布不规则的数据点,时间效率和聚类质量较好。In order to overcome the shortcomings of the DBDC,a distributed clustering based on centers and density which called DCUCD is proposed.It works based on the centers and the density.The virtual core objects are generated from the distributed data and the quality is better if the algorithm runs more times.Clustering is the same as the process to classify all of the core objects.Theoretical analysis and experimental results testify that DCUCD can effectively deal with the problem of local noise,and discover clusters of arbitrary shape.It can generate high quality clusters and cost a little time.国家自然科学基金资助项目(50604012

    Research and application of DBSCAN clustering algorithm based on density

    Get PDF
    摘要:首先对 D BSCA N(D ensity Based Spatial Clustering of A pplications with N oise)聚类算法进行了深入研究,分析了它的特 点、存在的问题及改进思想,提出了基于 D BSCA N 方法的交通事故多发点段的排查方法及其改进思路,并且给出了实例以说明处理过程及可行性。实验结果表明本文提出的方法可以大大提高交通事故黑点排查效率。A bstract:This paper first researches D BSCA N clustering algorithm,and analyzes characteristics and existing problem s of the D B- SCA N algorithm and im proved idea.Evaluation m ethod of the traffic accident black spots and an im proved thought based on D B- SCA N are proposed.In order to illum inate course of processing and feasibility,an exam ple is presented.The experim ental result dem onstrates that this paper m ethod can greatly enhance the working efficiency of evaluation of the traffic accident black spots.金项目:福建省自然科学基金(the N atural Science Foundation of Fujian Province of China under G rant N o.A 0310008);福建省高新技术研究开 放计划重点项目(N o.2003H 043)

    Effective increment algorithm for attribute reduction

    Get PDF
    针对粗糙集中求属性核和属性约简存在的问题,首先给出了改进的差别矩阵定义,进而提出一种基于改进差别矩阵的核增量式更新算法,用于解决对象动态增加情况下核的更新问题;同时,为了降低现有增量式属性约简算法的时间、空间复杂度,提出一种不存储差别矩阵的高效属性约简算法,用于处理对象动态增加情况下属性约简的更新问题.理论分析及实验结果均表明了所提出算法的有效性和可行性.Aiming at some shortcomings of existing on computing attribute core and attribute reduction in rough sets,an improved discernibility matrix definition is introduced.By using this foundation,based on improved discernibility matrix,an incremental updating algorithm for computing core is proposed,which is mainly used to solve core updating when objects are dynamically increased and deleted.In order to decrease time and space complexity on the existence incremental attribute reduction algorithm,an effective algorithm for attribute reduction is proposed,which does not storage discernibility matrix.This algorithm is mainly used to process attribute reduction updating when objects are dynamically increased.Theoretical analysis and experimental results show the feasibility and effectiveness of the proposed algorithm.国家自然科学基金项目(50604012

    An Improved DBSCAN Clustering Algorithm

    Get PDF
    针对"基于密度的带有噪声的空间聚类"(DBSCAN)算法存在的不足,提出"分而治之"和高效的并行方法对DBSCAN算法进行改进.通过对数据进行划分,利用"分而治之"思想减少全局变量Eps值的影响;利用并行处理方法和降维技术提高聚类效率,降低DBSCAN算法对内存的较高要求;采用增量式处理方式解决数据对象的增加和删除对聚类的影响.结果表明:新方法有效地解决了DBSCAN算法存在的问题,其聚类效率和聚类效果明显优于传统DBSCAN聚类算法.An improved density based spatial clustering of applications with noise(DBSCAN) algorithm,which can considerably improve cluster quality,is proposed.The algorithm is based on two ideas: dividing and ruling,and;high performance parallel methods.The idea of dividing and ruling was used to reduce the effect of the global variable Eps by data partition.Parallel processing methods and the technique of reducing dimensionality were used to improve the efficiency of clustering and to reduce the large memory space requirements of the DBSCAN algorithm.Finally,an incremental processing method was applied to determine the influence on clustering of inserting or deleting data objects.The results show that an implementation of the new method solves existing problems treated by the DBSCAN algorithm: Both the efficiency and the cluster quality are better than for the original DBSCAN algorithm.福建省自然科学基金项目(A0310008);; 福建省高新技术研究开放计划重点项目(2003H043

    Dynamic Materialized View Algorithm Based on Rough Set Clustering

    Get PDF
    摘 要:根据用户查询多样性的特点,提出了基于粗糙集聚类的物化视图的动态调整算法(RSCDMV)。该算法在对物化视图进行粗糙集聚 类的基础上进行动态调整,这不仅满足了用户查询多样性需求,而且兼顾了维的层次关系因素。实验结果证明,随着用户查询集合的增大, 查询集的动态性和多样性更加明显,因此,RSCDMV 算法更具有优势。 【 Abstract 】 Because of user’s various inquires, a new algorithm, named rough set clustering-based dynamic materialized view algorithm(RSCDMV) is presented. Based on rough set clustering on materialized view, the algorithm can execute dynamic adjustment which both satisfies the variety of the queries and take the hierarchy of dimension into consideration. Experimental results show, as the queries set increase, RSCDMV will show more advantages as inquires change.基金项目:福建省自然科学基金资助项目(A0310008);福建省高新 技术研究开放计划基金资助项目(2003H043
    corecore