Search CORE

56 research outputs found

Heuristic Chinese sentence compression algorithm based on hot word

Author: 张东站
韩静
Publication venue
Publication date: 15/02/2014
Field of study

传统的句子压缩方法多基于难以获得的“原句-压缩句“对齐语料库,因此提出了不依赖于对齐语料库的中文句子压缩算法。通过研究人工压缩结果并结合语言学知识,提出了词语层面和分句层面的两组压缩规则。算法在原句句法分析树和词语间依赖关系的基础上,使用两组规则进行压缩,同时为了保证压缩算法具有更强的适应性和准确性,引入词语的热度加强了压缩算法,最后通过句子整理和语法修复得到最终的压缩句。对比了人工压缩、只使用规则压缩和引入词语热度压缩三种压缩方法。实验结果表明,基于热度的启发式中文句子压缩算法可以在压缩比、语法性、信息量都损失较少的情况下,提高压缩句的热度。Since the parallel sentence/compression corpora which most of the traditional methods based on are not easy to obtain, a linguistically-motivated heuristics Chinese sentence compression algorithm is proposed after studying traditional methods.By analyzing the human-produced compression and linguistic knowledge, two sets of rules are proposed, one is in word layer and the other is in clause layer.Two sets of rules based on the parse tree and the words dependence are used to compress sentence, and enhance the algorithm by hot word in order to keep the algorithm flexibility and accuracy.In the last step the compression result is cleaned and repaired.Human-produced compression, rule-only algorithm and hot word enhanced algorithm are compared then the results are evaluated in compression rate, grammaticality, informativeness and heat.The experimental results show that heuristic Chinese sentence compression algorithm based on hot word can improve the heat of compression results without much loss in compression rate, grammaticality and informativeness.国家自然科学基金(No.50604012

Xiamen University Institutional Repository

Effective increment algorithm for attribute reduction

Author: 冯少荣
张东站
Publication venue
Publication date: 01/01/2011
Field of study

针对粗糙集中求属性核和属性约简存在的问题,首先给出了改进的差别矩阵定义,进而提出一种基于改进差别矩阵的核增量式更新算法,用于解决对象动态增加情况下核的更新问题;同时,为了降低现有增量式属性约简算法的时间、空间复杂度,提出一种不存储差别矩阵的高效属性约简算法,用于处理对象动态增加情况下属性约简的更新问题.理论分析及实验结果均表明了所提出算法的有效性和可行性.Aiming at some shortcomings of existing on computing attribute core and attribute reduction in rough sets,an improved discernibility matrix definition is introduced.By using this foundation,based on improved discernibility matrix,an incremental updating algorithm for computing core is proposed,which is mainly used to solve core updating when objects are dynamically increased and deleted.In order to decrease time and space complexity on the existence incremental attribute reduction algorithm,an effective algorithm for attribute reduction is proposed,which does not storage discernibility matrix.This algorithm is mainly used to process attribute reduction updating when objects are dynamically increased.Theoretical analysis and experimental results show the feasibility and effectiveness of the proposed algorithm.国家自然科学基金项目(50604012

Xiamen University Institutional Repository

Distributed Clustering Algorithm Based on Centers and Density

Author: 冯少荣
张东站
Publication venue
Publication date: 01/01/2010
Field of study

针对分布式聚类算法dbdC存在的不足,提出一种基于中心点及密度的分布式聚类算法dCuCd。将数据分布计算出的虚拟点作为核心对象,核心对象的代表性随算法的执行次数提高,聚类即是对所有核心对象分类的过程。理论分析和实验结果表明,该算法能有效处理噪声和分布不规则的数据点,时间效率和聚类质量较好。In order to overcome the shortcomings of the DBDC,a distributed clustering based on centers and density which called DCUCD is proposed.It works based on the centers and the density.The virtual core objects are generated from the distributed data and the quality is better if the algorithm runs more times.Clustering is the same as the process to classify all of the core objects.Theoretical analysis and experimental results testify that DCUCD can effectively deal with the problem of local noise,and discover clusters of arbitrary shape.It can generate high quality clusters and cost a little time.国家自然科学基金资助项目(50604012

Xiamen University Institutional Repository

Sentence similarity computing based on relation vector model

Author: 张东站
殷耀明
Publication venue
Publication date: 22/08/2013
Field of study

句子相似度的计算在自然语言处理的各个领域占有很重要的地位,一些传统的计算方法只考虑句子的词形、句长、词序等表面信息,并没有考虑句子更深层次的语义信息,另一些考虑句子语义的方法在实用性上的表现不太理想。在空间向量模型的基础上提出了一种同时考虑句子结构和语义信息的关系向量模型,这种模型考虑了组成句子的关键词之间的搭配关系和关键词的同义信息,这些信息反应了句子的局部结构成分以及各局部之间的关联关系,因此更能体现句子的结构和语义信息。以关系向量模型为核心,提出了基于关系向量模型的句子相似度计算方法。同时将该算法应用到网络热点新闻自动摘要生成算法中,排除文摘中意思相近的句子从而避免文摘的冗余。实验结果表明,在考虑网络新闻中的句子相似度时,与考虑词序与语义的算法相比,关系向量模型算法不但提高了句子相似度计算的准确率,计算的时间复杂度也得到了降低。Sentence similarity computation is very important in all fields of natural language process.Some of the traditional algorithms only compare sentences based on their surface form such as same words, sentence length, word order and do not consider the sentence deep-level semantic information, some methods considered the sentence semantics get an unsatisfactory performance on the algorithm practicality.Therefore, a relation vector model which taking into account the relationship of sentence structure and semantic information based on space vector model is presented, this model is composed of a mix between the key words of the sentence and the key words synonymous information, which reflects local structural component of the sentence as well as the correlation between the local structure and therefore better reflects the structure and semantics of the sentence.An algorithm of sentence similarity based on relation vector model is put forward.The algorithm is applied to the network news summary generation algorithm in order to avoid redundancy.The experimental results show that, compared with the algorithm which considers the word order and semantic, relation vector model algorithm not only improves the accuracy of sentence similarity calculation, the time complexity of calculation is also reduced

Xiamen University Institutional Repository

Two New Efficient Algorithms to User Access Prediction

Author: 冯少荣
张东站
Publication venue
Publication date: 01/01/2010
Field of study

针对基于WEb日志挖掘的用户访问预测经典算法的不足,提出了基于MArkOV链和关联规则的预测算法(MAPA).使用二阶MArkOV链找到用户下一步或将来可能访问的页面集,生成预测候选集;使用二项关联规则从正向和反向2个角度修正MArkOV的预测结果,从而生成最后的预测页面.通过引入用户反馈机制,提出了带反馈的MArkOV预测算法(MPAf),即在预测过程中逐步构造历史预测树,把历史预测信息保存到历史预测树中,并根据用户的反馈来判断预测的正确性.在预测过程中,用二阶MArkOV预测算法生成预测候选集,再利用历史预测信息动态地调整预测算法,从而生成预测页面.理论分析证明,这2种预测算法具有线性时间复杂度的预测效率.实验结果表明,MAPA和MPAf在预测准确率上平均提高5%和10%.A Markov chain and association rule prediction algorithm (MAPA) is proposed to deal with shortcomings of existing algorithms on user access prediction based on web log mining.The algorithm uses the second-order Markov chain to find the pages which users may visit in either the next step or future,so as to generate the candidate prediction page set.Then the two-item association rules are used to correct the prediction result from the forward and the reverse perspectives to get the last prediction page.The algorithm integrates the advantages of both the Markov chain and the association rule well.A Markov prediction algorithm with feedback (MPAF) is proposed by introducing user feedback mechanism.The algorithm creates a history prediction tree (HPT) step by step during the prediction process,saves the history prediction information into HPT,and determines whether the prediction is correct according the user's feedback.The algorithm generates the candidate prediction page set using the second order Markov prediction algorithm at first,and then the last prediction page is generated by dynamically adjusting the prediction algorithm according the historical prediction information.Theoretical analyses show that both the prediction algorithms have linear time complexity.Experimental results show that the average prediction accuracy of MAPA and MPAF is increased by 5% and 10%,respectively.国家自然科学基金资助项目(50604012

Xiamen University Institutional Repository

Research on XETL Process Based on Pattern Tree

Author: 张东站
郭有限
Publication venue
Publication date: 01/01/2009
Field of study

XMl数据与传统的关系型数据存在的差异,使得传统数据仓库的ETl方法已经不适用于XMl数据,而目前也没有专门的、有效的适用于XMl数据的ETl方法。针对这一问题,提出基于模式树的XMl转换处理过程——XETl。从数据模型和谓词模式研究XETl模型,基于XETl模型定义ETl过程中属性选择、空置处理、聚合以及属性重命名4类主要的转换处理操作。Because of the existing differences between XML data and the traditional relational data,the traditional method of data warehouse ETL is no longer suitable for dealing with XML data.This paper proposes the XETL method which is based on pattern tree and can be applied to transfer and deal with XML data.This paper starts with the research on XETL pattern based on data model and predicate model,and defines the four main transference operations in the XETL process based on the XETL model,,which are attribute selection,null attribute operation,aggregation and attribute renamed.国家自然科学基金资助项目(50604012

Xiamen University Institutional Repository

Increment algorithm for attribute reduction based on improvement of discernibility matrix

Author: 冯少荣
张东站
Publication venue
Publication date: 01/01/2012
Field of study

研究目前粗糙集中求属性核和属性约简存在的效率低下问题,提出基于改进差别矩阵的核增量式更新算法,用于解决对象动态增加情况下核的更新问题.为降低现有增量式属性约简算法的时间和空间复杂度,提出一种不存储差别矩阵的高效属性约简算法,用于处理对象动态增加情况下属性约简的更新问题.理论及实验结果表明,该算法可明显降低时间和空间的复杂度.An incremental updating algorithm for computing core based on an improved discernibility matrix definition is proposed to improve the efficiency of computing attribute core and attribute reduction in rough sets.This new algorithm is mainly used to solve core updating when objects are dynamically increased.The purpose of this said algorithm is to decrease the complexity of time and space on the existing incremental attribute reduction algorithm.The discernibility matrix is not necessarry to be stored and therefore the attribute reduction is updated when objects are dynamically increased.Theoretical analysis and experimental results have shown that this new algorithm is feasible and effective.国家自然科学基金资助项目(50604012)---

Xiamen University Institutional Repository

Research of logistics intelligent stowage and loading algorithm design

Author: 张东站
蓝启明
Publication venue
Publication date: 01/01/2012
Field of study

根据公路运输中物流配载的原则和特点,结合专家知识和策略,提出了物流智能配载的设计思想和实现方法,建立了在不同运单类型、货品属性、出车地点、装车要求等多维度约束条件下的智能配载模型,给出了详细的物流配载术语定义和配载规则。提出了一种基于启发式思想和贪婪思想的混合算法解决货品装箱问题。使用ASP.nET开发环境,实现物流运输智能配载系统,模拟部分物流公司货运数据进行测试,测试结果显示了该智能配载方法的有效性与高效性。According to the principles and features of logistics in highway transportation,combining expert knowledge and strategies,this paper proposes a design idea and implementation method of logistics intelligent stowage.It establishes an intelligent stowage model under multidimensional constraint condition,such as different waybill types,goods attributes,car parking spots,loading requirements.It defines terms on logistics stowage and makes rules of goods loading and uses a hybrid algorithm based on heuristic algorithm and greedy algorithm to solve packing problem.It implements the logistics intelligent stowage system under ASP.NET programming environment and imitates some data of logistics companies to test the system.The test result shows this intelligent stowage method is reasonable and feasible.国家自然科学基金(No.50604012

Xiamen University Institutional Repository

CFE: A Continued Fraction Based on Encoding for Dynamic XML Data

Author: 张东站
曾志民
江弋
Publication venue
Publication date: 01/01/2009
Field of study

论述了一种基于连分数的动态XMl编码,首先介绍了CfE编码的概念,在此基础上把CfE应用到区间编码和前缀编码,接着对CfE编码的更新算法进行了阐述,最后进行实验对比,说明CfE编码是可行的。This paper introduces a continued fraction based encoding for dynamic XML data.Firstly presents what is CFE encoding.Then applies it to region encoding and prefix encoding.And proposes an algorithm for dynamic update XML data for CFE encoding.Finally,the result shows that CFE encoding is effective.国家自然科学基金资助项目(50604012

Xiamen University Institutional Repository

Incremental updating algorithm for computing core based on improved discernibility matrix

Author: 冯少荣
张东站
赖桃桃
Publication venue
Publication date: 01/01/2009
Field of study

分析发现杨明教授给出的改进的差别矩阵中存在不必要的计算,为此提出了改进的差别矩阵定义和求核方法;在此基础上提出一种基于改进差别矩阵的核增量式更新算法,主要考虑对象动态增加情况下核的更新问题。理论分析表明改进的核增量式更新算法具有近线性时间和空间复杂度。实验结果显示算法有效可行。Through analysis,it was found out that the improved discernibility matrix presented by Professor Yang Ming had unnecessary calculations.Therefore,an improved discernibility matrix definition together with a method for computing core was introduced.The authors introduced an incremental updating algorithm for computing core based on improved discernibility matrix,which mainly considered core updating when objects dynamically increased.Theoretical analyses show that incremental updating algorithm for computing core has nearly linear time and space complexity;and the experimental results show that the algorithm is efficient and effective.国家自然科学基金资助项目(50604012

Xiamen University Institutional Repository