55 research outputs found

    New similarity measure for mining time series

    Get PDF
    针对时间序列的全序列聚类展开,提出一种新的相似性度量——全局特征,即从时间序列的统计分布特征、非线性和Fourier频谱转换等3个方面提取11个全局特征构建特征向量。利用特征向量来描述原时间序列,不仅保留了大部分原有的信息,还能加快聚类计算的速度。经过大量的实验验证表明,基于全局特征提取的相似性度量能得到合理的聚类结果,特别是对经济领域的时间序列效果更为明显。例举了2个数据进行实验,并从主观和客观两个角度对聚类结果进行评估。Proposes a new similarity measure-global characters for whole clustering of time series,that replaces the raw data with 11 global characteristics,from the aspects of statistical distribution,non-linear and Fourier transformation,thus can get a characteristic vector,which can hold most information of the original time seiries and reduce the calculating complexity.Experimentally compares the four similarity measures on three database under group-ward hierarchical clustering,evaluates the results objectively and subjecttively respectively,and is shown to yield useful and reasonable clustering,especially for economic time series.厦门大学985二期信息创新平台项目(No0000-X07204

    A k-means-based Algorithm for Soft Subspace Clustering

    Get PDF
    软子空间聚类是聚类研究领域的一个重要分支和研究热点。高维空间聚类以数据分布稀疏和“维度效应“现象等问题而成为难点。在分析现有软子空间聚类算法不足的基础上,引入子空间差异的概念;在此基础上,结合簇内紧凑度的信息来设计新的目标优化函数;提出了一种新的k-MEAnS型软子空间聚类算法,该算法在聚类过程中无需设置额外的参数。理论分析与实验结果表明,相对于其他的软子空间算法,该算法具有更好的聚类精度。Soft subspace clustering is an important part and research hotspot in clustering research.Clustering in high dimensional space is especially difficult due to the sparse distribution of the data and the curse of dimensionality.By analyzing limitations of the existing algorithms,the concept of subspace difference is proposed.Based on these,a new objective function is given by taking into account the compactness of the subspace clusters and subspace difference of the clusters.And a subspace clustering algorithm based on k-means is presented.The additional parameter is not necessary in the novel algorithm.Theoretical analysis and experimental results demonstrate that the proposed algorithm significantly improves the accuracy.国家自然科学基金No.10771176---

    Malware Identification Technique and its Applications

    Get PDF
    随着互联网技术的发展和安全形势的变化,恶意软件的数量呈指数级增长,恶意软件的变种更是层出不穷,传统的鉴别方法已经不能及时有效的处理这种海量数据,这使得以客户端为战场的传统查杀与防御模式不能适应新的安全需求,各大安全厂商开始构建各自的“云安全“计划。在这种大背景下,研究恶意软件检测关键技术是非常必要的。针对恶意软件数量大、变化快、维度高与干扰多的问题,我们研究云计算环境下的软件行为鉴别技术,探讨海量软件样本数据挖掘新方法、事件序列簇类模式挖掘新模型和算法及在恶意软件鉴别中的应用,并构建面向云安全的恶意软件智能鉴别系统原型以及中文钓鱼网站检测系统架构。With the development of the Internet technology and the changes of the situation of Internet security,we witness exponential increase of the number of malicious software and their endless variants.Traditional detection methods cannot effectively and timely deal with such mass of malicious software data,making traditional anti-virus platform running on PC client cannot satisfy current security requirements any more,thus some major Internet security venders have been launching their 'cloud security' program.Under such background,it is urgent to develop some new effective and efficient techniques for malware detection.In this paper,we investigate malware detection techniques based on cloud computing,including mining massive software samples,and applying new clustering models/algorithms for event sequences into malware detection,to deal with the critical issues of malware as being of large amount,fast change,highdimension and noise-laden.Furthermore,we propose a prototype of intelligent malware detection system for cloud security.国家自然科学基金(面向软件行为鉴别的事件序列挖掘方法研究;NO.61175123);深圳市生物、互联网、新能源产业发展专项资金(NO.CXB201005250021A

    Attribute reduction algorithm of rough set combined fuzzy set theory

    Get PDF
    结合模糊关系的理论,对粗糙集理论的属性约简算法进行研究,提出了一个新的属性约简算法,并给出了一个应用实例。This paper discussed the attribute reduction in rough set combined fuzzy relation theory,and then proposed a new attribute reduction algorithm and gave an illustrative example.国家自然科学基金资助项目(60275023);; 厦门大学科学研究基金资助项目(Y07002

    Pattern Matching Method Based on Point Distribution for Multivariate Time Series

    Get PDF
    多元时间序列模式匹配的常用方法难以刻画序列的全局形状特征,比如,EuClId方法的鲁棒性不够强;而PCA方法不适合处理小规模多元时间序列.基于点的统计分布提出了一种能够有效刻画多元时间序列形状特征的模式匹配方法.首先,提取多元时间序列样本的局部重要点,作为模式描述的方式;然后,根据重要点的统计分布特点构建特征模式向量,并借助EuClId范数来度量两个特征模式向量之间的相似程度,进而进行多元时间序列模式匹配.采用该方法进行模式匹配,充分利用了序列的全局形状特征.实验结果表明,基于点分布特征的多元时间序列模式匹配能够有效地刻画序列的形状特征,且能处理多种规模的序列数据.Common methods for matching multivariate time series such as the Euclid method and PCA method have difficulties in taking advantage of the global shape of time series.The Euclid method is not robust, while the PCA method is not suitable to deal with the small-scale multivariate time series.This paper proposes a pattern matching method based on point distribution for multivariate time series, which is able to characterize the shape of series.Local important points of a multivariate time series and their distribution are used to construct the pattern vector.To match pattern of multivariate time series, the Euclid norm is used to measure the similarity between the pattern vectors.The global shape characteristic is used in the method to match patterns of series.The results of experiments show that it is easy to characterize the shape of multivariate time series with this method, with which various scales can be dealt with in series data.国家自然科学基金No.10771176;国家“九八五”工程二期基金No.0000-X07204---

    3D Face Modeling Method for a Certain Person

    Get PDF
    描述了一种特定人的人脸三维建模方法,该方法用该人的正面及侧面人脸两张照片,通过选择关键特征点,在基本人脸模型基础上,经过变形,得到特定人脸三维网格模型。再经纹理匹配,获得特定人的三维人脸模型。该方法已在微型计算机上进行了模拟,并成功地获得酷似真人的人脸三维模型。This paper describes a method to reconstruct 3D face from a front face photo and a side face photo.After selecting key feature points and metamorphosing the basic face mode,the 3D grid face of the special person is created.After texture mapping,the special person’s 3D face model is reconstructed.This method is implemented on a PC.The experiment is successful because the 3D face model and the true face are alike.福建省自然科学基金资助项目(A0410002

    基于细胞状演化神经网络的黄河三角洲区域预测

    Get PDF
    将细胞状遗传算法( cGA)与反向传播( BP) 神经网络相结合, 构建了一个细胞状演化神经网络时间序列模型, 以 1984~ 2000年间的多时相遥感影像为主要数据源, 应用该模型对黄河口地区陆地面积进行预测, 并分析了其将来的演变 趋势. 结果表明, 在2001~ 2010年间研究区域陆地面积将呈现增长与蚀退交替演变、增长高峰逐渐下降的趋势.福建省青年科技人才基金项目( 2002 J005)资

    Frequency-supervised-breakpoint based discretization algorithm

    Get PDF
    提出了一种频数监督断点的离散化算法。该算法利用所提出的频数监督断点思想产生初始断点,并在此基础上进行断点简约。实验结果表明该算法所产生的断点不仅符合实际数据分布,而且更为合理、精练。In this paper,a discrete algorithm is proposed based on the frequency supervised breakpoint which is applied to the conditional continuous attributes.The algorithm adopts the idea of frequency supervised breakpoint that is brought forward to gen-erate initial breakpoints.On the basis of the preparatory work,reduction of breaktpoints has been performed.The result obtained shows clearly the breakpoints generated by this algorithm not only are in line with the actual data distribution but also they are more reasonable,refined.国家自然科学基金No.10771176---

    A Hierarchical Method for Determining the Number of Clusters

    Get PDF
    确定数据集的聚类数目是聚类分析中一项基础性的难题.常用的trail-and-error方法通常依赖于特定的聚类算法,且在大型数据集上计算效率欠佳.提出一种基于层次思想的计算方法,不需要对数据集进行反复聚类,它首先扫描数据集获得CF(clusteringfeature,聚类特征)统计值,然后自底向上地生成不同层次的数据集划分,增量地构建一条关于不同层次划分的聚类质量曲线;曲线极值点所对应的划分用于估计最佳的聚类数目.另外,还提出一种新的聚类有效性指标用于衡量不同划分的聚类质量.该指标着重于簇的几何结构且独立于具体的聚类算法,能够识别噪声和复杂形状的簇.在实际数据和合成数据上的实验结果表明,新方法的性能优于新近提出的其他指标,同时大幅度提高了计算效率.A fundamental and difficult problem in cluster analysis is the determination of the "true" number of clusters in a dataset. The common trail-and-error method generally depends on certain clustering algorithms and is inefficient when processing large datasets. In this paper, a hierarchical method is proposed to get rid of repeatedly clustering on large datasets. The method firstly obtains the CF (clustering feature) via scanning the dataset and agglomerative generates the hierarchical partitions of dataset, then a curve of the clustering quality w.r.t the varying partitions is incrementally constructed. The partitions corresponding to the extremum of the curve is used to estimate the number of clusters finally. A new validity index is also presented to quantify the clustering quality, which is independent of clustering algorithm and emphasis on the geometric features of clusters, handling efficiently the noisy data and arbitrary shaped clusters. Experimental results on both real world and synthesis datasets demonstrate that the new method outperforms the recently published approaches, while the efficiency is significantly improved.Supported by the National Natural Science Foundation of Chinaunder GrantNo.10771176(国家自然科学基金);; the National 985 Project of Chinaunder GrantNo.0000-X07204(985工程二期平台基金);; the Scientific Research Foundation of Xiamen University of Chinaunder GrantNo.0630-X01117(厦门大学科研基金

    Density Clustering Algorithm Based on Subspace Dimensional Weighting

    Get PDF
    在高维数据聚类中,受维度效应的影响,现有的算法聚类效果不佳。为此,提出一种适用于高维数据的密度聚类算法STAdECOn。在经典的PrEdECOn算法基础上,引入子空间维度权重的计算方法,避免PrEdECOn算法使用全空间距离度量带来的问题,提高了聚类的质量。在合成数据和实际应用数据集上的实验结果表明,该算法在高维数据聚类上可取得较好的聚类精度,算法是有效可行的。In clustering of high dimensional data,most of the existing algorithms can not reach people’s expectation due to the curse of dimensionality.Based on the classic PreDeCon algorithm,this paper presents the StaDeCon,a density clustering algorithm for high dimensional data,which introduces a measure of subspace dimensional weighting to avoid the problem existing in PreDeCon caused by using full dimensional distance,and in this way,the quality of clustering is improved.Experimental results both on artificial and practical data show that the algorithm is more accurate,and it is effective and feasible
    corecore