4 research outputs found

    Outlier detection based on random forest

    Get PDF
    摘要: 提出一种基于随机森林方法的异常样本 (outliers)检测方法。仿真实验表明 ,与其他 2种基于 距离的异常样本检测技术相比 ,这种方法可以更好地提高模型的准确率 ,且具有较强的鲁棒性 ,在处 理大规模数据集时还能显著地减少计算时间。Abstract: It intr oduces an outliers detecti on method based on random forest . Compared with the other t wo common outliers detecti on methods based on distance, the p roposed method can i mp r ove the performance and robustness of the model and can als o reduce the computati on ti me

    Research on Customer Churn Prediction Model for Telecommunication Enterprises

    No full text
    随着移动通信行业市场的繁荣发展,中国移动、中国联通、中国电信这三家老牌电信运营商之间的“三国演义”愈演愈烈。近年来,移动运营商一方面饱尝“价格战”之痛,另一方面也面临着不断攀高的营销成本支出。因而各移动运营商都开始考虑,如何从追求规模为主的发展模式向规模效益兼顾的发展模式转变。而实现这一转变的关键就是对用户流失率的控制,因此客户流失预测成为电信行业关注的一个重要问题。以传统统计学方法和人工智能方法为基础,电信行业客户流失预测取得了不少的研究成果,但仍存在数据来源众多、数据属性关系复杂、类别数量不平衡分布等特点。而现有的关于流失预测研究方面还缺乏一套科学的、系统的理论框架和方法体系,现有的基于单...With the prosperity and development of the mobile communications industry market, the competition among the three mobile operators which are China Mobile China Unicom and China telecom becomes intensified. In recent years, mobile operators have suffered the pain of the “price war”, on the other hand they also faced with constantly rising marketing costs. Thus, all mobile operators have begun to co...学位:工学博士院系专业:信息科学与技术学院自动化系_系统工程学号:2322008015056

    Feature Extraction Method Based on Random Forest and Transduction

    No full text
    提出一种基于随机森林和转导推理的特征提取方法,步骤如下:1)利用带标签的训练样本建立随机森林模型;2)将无标签的测试数据导入随机森林模型中,生成全体数据(训练样本和测试数据)的相似性矩阵;3)对该相似性矩阵进行多维尺度变换得到全体数据的低维数据表示,即低维特征,使得原高维数据在低维空间中具有更好的可分性.uCI数据库的实验结果表明:与主成分分析方法相比,该方法将无标签测试集的数据分布信息转移到相似性矩阵中,更好地刻画整个样本空间上的数据分布特性,从而提高分类器的性能,是一种行之有效的特征提取方法.最后还讨论了特征提取维数对模型准确率的影响,为实际应用提供参考.A feature extraction method is developed,which is based on random forest and transduction.Steps of method are as follows:1) build a random forest using training set;2) put both training set and test set into the model to get a proximity matrix which indicates the proximity of training samples and test data;3) perform the multidimensional scaling on the above matrix to get the extracted features.Experimental results on the UCI data show that compared to the principal component analysis(PCA) method the proposed method improves the performance of learning machine effectively.Furthermore,this paper discusses the effects of dimensions of feature extraction on the models

    Customer-churn Prediction for Telecom Enterprises Based on Random Forest and One-class SVM

    No full text
    针对电信行业客户流失问题,使用随机森林方法建立了初步的预测模型,对比电信行业原用的各种预测模型,其准确率有明显改善;针对模型特征维数众多的特点,进一步提出基于随机森林和转导推理的特征提取方法,对数据集进行降维,并引入单类支持向量机(SuPPOrT VECTOr MACHInE,SVM)算法得到最终的预测模型.实验表明,流失预测模型具有更高的预测准确率以及针对预测结果的部分可解释性.A customer-churn prediction model for the telecom enterprises is firstly established by random forest method.It is obviously superior in prediction accuracy with respect to the models actively used by the telecom enterprises.In order to get better,a feature extraction method based on random forest and transduction is proposed to heavily reduce the high-dimension of the data;furthermore,a one-class support vector Machine(OC-SVM)algorithm is introduced to perform the prediction under the new attributespace.Experiment results show that the improved model gets a much better accuracy as well as some reasonable explanation for the resuls.This new method is likely to be a powerful candidate in the customer-churn prediction for telecom enterprises