Feature Extraction Method Based on Random Forest and Transduction

Abstract

提出一种基于随机森林和转导推理的特征提取方法,步骤如下:1)利用带标签的训练样本建立随机森林模型;2)将无标签的测试数据导入随机森林模型中,生成全体数据(训练样本和测试数据)的相似性矩阵;3)对该相似性矩阵进行多维尺度变换得到全体数据的低维数据表示,即低维特征,使得原高维数据在低维空间中具有更好的可分性.uCI数据库的实验结果表明:与主成分分析方法相比,该方法将无标签测试集的数据分布信息转移到相似性矩阵中,更好地刻画整个样本空间上的数据分布特性,从而提高分类器的性能,是一种行之有效的特征提取方法.最后还讨论了特征提取维数对模型准确率的影响,为实际应用提供参考.A feature extraction method is developed,which is based on random forest and transduction.Steps of method are as follows:1) build a random forest using training set;2) put both training set and test set into the model to get a proximity matrix which indicates the proximity of training samples and test data;3) perform the multidimensional scaling on the above matrix to get the extracted features.Experimental results on the UCI data show that compared to the principal component analysis(PCA) method the proposed method improves the performance of learning machine effectively.Furthermore,this paper discusses the effects of dimensions of feature extraction on the models

    Similar works