Search CORE

2 research outputs found

Multi-test Decision Tree and its Application to Microarray Data Classification

Author: Armstrong
Berzal
Breiman
Breiman
Breiman
Brodley
Brown
Brown
Che
Chen
Cohen
Cordell
Cowell
Czajkowski
Demsar
Dettling
Diaz-Uriarte
Dramiński
Fayyad
Freund
Freund
Ge
Golub
Grześ
Hall
Hastie
Hu
Kuo
Li
Marcin Czajkowski
Marek Grześ
Marek Kretowski
Murthy
Murthy
Pagallo
Qu
Quinlan
Robnik-Siikonja
Rokach
Rokach
Sebastiani
Shalev-Shwartz
Shi
Tan
Tan
Wold
Yeoh
Publication venue: 'Elsevier BV'
Publication date: 01/05/2014
Field of study

Objective: The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. Methods: We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Results: Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on

14

datasets by an average

6

percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. Conclusion: This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts

Crossref

Kent Academic Repository

A General Framework of Feature Selection for Text Categorization

Author: G. Legrand
H. Peng
L. Polkowski
M. Mak
M. Robnik-Siikonja
P. Perner
Q. Zhou
S.J. Hong
T. Joactfims
Y. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Many feature selection methods have been proposed for text categorization. However, their performances are usually verified by experiments, so the results rely on the corpora used and may not be accurate. This paper proposes a novel feature selection framework called Distribution-Based Feature Selection (DBFS) based on distribution difference of features. This framework generalizes most of the state-of-the-art feature selection methods including OCFS, MI, ECE, IG, CHI and OR. The performances of many feature selection methods can be estimated by theoretical analysis using components of this framework. Besides, DBFS sheds light on the merits and drawbacks of many existing feature selection methods. In addition, this framework helps to select suitable feature selection methods for specific domains. Moreover, a weighted model based on DBFS is given so that suitable feature selection methods for unbalanced datasets can be derived. The experimental results show that they are more effective than CHI, IG and OCFS on both balanced and unbalanced datasets. ? 2009 Springer Berlin Heidelberg.EI

Crossref