30,731 research outputs found

    CSNL: A cost-sensitive non-linear decision tree algorithm

    Get PDF
    This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification. The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes. The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes

    Comparison of Supervised Classification Methods for Protein Profiling in Cancer Diagnosis

    Get PDF
    A key challenge in clinical proteomics of cancer is the identification of biomarkers that could allow detection, diagnosis and prognosis of the diseases. Recent advances in mass spectrometry and proteomic instrumentations offer unique chance to rapidly identify these markers. These advances pose considerable challenges, similar to those created by microarray-based investigation, for the discovery of pattern of markers from high-dimensional data, specific to each pathologic state (e.g. normal vs cancer). We propose a three-step strategy to select important markers from high-dimensional mass spectrometry data using surface enhanced laser desorption/ionization (SELDI) technology. The first two steps are the selection of the most discriminating biomarkers with a construction of different classifiers. Finally, we compare and validate their performance and robustness using different supervised classification methods such as Support Vector Machine, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Networks, Classification Trees and Boosting Trees. We show that the proposed method is suitable for analysing high-throughput proteomics data and that the combination of logistic regression and Linear Discriminant Analysis outperform other methods tested

    Some Remarks about the Usage of Asymmetric Correlation Measurements for the Induction of Decision Trees

    Get PDF
    Decision trees are used very successfully for the identification resp. classification task of objects in many domains like marketing (e.g. Decker, Temme (2001)) or medicine. Other procedures to classify objects are for instance the logistic regression, the logit- or probit analysis, the linear or squared discriminant analysis, the nearest neighbour procedure or some kernel density estimators. The common aim of all these classification procedures is to generate classification rules which describe the correlation between some independent exogenous variables resp. attributes and at least one endogenous variable, the so called class membership variable. --

    Classification of Hungarian medieval silver coins using x-ray fluorescent spectroscopy and multivariate data analysis

    Get PDF
    A set of silver coins from the collection of Déri Museum Debrecen (Hungary) was examined by X-ray fluorescent elemental analysis with the aim to assign the coins to different groups with the best possible precision based on the acquired chemical information and to build models, which arrange the coins according to their historical periods. Results: Principal component analysis, linear discriminant analysis, partial least squares discriminant analysis, classification and regression trees and multivariate curve resolution with alternating least squares were applied to reveal dominant pattern in the data and classify the coins into several groups. We also identified those chemical components, which are present in small percentages, but are useful for the classification of the coins. With the coins divided into two groups according to adequate historical periods, we have obtained a correct classification (76-78%) based on the chemical compositions. Conclusions: X-ray fluorescent elemental analysis together with multivariate data analysis methods is suitable to group medieval coins according to historical periods. Keywords: X-ray fluorescence spectroscopy, Multivariate techniques, Coin, Silver, Middle age

    Using Machine Learning, Neural Networks, and Statistics to Predict Corporate Bankruptcy

    Get PDF
    Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear discriminant analysis represents the "classical" statistical approach to classification, whereas classification trees and neural networks represent artificial intelligence approaches. A proper statistical design is used to be able to test whether observed differences in predictive performance are statistically significant. The data set consists of a collection of 576 annual reports from Belgian construction companies. We use stratified 10–fold cross–validation on the training set to choose "good" parameter values for the different learning methods. The test set is used to obtain an unbiased estimate of the true prediction error. Using rigorous statistical testing, we cannot conclude that in the case of the data set studied, one learning method clearly outperforms the other methods

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

    Localized Regression

    Get PDF
    The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen data¡adaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures
    corecore