273,717 research outputs found

    Alternating model trees

    Get PDF
    Model tree induction is a popular method for tackling regression problems requiring interpretable models. Model trees are decision trees with multiple linear regression models at the leaf nodes. In this paper, we propose a method for growing alternating model trees, a form of option tree for regression problems. The motivation is that alternating decision trees achieve high accuracy in classification problems because they represent an ensemble classifier as a single tree structure. As in alternating decision trees for classifi-cation, our alternating model trees for regression contain splitter and prediction nodes, but we use simple linear regression functions as opposed to constant predictors at the prediction nodes. Moreover, additive regression using forward stagewise modeling is applied to grow the tree rather than a boosting algorithm. The size of the tree is determined using cross-validation. Our empirical results show that alternating model trees achieve significantly lower squared error than standard model trees on several regression datasets

    Decision Stream: Cultivating Deep Decision Trees

    Full text link
    Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%

    Tree Boosting Data Competitions with XGBoost

    Get PDF
    This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition

    Credit Scoring Menggunakan Algoritma Classification and Regression Tree (CART)

    Full text link
    Credit Scoring, adalah proses penilaian permohonan kredit yang dilakukan oleh lembaga kreditur/ pihak yang memberikan kredit kepada debitur selaku penerima kredit tersebut. Keluaran dari credit scoring adalah layak atau tidaknya calon debitur tersebut untuk menerima kredit. Tahapan ini adalah tahapan yang paling penting didalam proses kredit. Kesalahan di dalam tahapan ini akan berdampak besar pada keseluruhan tahapan pemberian kredit, dan secara global berpengaruh terhadap lembaga itu sendiri. Bidang ilmu dari teknologi informasi, yang bisa membantu credit scoring adalah data mining. Salah satu algoritma yang bisa digunakan di dalam data mining adalah Classification And Regresion Tree (CART). Penggunaan algoritma ini untuk credit scoring akan menghemat waktu, USAha dan biaya serta dengan cepat, tepat dan efektif menganalisis kelayakan calon debitur. Model yang dibentuk dari algoritma CART di domain credit scoring memberikan rata-rata tingkat akurasi sebesar 75,20 % dan dikategorikan sebagai fair classification
    corecore