273,717 research outputs found
Alternating model trees
Model tree induction is a popular method for tackling regression problems requiring interpretable models. Model trees are decision trees with multiple linear regression models at the leaf nodes. In this paper, we propose a method for growing alternating model trees, a form of option tree for regression problems. The motivation is that alternating decision trees achieve high accuracy in classification problems because they represent an ensemble classifier as a single tree structure. As in alternating decision trees for classifi-cation, our alternating model trees for regression contain splitter and prediction nodes, but we use simple linear regression functions as opposed to constant predictors at the prediction nodes. Moreover, additive regression using forward stagewise modeling is applied to grow the tree rather than a boosting algorithm. The size of the tree is determined using cross-validation. Our empirical results show that alternating model trees achieve significantly lower squared error than standard model trees on several regression datasets
Decision Stream: Cultivating Deep Decision Trees
Various modifications of decision trees have been extensively used during the
past years due to their high efficiency and interpretability. Tree node
splitting based on relevant feature selection is a key step of decision tree
learning, at the same time being their major shortcoming: the recursive nodes
partitioning leads to geometric reduction of data quantity in the leaf nodes,
which causes an excessive model complexity and data overfitting. In this paper,
we present a novel architecture - a Decision Stream, - aimed to overcome this
problem. Instead of building a tree structure during the learning process, we
propose merging nodes from different branches based on their similarity that is
estimated with two-sample test statistics, which leads to generation of a deep
directed acyclic graph of decision rules that can consist of hundreds of
levels. To evaluate the proposed solution, we test it on several common machine
learning problems - credit scoring, twitter sentiment analysis, aircraft flight
control, MNIST and CIFAR image classification, synthetic data classification
and regression. Our experimental results reveal that the proposed approach
significantly outperforms the standard decision tree learning methods on both
regression and classification tasks, yielding a prediction error decrease up to
35%
Tree Boosting Data Competitions with XGBoost
This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition
Credit Scoring Menggunakan Algoritma Classification and Regression Tree (CART)
Credit Scoring, adalah proses penilaian permohonan kredit yang dilakukan oleh lembaga kreditur/ pihak yang memberikan kredit kepada debitur selaku penerima kredit tersebut. Keluaran dari credit scoring adalah layak atau tidaknya calon debitur tersebut untuk menerima kredit. Tahapan ini adalah tahapan yang paling penting didalam proses kredit. Kesalahan di dalam tahapan ini akan berdampak besar pada keseluruhan tahapan pemberian kredit, dan secara global berpengaruh terhadap lembaga itu sendiri. Bidang ilmu dari teknologi informasi, yang bisa membantu credit scoring adalah data mining. Salah satu algoritma yang bisa digunakan di dalam data mining adalah Classification And Regresion Tree (CART). Penggunaan algoritma ini untuk credit scoring akan menghemat waktu, USAha dan biaya serta dengan cepat, tepat dan efektif menganalisis kelayakan calon debitur. Model yang dibentuk dari algoritma CART di domain credit scoring memberikan rata-rata tingkat akurasi sebesar 75,20 % dan dikategorikan sebagai fair classification
- …
