19,978 research outputs found
Tree Boosting Data Competitions with XGBoost
This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition
Unbiased split selection for classification trees based on the Gini Index
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion
Path computation in multi-layer networks: Complexity and algorithms
Carrier-grade networks comprise several layers where different protocols
coexist. Nowadays, most of these networks have different control planes to
manage routing on different layers, leading to a suboptimal use of the network
resources and additional operational costs. However, some routers are able to
encapsulate, decapsulate and convert protocols and act as a liaison between
these layers. A unified control plane would be useful to optimize the use of
the network resources and automate the routing configurations. Software-Defined
Networking (SDN) based architectures, such as OpenFlow, offer a chance to
design such a control plane. One of the most important problems to deal with in
this design is the path computation process. Classical path computation
algorithms cannot resolve the problem as they do not take into account
encapsulations and conversions of protocols. In this paper, we propose
algorithms to solve this problem and study several cases: Path computation
without bandwidth constraint, under bandwidth constraint and under other
Quality of Service constraints. We study the complexity and the scalability of
our algorithms and evaluate their performances on real topologies. The results
show that they outperform the previous ones proposed in the literature.Comment: IEEE INFOCOM 2016, Apr 2016, San Francisco, United States. To be
published in IEEE INFOCOM 2016, \<http://infocom2016.ieee-infocom.org/\&g
- …