19,978 research outputs found

    Tree Boosting Data Competitions with XGBoost

    Get PDF
    This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition

    Unbiased split selection for classification trees based on the Gini Index

    Get PDF
    The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion

    Path computation in multi-layer networks: Complexity and algorithms

    Full text link
    Carrier-grade networks comprise several layers where different protocols coexist. Nowadays, most of these networks have different control planes to manage routing on different layers, leading to a suboptimal use of the network resources and additional operational costs. However, some routers are able to encapsulate, decapsulate and convert protocols and act as a liaison between these layers. A unified control plane would be useful to optimize the use of the network resources and automate the routing configurations. Software-Defined Networking (SDN) based architectures, such as OpenFlow, offer a chance to design such a control plane. One of the most important problems to deal with in this design is the path computation process. Classical path computation algorithms cannot resolve the problem as they do not take into account encapsulations and conversions of protocols. In this paper, we propose algorithms to solve this problem and study several cases: Path computation without bandwidth constraint, under bandwidth constraint and under other Quality of Service constraints. We study the complexity and the scalability of our algorithms and evaluate their performances on real topologies. The results show that they outperform the previous ones proposed in the literature.Comment: IEEE INFOCOM 2016, Apr 2016, San Francisco, United States. To be published in IEEE INFOCOM 2016, \<http://infocom2016.ieee-infocom.org/\&g
    corecore