24,933 research outputs found

    Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of β„“2\ell_2 Regularization

    Full text link
    While β„“2\ell_2 regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with β„“2\ell_2 regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree. Numerical experiments on a simulated dataset and a genomic ChIP dataset show that TreeInner has state-of-the-art feature selection performance. Code reproducing experiments is available at https://github.com/nalzok/TreeInner .Comment: 43 pages, 29 figure

    Developing a Data-Driven Statistical Model for Accurately Predicting the Superconducting Critical Temperature of Materials using Multiple Regression and Gradient-Boosted Methods

    Get PDF
    This study focuses on developing a statistical model for estimating the superconducting critical temperature (Tc) of materials using a data-driven strategy. The study analyzed 21,263 superconductors and used a combination of multiple regression and gradient-boosted models to make predictions. The analysis included a descriptive analysis of the distribution of Tc, feature selection using the Backwards selection method, and model diagnostics. The results showed that the gradient-boosted method outperformed the multiple linear regression method with an RMSE of 12.01 and an R2 value of 88.23 after fine-tuning its hyperparameters. The study concludes that the gradient-boosted method is an effective approach for accurately predicting Tc in superconducting materials

    End-to-end Feature Selection Approach for Learning Skinny Trees

    Full text link
    Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection and tree ensemble learning occurs simultaneously. It is based on an end-to-end optimization approach that considers feature selection in differentiable trees with Group β„“0βˆ’β„“2\ell_0 - \ell_2 regularization. We optimize with a first-order proximal method and present convergence guarantees for a non-convex and non-smooth objective. Interestingly, dense-to-sparse regularization scheduling can lead to more expressive and sparser tree ensembles than vanilla proximal method. On 15 synthetic and real-world datasets, Skinny Trees can achieve 1.5Γ—1.5\times - 620Γ—620\times feature compression rates, leading up to 10Γ—10\times faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25%25\% feature budget, Skinny Trees outperforms LightGBM by 10.2%10.2\% (up to 37.7%37.7\%), and Random Forests by 3%3\% (up to 12.5%12.5\%).Comment: Preprin

    Comparative Analysis of Machine Learning Algorithms for Solar Irradiance Forecasting in Smart Grids

    Full text link
    The increasing global demand for clean and environmentally friendly energy resources has caused increased interest in harnessing solar power through photovoltaic (PV) systems for smart grids and homes. However, the inherent unpredictability of PV generation poses problems associated with smart grid planning and management, energy trading and market participation, demand response, reliability, etc. Therefore, solar irradiance forecasting is essential for optimizing PV system utilization. This study proposes the next-generation machine learning algorithms such as random forests, Extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM) ensemble, CatBoost, and Multilayer Perceptron Artificial Neural Networks (MLP-ANNs) to forecast solar irradiance. Besides, Bayesian optimization is applied to hyperparameter tuning. Unlike tree-based ensemble algorithms that select the features intrinsically, MLP-ANN needs feature selection as a separate step. The simulation results indicate that the performance of the MLP-ANNs improves when feature selection is applied. Besides, the random forest outperforms the other learning algorithms.Comment: 6 pages, 4 figures, 3 tables, to appear in the 13th Smart Grid Conferenc

    Boosted Multiple Kernel Learning for First-Person Activity Recognition

    Get PDF
    Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effectively selects and combines features and their respective kernels during the training. Our experimental results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in first-person activity recognition problem exhibits improved results in comparison to the state-of-the-art. In addition, these techniques enable the expansion of the framework with new features in an efficient and convenient way.Comment: First published in the Proceedings of the 25th European Signal Processing Conference (EUSIPCO-2017) in 2017, published by EURASI
    • …
    corecore