Search CORE

14 research outputs found

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization

Author: Sun Qingyao
Publication venue
Publication date: 08/11/2022
Field of study

While

\ell_2

regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with

\ell_2

regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree. Numerical experiments on a simulated dataset and a genomic ChIP dataset show that TreeInner has state-of-the-art feature selection performance. Code reproducing experiments is available at https://github.com/nalzok/TreeInner .Comment: 43 pages, 29 figure

arXiv.org e-Print Archive

From global to local MDI variable importances for random forests and when they are Shapley values

Author: Geurts Pierre
Huynh-Thu Vân Anh
Louppe Gilles
Sutera Antonio
Wehenkel Louis
Publication venue
Publication date: 03/11/2021
Field of study

peer reviewedRandom forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Enhancing cryptocurrency price forecasting accuracy: a feature selection and weighting approach with bi-directional LSTM and trend-preserving model bias correction

Author: Ali Mirza Qublai Khan
Aliasghar Maria
Aziz Arisha
Hameed Sufian
Rafi Muhammed
Sohail Muhammad Izaan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/06/2023
Field of study

A cryptocurrency is a digitized, encrypted, and decentralized virtual currency, which is impossible to counterfeit or double-spend. It is one of the very popular investment instruments and traded in blockchain based crypto exchanges on ever growing volume. It is quite volatile due to imbalance of supply and demand, government regulations, investor sentiment and above all media hype. Cryptocurrency price forecasting is an active area of research and several approaches have been proposed recently. This study proposed a price forecasting model based on three vital characteristics (i) a feature selection and weighting approach based on Mean Decrease Impurity(MDI) features. (ii) Bi-directional LSTM and (iii) with a trend preserving model bias correction (CUSUM control charts for monitoring the model performance over time) to forecast Bitcoin and Ethereum values for long and short term spans. The data for both currencies were analyzed in three different intervals: (i) April 01, 2013 to April 01, 2016 (ii) April 01, 2013 to April 01, 2017 and (iii) April 01, 2013 to December 31, 2019. Extensive series of experiments were performed and evaluated on Root Mean Square Errors (RMSE). For bitcoin forecasting, the model achieved RMSE values 3.499 for interval 1, 5.070 for interval 2 and 6.642 for interval 3. Similarly, for Ethereum RSME of 0.094, 0.332, 3.027 are obtained for the three intervals respectively, On a new test-set collected from January 01, 2020 to January 01, 2022 for the two cryptocurrencies we obtained an average RSME of 9.17, with model bias correction, Comparing with the prevalent forecasting models we report a new state of the art in cryptocurrency forecasting

University of Gloucestershire Research Repository

Evaluation of machine learning methods and multi-source remote sensing data combinations to construct forest above-ground biomass models

Author: Li Jing
Ma Tianyue
Shao Jiahao
Smith Andy
Su Yiting
Yan Xingguang
Yang Di
Publication venue
Publication date: 01/11/2023
Field of study

Bangor University Research Portal

ESG ratings: the road ahead

Author: Bendix Joseph
Contreras Oscar
Lopez Claude
Publication venue
Publication date: 20/09/2020
Field of study

In this study, we show that using a common set of variables would partially resolve inconsistencies and the lack of comparability across rating providers that often confuse investors. Furthermore, we dissociate the impact of the rating agencies’ different focus on “E”, “S” or “G” from that of using different data. While the former, if properly disclosed, can be useful as it allows investors to choose what rating will be more in line with their preferences, the latter necessarily requires harmonization of the data collected

Munich RePEc Personal Archive