7 research outputs found
Generalizing Gain Penalization for Feature Selection in Tree-Based Models
We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for full flexibility in the choice of feature-specific importance weights, while also applying a global penalization. We validate our method on both simulated and real data, exploring how the hyperparameters interact and we provide the implementation as an extension of the popular R package ranger
Is Brazilian music getting more predictable? A statistical physics approach for different music genres
Music is an important part of most people's lives and also of the culture of a country. Moreover, the different
characteristics of songs, such as genre and the chord sequences, could have different impacts on individual
behaviours. Even considering just seven chords and the respective variations, originality can be a crucial element
of a song's success. Considering this, and in the context of Brazilian music, we employed the Detrended
Fluctuation Analysis to analyse the possible predictability of eight different music genres. On these genres, we
found that Reggae and Pop seem to be the least random considering the sequenced use of chords. With a sliding
windows approach, we found that the predictability of chord sequences of Pop decreased over time. Applying
the same methodology after shuffling the original series of music, the results point to a randomness of those
shuffled series, demonstrating the robustness of our approach
Hierarchical Embedded Bayesian Additive Regression Trees
We propose a simple yet powerful extension of Bayesian Additive Regression
Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for
random effects to be included at the terminal node level of a set of regression
trees, making HE-BART a non-parametric alternative to mixed effects models
which avoids the need for the user to specify the structure of the random
effects in the model, whilst maintaining the prediction and uncertainty
calibration properties of standard BART. Using simulated and real-world
examples, we demonstrate that this new extension yields superior predictions
for many of the standard mixed effects models' example data sets, and yet still
provides consistent estimates of the random effect variances. In a future
version of this paper, we outline its use in larger, more advanced data sets
and structures