7 research outputs found

    Generalizing Gain Penalization for Feature Selection in Tree-Based Models

    Get PDF
    We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for full flexibility in the choice of feature-specific importance weights, while also applying a global penalization. We validate our method on both simulated and real data, exploring how the hyperparameters interact and we provide the implementation as an extension of the popular R package ranger

    Is Brazilian music getting more predictable? A statistical physics approach for different music genres

    Get PDF
    Music is an important part of most people's lives and also of the culture of a country. Moreover, the different characteristics of songs, such as genre and the chord sequences, could have different impacts on individual behaviours. Even considering just seven chords and the respective variations, originality can be a crucial element of a song's success. Considering this, and in the context of Brazilian music, we employed the Detrended Fluctuation Analysis to analyse the possible predictability of eight different music genres. On these genres, we found that Reggae and Pop seem to be the least random considering the sequenced use of chords. With a sliding windows approach, we found that the predictability of chord sequences of Pop decreased over time. Applying the same methodology after shuffling the original series of music, the results point to a randomness of those shuffled series, demonstrating the robustness of our approach

    Hierarchical Embedded Bayesian Additive Regression Trees

    Full text link
    We propose a simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for random effects to be included at the terminal node level of a set of regression trees, making HE-BART a non-parametric alternative to mixed effects models which avoids the need for the user to specify the structure of the random effects in the model, whilst maintaining the prediction and uncertainty calibration properties of standard BART. Using simulated and real-world examples, we demonstrate that this new extension yields superior predictions for many of the standard mixed effects models' example data sets, and yet still provides consistent estimates of the random effect variances. In a future version of this paper, we outline its use in larger, more advanced data sets and structures
    corecore