28 research outputs found
Multilabel Classification with R Package mlr
We implemented several multilabel classification algorithms in the machine
learning package mlr. The implemented methods are binary relevance, classifier
chains, nested stacking, dependent binary relevance and stacking, which can be
used with any base learner that is accessible in mlr. Moreover, there is access
to the multilabel classification versions of randomForestSRC and rFerns. All
these methods can be easily compared by different implemented multilabel
performance measures and resampling methods in the standardized mlr framework.
In a benchmark experiment with several multilabel datasets, the performance of
the different methods is evaluated.Comment: 18 pages, 2 figures, to be published in R Journal; reference
correcte
On benchmark experiments and visualization methods for the evaluation and interpretation of machine learning models
Subject-specific Bradley-Terry-Luce Models with Implicit Variable Selection
The Bradley-Terry-Luce (BTL) model for paired comparison data is able to obtain a ranking of the objects that are compared pairwise by subjects. The task of each subject is to make preference decisions in favor of one of the objects. This decision is binary when subjects prefer either the first object or the second object, but can also be ordinal when subjects make their decisions on a Likert scale.
Since subject-specific covariates, which reflect characteristics of the subject, may affect the preference decision, it is essential to incorporate subject-specific covariates into the model.
However, the inclusion of subject-specific covariates yields a model that contains many
parameters and thus estimation becomes challenging. To overcome this problem, we propose a procedure that is able to select and estimate only relevant variables
Decomposing Global Feature Effects Based on Feature Interactions
Global feature effect methods, such as partial dependence plots, provide an
intelligible visualization of the expected marginal feature effect. However,
such global feature effect methods can be misleading, as they do not represent
local feature effects of single observations well when feature interactions are
present. We formally introduce generalized additive decomposition of global
effects (GADGET), which is a new framework based on recursive partitioning to
find interpretable regions in the feature space such that the
interaction-related heterogeneity of local feature effects is minimized. We
provide a mathematical foundation of the framework and show that it is
applicable to the most popular methods to visualize marginal feature effects,
namely partial dependence, accumulated local effects, and Shapley additive
explanations (SHAP) dependence. Furthermore, we introduce a new
permutation-based interaction test to detect significant feature interactions
that is applicable to any feature effect method that fits into our proposed
framework. We empirically evaluate the theoretical characteristics of the
proposed methods based on various feature effect methods in different
experimental settings. Moreover, we apply our introduced methodology to two
real-world examples to showcase their usefulness
Subject-specific Bradley-Terry-Luce models with implicit variable selection
The Bradley-Terry-Luce (BTL) model for paired comparison data is able to obtain a ranking of the objects that are compared pairwise by subjects. The task of each subject is to make preference decisions in favor of one of the objects. This decision is binary when subjects prefer either the first object or the second object, but can also be ordinal when subjects make their decisions on a Likert scale.
Since subject-specific covariates, which reflect characteristics of the subject, may affect the preference decision, it is essential to incorporate subject-specific covariates into the model.
However, the inclusion of subject-specific covariates yields a model that contains many
parameters and thus estimation becomes challenging. To overcome this problem, we propose a procedure that is able to select and estimate only relevant variables
Algorithm-Agnostic Interpretations for Clustering
A clustering outcome for high-dimensional data is typically interpreted via
post-processing, involving dimension reduction and subsequent visualization.
This destroys the meaning of the data and obfuscates interpretations. We
propose algorithm-agnostic interpretation methods to explain clustering
outcomes in reduced dimensions while preserving the integrity of the data. The
permutation feature importance for clustering represents a general framework
based on shuffling feature values and measuring changes in cluster assignments
through custom score functions. The individual conditional expectation for
clustering indicates observation-wise changes in the cluster assignment due to
changes in the data. The partial dependence for clustering evaluates average
changes in cluster assignments for the entire feature space. All methods can be
used with any clustering algorithm able to reassign instances through soft or
hard labels. In contrast to common post-processing methods such as principal
component analysis, the introduced methods maintain the original structure of
the features