51 research outputs found
Truly Unordered Probabilistic Rule Sets for Multi-class Classification
Rule set learning has long been studied and has recently been frequently
revisited due to the need for interpretable models. Still, existing methods
have several shortcomings: 1) most recent methods require a binary feature
matrix as input, while learning rules directly from numeric variables is
understudied; 2) existing methods impose orders among rules, either explicitly
or implicitly, which harms interpretability; and 3) currently no method exists
for learning probabilistic rule sets for multi-class target variables (there is
only one for probabilistic rule lists).
We propose TURS, for Truly Unordered Rule Sets, which addresses these
shortcomings. We first formalize the problem of learning truly unordered rule
sets. To resolve conflicts caused by overlapping rules, i.e., instances covered
by multiple rules, we propose a novel approach that exploits the probabilistic
properties of our rule sets. We next develop a two-phase heuristic algorithm
that learns rule sets by carefully growing rules. An important innovation is
that we use a surrogate score to take the global potential of the rule set into
account when learning a local rule.
Finally, we empirically demonstrate that, compared to non-probabilistic and
(explicitly or implicitly) ordered state-of-the-art methods, our method learns
rule sets that not only have better interpretability but also better predictive
performance.Comment: Camera ready version for ECMLPKDD 2022, with Supplementary Material
Robust subgroup discovery
We introduce the problem of robust subgroup discovery, i.e., finding a set of
interpretable descriptions of subsets that 1) stand out with respect to one or
more target attributes, 2) are statistically robust, and 3) non-redundant. Many
attempts have been made to mine either locally robust subgroups or to tackle
the pattern explosion, but we are the first to address both challenges at the
same time from a global modelling perspective. First, we formulate the broad
model class of subgroup lists, i.e., ordered sets of subgroups, for univariate
and multivariate targets that can consist of nominal or numeric variables, and
that includes traditional top-1 subgroup discovery in its definition. This
novel model class allows us to formalise the problem of optimal robust subgroup
discovery using the Minimum Description Length (MDL) principle, where we resort
to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and
numeric targets, respectively. Second, as finding optimal subgroup lists is
NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists
and guarantees that the most significant subgroup found according to the MDL
criterion is added in each iteration, which is shown to be equivalent to a
Bayesian one-sample proportions, multinomial, or t-test between the subgroup
and dataset marginal target distributions plus a multiple hypothesis testing
penalty. We empirically show on 54 datasets that SSD++ outperforms previous
subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ;
submitted to Data Mining and Knowledge Discovery Journa
CHIRPS: Explaining random forest classification
Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop pro-cesses, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting
Vouw: geometric pattern mining using the MDL principle
Algorithms and the Foundations of Software technolog
Preditcting Treatment Outcome Using Interpretable Models for Patients with Head and Neck Cancer
Head and neck cancer accounts for around 3 % of cancers worldwide, resulting in many deaths each year. The increasing number of patients receiving a cancer diagnosis increases the demand for accurate diagnosis and effective treatment. Intra-tumor heterogeneity is said to be one of the issues in cancer therapy, an issue that needs to be solved. Radiomics pave the way for extracting features based on the shape, size, and texture of the entire tumor.
Radiomics extracts features from tumors based on the gray levels in a medical image. The process of radiomics is intended to capture texture and heterogeneity in the tumor that would be impossible to deduce from a simple tumor biopsy. Feature extraction by radiomics has been proven to enrich clinical datasets with valuable features that positively impact the performance of predictive models.
This thesis investigates the use of clinical and radiomics features for predicting treatment outcomes of head and neck cancer patients using interpretable models. The radiomics algorithm extracts first-order statistical, shape, and texture features from PET and CT images of each patient. The 139 patients in the training dataset were from Oslo University Hospital (OUS), whereas the 99 patients in the test set were from the MAASTRO clinic in the Netherlands. All the clinical features, together with the radiomics features, counted 388 features in total. Feature selection through the repeated elastic net technique (RENT) was performed to exclude irrelevant features from the dataset. Seven different tree-based machine learning algorithms were fitted to the data, and the performance was validated by the accuracy, ROC AUC, Matthews correlation coefficient, F1 score for class 1, and F1 score for class 0. The models were tested on the external MAASTRO dataset, and the overall best-performing models were interpreted.
On the external dataset from the MAASTRO clinic, the highest-performing models obtained an MCC of 0.37 for OS prediction and 0.44 for DFS prediction. For both OS and DFS, the highest predictions were made on only the clinical data. Transparency in machine learning models greatly benefits decision-makers in clinical settings, as every prediction can be reasoned for. Predicting treatment outcomes for head and neck patients is highly possible with interpretable models. To determine if the methods used in this thesis are suited for predicting treatment outcomes for head and neck cancer patients, it is necessary to test the methods and models on more datasets
- …