1 research outputs found
Interpreting Classifiers through Attribute Interactions in Datasets
In this work we present the novel ASTRID method for investigating which
attribute interactions classifiers exploit when making predictions. Attribute
interactions in classification tasks mean that two or more attributes together
provide stronger evidence for a particular class label. Knowledge of such
interactions makes models more interpretable by revealing associations between
attributes. This has applications, e.g., in pharmacovigilance to identify
interactions between drugs or in bioinformatics to investigate associations
between single nucleotide polymorphisms. We also show how the found attribute
partitioning is related to a factorisation of the data generating distribution
and empirically demonstrate the utility of the proposed method.Comment: presented at 2017 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2017), Sydney, NSW, Australi