Search CORE

8 research outputs found

Maximal exceptions with minimal descriptions

Author: AJ Mitchell-Jones
H Heikinheimo
HR Warner
IH Witten
J Rissanen
Matthijs van Leeuwen
S Kullback
W Klösgen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Regression models on bivariate count data in an exceptional model mining context

Author: Raaijmakers Boy A.J.
Publication venue
Publication date: 31/08/2017
Field of study

Pure OAI Repository

From LSTM to ANNs :Exceptional Model Mining of Neural Networks

Author: van der Ster Jelle
Publication venue
Publication date: 03/11/2020
Field of study

Pure OAI Repository

Robust subgroup discovery

Author: Bäck Thomas
Grünwald Peter
Proença Hugo Manuel
van Leeuwen Matthijs
Publication venue
Publication date: 28/11/2021
Field of study

We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, and that includes traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, as finding optimal subgroup lists is NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. We empirically show on 54 datasets that SSD++ outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journa

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

Diverse subgroup set discovery

Author: A Knobbe
A Mitchell-Jones
Arno Knobbe
G Garriga
G Webb
H Grosskreutz
H Heikinheimo
H Peng
J Friedman
J Han
J Vreeken
M Leeuwen van
Matthijs van Leeuwen
N Lavrač
P Clark
P Grünwald
P Kralj Novak
S Bay
S Kullback
T Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Analyzing Granger causality in climate data with time series classification methods

Author: Decubber Stijn
Demuzere Matthias
Miralles Diego
Papagiannopoulou Christina
Verhoest Niko
Waegeman Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

Ghent University Academic Bibliography

Exceptional Model Mining

Author: Duivesteijn W.
Publication venue
Publication date: 01/01/2013
Field of study

Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (subgroup discovery). These, however, do not encompass all forms of "interesting". To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these attributes is chosen to be the target concept. Then, subsets are sought on which this model is substantially different from the model on the whole dataset. For instance, we can find parts of the data where two target attributes have an unusual correlation, a classifier has a deviating predictive performance, or a Bayesian network fitted on several target attributes has an exceptional structure. We will discuss some real-world applications of EMM instances, including using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand.This research is supported by the Netherlands Organisation for Scientific Research (NWO) under project number 612.065.822 (Exceptional Model Mining).Algorithms and the Foundations of Software technolog

Ghent University Academic Bibliography

Leiden University Scholary Publications