Search CORE

50 research outputs found

Exceptional Model Mining

Author: Duivesteijn W.
Publication venue
Publication date: 01/01/2013
Field of study

Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (subgroup discovery). These, however, do not encompass all forms of "interesting". To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these attributes is chosen to be the target concept. Then, subsets are sought on which this model is substantially different from the model on the whole dataset. For instance, we can find parts of the data where two target attributes have an unusual correlation, a classifier has a deviating predictive performance, or a Bayesian network fitted on several target attributes has an exceptional structure. We will discuss some real-world applications of EMM instances, including using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand.This research is supported by the Netherlands Organisation for Scientific Research (NWO) under project number 612.065.822 (Exceptional Model Mining).Algorithms and the Foundations of Software technolog

Ghent University Academic Bibliography

Leiden University Scholary Publications

Discovering a taste for the unusual: exceptional models for preference mining

Author: Alípio Mário Jorge
Arno Knobbe
Carlos Soares
Cláudio Rebelo de Sá
CR Sá de
CR Sá de
E Hüllermeier
F Chiclana
F M Harper
J Chomicki
L Umek
M Leeuwen van
N Jin
N Lavrac
P Brazdil
Paulo Azevedo
PJ Azevedo
V Svendová
W Duivesteijn
WD Cook
WD Cook
Wouter Duivesteijn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes exceptional' varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge.This research has received funding from the ECSEL Joint Undertaking, the framework programme for research and innovation Horizon 2020 (2014-2020) under Grant Agreement Number 662189-MANTIS-2014-1

Universidade do Minho: RepositoriUM

Repository TU/e

Crossref

Pure OAI Repository

Leiden University Scholary Publications

DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups

Author: A Belfodil
AF Hayes
B Efron
B Ganter
B Ganter
D Eppstein
F Duris
F Lemmerich
GI Webb
H Grosskreutz
M Das
M van Leeuwen
S Geisser
S Minato
SO Kuznetsov
T Cover
W Duivesteijn
W Hämäläinen
W Hämäläinen
Publication venue: HAL CCSD
Publication date: 20/06/2019
Field of study

We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals , in any type of data featuring individuals (e.g., parliamentarians , customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff's Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm , named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT. Technical Report Associated with the ECML/PKDD 2019 Paper entitled: "DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups"

Learning Interpretable Rules for Multi-label Classification

Author: A Gabriel
AA Freitas
AJ Knobbe
B Liu
B Minnaert
D Malerba
E Gibaja
E Gibaja
E Loza Mencía
E Montañés
F Charte
F Herrera
F Janssen
F Thabtah
G Bosc
G Tsoumakas
Grigorios Tsoumakas
H Allahyari
J Arunadevi
J Demšar
J Fürnkranz
J Han
J Hipp
J Read
JN Sulzmann
K Dembczyński
K Dembczyński
L Chekina
L Raedt De
LE Sucar
M Atzmüller
M Beckerle
M Friedman
M Zhang
Miltiadis Allamanis
MR Boutell
P Kralj Novak
PJ Hayes
R Senge
RM Cameron-Jones
Shantanu Godbole
W Duivesteijn
W Waegeman
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio

arXiv.org e-Print Archive

TUbiblio

Crossref

KnowBots:Discovering Relevant Patterns in Chatbot Dialogues

Author: A Kerly
BA Shawar
C Chakrabarti
C Mooney
F Herrera
H Shah
J Jia
J Pereira
JY Chai
KB Shah
M Souza
P Fournier-Viger
P Fournier-Viger
P Fournier-Viger
W Duivesteijn
Publication venue: Springer
Publication date: 16/10/2019
Field of study

Crossref

University of Twente Research Information

Monotonicity Detection and Enforcement in Longitudinal Classification

Author: A Ben-David
A Ben-David
A Kaiser
CC Chen
D Martens
GB Huang
H Zhu
J Brookhouse
JR Cano
W Duivesteijn
W Verbeke
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/11/2019
Field of study

Longitudinal datasets contain repeated measurements of the same variables at different points in time, which can be used by researchers to discover useful knowledge based on the changes of the data over time. Monotonic relations often occur in real-world data and need to be preserved in data mining models in order for the models to be acceptable by users. We propose a new methodology for detecting monotonic relations in longitudinal datasets and applying them in longitudinal classification model construction. Two different approaches were used to detect monotonic relations and include them into the classification task. The proposed approaches are evaluated using data from the English Lon- gitudinal Study of Ageing (ELSA) with 10 different age-related diseases used as class variables to be predicted. A gradient boosting algorithm (XGBoost) is used for constructing classification models in two scenarios: enforcing and not enforcing the constraints. The results show that enforcement of monotonicity constraints can consistently improve the predictive accuracy of the constructed models. The produced models are fully monotonic according to the monotonicity constraints, which can have a positive impact on model acceptance in real world applications

Crossref

Kent Academic Repository

Big-Data-Driven Materials Science and its FAIR Data Infrastructure

This chapter addresses the forth paradigm of materials research -- big-data driven materials science. Its concepts and state-of-the-art are described, and its challenges and chances are discussed. For furthering the field, Open Data and an all-embracing sharing, an efficient data infrastructure, and the rich ecosystem of computer codes used in the community are of critical importance. For shaping this forth paradigm and contributing to the development or discovery of improved and novel materials, data must be what is now called FAIR -- Findable, Accessible, Interoperable and Re-purposable/Re-usable. This sets the stage for advances of methods from artificial intelligence that operate on large data sets to find trends and patterns that cannot be obtained from individual calculations and not even directly from high-throughput studies. Recent progress is reviewed and demonstrated, and the chapter is concluded by a forward-looking perspective, addressing important not yet solved challenges.Comment: submitted to the Handbook of Materials Modeling (eds. S. Yip and W. Andreoni), Springer 2018/201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Exceptional Preferences Mining

Author: AM Jorge
CR Sá de
E Hüllermeier
H Mannila
J Fürnkranz
J Fürnkranz
L Umek
N Jin
R Agrawal
S Henzgen
S Vembu
T Abudawood
T Van
V Dzyuba
W Duivesteijn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Exceptional Preferences Mining (EPM) is a crossover between two subfields of datamining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where the preference relations between subsets of the labels significantly deviate from the norm; a variant of Subgroup Discovery, with rankings as the (complex) target concept. We employ three quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes 'exceptional' varies with the quality measure: the first gauges exceptional overall ranking behavior, the second indicates whether a particular label stands out from the rest, and the third highlights subgroups featuring unusual pairwise label ranking behavior. As proof of concept, we explore five datasets. The results confirm that the new task EPM can deliver interesting knowledge. The results also illustrate how the visualization of the preferences in a Preference Matrix can aid in interpreting exceptional preference subgroups

Crossref

Ghent University Academic Bibliography

University of Twente Research Information

Elements About Exploratory, Knowledge-Based, Hybrid, and Explainable Knowledge Discovery

Author: A Belfodil
A Buzmakov
A Buzmakov
A Hristoskova
AA Bendimerad
B Ganter
B Ganter
B Ganter
C Carpineto
D Grissa
G Sourek
H Blockeel
Jilles Vreeken
JW Tukey
K Bertet
K Janowicz
M Alam
M Hilario
M Kaytoue
M Kaytoue
M Kaytoue
M Kaytoue
M Leeuwen
M Rouane-Hacene
MJ Zaki
N Lavrac
Omer Sagi
P Brazdil
P Eklund
P Nguyen
P-N Tan
Riccardo Guidotti
SN Tran
SO Kuznetsov
T De Bie
T Fawcett
TG Dietterich
V Codocedo
V Codocedo
V Duquenne
W Duivesteijn
W Ugarte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/06/2019
Field of study

International audienceKnowledge Discovery in Databases (KDD) and especially pattern mining can be interpreted along several dimensions, namely data, knowledge, problem-solving and interactivity. These dimensions are not disconnected and have a direct impact on the quality, applicability, and efficiency of KDD. Accordingly, we discuss some objectives of KDD based on these dimensions, namely exploration, knowledge orientation, hybridization, and explanation. The data space and the pattern space can be explored in several ways, depending on specific evaluation functions and heuristics, possibly related to domain knowledge. Furthermore, numerical data are complex and supervised numerical machine learning methods are usually the best candidates for efficiently mining such data. However, the work and output of numerical methods are most of the time hard to understand, while symbolic methods are usually more intelligible. This calls for hybridization, combining numerical and symbolic mining methods to improve the applicability and interpretability of KDD. Moreover, suitable explanations about the operating models and possible subsequent decisions should complete KDD, and this is far from being the case at the moment. For illustrating these dimensions and objectives, we analyze a concrete case about the mining of biological data, where we characterize these dimensions and their connections. We also discuss dimensions and objectives in the framework of Formal Concept Analysis and we draw some perspectives for future research

Crossref

INRIA a CCSD electronic archive server

Identifying Proteins in Zebrafish Embryos Using Spectral Libraries Generated from Dissected Adult Organs and Tissues

Author: Abramsson A.
Alex A. Henneman
André M. Deelder
Annemarie Meijer
Anouk Botermans
Antberg L.
Arukwe A.
Brittijn S. A.
Craig R.
Cui C.
Deutsch E. W.
Deutsch E. W.
Gerhard G. S.
Grandel H.
Gupta T.
Hans Dalebout
Herman P. Spaink
Jessen J. R.
Jordy L. Hoogendijk
Keller A.
Lam H.
Lam H.
Lange V.
Lin Y.
Link V.
Lucitt M. B.
Lößner C.
Ma K.
Maddison D. R.
Magnus Palmblad
Mione M.
Mostovenko E.
Nesvizhskii A. I.
Palmblad M.
Pedrioli P. G.
Phelps H. A.
Qi H. H.
Shteynberg D.
Singh S. K.
Singh S. K.
Sokal R.
Suzanne J. van der Plas-Duivesteijn
Tamura K.
Traver D.
Trede N. S.
Vizcaíno J. A.
Westerfield M.
Wolstencroft K.
Yassene Mohammed
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2014
Field of study

Proteomic

Crossref

Leiden University Scholary Publications