Search CORE

654 research outputs found

Interactive and Iterative Discovery of Entity Network Subgraphs

Author: North C.
Ramakrishnan N.
Sun M.
Tatti N.
Vreeken J.
Wu H.
Publication venue
Publication date: 01/01/2016
Field of study

Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions

MPG.PuRe

Learning subjectively interesting data representations

Author: Kang Bo
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Mining and modeling graphs using patterns and priors

Author: Adriaens Florian
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Subjectively interesting alternative clusterings

Author: De Bie Tijl
Kontonasios Kleanthis-Nikolaos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

Explore Bristol Research

Mining subjectively interesting patterns in rich data

Author: Deng Junning
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2021
Field of study

Ghent University Academic Bibliography

From Sets of Good Redescriptions to Good Sets of Redescriptions

Author: Galbrun Esther
Kalofolias Janis
Miettinen Pauli
Publication venue: HAL CCSD
Publication date: 12/12/2016
Field of study

International audienceRedescription mining aims at finding pairs of queries over data variables that describe roughly the same set of observations. These redescriptions can be used to obtain different views on the same set of entities. So far, redescription mining methods have aimed at listing all redescriptions supported by the data. Such an approach can result in many redundant redescriptions and hinder the user's ability to understand the overall characteristics of the data. In this work, we present an approach to find a good set of redescriptions, instead of finding a set of good redescriptions. That is, we present a way to remove the redundant redescriptions from a given set of redescriptions. We measure the redundancy using a framework inspired by the subjective interestingness based on maximum-entropy distributions as proposed by De Bie in 2011. Redescriptions, however, raise their unique requirements on the framework, and our solution differs significantly from the existing ones. Notably, our approach can handle disjunctions and conjunctions in the queries, whereas the existing approaches are limited only to conjunctive queries. The framework also reduces the redundancy in the redescription mining results, as we show in our empirical evaluation

INRIA a CCSD electronic archive server

Robust subgroup discovery

Author: Bäck Thomas
Grünwald Peter
Proença Hugo Manuel
van Leeuwen Matthijs
Publication venue
Publication date: 28/11/2021
Field of study

We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, and that includes traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, as finding optimal subgroup lists is NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. We empirically show on 54 datasets that SSD++ outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journa

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

The Minimum Description Length Principle for Pattern Mining: A Survey

Author: Galbrun Esther
Publication venue
Publication date: 28/07/2021
Field of study

This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

arXiv.org e-Print Archive