Search CORE

6 research outputs found

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Author: De Bie Tijl
Duivesteijn Wouter
Kang Bo
Lijffijt Jefrey
Oikarinen Emilia
Puolamäki Kai
Publication venue
Publication date: 01/01/2017
Field of study

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Ghent University Academic Bibliography

Aaltodoc Publication Archive

Subjectively interesting alternative clusterings

Author: De Bie Tijl
Kontonasios Kleanthis-Nikolaos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Ghent University Academic Bibliography

Explore Bristol Research

Maximum Entropy Modelling for Assessing Results on Real-Valued Data

Author: Jilles Vreeken
Kontonasios K-N
Tijl De Bie
Publication venue
Publication date: 01/12/2011
Field of study

Explore Bristol Research

Maximum Entropy Modelling for Assessing Results on Real-Valued Data

Author: De Tijl
Kontonasios Kleanthis-Nikolaos
Vreeken Jilles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

Ghent University Academic Bibliography

Learning subjectively interesting data representations

Author: Kang Bo
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Bie. Maximum entropy modelling for assessing results on real-valued data

Author: Jilles Vreeken
Kleanthis-Nikolaos Kontonasios
Tijl De Bie
Publication venue
Publication date: 01/01/2011
Field of study

Abstract-Statistical assessment of the results of data mining is increasingly recognised as a core task in the knowledge discovery process. It is of key importance in practice, as results that might seem interesting at first glance can often be explained by well-known basic properties of the data. In pattern mining, for instance, such trivial results can be so overwhelming in number that filtering them out is a necessity in order to identify the truly interesting patterns. In this paper, we propose an approach for assessing results on real-valued rectangular databases. More specifically, using our analytical model we are able to statistically assess whether or not a discovered structure may be the trivial result of the row and column marginal distributions in the database. Our main approach is to use the Maximum Entropy principle to fit a background model to the data while respecting its marginal distributions. To find these distributions, we employ an MDL based histogram estimator, and we fit these in our model using efficient convex optimization techniques. Subsequently, our model can be used to calculate probabilities directly, as well as to efficiently sample data with the purpose of assessing results by means of empirical hypothesis testing. Notably, our approach is efficient, parameter-free, and naturally deals with missing values. As such, it represents a well-founded alternative to swap randomisation

CiteSeerX