Search CORE

602 research outputs found

Formalising the subjective interestingness of a linear projection of a data set : two examples

Author: De Bie Tijl
Publication venue
Publication date: 01/01/2014
Field of study

Explainable subgraphs with surprising densities : a subgroup discovery approach

Author: De Bie Tijl
Deng Junning
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph. We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediatly enables iterative mining of such patterns. This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Subjectively interesting connecting trees

Author: Adriaens Florian
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Ghent University Academic Bibliography

SIDE : a web app for interactive visual data exploration with subjective feedback

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Puolamäki Kai
Publication venue
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Learning what matters - Sampling interesting patterns

Author: M Bhuiyan
M Boley
M Leeuwen van
M Leeuwen van
S Chakraborty
S Shalev-Shwartz
T Calders
V Dzyuba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account-but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals. To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in 1) weighted sampling in SAT and 2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user's interests. We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality-diversity and exploitation-exploration when compared to existing methods.Comment: PAKDD 2017, extended versio

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Informative Data Projections: A Framework and Two Examples

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Santos-Rodriguez Raul
Publication venue
Publication date: 01/01/2015
Field of study

Methods for Projection Pursuit aim to facilitate the visual exploration of high-dimensional data by identifying interesting low-dimensional projections. A major challenge is the design of a suitable quality metric of projections, commonly referred to as the projection index, to be maximized by the Projection Pursuit algorithm. In this paper, we introduce a new information-theoretic strategy for tackling this problem, based on quantifying the amount of information the projection conveys to a user given their prior beliefs about the data. The resulting projection index is a subjective quantity, explicitly dependent on the intended user. As a useful illustration, we developed this idea for two particular kinds of prior beliefs. The first kind leads to PCA (Principal Component Analysis), shining new light on when PCA is (not) appropriate. The second kind leads to a novel projection index, the maximization of which can be regarded as a robust variant of PCA. We show how this projection index, though non-convex, can be effectively maximized using a modified power method as well as using a semidefinite programming relaxation. The usefulness of this new projection index is demonstrated in comparative empirical experiments against PCA and a popular Projection Pursuit method

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Explore Bristol Research

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Author: De Bie Tijl
Duivesteijn Wouter
Kang Bo
Lijffijt Jefrey
Oikarinen Emilia
Puolamäki Kai
Publication venue
Publication date: 01/01/2017
Field of study

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Ghent University Academic Bibliography

Aaltodoc Publication Archive

A tool for subjective and interactive visual data exploration

Author: C Ware
D Paurat
DJ Hand
J Lijffijt
T Bie De
T Ruotsalo
V Dzyuba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We present SIDE, a tool for Subjective and Interactive Visual Data Exploration, which lets users explore high dimensional data via subjectively informative 2D data visualizations. Many existing visual analytics tools are either restricted to specific problems and domains or they aim to find visualizations that align with user’s belief about the data. In contrast, our generic tool computes data visualizations that are surprising given a user’s current understanding of the data. The user’s belief state is represented as a set of projection tiles. Hence, this user-awareness offers users an efficient way to interactively explore yet-unknown features of complex high dimensional datasets

Crossref

Ghent University Academic Bibliography