Search CORE

15,672 research outputs found

Learning what matters - Sampling interesting patterns

Author: M Bhuiyan
M Boley
M Leeuwen van
M Leeuwen van
S Chakraborty
S Shalev-Shwartz
T Calders
V Dzyuba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account-but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals. To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in 1) weighted sampling in SAT and 2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user's interests. We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality-diversity and exploitation-exploration when compared to existing methods.Comment: PAKDD 2017, extended versio

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Interactive Constrained Association Rule Mining

Author: Bussche Jan Van den
Goethals Bart
Publication venue
Publication date: 01/01/2003
Field of study

We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

arXiv.org e-Print Archive

CiteSeerX

Flexible constrained sampling with guarantees for pattern mining

Author: A Giacometti
A Zimmermann
C Bucilă
CP Gomes
F Bonchi
Luc De Raedt
M Berlingerio
M Boley
MA Hasan
Matthijs van Leeuwen
S Ermon
S Nijssen
T Calders
T Guns
T Guns
Vladimir Dzyuba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Constraining the Search Space in Temporal Pattern Mining

Author: Herzog Otthein
Lattner Andreas D.
Publication venue
Publication date: 28/04/2011
Field of study

Agents in dynamic environments have to deal with complex situations including various temporal interrelations of actions and events. Discovering frequent patterns in such scenes can be useful in order to create prediction rules which can be used to predict future activities or situations. We present the algorithm MiTemP which learns frequent patterns based on a time intervalbased relational representation. Additionally the problem has also been transfered to a pure relational association rule mining task which can be handled by WARMR. The two approaches are compared in a number of experiments. The experiments show the advantage of avoiding the creation of impossible or redundant patterns with MiTemP. While less patterns have to be explored on average with MiTemP more frequent patterns are found at an earlier refinement level

University of Hildesheim