Search CORE

1,285 research outputs found

A New Data Layout For Set Intersection on GPUs

Author: Amossen Rasmus Resen
Pagh Rasmus
Publication venue
Publication date: 01/01/2011
Field of study

Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, "BatMap", that is particularly well suited for parallel processing, and is compact even for sparse data. Frequent itemset mining is one of the most important applications of set intersection. As a case-study on the potential of BatMaps we focus on frequent pair mining, which is a core special case of frequent itemset mining. The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 1%. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best single-threaded implementations.Comment: A version of this paper appears in Proceedings of IPDPS 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

The IT University of Copenhagen's Repository

Evolving temporal association rules with genetic algorithms

Author: A. Ghandar
C.-Y. Chang
F. Herrera
J. Alcala-Fdez
J.H. Holland
K.A. Jong De
P.-N Tan
R. Agrawal
R. Agrawal
S. Laxman
X. Yan
Y. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of the proposed framework isolates target temporal itemsets in synthetic datasets. The Iterative Rule Learning method successfully discovers these targets in datasets with varying levels of difficulty

CiteSeerX

Crossref

Sheffield Hallam University Research Archive

Open Repository and Bibliography - Liège

Explore Bristol Research

A log mining approach for process monitoring in SCADA

Author: Bolzoni Damiano
Hadziosmanovic Dina
Hartel Pieter
Publication venue: Springer
Publication date: 01/01/2012
Field of study

SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow

Springer - Publisher Connector

University of Twente Research Information

Using Answer Set Programming for pattern mining

Author: Guyet Thomas
Moinard Yves
Quiniou René
Publication venue
Publication date: 11/06/2014
Field of study

Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Reductions for Frequency-Based Data Mining Problems

Author: Miettinen Pauli
Neumann Stefan
Publication venue
Publication date: 01/01/2017
Field of study

Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

arXiv.org e-Print Archive

Crossref

MPG.PuRe