1,285 research outputs found
A New Data Layout For Set Intersection on GPUs
Set intersection is the core in a variety of problems, e.g. frequent itemset
mining and sparse boolean matrix multiplication. It is well-known that large
speed gains can, for some computational problems, be obtained by using a
graphics processing unit (GPU) as a massively parallel computing device.
However, GPUs require highly regular control flow and memory access patterns,
and for this reason previous GPU methods for intersecting sets have used a
simple bitmap representation. This representation requires excessive space on
sparse data sets. In this paper we present a novel data layout, "BatMap", that
is particularly well suited for parallel processing, and is compact even for
sparse data.
Frequent itemset mining is one of the most important applications of set
intersection. As a case-study on the potential of BatMaps we focus on frequent
pair mining, which is a core special case of frequent itemset mining. The main
finding is that our method is able to achieve speedups over both Apriori and
FP-growth when the number of distinct items is large, and the density of the
problem instance is above 1%. Previous implementations of frequent itemset
mining on GPU have not been able to show speedups over the best single-threaded
implementations.Comment: A version of this paper appears in Proceedings of IPDPS 201
Evolving temporal association rules with genetic algorithms
A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of the proposed framework isolates target temporal itemsets in synthetic datasets. The Iterative Rule Learning method successfully discovers these targets in datasets with varying levels of difficulty
A log mining approach for process monitoring in SCADA
SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow
Using Answer Set Programming for pattern mining
Serial pattern mining consists in extracting the frequent sequential patterns
from a unique sequence of itemsets. This paper explores the ability of a
declarative language, such as Answer Set Programming (ASP), to solve this issue
efficiently. We propose several ASP implementations of the frequent sequential
pattern mining task: a non-incremental and an incremental resolution. The
results show that the incremental resolution is more efficient than the
non-incremental one, but both ASP programs are less efficient than dedicated
algorithms. Nonetheless, this approach can be seen as a first step toward a
generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
- …