436 research outputs found
Direct mining of subjectively interesting relational patterns
Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible
Discover, recycle and reuse frequent patterns in association rule mining
Ph.DDOCTOR OF PHILOSOPH
RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework
Initially, a number of frequent itemset mining (FIM) algorithms have been
designed on the Hadoop MapReduce, a distributed big data processing framework.
But, due to heavy disk I/O, MapReduce is found to be inefficient for such
highly iterative algorithms. Therefore, Spark, a more efficient distributed
data processing framework, has been developed with in-memory computation and
resilient distributed dataset (RDD) features to support the iterative
algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM
algorithms have been designed, but Eclat-based algorithm has not been explored
yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD
framework is proposed with its five variants. The proposed algorithms are
evaluated on the various benchmark datasets, which shows that RDD-Eclat
outperforms the Spark-based Apriori by many times. Also, the experimental
results show the scalability of the proposed algorithms on increasing the
number of cores and size of the dataset.Comment: 16 pages, 6 figures, ICCNCT 201
- …