194,271 research outputs found

    Reductions for Frequency-Based Data Mining Problems

    Full text link
    Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

    ANALISIS DAN IMPLEMENTASI ALGORITMA SQL BASED FREQUENT PATTERN MINING DENGAN FREQUENT PATTERN – GROWTH (FP-GROWTH)

    Get PDF
    ABSTRAKSI: Scalable data mining dalam database yang berukuran besar saat ini merupakan tantangan pada penelitian database. Integrasi data mining dengan database system merupakan komponen yang sangat penting untuk aplikasi data mining dengan ukuran yang besar.Komponen dasar dalam data mining task adalah mencari frequent pattern dalam sebuah dataset yang diberikan. Kebanyakan pelajaran yang sebelumnya mengadopsi dari Apriori seperti pendekatan candidate set generation and test. Namun, candidate set generation masih mahal, khususnya ketika terdapat database yang berukuran besar.Pada Tugas akhir ini mengimplementasikan dan menyajikan hasil eksperimen dari sebuah SQL based frequent pattern mining dengan sebuah metode frequent pattern growth (FP-growth) baru, yang effisien dan skalabel untuk mencari frequent patterns tanpa candidate generation.Kata Kunci : Data mining, association rule, SQL based frequent pattern mining,ABSTRACT: Scalable data mining in large databases is one of today\u27s real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large scale data mining application.A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori like candidate set generation and test approach. However, candidate set generation is still costly, especially when the database is large.This final project implement and present experimental result of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining frequent patterns without candidate generation.Keyword: Data mining, association rule, SQL based frequent pattern mining

    Algorithms for Extracting Frequent Episodes in the Process of Temporal Data Mining

    Get PDF
    An important aspect in the data mining process is the discovery of patterns having a great influence on the studied problem. The purpose of this paper is to study the frequent episodes data mining through the use of parallel pattern discovery algorithms. Parallel pattern discovery algorithms offer better performance and scalability, so they are of a great interest for the data mining research community. In the following, there will be highlighted some parallel and distributed frequent pattern mining algorithms on various platforms and it will also be presented a comparative study of their main features. The study takes into account the new possibilities that arise along with the emerging novel Compute Unified Device Architecture from the latest generation of graphics processing units. Based on their high performance, low cost and the increasing number of features offered, GPU processors are viable solutions for an optimal implementation of frequent pattern mining algorithmsFrequent Pattern Mining, Parallel Computing, Dynamic Load Balancing, Temporal Data Mining, CUDA, GPU, Fermi, Thread

    Reframing in Frequent Pattern Mining

    Get PDF

    Finding Temporal Patterns in Noisy Longitudinal Data: A Study in Diabetic Retinopathy

    Get PDF
    This paper describes an approach to temporal pattern mining using the concept of user defined temporal prototypes to define the nature of the trends of interests. The temporal patterns are defined in terms of sequences of support values associated with identified frequent patterns. The prototypes are defined mathematically so that they can be mapped onto the temporal patterns. The focus for the advocated temporal pattern mining process is a large longitudinal patient database collected as part of a diabetic retinopathy screening programme, The data set is, in itself, also of interest as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The diabetic retinopathy application, the data warehousing and cleaning process, and the frequent pattern mining procedure (together with the application of the prototype concept) are all described in the paper. An evaluation of the frequent pattern mining process is also presented
    corecore