132,410 research outputs found

    Frequent Pattern mining with closeness Considerations: Current State of the art

    Get PDF
    Due to rising importance in frequent pattern mining in the field of data mining research, tremendous progress has been observed in fields ranging from frequent itemset mining in transaction databases to numerous research frontiers. An elaborative note on current condition in frequent pattern mining and potential research directions is discussed in this article. It2019;s a strong belief that with considerably increasing research in frequent pattern mining in data analysis, it will provide a strong foundation for data mining methodologies and its applications which might prove a milestone in data mining applications in mere future

    An Efficient Algorithm for Frequent Pattern Mining for Real-Time Business Intelligence Analytics in Dense Datasets

    Get PDF
    Finding frequent patterns from databases has been the most time consuming process in data mining tasks, like association rule mining. Frequent pattern mining in real-time is of increasing thrust in many business applications such as e-commerce, recommender systems, and supply-chain management and group decision support systems, to name a few. A plethora of efficient algorithms have been proposed till date, among which, vertical mining algorithms have been found to be very effective, usually outperforming the horizontal ones. However, with dense datasets, the performances of these algorithms significantly degrade. Moreover, these algorithms are not suited to respond to the real-time need. In this paper, we describe BDFS(b)-diff-sets, an algorithm to perform real-time frequent pattern mining using diff-sets and limited computing resources. Empirical evaluations show that our algorithm can make a fair estimation of the probable frequent patterns and reaches some of the longest frequent patterns much faster than the existing algorithms.

    Mining Frequent Graph Patterns with Differential Privacy

    Full text link
    Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing discovered frequent patterns may present a threat to the privacy of individuals. {\em Differential privacy} has recently emerged as the {\em de facto} standard for private data analysis due to its provable privacy guarantee. In this paper we propose the first differentially private algorithm for mining frequent graph patterns. We first show that previous techniques on differentially private discovery of frequent {\em itemsets} cannot apply in mining frequent graph patterns due to the inherent complexity of handling structural information in graphs. We then address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling based algorithm. Unlike previous work on frequent itemset mining, our techniques do not rely on the output of a non-private mining algorithm. Instead, we observe that both frequent graph pattern mining and the guarantee of differential privacy can be unified into an MCMC sampling framework. In addition, we establish the privacy and utility guarantee of our algorithm and propose an efficient neighboring pattern counting technique as well. Experimental results show that the proposed algorithm is able to output frequent patterns with good precision

    PADS: A simple yet effective pattern-aware dynamic search method for fast maximal frequent pattern mining

    Full text link
    While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent patterns efficiently is important in both theory and applications of frequent pattern mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depth-first manner. In this paper, we develop a new technique for more efficient max-pattern mining. Our method is pattern-aware: it uses the patterns already found to schedule its future search so that many search subspaces can be pruned. We present efficient techniques to implement the new approach. As indicated by a systematic empirical study using the benchmark data sets, our new approach outperforms the currently fastest max-pattern mining algorithms FPMax* and LCM2 clearly. The source code and the executable code (on both Windows and Linux platforms) are publicly available at http://www.cs.sfu.ca/~jpei/Software/PADS.zip. © Springer-Verlag London Limited 2008

    Discovering Exclusive Patterns in Frequent Sequences

    Get PDF
    This paper presents a new concept for pattern discovery in frequent sequences with potentially interesting applications. Based on data mining, the approach aims to discover exclusive sequential patterns (ESP) by checking the relative exclusion of patterns across data sequences. ESP mining pursues the post-processing of sequential patterns and augments existing work on structural relations patterns mining. A three phase ESP mining method is proposed together with component algorithms, where a running worked example explains the process. Experiments are performed on real-world and synthetic datasets which showcase the results of ESP mining and demonstrate its effectiveness, illuminating the theories developed. An outline case study in workflow modelling gives some insight into future applicability

    An algorithm for fast mining top-rank-k frequent patterns based on node-list data structure

    Get PDF
    Frequent pattern mining usually requires much run time and memory usage. In some applications, only the patterns with top frequency rank are needed. Because of the limited pattern numbers, quality of the results is even more important than time and memory consumption. A Frequent Pattern algorithm for mining Top-rank-K patterns, FP_TopK, is proposed. It is based on a Node-list data structure extracted from FTPP-tree. Each node is with one or more triple sets, which contain supports, preorder and post-order transversal orders for candidate pattern generation and top-rank-k frequent pattern mining. FP_TopK uses the minimal support threshold for pruning strategy to guarantee that each pattern in the top-rank-k table is really frequent and this further improves the efficiency. Experiments are conducted to compare FP_TopK with iNTK and BTK on four datasets. The results show that FP_TopK achieves better performance

    CS 7720: Data Mining

    Get PDF
    This course studies the fundamental concepts, issues, and techniques of data mining. Topics include basics of data, data preprocessing, feature selection/extraction, frequent pattern and association/correlation mining, classification, clustering, outlier analysis, OLAP/OLAM, contrast mining, applications, etc
    • 

    corecore