116,240 research outputs found

    Prime-based method for interactive mining of frequent patterns

    Get PDF
    Over the past decade, an increasing number of efficient mining algorithms have been proposed to mine the frequent patterns by satisfying a user specified threshold called minimum support (minsup). However, determining an appropriate value for minsup to find proper frequent patterns in different applications is extremely difficult. Since rerunning the mining algorithms from scratch can be very time consuming, researchers have introduced interactive mining to find proper patterns by using the current mining model with various minsup. Thus far, a few efficient interactive mining algorithms have been proposed. However, their runtime do not fulfill the need of short runtime in real time applications especially where data is sparse and proper frequent patterns are mined with very low values of minsup. As response to the above-mentioned challenges, this study is devoted towards developing an interactive mining method based on prime number and its special characteristic “uniqueness” by which the content of the relevant data is transformed into a compact layout. At first, a general architecture for interactive mining is proposed consisting of two isolated components: mining model and mining process. Then, the proposed method is developed based on the architecture such that the mining model is constructed once, and it can be frequently mined by various minsup. In the mining model construction, the content of relevant data is captured by a novel tree structure called PC-tree with one database scan and mining materials are consequently formed. The PC-tree is a well-organized tree structure, which is systematically built based on descendant making introduced in this study. Moreover, this study introduces a mining algorithm called PC-miner to mine the mining model frequently with various values of minsup. It grows an effective candidate head set introduced in this study starting from the longest candidate patterns by using the Apriori principle. Meanwhile, during the growing of the candidate head set in each round, the longest candidate patterns are used to find maximal frequent patterns from which the frequent patterns can be derived. Moreover, the PC-miner reduces the number of candidate patterns and comparisons by using several pruning techniques. A comprehensive experimental analysis is conducted by several experiments and scenarios to evaluate the correctness and effectiveness of the proposed method especially for interactive mining. The experimental results verify that the proposed method constructs the mining model independent of minsup once and this enable the model to be frequently mined. The results also show that the proposed method mines frequent patterns correctly and efficiently. Moreover, the results verify that the proposed method speeds up interactive mining of frequent patterns over both sparse and dense datasets with more scalable total runtime for very low values of minsup over sparse datasets as compared to results from the previous work

    Towards Interpretable Graph Modeling with Vertex Replacement Grammars

    Full text link
    An enormous amount of real-world data exists in the form of graphs. Oftentimes, interesting patterns that describe the complex dynamics of these graphs are captured in the form of frequently reoccurring substructures. Recent work at the intersection of formal language theory and graph theory has explored the use of graph grammars for graph modeling and pattern mining. However, existing formulations do not extract meaningful and easily interpretable patterns from the data. The present work addresses this limitation by extracting a special type of vertex replacement grammar, which we call a KT grammar, according to the Minimum Description Length (MDL) heuristic. In experiments on synthetic and real-world datasets, we show that KT-grammars can be efficiently extracted from a graph and that these grammars encode meaningful patterns that represent the dynamics of the real-world system.Comment: 10 pages, 9 figures, accepted at IEEE BigData 201

    Reductions for Frequency-Based Data Mining Problems

    Full text link
    Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17

    Frequent Subgraph Mining in Outerplanar Graphs

    Get PDF
    In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset

    Frequent Subgraph Mining in Outerplanar Graphs

    Get PDF
    In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset

    Image mining: issues, frameworks and techniques

    Get PDF
    [Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in significantly large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. Despite the development of many applications and algorithms in the individual research fields cited above, research in image mining is still in its infancy. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining at the end of this paper
    corecore