18,866 research outputs found

    HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

    Full text link
    In this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan, 200

    Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix

    Full text link
    One of the most powerful techniques to study protein structures is to look for recurrent fragments (also called substructures or spatial motifs), then use them as patterns to characterize the proteins under study. An emergent trend consists in parsing proteins three-dimensional (3D) structures into graphs of amino acids. Hence, the search of recurrent spatial motifs is formulated as a process of frequent subgraph discovery where each subgraph represents a spatial motif. In this scope, several efficient approaches for frequent subgraph discovery have been proposed in the literature. However, the set of discovered frequent subgraphs is too large to be efficiently analyzed and explored in any further process. In this paper, we propose a novel pattern selection approach that shrinks the large number of discovered frequent subgraphs by selecting the representative ones. Existing pattern selection approaches do not exploit the domain knowledge. Yet, in our approach we incorporate the evolutionary information of amino acids defined in the substitution matrices in order to select the representative subgraphs. We show the effectiveness of our approach on a number of real datasets. The results issued from our experiments show that our approach is able to considerably decrease the number of motifs while enhancing their interestingness

    Applications of concurrent access patterns in web usage mining

    Get PDF
    This paper builds on the original data mining and modelling research which has proposed the discovery of novel structural relation patterns, applying the approach in web usage mining. The focus of attention here is on concurrent access patterns (CAP), where an overarching framework illuminates the methodology for web access patterns post-processing. Data pre-processing, pattern discovery and patterns analysis all proceed in association with access patterns mining, CAP mining and CAP modelling. Pruning and selection of access pat-terns takes place as necessary, allowing further CAP mining and modelling to be pursued in the search for the most interesting concurrent access patterns. It is shown that higher level CAPs can be modelled in a way which brings greater structure to bear on the process of knowledge discovery. Experiments with real-world datasets highlight the applicability of the approach in web navigation

    On mining complex sequential data by means of FCA and pattern structures

    Get PDF
    Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work. Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems. The paper is created in the wake of the conference on Concept Lattice and their Applications (CLA'2013). 27 pages, 9 figures, 3 table
    • …
    corecore