18,866 research outputs found
HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach
In this paper we present a novel hybrid (arraybased layout and vertical
bitmap layout) database representation approach for mining complete Maximal
Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms
of scalability, item search order and two horizontal and vertical projection
techniques. We also present a maximal algorithm using this hybrid database
representation approach. Different experimental results on real and sparse
benchmark datasets show that our approach is better than previous state of art
maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan,
200
Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix
One of the most powerful techniques to study protein structures is to look
for recurrent fragments (also called substructures or spatial motifs), then use
them as patterns to characterize the proteins under study. An emergent trend
consists in parsing proteins three-dimensional (3D) structures into graphs of
amino acids. Hence, the search of recurrent spatial motifs is formulated as a
process of frequent subgraph discovery where each subgraph represents a spatial
motif. In this scope, several efficient approaches for frequent subgraph
discovery have been proposed in the literature. However, the set of discovered
frequent subgraphs is too large to be efficiently analyzed and explored in any
further process. In this paper, we propose a novel pattern selection approach
that shrinks the large number of discovered frequent subgraphs by selecting the
representative ones. Existing pattern selection approaches do not exploit the
domain knowledge. Yet, in our approach we incorporate the evolutionary
information of amino acids defined in the substitution matrices in order to
select the representative subgraphs. We show the effectiveness of our approach
on a number of real datasets. The results issued from our experiments show that
our approach is able to considerably decrease the number of motifs while
enhancing their interestingness
Applications of concurrent access patterns in web usage mining
This paper builds on the original data mining and modelling research which has proposed the discovery of novel structural relation patterns, applying the approach in web usage mining. The focus of attention here is on concurrent access patterns (CAP), where an overarching framework illuminates the methodology for web access patterns post-processing. Data pre-processing, pattern discovery and patterns analysis all proceed in association with access patterns mining, CAP mining and CAP modelling. Pruning and selection of access pat-terns takes place as necessary, allowing further CAP mining and modelling to be pursued in the search for the most interesting concurrent access patterns. It is shown that higher level CAPs can be modelled in a way which brings greater structure to bear on the process of knowledge discovery. Experiments with real-world datasets highlight the applicability of the approach in web navigation
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
- …