3,565 research outputs found
FS^3: A Sampling based method for top-k Frequent Subgraph Mining
Mining labeled subgraph is a popular research task in data mining because of
its potential application in many different scientific domains. All the
existing methods for this task explicitly or implicitly solve the subgraph
isomorphism task which is computationally expensive, so they suffer from the
lack of scalability problem when the graphs in the input database are large. In
this work, we propose FS^3, which is a sampling based method. It mines a small
collection of subgraphs that are most frequent in the probabilistic sense. FS^3
performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a
fixed-size subgraphs such that the potentially frequent subgraphs are sampled
more often. Besides, FS^3 is equipped with an innovative queue manager. It
stores the sampled subgraph in a finite queue over the course of mining in such
a manner that the top-k positions in the queue contain the most frequent
subgraphs. Our experiments on database of large graphs show that FS^3 is
efficient, and it obtains subgraphs that are the most frequent amongst the
subgraphs of a given size
Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix
One of the most powerful techniques to study protein structures is to look
for recurrent fragments (also called substructures or spatial motifs), then use
them as patterns to characterize the proteins under study. An emergent trend
consists in parsing proteins three-dimensional (3D) structures into graphs of
amino acids. Hence, the search of recurrent spatial motifs is formulated as a
process of frequent subgraph discovery where each subgraph represents a spatial
motif. In this scope, several efficient approaches for frequent subgraph
discovery have been proposed in the literature. However, the set of discovered
frequent subgraphs is too large to be efficiently analyzed and explored in any
further process. In this paper, we propose a novel pattern selection approach
that shrinks the large number of discovered frequent subgraphs by selecting the
representative ones. Existing pattern selection approaches do not exploit the
domain knowledge. Yet, in our approach we incorporate the evolutionary
information of amino acids defined in the substitution matrices in order to
select the representative subgraphs. We show the effectiveness of our approach
on a number of real datasets. The results issued from our experiments show that
our approach is able to considerably decrease the number of motifs while
enhancing their interestingness
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
- …