1,292 research outputs found
Mining Frequent Neighborhood Patterns in Large Labeled Graphs
Over the years, frequent subgraphs have been an important sort of targeted
patterns in the pattern mining literatures, where most works deal with
databases holding a number of graph transactions, e.g., chemical structures of
compounds. These methods rely heavily on the downward-closure property (DCP) of
the support measure to ensure an efficient pruning of the candidate patterns.
When switching to the emerging scenario of single-graph databases such as
Google Knowledge Graph and Facebook social graph, the traditional support
measure turns out to be trivial (either 0 or 1). However, to the best of our
knowledge, all attempts to redefine a single-graph support resulted in measures
that either lose DCP, or are no longer semantically intuitive.
This paper targets mining patterns in the single-graph setting. We resolve
the "DCP-intuitiveness" dilemma by shifting the mining target from frequent
subgraphs to frequent neighborhoods. A neighborhood is a specific topological
pattern where a vertex is embedded, and the pattern is frequent if it is shared
by a large portion (above a given threshold) of vertices. We show that the new
patterns not only maintain DCP, but also have equally significant semantics as
subgraph patterns. Experiments on real-life datasets display the feasibility of
our algorithms on relatively large graphs, as well as the capability of mining
interesting knowledge that is not discovered in prior works.Comment: 9 page
FS^3: A Sampling based method for top-k Frequent Subgraph Mining
Mining labeled subgraph is a popular research task in data mining because of
its potential application in many different scientific domains. All the
existing methods for this task explicitly or implicitly solve the subgraph
isomorphism task which is computationally expensive, so they suffer from the
lack of scalability problem when the graphs in the input database are large. In
this work, we propose FS^3, which is a sampling based method. It mines a small
collection of subgraphs that are most frequent in the probabilistic sense. FS^3
performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a
fixed-size subgraphs such that the potentially frequent subgraphs are sampled
more often. Besides, FS^3 is equipped with an innovative queue manager. It
stores the sampled subgraph in a finite queue over the course of mining in such
a manner that the top-k positions in the queue contain the most frequent
subgraphs. Our experiments on database of large graphs show that FS^3 is
efficient, and it obtains subgraphs that are the most frequent amongst the
subgraphs of a given size
Mining Frequent Graph Patterns with Differential Privacy
Discovering frequent graph patterns in a graph database offers valuable
information in a variety of applications. However, if the graph dataset
contains sensitive data of individuals such as mobile phone-call graphs and
web-click graphs, releasing discovered frequent patterns may present a threat
to the privacy of individuals. {\em Differential privacy} has recently emerged
as the {\em de facto} standard for private data analysis due to its provable
privacy guarantee. In this paper we propose the first differentially private
algorithm for mining frequent graph patterns.
We first show that previous techniques on differentially private discovery of
frequent {\em itemsets} cannot apply in mining frequent graph patterns due to
the inherent complexity of handling structural information in graphs. We then
address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling
based algorithm. Unlike previous work on frequent itemset mining, our
techniques do not rely on the output of a non-private mining algorithm.
Instead, we observe that both frequent graph pattern mining and the guarantee
of differential privacy can be unified into an MCMC sampling framework. In
addition, we establish the privacy and utility guarantee of our algorithm and
propose an efficient neighboring pattern counting technique as well.
Experimental results show that the proposed algorithm is able to output
frequent patterns with good precision
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
Frequent Subgraph Mining in Outerplanar Graphs
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset
Frequent Subgraph Mining in Outerplanar Graphs
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset
- …