91,310 research outputs found
PRIIME: A generic framework for interactive personalized interesting pattern discovery
The traditional frequent pattern mining algorithms generate an exponentially large number of patterns of which a substantial proportion are not much significant for many data analysis endeavors. Discovery of a small number of personalized interesting patterns from the large output set according to a particular user's interest is an important as well as challenging task. Existing works on pattern summarization do not solve this problem from the personalization viewpoint. In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness measure of patterns from the user. The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns. We develop a softmax classification based iterative learning algorithm that uses a limited number of interactive feedback from the user to learn her interestingness profile, and use this profile for pattern recommendation. To handle sequence and graph type patterns PRIIME adopts a neural net (NN) based unsupervised feature construction approach. We also develop a strategy that combines exploration and exploitation to select patterns for feedback. We show experimental results on several real-life datasets to validate the performance of the proposed method. We also compare with the existing methods of interactive pattern discovery to show that our method is substantially superior in performance. To portray the applicability of the framework, we present a case study from the real-estate domain
Mining Frequent Graph Patterns with Differential Privacy
Discovering frequent graph patterns in a graph database offers valuable
information in a variety of applications. However, if the graph dataset
contains sensitive data of individuals such as mobile phone-call graphs and
web-click graphs, releasing discovered frequent patterns may present a threat
to the privacy of individuals. {\em Differential privacy} has recently emerged
as the {\em de facto} standard for private data analysis due to its provable
privacy guarantee. In this paper we propose the first differentially private
algorithm for mining frequent graph patterns.
We first show that previous techniques on differentially private discovery of
frequent {\em itemsets} cannot apply in mining frequent graph patterns due to
the inherent complexity of handling structural information in graphs. We then
address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling
based algorithm. Unlike previous work on frequent itemset mining, our
techniques do not rely on the output of a non-private mining algorithm.
Instead, we observe that both frequent graph pattern mining and the guarantee
of differential privacy can be unified into an MCMC sampling framework. In
addition, we establish the privacy and utility guarantee of our algorithm and
propose an efficient neighboring pattern counting technique as well.
Experimental results show that the proposed algorithm is able to output
frequent patterns with good precision
Efficient Subgraph Matching on Billion Node Graphs
The ability to handle large scale graph data is crucial to an increasing
number of applications. Much work has been dedicated to supporting basic graph
operations such as subgraph matching, reachability, regular expression
matching, etc. In many cases, graph indices are employed to speed up query
processing. Typically, most indices require either super-linear indexing time
or super-linear indexing space. Unfortunately, for very large graphs,
super-linear approaches are almost always infeasible. In this paper, we study
the problem of subgraph matching on billion-node graphs. We present a novel
algorithm that supports efficient subgraph matching for graphs deployed on a
distributed memory store. Instead of relying on super-linear indices, we use
efficient graph exploration and massive parallel computing for query
processing. Our experimental results demonstrate the feasibility of performing
subgraph matching on web-scale graph data.Comment: VLDB201
Context-aware visual exploration of molecular databases
Facilitating the visual exploration of scientific data has
received increasing attention in the past decade or so. Especially
in life science related application areas the amount
of available data has grown at a breath taking pace. In this
paper we describe an approach that allows for visual inspection
of large collections of molecular compounds. In
contrast to classical visualizations of such spaces we incorporate
a specific focus of analysis, for example the outcome
of a biological experiment such as high throughout
screening results. The presented method uses this experimental
data to select molecular fragments of the underlying
molecules that have interesting properties and uses the
resulting space to generate a two dimensional map based
on a singular value decomposition algorithm and a self organizing
map. Experiments on real datasets show that
the resulting visual landscape groups molecules of similar
chemical properties in densely connected regions
- …