228,675 research outputs found

    Mining Frequent Graph Patterns with Differential Privacy

    Full text link
    Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing discovered frequent patterns may present a threat to the privacy of individuals. {\em Differential privacy} has recently emerged as the {\em de facto} standard for private data analysis due to its provable privacy guarantee. In this paper we propose the first differentially private algorithm for mining frequent graph patterns. We first show that previous techniques on differentially private discovery of frequent {\em itemsets} cannot apply in mining frequent graph patterns due to the inherent complexity of handling structural information in graphs. We then address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling based algorithm. Unlike previous work on frequent itemset mining, our techniques do not rely on the output of a non-private mining algorithm. Instead, we observe that both frequent graph pattern mining and the guarantee of differential privacy can be unified into an MCMC sampling framework. In addition, we establish the privacy and utility guarantee of our algorithm and propose an efficient neighboring pattern counting technique as well. Experimental results show that the proposed algorithm is able to output frequent patterns with good precision

    GTRACE-RS: Efficient Graph Sequence Mining using Reverse Search

    Full text link
    The mining of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences. A method, called GTRACE, has been proposed to mine frequent patterns from graph sequences under the assumption that changes in graphs are gradual. Although GTRACE mines the frequent patterns efficiently, it still needs substantial computation time to mine the patterns from graph sequences containing large graphs and long sequences. In this paper, we propose a new version of GTRACE that enables efficient mining of frequent patterns based on the principle of a reverse search. The underlying concept of the reverse search is a general scheme for designing efficient algorithms for hard enumeration problems. Our performance study shows that the proposed method is efficient and scalable for mining both long and large graph sequence patterns and is several orders of magnitude faster than the original GTRACE

    Mining Frequent Neighborhood Patterns in Large Labeled Graphs

    Full text link
    Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google Knowledge Graph and Facebook social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets mining patterns in the single-graph setting. We resolve the "DCP-intuitiveness" dilemma by shifting the mining target from frequent subgraphs to frequent neighborhoods. A neighborhood is a specific topological pattern where a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant semantics as subgraph patterns. Experiments on real-life datasets display the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered in prior works.Comment: 9 page

    GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data

    Full text link
    Recent research on pattern discovery has progressed from mining frequent patterns and sequences to mining structured patterns, such as trees and graphs. Graphs as general data structure can model complex relations among data with wide applications in web exploration and social networks. However, the process of mining large graph patterns is a challenge due to the existence of large number of subgraphs. In this paper, we aim to mine only frequent complete graph patterns. A graph g in a database is complete if every pair of distinct vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining algorithm developed to explore interesting pruning techniques to extract maximal complete graphs from large spatial dataset existing in Sloan Digital Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high efficiency especially in the presence of large number of patterns. In this paper, we describe GCG that can mine not only simple co-location spatial patterns but also complex ones. To the best of our knowledge, this is the first algorithm used to exploit the extraction of maximal complete graphs in the process of mining complex co-location patterns in large spatial dataset.Comment: 1

    Compacting Frequent Star Patterns in RDF Graphs

    Get PDF
    Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top of gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude reducing the RDF graph size by up to 66.56%

    Compacting frequent star patterns in RDF graphs

    Get PDF
    Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. Since the edges between the entities and the objects in the frequent star pattern are replaced by edges between these entities and the surrogate entity of the compact RDF molecule, the size of the RDF graph is reduced. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude. Additionally, RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved

    Ribofsm: Frequent Subgraph Mining For the Discovery of RNA Structures and Interactions

    Get PDF
    Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei, which has a unique RNA editing mechanism. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA

    Graph-based discovery of ontology change patterns

    Get PDF
    Ontologies can support a variety of purposes, ranging from capturing conceptual knowledge to the organisation of digital content and information. However, information systems are always subject to change and ontology change management can pose challenges. We investigate ontology change representation and discovery of change patterns. Ontology changes are formalised as graph-based change logs. We use attributed graphs, which are typed over a generic graph with node and edge attribution.We analyse ontology change logs, represented as graphs, and identify frequent change sequences. Such sequences are applied as a reference in order to discover reusable, often domain-specific and usagedriven change patterns. We describe the pattern discovery algorithms and measure their performance using experimental result
    • 

    corecore