Given a semantic graph data set, perhaps one lacking in an explicit ontology, we wish to first identify its significant semantic structures, and then measure the extent of their significance. Casting a semantic graph dataset as an edgelabeled, directed graph, this task can be built on the ability to mine frequent labeled subgraphs in edge-labeled, directed graphs. We begin by considering the enumerative combinatorics of subgraph motif structures in edge-labeled directed graphs. We identify frequent labeled, directed subgraph motif patterns, and measure the significance of the resulting motifs by the information gain relative to the expected value of the motif based on the empirical frequency distribution of the link types which compose them, assuming independence. We illustrate on a small test graph, and discuss results obtained for small linear motifs (link type bigrams and trigrams) in the Billion Triple Challenge triplestore. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.