1 research outputs found
The Structurally Smoothed Graphlet Kernel
A commonly used paradigm for representing graphs is to use a vector that
contains normalized frequencies of occurrence of certain motifs or sub-graphs.
This vector representation can be used in a variety of applications, such as,
for computing similarity between graphs. The graphlet kernel of Shervashidze et
al. [32] uses induced sub-graphs of k nodes (christened as graphlets by Przulj
[28]) as motifs in the vector representation, and computes the kernel via a dot
product between these vectors. One can easily show that this is a valid kernel
between graphs. However, such a vector representation suffers from a few
drawbacks. As k becomes larger we encounter the sparsity problem; most higher
order graphlets will not occur in a given graph. This leads to diagonal
dominance, that is, a given graph is similar to itself but not to any other
graph in the dataset. On the other hand, since lower order graphlets tend to be
more numerous, using lower values of k does not provide enough discrimination
ability. We propose a smoothing technique to tackle the above problems. Our
method is based on a novel extension of Kneser-Ney and Pitman-Yor smoothing
techniques from natural language processing to graphs. We use the relationships
between lower order and higher order graphlets in order to derive our method.
Consequently, our smoothing algorithm not only respects the dependency between
sub-graphs but also tackles the diagonal dominance problem by distributing the
probability mass across graphlets. In our experiments, the smoothed graphlet
kernel outperforms graph kernels based on raw frequency counts