65,184 research outputs found
Graph Kernels
We present a unified framework to study graph kernels, special cases of which include the random
walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004;
Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time
complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3).
We find a spectral decomposition approach even more efficient when computing entire kernel matrices.
For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3)
time per iteration, where d is the size of the label set. By extending the necessary linear algebra to
Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels,
and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2)
time per iteration in all cases. Experiments on graphs from bioinformatics and other application
domains show that these techniques can speed up computation of the kernel by an order of magnitude
or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when
specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to
R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment
kernel of Fröhlich et al. (2006) yet provably positive semi-definite
Prediction of Atomization Energy Using Graph Kernel and Active Learning
Data-driven prediction of molecular properties presents unique challenges to
the design of machine learning methods concerning data
structure/dimensionality, symmetry adaption, and confidence management. In this
paper, we present a kernel-based pipeline that can learn and predict the
atomization energy of molecules with high accuracy. The framework employs
Gaussian process regression to perform predictions based on the similarity
between molecules, which is computed using the marginalized graph kernel. To
apply the marginalized graph kernel, a spatial adjacency rule is first employed
to convert molecules into graphs whose vertices and edges are labeled by
elements and interatomic distances, respectively. We then derive formulas for
the efficient evaluation of the kernel. Specific functional components for the
marginalized graph kernel are proposed, while the effect of the associated
hyperparameters on accuracy and predictive confidence are examined. We show
that the graph kernel is particularly suitable for predicting extensive
properties because its convolutional structure coincides with that of the
covariance formula between sums of random variables. Using an active learning
procedure, we demonstrate that the proposed method can achieve a mean absolute
error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7
data set
Learning Structural Kernels for Natural Language Processing
Structural kernels are a flexible learning
paradigm that has been widely used in Natural
Language Processing. However, the problem
of model selection in kernel-based methods
is usually overlooked. Previous approaches
mostly rely on setting default values for kernel
hyperparameters or using grid search,
which is slow and coarse-grained. In contrast,
Bayesian methods allow efficient model
selection by maximizing the evidence on the
training data through gradient-based methods.
In this paper we show how to perform this
in the context of structural kernels by using
Gaussian Processes. Experimental results on
tree kernels show that this procedure results
in better prediction performance compared to
hyperparameter optimization via grid search.
The framework proposed in this paper can be
adapted to other structures besides trees, e.g.,
strings and graphs, thereby extending the utility
of kernel-based methods
State-dependent Kernel selection for conditional sampling of graphs
This article introduces new efficient algorithms for two problems: sampling conditional on vertex degrees in unweighted graphs, and conditional on vertex strengths in weighted graphs. The resulting conditional distributions provide the basis for exact tests on social networks and two-way contingency tables. The algorithms are able to sample conditional on the presence or absence of an arbitrary set of edges. Existing samplers based on MCMC or sequential importance sampling are generally not scalable; their efficiency can degrade in large graphs with complex patterns of known edges. MCMC methods usually require explicit computation of a Markov basis to navigate the state space; this is computationally intensive even for small graphs. Our samplers do not require a Markov basis, and are efficient both in sparse and dense settings. The key idea is to carefully select a Markov kernel on the basis of the current state of the chain. We demonstrate the utility of our methods on a real network and contingency table. Supplementary materials for this article are available online
Application of kernel functions for accurate similarity search in large chemical databases
Background
Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases.
Results
To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep.
Conclusions
Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases
Efficient Estimation of Heat Kernel PageRank for Local Clustering
Given an undirected graph G and a seed node s, the local clustering problem
aims to identify a high-quality cluster containing s in time roughly
proportional to the size of the cluster, regardless of the size of G. This
problem finds numerous applications on large-scale graphs. Recently, heat
kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs,
is applied to this problem and found to be more efficient compared with prior
methods. However, existing solutions for computing HKPR either are
prohibitively expensive or provide unsatisfactory error approximation on HKPR
values, rendering them impractical especially on billion-edge graphs.
In this paper, we present TEA and TEA+, two novel local graph clustering
algorithms based on HKPR, to address the aforementioned limitations.
Specifically, these algorithms provide non-trivial theoretical guarantees in
relative error of HKPR values and the time complexity. The basic idea is to
utilize deterministic graph traversal to produce a rough estimation of exact
HKPR vector, and then exploit Monte-Carlo random walks to refine the results in
an optimized and non-trivial way. In particular, TEA+ offers practical
efficiency and effectiveness due to non-trivial optimizations. Extensive
experiments on real-world datasets demonstrate that TEA+ outperforms the
state-of-the-art algorithm by more than four times on most benchmark datasets
in terms of computational time when achieving the same clustering quality, and
in particular, is an order of magnitude faster on large graphs including the
widely studied Twitter and Friendster datasets.Comment: The technical report for the full research paper accepted in the
SIGMOD 201
Global geometric graph kernels and applications
This thesis explores the topics of graph kernels and classification of graphs. Graph kernels have received considerable attention in the last decade, in part because of their value in many practical applications, such as chemo informatics and molecular biology, in which classification using graph kernels have become the standard model for several problems. Perhaps even more important is the inclusion of graph kernels in the rich field of kernel methods, making a large family of machine learning algorithms, including support vector machines, applicable to data naturally represented as graphs. Graph kernels are similarity functions defined on pairs of graphs. Traditionally, graph kernels compare graphs in terms of features of subgraphs such as walks, paths or tree patterns. For the kernels to remain computationally efficient, these subgraphs are often chosen to be small. Because of this fact, most graph kernels adopt an inherently local perspective on the graph and may fail to discern global properties, such as the girth or the chromatic number, that are not captured in local structure. Furthermore, existing work on graph kernels lack results justifying a particular choice of kernel for a given application. In this thesis we propose two new graph kernels, designed to capture global properties of graphs, as described above. At the core of these kernels is Lov ́asz number, an important concept in graph theory with strong connections to graph properties like the chromatic number and the size of the largest clique. We give efficient sampling approximations to both kernels, allowing them to scale to large graphs. We also show that we can characterize the separation margin induced by these kernels in certain classification tasks. This serves as initial progress towards making theory aid kernel choice. We make an extensive empirical evaluation of both kernels on synthetic data with known global properties, and on real graphs frequently used to benchmark graph kernels. Finally, we present a new application of graph kernels in the field of data mining by redefining an important subproblem of entity disambiguation as a graph classification problem. We show empirically that our proposed method improves on the state-of-the-art
- …