16 research outputs found
Generalized Optimization Framework for Graph-based Semi-supervised Learning
We develop a generalized optimization framework for graph-based
semi-supervised learning. The framework gives as particular cases the Standard
Laplacian, Normalized Laplacian and PageRank based methods. We have also
provided new probabilistic interpretation based on random walks and
characterized the limiting behaviour of the methods. The random walk based
interpretation allows us to explain di erences between the performances of
methods with di erent smoothing kernels. It appears that the PageRank based
method is robust with respect to the choice of the regularization parameter and
the labelled data. We illustrate our theoretical results with two realistic
datasets, characterizing di erent challenges: Les Miserables characters social
network and Wikipedia hyper-link graph. The graph-based semi-supervised
learning classi- es the Wikipedia articles with very good precision and perfect
recall employing only the information about the hyper-text links
Semi-supervised Learning with Regularized Laplacian
International audienceWe study a semi-supervised learning method based on the similarity graph and Regularized Laplacian. We give convenient optimization formulation of the Regularized Laplacian method and establish its various properties. In particular, we show that the kernel of the method can be interpreted in terms of discrete and continuous time random walks and possesses several important properties of proximity measures. Both optimization and linear algebra methods can be used for efficient computation of the classification functions. We demonstrate on numerical examples that the Regularized Laplacian method is robust with respect to the choice of the regularization parameter and outperforms the Laplacian-based heat kernel methods
GenPR: Generative PageRank Framework for Semi-supervised Learning on Citation Graphs
International audienceNowadays, Semi-Supervised Learning (SSL) on citation graph data sets is a rapidly growing area of research. However, the recently proposed graph-based SSL algorithms use a default adjacency matrix with binary weights on edges (citations), that causes a loss of the nodes (papers) similarity information. In this work, therefore, we propose a framework focused on embedding PageRank SSL in a generative model. This framework allows one to do joint training of nodes latent space representation and label spreading through the reweighted adjacency matrix by node similarities in the latent space. We explain that a generative model can improve accuracy and reduce the number of iteration steps for PageRank SSL. Moreover, we show that our framework outperforms the best graph-based SSL algorithms on four public citation graph data sets and improves the interpretability of classification results
Almost exact recovery in noisy semi-supervised learning
This paper investigates noisy graph-based semi-supervised learning or
community detection. We consider the Stochastic Block Model (SBM), where, in
addition to the graph observation, an oracle gives a non-perfect information
about some nodes' cluster assignment. We derive the Maximum A Priori (MAP)
estimator, and show that a continuous relaxation of the MAP performs almost
exact recovery under non-restrictive conditions on the average degree and
amount of oracle noise. In particular, this method avoids some pitfalls of
several graph-based semi-supervised learning methods such as the flatness of
the classification functions, appearing in the problems with a very large
amount of unlabeled data
LFGCN: Levitating over Graphs with Levy Flights
Due to high utility in many applications, from social networks to blockchain
to power grids, deep learning on non-Euclidean objects such as graphs and
manifolds, coined Geometric Deep Learning (GDL), continues to gain an ever
increasing interest. We propose a new L\'evy Flights Graph Convolutional
Networks (LFGCN) method for semi-supervised learning, which casts the L\'evy
Flights into random walks on graphs and, as a result, allows both to accurately
account for the intrinsic graph topology and to substantially improve
classification performance, especially for heterogeneous graphs. Furthermore,
we propose a new preferential P-DropEdge method based on the Girvan-Newman
argument. That is, in contrast to uniform removing of edges as in DropEdge,
following the Girvan-Newman algorithm, we detect network periphery structures
using information on edge betweenness and then remove edges according to their
betweenness centrality. Our experimental results on semi-supervised node
classification tasks demonstrate that the LFGCN coupled with P-DropEdge
accelerates the training task, increases stability and further improves
predictive accuracy of learned graph topology structure. Finally, in our case
studies we bring the machinery of LFGCN and other deep networks tools to
analysis of power grid networks - the area where the utility of GDL remains
untapped.Comment: To Appear in the 2020 IEEE International Conference on Data Mining
(ICDM
Graph Tikhonov Regularization and Interpolation via Random Spanning Forests
Novel Monte Carlo estimators are proposed to solve both the Tikhonov
regularization (TR) and the interpolation problems on graphs. These estimators
are based on random spanning forests (RSF), the theoretical properties of which
enable to analyze the estimators' theoretical mean and variance. We also show
how to perform hyperparameter tuning for these RSF-based estimators. TR is a
component in many well-known algorithms, and we show how the proposed
estimators can be easily adapted to avoid expensive intermediate steps in
generalized semi-supervised learning, label propagation, Newton's method and
iteratively reweighted least squares. In the experiments, we illustrate the
proposed methods on several problems and provide observations on their run
time
Kernels on Graphs as Proximity Measures
International audienceKernels and, broadly speaking, similarity measures on graphs are extensively used in graph-based unsupervised and semi-supervised learning algorithms as well as in the link prediction problem. We analytically study proximity and distance properties of various kernels and similarity measures on graphs. This can potentially be useful for recommending the adoption of one or another similarity measure in a machine learning method. Also, we numerically compare various similarity measures in the context of spectral clustering and observe that normalized heat-type similarity measures with log modification generally perform the best