Search CORE

16 research outputs found

Generalized Optimization Framework for Graph-based Semi-supervised Learning

Author: Avrachenkov Konstantin
Gonçalves Paulo
Mishenin Alexey
Sokol Marina
Publication venue
Publication date: 19/10/2011
Field of study

We develop a generalized optimization framework for graph-based semi-supervised learning. The framework gives as particular cases the Standard Laplacian, Normalized Laplacian and PageRank based methods. We have also provided new probabilistic interpretation based on random walks and characterized the limiting behaviour of the methods. The random walk based interpretation allows us to explain di erences between the performances of methods with di erent smoothing kernels. It appears that the PageRank based method is robust with respect to the choice of the regularization parameter and the labelled data. We illustrate our theoretical results with two realistic datasets, characterizing di erent challenges: Les Miserables characters social network and Wikipedia hyper-link graph. The graph-based semi-supervised learning classi- es the Wikipedia articles with very good precision and perfect recall employing only the information about the hyper-text links

arXiv.org e-Print Archive

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Semi-supervised Learning with Regularized Laplacian

Author: Avrachenkov Konstantin
Chebotarev Pavel
Mishenin Alexey
Publication venue: 'Informa UK Limited'
Publication date: 07/07/2016
Field of study

International audienceWe study a semi-supervised learning method based on the similarity graph and Regularized Laplacian. We give convenient optimization formulation of the Regularized Laplacian method and establish its various properties. In particular, we show that the kernel of the method can be interpreted in terms of discrete and continuous time random walks and possesses several important properties of proximity measures. Both optimization and linear algebra methods can be used for efficient computation of the classification functions. We demonstrate on numerical examples that the Regularized Laplacian method is robust with respect to the choice of the regularization parameter and outperforms the Laplacian-based heat kernel methods

Crossref

INRIA a CCSD electronic archive server

GenPR: Generative PageRank Framework for Semi-supervised Learning on Citation Graphs

Author: Avrachenkov Konstantin
Kamalov Mikhail
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2020
Field of study

International audienceNowadays, Semi-Supervised Learning (SSL) on citation graph data sets is a rapidly growing area of research. However, the recently proposed graph-based SSL algorithms use a default adjacency matrix with binary weights on edges (citations), that causes a loss of the nodes (papers) similarity information. In this work, therefore, we propose a framework focused on embedding PageRank SSL in a generative model. This framework allows one to do joint training of nodes latent space representation and label spreading through the reweighted adjacency matrix by node similarities in the latent space. We explain that a generative model can improve accuracy and reduce the number of iteration steps for PageRank SSL. Moreover, we show that our framework outperforms the best graph-based SSL algorithms on four public citation graph data sets and improves the interpretability of classification results

Crossref

INRIA a CCSD electronic archive server

Almost exact recovery in noisy semi-supervised learning

Author: Avrachenkov Konstantin
Dreveton Maximilien
Publication venue
Publication date: 29/07/2020
Field of study

This paper investigates noisy graph-based semi-supervised learning or community detection. We consider the Stochastic Block Model (SBM), where, in addition to the graph observation, an oracle gives a non-perfect information about some nodes' cluster assignment. We derive the Maximum A Priori (MAP) estimator, and show that a continuous relaxation of the MAP performs almost exact recovery under non-restrictive conditions on the average degree and amount of oracle noise. In particular, this method avoids some pitfalls of several graph-based semi-supervised learning methods such as the flatness of the classification functions, appearing in the problems with a very large amount of unlabeled data

arXiv.org e-Print Archive

LFGCN: Levitating over Graphs with Levy Flights

Author: Avrachenkov Konstantin
Chen Yuzhou
Gel Yulia R.
Publication venue
Publication date: 04/09/2020
Field of study

Due to high utility in many applications, from social networks to blockchain to power grids, deep learning on non-Euclidean objects such as graphs and manifolds, coined Geometric Deep Learning (GDL), continues to gain an ever increasing interest. We propose a new L\'evy Flights Graph Convolutional Networks (LFGCN) method for semi-supervised learning, which casts the L\'evy Flights into random walks on graphs and, as a result, allows both to accurately account for the intrinsic graph topology and to substantially improve classification performance, especially for heterogeneous graphs. Furthermore, we propose a new preferential P-DropEdge method based on the Girvan-Newman argument. That is, in contrast to uniform removing of edges as in DropEdge, following the Girvan-Newman algorithm, we detect network periphery structures using information on edge betweenness and then remove edges according to their betweenness centrality. Our experimental results on semi-supervised node classification tasks demonstrate that the LFGCN coupled with P-DropEdge accelerates the training task, increases stability and further improves predictive accuracy of learned graph topology structure. Finally, in our case studies we bring the machinery of LFGCN and other deep networks tools to analysis of power grid networks - the area where the utility of GDL remains untapped.Comment: To Appear in the 2020 IEEE International Conference on Data Mining (ICDM

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Graph Tikhonov Regularization and Interpolation via Random Spanning Forests

Author: Amblard Pierre-Olivier
Barthelme Simon
Pilavci Yusuf
Tremblay Nicolas
Publication venue
Publication date: 20/11/2020
Field of study

Novel Monte Carlo estimators are proposed to solve both the Tikhonov regularization (TR) and the interpolation problems on graphs. These estimators are based on random spanning forests (RSF), the theoretical properties of which enable to analyze the estimators' theoretical mean and variance. We also show how to perform hyperparameter tuning for these RSF-based estimators. TR is a component in many well-known algorithms, and we show how the proposed estimators can be easily adapted to avoid expensive intermediate steps in generalized semi-supervised learning, label propagation, Newton's method and iteratively reweighted least squares. In the experiments, we illustrate the proposed methods on several problems and provide observations on their run time

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Hal-Diderot

Kernels on Graphs as Proximity Measures

Author: C Lenart
D Boley
D Liben-Nowell
E Estrada
E Estrada
E Estrada
F Chung
F Fouss
F Sommer
I Kivimäki
IJ Schoenberg
IJ Schoenberg
IS Dhillon
IS Dhillon
J Shawe-Taylor
K Avrachenkov
K Avrachenkov
K-R Müller
L Backstrom
L Katz
O Chapelle
P Chebotarev
P Chebotarev
P Chebotarev
PY Chebotarev
PY Chebotarev
PY Chebotarev
RA Horn
SJ Kirkland
SVN Vishwanathan
U Luxburg von
V Ivashkin
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 15/06/2017
Field of study

International audienceKernels and, broadly speaking, similarity measures on graphs are extensively used in graph-based unsupervised and semi-supervised learning algorithms as well as in the link prediction problem. We analytically study proximity and distance properties of various kernels and similarity measures on graphs. This can potentially be useful for recommending the adoption of one or another similarity measure in a machine learning method. Also, we numerically compare various similarity measures in the context of spectral clustering and observe that normalized heat-type similarity measures with log modification generally perform the best

Crossref

INRIA a CCSD electronic archive server