8,325 research outputs found
Recommended from our members
Learning on Graphs with Partially Absorbing Random Walks: Theory and Practice
Learning on graphs has been studied for decades with abundant models proposed, yet many of their behaviors and relations remain unclear. This thesis fills this gap by introducing a novel second-order Markov chain, called partially absorbing random walks (ParWalk). Different from ordinary random walk, ParWalk is absorbed at the current state with probability , and follows a random edge out with probability . The partial absorption results in absorption probability between any two vertices, which turns out to encompass various popular models including PageRank, hitting times, label propagation, and regularized Laplacian kernels. The unified treatment reveals the distinguishing characteristics of these models arising from different contexts, and allows comparing them and transferring findings from one paradigm to another.
The key for learning on graphs is capitalizing on the cluster structure of the underlying graph. The absorption probabilities of ParWalk, turn out to be highly effective in capturing the cluster structure. Given a query vertex in a cluster , we show that when the absorbing capacity () of each vertex on the graph is small, the probabilities of ParWalk to be absorbed at have small variations in region of high conductance (within clusters), but have large gaps in region of low conductance (between clusters). And the less absorbent the vertices of are, the better the absorption probabilities can represent the local cluster . Our theory induces principles for designing reliable similarity measures and provides justification to a number of popular ones such as hitting times and the pseudo-inverse of graph Laplacian. Furthermore, it reveals their new important properties. For example, we are the first to show that hitting times is better in retrieving sparse clusters, while the pseudo-inverse of graph Laplacian is better for dense ones.
The theoretical insights instilled from ParWalk guide us in developing robust algorithms for various applications including local clustering, semi-supervised learning, and ranking. For local clustering, we propose a new method for salient object segmentation. By taking a noisy saliency map as the probability distribution of query vertices, we compute the absorption probabilities of ParWalk to the queries, producing a high-quality refined saliency map where the objects can be easily segmented. For semi-supervised learning, we propose a new algorithm for label propagation. The algorithm is justified by our theoretical analysis and guaranteed to be superior than many existing ones. For ranking, we design a new similarity measure using ParWalk, which combines the strengths of both hitting times and the pseudo-inverse of graph Laplacian. The hybrid similarity measure can well adapt to complex data of diverse density, thus performs superiorly overall. For all these learning tasks, our methods achieve substantial improvements over the state-of-the-art on extensive benchmark datasets
Propagation Kernels
We introduce propagation kernels, a general graph-kernel framework for
efficiently measuring the similarity of structured data. Propagation kernels
are based on monitoring how information spreads through a set of given graphs.
They leverage early-stage distributions from propagation schemes such as random
walks to capture structural information encoded in node labels, attributes, and
edge information. This has two benefits. First, off-the-shelf propagation
schemes can be used to naturally construct kernels for many graph types,
including labeled, partially labeled, unlabeled, directed, and attributed
graphs. Second, by leveraging existing efficient and informative propagation
schemes, propagation kernels can be considerably faster than state-of-the-art
approaches without sacrificing predictive performance. We will also show that
if the graphs at hand have a regular structure, for instance when modeling
image or video data, one can exploit this regularity to scale the kernel
computation to large databases of graphs with thousands of nodes. We support
our contributions by exhaustive experiments on a number of real-world graphs
from a variety of application domains
Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
Graph-based Semi-supervised learning (SSL) algorithms have been successfully
used in a large number of applications. These methods classify initially
unlabeled nodes by propagating label information over the structure of graph
starting from seed nodes. Graph-based SSL algorithms usually scale linearly
with the number of distinct labels (m), and require O(m) space on each node.
Unfortunately, there exist many applications of practical significance with
very large m over large graphs, demanding better space and time complexity. In
this paper, we propose MAD-SKETCH, a novel graph-based SSL algorithm which
compactly stores label distribution on each node using Count-min Sketch, a
randomized data structure. We present theoretical analysis showing that under
mild conditions, MAD-SKETCH can reduce space complexity at each node from O(m)
to O(log m), and achieve similar savings in time complexity as well. We support
our analysis through experiments on multiple real world datasets. We observe
that MAD-SKETCH achieves similar performance as existing state-of-the-art
graph- based SSL algorithms, while requiring smaller memory footprint and at
the same time achieving up to 10x speedup. We find that MAD-SKETCH is able to
scale to datasets with one million labels, which is beyond the scope of
existing graph- based SSL algorithms.Comment: 9 page
Partitioned Sampling of Public Opinions Based on Their Social Dynamics
Public opinion polling is usually done by random sampling from the entire
population, treating individual opinions as independent. In the real world,
individuals' opinions are often correlated, e.g., among friends in a social
network. In this paper, we explore the idea of partitioned sampling, which
partitions individuals with high opinion similarities into groups and then
samples every group separately to obtain an accurate estimate of the population
opinion. We rigorously formulate the above idea as an optimization problem. We
then show that the simple partitions which contain only one sample in each
group are always better, and reduce finding the optimal simple partition to a
well-studied Min-r-Partition problem. We adapt an approximation algorithm and a
heuristic algorithm to solve the optimization problem. Moreover, to obtain
opinion similarity efficiently, we adapt a well-known opinion evolution model
to characterize social interactions, and provide an exact computation of
opinion similarities based on the model. We use both synthetic and real-world
datasets to demonstrate that the partitioned sampling method results in
significant improvement in sampling quality and it is robust when some opinion
similarities are inaccurate or even missing
- β¦