8,325 research outputs found

    Propagation Kernels

    Full text link
    We introduce propagation kernels, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge information. This has two benefits. First, off-the-shelf propagation schemes can be used to naturally construct kernels for many graph types, including labeled, partially labeled, unlabeled, directed, and attributed graphs. Second, by leveraging existing efficient and informative propagation schemes, propagation kernels can be considerably faster than state-of-the-art approaches without sacrificing predictive performance. We will also show that if the graphs at hand have a regular structure, for instance when modeling image or video data, one can exploit this regularity to scale the kernel computation to large databases of graphs with thousands of nodes. We support our contributions by exhaustive experiments on a number of real-world graphs from a variety of application domains

    Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch

    Full text link
    Graph-based Semi-supervised learning (SSL) algorithms have been successfully used in a large number of applications. These methods classify initially unlabeled nodes by propagating label information over the structure of graph starting from seed nodes. Graph-based SSL algorithms usually scale linearly with the number of distinct labels (m), and require O(m) space on each node. Unfortunately, there exist many applications of practical significance with very large m over large graphs, demanding better space and time complexity. In this paper, we propose MAD-SKETCH, a novel graph-based SSL algorithm which compactly stores label distribution on each node using Count-min Sketch, a randomized data structure. We present theoretical analysis showing that under mild conditions, MAD-SKETCH can reduce space complexity at each node from O(m) to O(log m), and achieve similar savings in time complexity as well. We support our analysis through experiments on multiple real world datasets. We observe that MAD-SKETCH achieves similar performance as existing state-of-the-art graph- based SSL algorithms, while requiring smaller memory footprint and at the same time achieving up to 10x speedup. We find that MAD-SKETCH is able to scale to datasets with one million labels, which is beyond the scope of existing graph- based SSL algorithms.Comment: 9 page

    Partitioned Sampling of Public Opinions Based on Their Social Dynamics

    Full text link
    Public opinion polling is usually done by random sampling from the entire population, treating individual opinions as independent. In the real world, individuals' opinions are often correlated, e.g., among friends in a social network. In this paper, we explore the idea of partitioned sampling, which partitions individuals with high opinion similarities into groups and then samples every group separately to obtain an accurate estimate of the population opinion. We rigorously formulate the above idea as an optimization problem. We then show that the simple partitions which contain only one sample in each group are always better, and reduce finding the optimal simple partition to a well-studied Min-r-Partition problem. We adapt an approximation algorithm and a heuristic algorithm to solve the optimization problem. Moreover, to obtain opinion similarity efficiently, we adapt a well-known opinion evolution model to characterize social interactions, and provide an exact computation of opinion similarities based on the model. We use both synthetic and real-world datasets to demonstrate that the partitioned sampling method results in significant improvement in sampling quality and it is robust when some opinion similarities are inaccurate or even missing
    • …
    corecore