Search CORE

68,228 research outputs found

TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs

Author: Jung Jinhong
Kang U
Yoon Minji
Publication venue
Publication date: 03/12/2017
Field of study

Given a large graph, how can we determine similarity between nodes in a fast and accurate way? Random walk with restart (RWR) is a popular measure for this purpose and has been exploited in numerous data mining applications including ranking, anomaly detection, link prediction, and community detection. However, previous methods for computing exact RWR require prohibitive storage sizes and computational costs, and alternative methods which avoid such costs by computing approximate RWR have limited accuracy. In this paper, we propose TPA, a fast, scalable, and highly accurate method for computing approximate RWR on large graphs. TPA exploits two important properties in RWR: 1) nodes close to a seed node are likely to be revisited in following steps due to block-wise structure of many real-world graphs, and 2) RWR scores of nodes which reside far from the seed node are proportional to their PageRank scores. Based on these two properties, TPA divides approximate RWR problem into two subproblems called neighbor approximation and stranger approximation. In the neighbor approximation, TPA estimates RWR scores of nodes close to the seed based on scores of few early steps from the seed. In the stranger approximation, TPA estimates RWR scores for nodes far from the seed using their PageRank. The stranger and neighbor approximations are conducted in the preprocessing phase and the online phase, respectively. Through extensive experiments, we show that TPA requires up to 3.5x less time with up to 40x less memory space than other state-of-the-art methods for the preprocessing phase. In the online phase, TPA computes approximate RWR up to 30x faster than existing methods while maintaining high accuracy.Comment: 12pages, 10 figure

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Learning to Discover Sparse Graphical Models

Author: Belilovsky Eugene
Blaschko Matthew
Kastner Kyle
Varoquaux Gaël
Publication venue
Publication date: 03/08/2017
Field of study

We consider structure discovery of undirected graphical models from observational data. Inferring likely structures from few examples is a complex task often requiring the formulation of priors and sophisticated inference procedures. Popular methods rely on estimating a penalized maximum likelihood of the precision matrix. However, in these approaches structure recovery is an indirect consequence of the data-fit term, the penalty can be difficult to adapt for domain-specific knowledge, and the inference is computationally demanding. By contrast, it may be easier to generate training samples of data that arise from graphs with the desired structure properties. We propose here to leverage this latter source of information as training data to learn a function, parametrized by a neural network that maps empirical covariance matrices to estimated graph structures. Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood. Applying this framework, we find our learnable graph-discovery method trained on synthetic data generalizes well: identifying relevant edges in both synthetic and real data, completely unknown at training time. We find that on genetics, brain imaging, and simulation data we obtain performance generally superior to analytical methods

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Rennes 1

Seeding with Costly Network Information

Author: Eckles Dean
Esfandiari Hossein
Mossel Elchanan
Rahimian M. Amin
Publication venue
Publication date: 04/06/2021
Field of study

We study the task of selecting

k

nodes in a social network of size

n

, to seed a diffusion with maximum expected spread size, under the independent cascade model with cascade probability

p

. Most of the previous work on this problem (known as influence maximization) focuses on efficient algorithms to approximate the optimal seed set with provable guarantees, given the knowledge of the entire network. However, in practice, obtaining full knowledge of the network is very costly. To address this gap, we first study the achievable guarantees using

o(n)

influence samples. We provide an approximation algorithm with a tight (1-1/e){\mbox{OPT}}-\epsilon n guarantee, using

O_{\epsilon}(k^2\log n)

influence samples and show that this dependence on

k

is asymptotically optimal. We then propose a probing algorithm that queries

{O}_{\epsilon}(p n^2\log^4 n + \sqrt{k p} n^{1.5}\log^{5.5} n + k n\log^{3.5}{n})

edges from the graph and use them to find a seed set with the same almost tight approximation guarantee. We also provide a matching (up to logarithmic factors) lower-bound on the required number of edges. To address the dependence of our probing algorithm on the independent cascade probability

p

, we show that it is impossible to maintain the same approximation guarantees by controlling the discrepancy between the probing and seeding cascade probabilities. Instead, we propose to down-sample the probed edges to match the seeding cascade probability, provided that it does not exceed that of probing. Finally, we test our algorithms on real world data to quantify the trade-off between the cost of obtaining more refined network information and the benefit of the added information for guiding improved seeding strategies

arXiv.org e-Print Archive

DSpace@MIT

D-Scholarship@Pitt

Robust Densest Subgraph Discovery

Author: Miyauchi Atsushi
Takeda Akiko
Publication venue
Publication date: 13/09/2018
Field of study

Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph

G=(V,E)

with an edge-weight vector

w=(w_e)_{e\in E}

, we aim to find

S\subseteq V

that maximizes the density, i.e.,

w(S)/|S|

, where

w(S)

is the sum of the weights of the edges in the subgraph induced by

S

. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201

arXiv.org e-Print Archive

Crossref