Search CORE

13,202 research outputs found

TPA: Fast, Scalable, and Accurate Method for Approximate Random Walk with Restart on Billion Scale Graphs

Author: Jung Jinhong
Kang U
Yoon Minji
Publication venue
Publication date: 03/12/2017
Field of study

Given a large graph, how can we determine similarity between nodes in a fast and accurate way? Random walk with restart (RWR) is a popular measure for this purpose and has been exploited in numerous data mining applications including ranking, anomaly detection, link prediction, and community detection. However, previous methods for computing exact RWR require prohibitive storage sizes and computational costs, and alternative methods which avoid such costs by computing approximate RWR have limited accuracy. In this paper, we propose TPA, a fast, scalable, and highly accurate method for computing approximate RWR on large graphs. TPA exploits two important properties in RWR: 1) nodes close to a seed node are likely to be revisited in following steps due to block-wise structure of many real-world graphs, and 2) RWR scores of nodes which reside far from the seed node are proportional to their PageRank scores. Based on these two properties, TPA divides approximate RWR problem into two subproblems called neighbor approximation and stranger approximation. In the neighbor approximation, TPA estimates RWR scores of nodes close to the seed based on scores of few early steps from the seed. In the stranger approximation, TPA estimates RWR scores for nodes far from the seed using their PageRank. The stranger and neighbor approximations are conducted in the preprocessing phase and the online phase, respectively. Through extensive experiments, we show that TPA requires up to 3.5x less time with up to 40x less memory space than other state-of-the-art methods for the preprocessing phase. In the online phase, TPA computes approximate RWR up to 30x faster than existing methods while maintaining high accuracy.Comment: 12pages, 10 figure

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Pseudorandom number generators revisited

Author: Annink B.
Kleijnen J.P.C.
Publication venue
Publication date
Field of study

Statistical Methods;mathematische statistiek

Research Papers in Economics

Efficient Seeds Computation Revisited

Author: A. Apostolico
C.S. Iliopoulos
D. Breslauer
G.S. Brodal
J. Fischer
K. Sadakane
M. Crochemore
M. Crochemore
M. Crochemore
O. Berkman
Y. Li
Publication venue
Publication date: 01/01/2011
Field of study

The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions --- computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in

O(n^2)

time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an

O(n\log{(n/m)})

time algorithm checking if the shortest seed has length at least

m

and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm (Iliopoulos et al., 1996).Comment: 14 pages, accepted to CPM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Low-shot learning with large-scale diffusion

Author: Douze Matthijs
Hariharan Bharath
Jégou Hervé
Szlam Arthur
Publication venue
Publication date: 15/06/2018
Field of study

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime

arXiv.org e-Print Archive

Crossref

Interactive Channel Capacity Revisited

Author: Haeupler Bernhard
Publication venue
Publication date: 01/01/2014
Field of study

We provide the first capacity approaching coding schemes that robustly simulate any interactive protocol over an adversarial channel that corrupts any

\epsilon

fraction of the transmitted symbols. Our coding schemes achieve a communication rate of

1 - O(\sqrt{\epsilon \log \log 1/\epsilon})

over any adversarial channel. This can be improved to

1 - O(\sqrt{\epsilon})

for random, oblivious, and computationally bounded channels, or if parties have shared randomness unknown to the channel. Surprisingly, these rates exceed the

1 - \Omega(\sqrt{H(\epsilon)}) = 1 - \Omega(\sqrt{\epsilon \log 1/\epsilon})

interactive channel capacity bound which [Kol and Raz; STOC'13] recently proved for random errors. We conjecture

1 - \Theta(\sqrt{\epsilon \log \log 1/\epsilon})

and

1 - \Theta(\sqrt{\epsilon})

to be the optimal rates for their respective settings and therefore to capture the interactive channel capacity for random and adversarial errors. In addition to being very communication efficient, our randomized coding schemes have multiple other advantages. They are computationally efficient, extremely natural, and significantly simpler than prior (non-capacity approaching) schemes. In particular, our protocols do not employ any coding but allow the original protocol to be performed as-is, interspersed only by short exchanges of hash values. When hash values do not match, the parties backtrack. Our approach is, as we feel, by far the simplest and most natural explanation for why and how robust interactive communication in a noisy environment is possible

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recursive Online Enumeration of All Minimal Unsatisfiable Subsets

Author: Anton Belov
Fahiem Bacchus
J Bailey
J Bendík
MH Liffiton
Publication venue
Publication date: 08/05/2018
Field of study

In various areas of computer science, we deal with a set of constraints to be satisfied. If the constraints cannot be satisfied simultaneously, it is desirable to identify the core problems among them. Such cores are called minimal unsatisfiable subsets (MUSes). The more MUSes are identified, the more information about the conflicts among the constraints is obtained. However, a full enumeration of all MUSes is in general intractable due to the large number (even exponential) of possible conflicts. Moreover, to identify MUSes algorithms must test sets of constraints for their simultaneous satisfiabilty. The type of the test depends on the application domains. The complexity of tests can be extremely high especially for domains like temporal logics, model checking, or SMT. In this paper, we propose a recursive algorithm that identifies MUSes in an online manner (i.e., one by one) and can be terminated at any time. The key feature of our algorithm is that it minimizes the number of satisfiability tests and thus speeds up the computation. The algorithm is applicable to an arbitrary constraint domain and its effectiveness demonstrates itself especially in domains with expensive satisfiability checks. We benchmark our algorithm against state of the art algorithm on Boolean and SMT constraint domains and demonstrate that our algorithm really requires less satisfiability tests and consequently finds more MUSes in given time limits

arXiv.org e-Print Archive

Crossref