Search CORE

14,263 research outputs found

Importance Sketching of Influence Dynamics in Billion-scale Networks

Author: Dinh Thang N.
Nguyen Hung T.
Nguyen Tri P.
Phan NhatHai
Publication venue
Publication date: 11/09/2017
Field of study

The blooming availability of traces for social, biological, and communication networks opens up unprecedented opportunities in analyzing diffusion processes in networks. However, the sheer sizes of the nowadays networks raise serious challenges in computational efficiency and scalability. In this paper, we propose a new hyper-graph sketching framework for inflence dynamics in networks. The central of our sketching framework, called SKIS, is an efficient importance sampling algorithm that returns only non-singular reverse cascades in the network. Comparing to previously developed sketches like RIS and SKIM, our sketch significantly enhances estimation quality while substantially reducing processing time and memory-footprint. Further, we present general strategies of using SKIS to enhance existing algorithms for influence estimation and influence maximization which are motivated by practical applications like viral marketing. Using SKIS, we design high-quality influence oracle for seed sets with average estimation error up to 10x times smaller than those using RIS and 6x times smaller than SKIM. In addition, our influence maximization using SKIS substantially improves the quality of solutions for greedy algorithms. It achieves up to 10x times speed-up and 4x memory reduction for the fastest RIS-based DSSA algorithm, while maintaining the same theoretical guarantees.Comment: 12 pages, to appear in ICDM 2017 as a regular pape

arXiv.org e-Print Archive

Crossref

Fast Similarity Sketching

Author: Dahlgaard Søren
Knudsen Mathias Bæk Tejs
Thorup Mikkel
Publication venue
Publication date: 01/01/2017
Field of study

We consider the Similarity Sketching problem: Given a universe

[u]= \{0,\ldots,u-1\}

we want a random function

S

mapping subsets

A\subseteq [u]

into vectors

S(A)

of size

t

, such that similarity is preserved. More precisely: Given sets

A,B\subseteq [u]

, define

X_i=[S(A)[i]= S(B)[i]]

and

X=\sum_{i\in [t]}X_i

. We want to have

E[X]=t\cdot J(A,B)

, where

J(A,B)=|A\cap B|/|A\cup B|

and furthermore to have strong concentration guarantees (i.e. Chernoff-style bounds) for

X

. This is a fundamental problem which has found numerous applications in data mining, large-scale classification, computer vision, similarity search, etc. via the classic MinHash algorithm. The vectors

S(A)

are also called sketches. The seminal

t\times

MinHash algorithm uses

t

random hash functions

h_1,\ldots, h_t

, and stores

\left(\min_{a\in A}h_1(A),\ldots, \min_{a\in A}h_t(A)\right)

as the sketch of

A

. The main drawback of MinHash is, however, its

O(t\cdot |A|)

running time, and finding a sketch with similar properties and faster running time has been the subject of several papers. Addressing this, Li et al. [NIPS'12] introduced one permutation hashing (OPH), which creates a sketch of size

t

O(t + |A|)

time, but with the drawback that possibly some of the

t

entries are "empty" when

|A| = O(t)

. One could argue that sketching is not necessary in this case, however the desire in most applications is to have one sketching procedure that works for sets of all sizes. Therefore, filling out these empty entries is the subject of several follow-up papers initiated by Shrivastava and Li [ICML'14]. However, these "densification" schemes fail to provide good concentration bounds exactly in the case

|A| = O(t)

, where they are needed. (continued...

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Munteanu Alexander
Schwiegelshohn Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Archivio della ricerca- Università di Roma La Sapienza