14,263 research outputs found
Importance Sketching of Influence Dynamics in Billion-scale Networks
The blooming availability of traces for social, biological, and communication
networks opens up unprecedented opportunities in analyzing diffusion processes
in networks. However, the sheer sizes of the nowadays networks raise serious
challenges in computational efficiency and scalability.
In this paper, we propose a new hyper-graph sketching framework for inflence
dynamics in networks. The central of our sketching framework, called SKIS, is
an efficient importance sampling algorithm that returns only non-singular
reverse cascades in the network. Comparing to previously developed sketches
like RIS and SKIM, our sketch significantly enhances estimation quality while
substantially reducing processing time and memory-footprint. Further, we
present general strategies of using SKIS to enhance existing algorithms for
influence estimation and influence maximization which are motivated by
practical applications like viral marketing. Using SKIS, we design high-quality
influence oracle for seed sets with average estimation error up to 10x times
smaller than those using RIS and 6x times smaller than SKIM. In addition, our
influence maximization using SKIS substantially improves the quality of
solutions for greedy algorithms. It achieves up to 10x times speed-up and 4x
memory reduction for the fastest RIS-based DSSA algorithm, while maintaining
the same theoretical guarantees.Comment: 12 pages, to appear in ICDM 2017 as a regular pape
Fast Similarity Sketching
We consider the Similarity Sketching problem: Given a universe we want a random function mapping subsets into vectors of size , such that similarity is preserved. More
precisely: Given sets , define and
. We want to have , where
and furthermore to have strong concentration
guarantees (i.e. Chernoff-style bounds) for . This is a fundamental problem
which has found numerous applications in data mining, large-scale
classification, computer vision, similarity search, etc. via the classic
MinHash algorithm. The vectors are also called sketches.
The seminal MinHash algorithm uses random hash functions
, and stores as the sketch of . The main drawback of MinHash is,
however, its running time, and finding a sketch with similar
properties and faster running time has been the subject of several papers.
Addressing this, Li et al. [NIPS'12] introduced one permutation hashing (OPH),
which creates a sketch of size in time, but with the drawback
that possibly some of the entries are "empty" when . One could
argue that sketching is not necessary in this case, however the desire in most
applications is to have one sketching procedure that works for sets of all
sizes. Therefore, filling out these empty entries is the subject of several
follow-up papers initiated by Shrivastava and Li [ICML'14]. However, these
"densification" schemes fail to provide good concentration bounds exactly in
the case , where they are needed. (continued...
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
- …