92,572 research outputs found
Compressive PCA for Low-Rank Matrices on Graphs
We introduce a novel framework for an approxi- mate recovery of data matrices
which are low-rank on graphs, from sampled measurements. The rows and columns
of such matrices belong to the span of the first few eigenvectors of the graphs
constructed between their rows and columns. We leverage this property to
recover the non-linear low-rank structures efficiently from sampled data
measurements, with a low cost (linear in n). First, a Resrtricted Isometry
Property (RIP) condition is introduced for efficient uniform sampling of the
rows and columns of such matrices based on the cumulative coherence of graph
eigenvectors. Secondly, a state-of-the-art fast low-rank recovery method is
suggested for the sampled data. Finally, several efficient, parallel and
parameter-free decoders are presented along with their theoretical analysis for
decoding the low-rank and cluster indicators for the full data matrix. Thus, we
overcome the computational limitations of the standard linear low-rank recovery
methods for big datasets. Our method can also be seen as a major step towards
efficient recovery of non- linear low-rank structures. For a matrix of size n X
p, on a single core machine, our method gains a speed up of over Robust
Principal Component Analysis (RPCA), where k << p is the subspace dimension.
Numerically, we can recover a low-rank matrix of size 10304 X 1000, 100 times
faster than Robust PCA
Graph Sample and Hold: A Framework for Big-Graph Analytics
Sampling is a standard approach in big-graph analytics; the goal is to
efficiently estimate the graph properties by consulting a sample of the whole
population. A perfect sample is assumed to mirror every property of the whole
population. Unfortunately, such a perfect sample is hard to collect in complex
populations such as graphs (e.g. web graphs, social networks etc), where an
underlying network connects the units of the population. Therefore, a good
sample will be representative in the sense that graph properties of interest
can be estimated with a known degree of accuracy. While previous work focused
particularly on sampling schemes used to estimate certain graph properties
(e.g. triangle count), much less is known for the case when we need to estimate
various graph properties with the same sampling scheme. In this paper, we
propose a generic stream sampling framework for big-graph analytics, called
Graph Sample and Hold (gSH). To begin, the proposed framework samples from
massive graphs sequentially in a single pass, one edge at a time, while
maintaining a small state. We then show how to produce unbiased estimators for
various graph properties from the sample. Given that the graph analysis
algorithms will run on a sample instead of the whole population, the runtime
complexity of these algorithm is kept under control. Moreover, given that the
estimators of graph properties are unbiased, the approximation error is kept
under control. Finally, we show the performance of the proposed framework (gSH)
on various types of graphs, such as social graphs, among others
- …