Search CORE

2 research outputs found

A note on quickly sampling a sparse matrix with low rank expectation

Author: Binkiewicz Norbert
Han Xintian
Rohe Karl
Tao Jun
Publication venue
Publication date: 08/03/2017
Field of study

Given matrices

X,Y \in R^{n \times K}

and

S \in R^{K \times K}

with positive elements, this paper proposes an algorithm fastRG to sample a sparse matrix

A

with low rank expectation

E(A) = XSY^T

and independent Poisson elements. This allows for quickly sampling from a broad class of stochastic blockmodel graphs (degree-corrected, mixed membership, overlapping) all of which are specific parameterizations of the generalized random product graph model defined in Section 2.2. The basic idea of fastRG is to first sample the number of edges

m

and then sample each edge. The key insight is that because of the the low rank expectation, it is easy to sample individual edges. The naive "element-wise" algorithm requires

O(n^2)

operations to generate the

n\times n

adjacency matrix

A

. In sparse graphs, where

m = O(n)

, ignoring log terms, fastRG runs in time

O(n)

. An implementation in fastRG is available on github. A computational experiment in Section 2.4 simulates graphs up to

n=10,000,000

nodes with

m = 100,000,000

edges. For example, on a graph with

n=500,000

and

m = 5,000,000

, fastRG runs in less than one second on a 3.5 GHz Intel i5

arXiv.org e-Print Archive

Covariate-assisted spectral clustering

Author: Binkiewicz Norbert
Rohe Karl
Vogelstein Joshua T.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 30/10/2016
Field of study

Biological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Moreover, in applications such as connectomics, social networks, and genomics, graph data are accompanied by contextualizing measures on each node. We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering. Statistical guarantees are provided under a joint mixture model that we call the node-contextualized stochastic blockmodel, including a bound on the mis-clustering rate. The bound is used to derive conditions for achieving perfect clustering. For most simulated cases, covariate-assisted spectral clustering yields results superior to regularized spectral clustering without node covariates and to an adaptation of canonical correlation analysis. We apply our clustering method to large brain graphs derived from diffusion MRI data, using the node locations or neurological region membership as covariates. In both cases, covariate-assisted spectral clustering yields clusters that are easier to interpret neurologically.Comment: 28 pages, 4 figures, includes substantial changes to theoretical result

arXiv.org e-Print Archive

CiteSeerX