2 research outputs found
A note on quickly sampling a sparse matrix with low rank expectation
Given matrices and with
positive elements, this paper proposes an algorithm fastRG to sample a sparse
matrix with low rank expectation and independent Poisson
elements. This allows for quickly sampling from a broad class of stochastic
blockmodel graphs (degree-corrected, mixed membership, overlapping) all of
which are specific parameterizations of the generalized random product graph
model defined in Section 2.2. The basic idea of fastRG is to first sample the
number of edges and then sample each edge. The key insight is that because
of the the low rank expectation, it is easy to sample individual edges. The
naive "element-wise" algorithm requires operations to generate the
adjacency matrix . In sparse graphs, where , ignoring
log terms, fastRG runs in time . An implementation in fastRG is available
on github. A computational experiment in Section 2.4 simulates graphs up to
nodes with edges. For example, on a graph with
and , fastRG runs in less than one second on a 3.5
GHz Intel i5
Covariate-assisted spectral clustering
Biological and social systems consist of myriad interacting units. The
interactions can be represented in the form of a graph or network. Measurements
of these graphs can reveal the underlying structure of these interactions,
which provides insight into the systems that generated the graphs. Moreover, in
applications such as connectomics, social networks, and genomics, graph data
are accompanied by contextualizing measures on each node. We utilize these node
covariates to help uncover latent communities in a graph, using a modification
of spectral clustering. Statistical guarantees are provided under a joint
mixture model that we call the node-contextualized stochastic blockmodel,
including a bound on the mis-clustering rate. The bound is used to derive
conditions for achieving perfect clustering. For most simulated cases,
covariate-assisted spectral clustering yields results superior to regularized
spectral clustering without node covariates and to an adaptation of canonical
correlation analysis. We apply our clustering method to large brain graphs
derived from diffusion MRI data, using the node locations or neurological
region membership as covariates. In both cases, covariate-assisted spectral
clustering yields clusters that are easier to interpret neurologically.Comment: 28 pages, 4 figures, includes substantial changes to theoretical
result