301 research outputs found
Counting Spanning Trees of Threshold Graphs
Cayley's formula states that there are spanning trees in the
complete graph on vertices; it has been proved in more than a dozen
different ways over its 150 year history. The complete graphs are a special
case of threshold graphs, and using Merris' Theorem and the Matrix Tree
Theorem, there is a strikingly simple formula for counting the number of
spanning trees in a threshold graph on vertices; it is simply the product,
over , of the number of vertices of degree at least . In
this manuscript, we provide a direct combinatorial proof for this formula which
does not use the Matrix Tree Theorem; the proof is an extension of Joyal's
proof for Cayley's formula. Then we apply this methodology to give a formula
for the number of spanning trees in any difference graph.Comment: 14 pages, 5 figure
On the Consistency of the Likelihood Maximization Vertex Nomination Scheme: Bridging the Gap Between Maximum Likelihood Estimation and Graph Matching
Given a graph in which a few vertices are deemed interesting a priori, the
vertex nomination task is to order the remaining vertices into a nomination
list such that there is a concentration of interesting vertices at the top of
the list. Previous work has yielded several approaches to this problem, with
theoretical results in the setting where the graph is drawn from a stochastic
block model (SBM), including a vertex nomination analogue of the Bayes optimal
classifier. In this paper, we prove that maximum likelihood (ML)-based vertex
nomination is consistent, in the sense that the performance of the ML-based
scheme asymptotically matches that of the Bayes optimal scheme. We prove
theorems of this form both when model parameters are known and unknown.
Additionally, we introduce and prove consistency of a related, more scalable
restricted-focus ML vertex nomination scheme. Finally, we incorporate vertex
and edge features into ML-based vertex nomination and briefly explore the
empirical effectiveness of this approach
On the Incommensurability Phenomenon
Suppose that two large, multi-dimensional data sets are each noisy
measurements of the same underlying random process, and principle components
analysis is performed separately on the data sets to reduce their
dimensionality. In some circumstances it may happen that the two
lower-dimensional data sets have an inordinately large Procrustean
fitting-error between them. The purpose of this manuscript is to quantify this
"incommensurability phenomenon." In particular, under specified conditions, the
square Procrustean fitting-error of the two normalized lower-dimensional data
sets is (asymptotically) a convex combination (via a correlation parameter) of
the Hausdorff distance between the projection subspaces and the maximum
possible value of the square Procrustean fitting-error for normalized data. We
show how this gives rise to the incommensurability phenomenon, and we employ
illustrative simulations as well as a real data experiment to explore how the
incommensurability phenomenon may have an appreciable impact
A consistent adjacency spectral embedding for stochastic blockmodel graphs
We present a method to estimate block membership of nodes in a random graph
generated by a stochastic blockmodel. We use an embedding procedure motivated
by the random dot product graph model, a particular example of the latent
position model. The embedding associates each node with a vector; these vectors
are clustered via minimization of a square error criterion. We prove that this
method is consistent for assigning nodes to blocks, as only a negligible number
of nodes will be mis-assigned. We prove consistency of the method for directed
and undirected graphs. The consistent block assignment makes possible
consistent parameter estimation for a stochastic blockmodel. We extend the
result in the setting where the number of blocks grows slowly with the number
of nodes. Our method is also computationally feasible even for very large
graphs. We compare our method to Laplacian spectral clustering through analysis
of simulated data and a graph derived from Wikipedia documents.Comment: 21 page
Vertex nomination schemes for membership prediction
Suppose that a graph is realized from a stochastic block model where one of
the blocks is of interest, but many or all of the vertices' block labels are
unobserved. The task is to order the vertices with unobserved block labels into
a ``nomination list'' such that, with high probability, vertices from the
interesting block are concentrated near the list's beginning. We propose
several vertex nomination schemes. Our basic - but principled - setting and
development yields a best nomination scheme (which is a Bayes-Optimal
analogue), and also a likelihood maximization nomination scheme that is
practical to implement when there are a thousand vertices, and which is
empirically near-optimal when the number of vertices is small enough to allow
comparison to the best nomination scheme. We then illustrate the robustness of
the likelihood maximization nomination scheme to the modeling challenges
inherent in real data, using examples which include a social network involving
human trafficking, the Enron Graph, a worm brain connectome and a political
blog network.Comment: Published at http://dx.doi.org/10.1214/15-AOAS834 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown
For random graphs distributed according to a stochastic block model, we
consider the inferential task of partioning vertices into blocks using spectral
techniques. Spectral partioning using the normalized Laplacian and the
adjacency matrix have both been shown to be consistent as the number of
vertices tend to infinity. Importantly, both procedures require that the number
of blocks and the rank of the communication probability matrix are known, even
as the rest of the parameters may be unknown. In this article, we prove that
the (suitably modified) adjacency-spectral partitioning procedure, requiring
only an upper bound on the rank of the communication probability matrix, is
consistent. Indeed, this result demonstrates a robustness to model
mis-specification; an overestimate of the rank may impose a moderate
performance penalty, but the procedure is still consistent. Furthermore, we
extend this procedure to the setting where adjacencies may have multiple
modalities and we allow for either directed or undirected graphs.Comment: 26 pages, 2 figur
Graph Matching: Relax at Your Own Risk
Graph matching---aligning a pair of graphs to minimize their edge
disagreements---has received wide-spread attention from both theoretical and
applied communities over the past several decades, including combinatorics,
computer vision, and connectomics. Its attention can be partially attributed to
its computational difficulty. Although many heuristics have previously been
proposed in the literature to approximately solve graph matching, very few have
any theoretical support for their performance. A common technique is to relax
the discrete problem to a continuous problem, therefore enabling practitioners
to bring gradient-descent-type algorithms to bear. We prove that an indefinite
relaxation (when solved exactly) almost always discovers the optimal
permutation, while a common convex relaxation almost always fails to discover
the optimal permutation. These theoretical results suggest that initializing
the indefinite algorithm with the convex optimum might yield improved practical
performance. Indeed, experimental results illuminate and corroborate these
theoretical findings, demonstrating that excellent results are achieved in both
benchmark and real data problems by amalgamating the two approaches.Comment: 14 pages, 11 figures, 3 table
Seeded Graph Matching
Given two graphs, the graph matching problem is to align the two vertex sets
so as to minimize the number of adjacency disagreements between the two graphs.
The seeded graph matching problem is the graph matching problem when we are
first given a partial alignment that we are tasked with completing. In this
paper, we modify the state-of-the-art approximate graph matching algorithm
"FAQ" of Vogelstein et al. (2015) to make it a fast approximate seeded graph
matching algorithm, adapt its applicability to include graphs with differently
sized vertex sets, and extend the algorithm so as to provide, for each
individual vertex, a nomination list of likely matches. We demonstrate the
effectiveness of our algorithm via simulation and real data experiments;
indeed, knowledge of even a few seeds can be extremely effective when our
seeded graph matching algorithm is used to recover a naturally existing
alignment that is only partially observed.Comment: 24 pages, 10 figure
Vertex nomination: The canonical sampling and the extended spectral nomination schemes
Suppose that one particular block in a stochastic block model is of interest,
but block labels are only observed for a few of the vertices in the network.
Utilizing a graph realized from the model and the observed block labels, the
vertex nomination task is to order the vertices with unobserved block labels
into a ranked nomination list with the goal of having an abundance of
interesting vertices near the top of the list. There are vertex nomination
schemes in the literature, including the optimally precise canonical nomination
scheme~ and the consistent spectral partitioning nomination
scheme~. While the canonical nomination scheme
is provably optimally precise, it is computationally intractable, being
impractical to implement even on modestly sized graphs. With this in mind, an
approximation of the canonical scheme---denoted the {\it canonical sampling
nomination scheme} ---is introduced;
relies on a scalable, Markov chain Monte Carlo-based approximation of
, and converges to as the amount of sampling
goes to infinity. The spectral partitioning nomination scheme is also extended
to the {\it extended spectral partitioning nomination scheme},
, which introduces a novel semisupervised clustering
framework to improve upon the precision of . Real-data and
simulation experiments are employed to illustrate the precision of these vertex
nomination schemes, as well as their empirical computational complexity.
Keywords: vertex nomination, Markov chain Monte Carlo, spectral partitioning,
Mclust MSC[2010]: 60J22, 65C40, 62H30, 62H2
Spectral Clustering for Divide-and-Conquer Graph Matching
We present a parallelized bijective graph matching algorithm that leverages
seeds and is designed to match very large graphs. Our algorithm combines
spectral graph embedding with existing state-of-the-art seeded graph matching
procedures. We justify our approach by proving that modestly correlated, large
stochastic block model random graphs are correctly matched utilizing very few
seeds through our divide-and-conquer procedure. We also demonstrate the
effectiveness of our approach in matching very large graphs in simulated and
real data examples, showing up to a factor of 8 improvement in runtime with
minimal sacrifice in accuracy.Comment: 32 pages, 8 figure
- …