62,278 research outputs found
The -invariant massive Laplacian on isoradial graphs
We introduce a one-parameter family of massive Laplacian operators
defined on isoradial graphs, involving elliptic
functions. We prove an explicit formula for the inverse of , the
massive Green function, which has the remarkable property of only depending on
the local geometry of the graph, and compute its asymptotics. We study the
corresponding statistical mechanics model of random rooted spanning forests. We
prove an explicit local formula for an infinite volume Boltzmann measure, and
for the free energy of the model. We show that the model undergoes a second
order phase transition at , thus proving that spanning trees corresponding
to the Laplacian introduced by Kenyon are critical. We prove that the massive
Laplacian operators provide a one-parameter
family of -invariant rooted spanning forest models. When the isoradial graph
is moreover -periodic, we consider the spectral curve of the
characteristic polynomial of the massive Laplacian. We provide an explicit
parametrization of the curve and prove that it is Harnack and has genus . We
further show that every Harnack curve of genus with
symmetry arises from such a massive
Laplacian.Comment: 71 pages, 13 figures, to appear in Inventiones mathematica
The Statistical Performance of Collaborative Inference
The statistical analysis of massive and complex data sets will require the
development of algorithms that depend on distributed computing and
collaborative inference. Inspired by this, we propose a collaborative framework
that aims to estimate the unknown mean of a random variable . In
the model we present, a certain number of calculation units, distributed across
a communication network represented by a graph, participate in the estimation
of by sequentially receiving independent data from while
exchanging messages via a stochastic matrix defined over the graph. We give
precise conditions on the matrix under which the statistical precision of
the individual units is comparable to that of a (gold standard) virtual
centralized estimate, even though each unit does not have access to all of the
data. We show in particular the fundamental role played by both the non-trivial
eigenvalues of and the Ramanujan class of expander graphs, which provide
remarkable performance for moderate algorithmic cost
On Stability and Similarity of Network Embeddings
Machine Learning on graphs has become an active research area due to the prevailing graph-structured data in the real world. Many real-world applications can be modeled with graphs. Modern application domains include web-scale social networks [26], recommender systems, knowledge graphs, and biological or protein networks. However, there are various challenges. First, the graphs generated from such applications are often large. Moreover, in some scenarios, the complete graph is not available, e.g., for privacy reasons. Thus, it becomes impractical to perform network analysis or compute various graph measures. Hence, graph sampling becomes an important task.Sampling is often the first step to handle any type of massive data. The same applies to graphs as well, which leads to many graph sampling techniques. Sampling Techniques include Node-based (e.g., Random Node Sampling), Edge-based (e.g., Random Edge Sampling) and Traversal-based (e.g., Random Walk Sampling). Graphs are often analyzed by first embedding (i.e., representing) them in some matrix/vector form with some number of dimensions. Various graph embedding methods have been developed to convert raw graph data into high dimensional vectors while preserving intrinsic graph properties [3]. The embedding methods focus on the node-level, edge-level [28], a hybrid, or at the graph level. This thesis focuses on graph-level embeddings which allows calculating similarity between two graphs. With the knowledge of embedding and sampling methods, the natural questions to ask are: 1) What is a good sampling size to ensure embeddings are similar enough to that of the original graph? 2) Do results depend on the sampling method? 3) Do they depend on the embedding method? 4) As we have embeddings, can we find some similarity between the original graph and sample? 5) How do we decide if the sample is good or not? How do we decide if the embedding is good or not? Essentially, if we have an embedding method and a sampling strategy, can we find the smallest sampling size that will give an Δ-similar embedding to that of the original graph? We will try to answer the above questions in the thesis and give a new perspective on graph sampling. The experiments are conducted on graphs with thousands of edges and nodes. The datasets include graphs from social networks, autonomous systems, peer-to-peer networks, and collabo- ration networks. Two sampling methods are targeted namely - Random Node Sampling, and Random Edge Sampling. Euclidean distance is used as a similarity metric. Experiments are car- ried out on Graph2vec, and Spectral Features(SF) graph embedding methods. Univariate analysis is performed to decide a minimum sample which gives, e.g., 40% minimum sample for 80% similarity. We also design a Regression model which predicts similarity for a given sampling size and graph properties. Finally, we analyze the stability of the embedding methods, where we find that that e.g., Graph2Vec is a stable embedding method
Estimating Graphlet Statistics via Lifting
Exploratory analysis over network data is often limited by the ability to
efficiently calculate graph statistics, which can provide a model-free
understanding of the macroscopic properties of a network. We introduce a
framework for estimating the graphlet count---the number of occurrences of a
small subgraph motif (e.g. a wedge or a triangle) in the network. For massive
graphs, where accessing the whole graph is not possible, the only viable
algorithms are those that make a limited number of vertex neighborhood queries.
We introduce a Monte Carlo sampling technique for graphlet counts, called {\em
Lifting}, which can simultaneously sample all graphlets of size up to
vertices for arbitrary . This is the first graphlet sampling method that can
provably sample every graphlet with positive probability and can sample
graphlets of arbitrary size . We outline variants of lifted graphlet counts,
including the ordered, unordered, and shotgun estimators, random walk starts,
and parallel vertex starts. We prove that our graphlet count updates are
unbiased for the true graphlet count and have a controlled variance for all
graphlets. We compare the experimental performance of lifted graphlet counts to
the state-of-the art graphlet sampling procedures: Waddling and the pairwise
subgraph random walk
Generating Procedural Materials from Text or Image Prompts
Node graph systems are used ubiquitously for material design in computer
graphics. They allow the use of visual programming to achieve desired effects
without writing code. As high-level design tools they provide convenience and
flexibility, but mastering the creation of node graphs usually requires
professional training. We propose an algorithm capable of generating multiple
node graphs from different types of prompts, significantly lowering the bar for
users to explore a specific design space. Previous work was limited to
unconditional generation of random node graphs, making the generation of an
envisioned material challenging. We propose a multi-modal node graph generation
neural architecture for high-quality procedural material synthesis which can be
conditioned on different inputs (text or image prompts), using a CLIP-based
encoder. We also create a substantially augmented material graph dataset, key
to improving the generation quality. Finally, we generate high-quality graph
samples using a regularized sampling process and improve the matching quality
by differentiable optimization for top-ranked samples. We compare our methods
to CLIP-based database search baselines (which are themselves novel) and
achieve superior or similar performance without requiring massive data storage.
We further show that our model can produce a set of material graphs
unconditionally, conditioned on images, text prompts or partial graphs, serving
as a tool for automatic visual programming completion
Brief Announcement: Streaming and Massively Parallel Algorithms for Edge Coloring
A valid edge-coloring of a graph is an assignment of "colors" to its edges such that no two incident edges receive the same color. The goal is to find a proper coloring that uses few colors. In this paper, we revisit this problem in two models of computation specific to massive graphs, the Massively Parallel Computations (MPC) model and the Graph Streaming model:
Massively Parallel Computation. We give a randomized MPC algorithm that w.h.p., returns a (1+o(1))Delta edge coloring in O(1) rounds using O~(n) space per machine and O(m) total space. The space per machine can also be further improved to n^{1-Omega(1)} if Delta = n^{Omega(1)}. This is, to our knowledge, the first constant round algorithm for a natural graph problem in the strongly sublinear regime of MPC. Our algorithm improves a previous result of Harvey et al. [SPAA 2018] which required n^{1+Omega(1)} space to achieve the same result.
Graph Streaming. Since the output of edge-coloring is as large as its input, we consider a standard variant of the streaming model where the output is also reported in a streaming fashion. The main challenge is that the algorithm cannot "remember" all the reported edge colors, yet has to output a proper edge coloring using few colors.
We give a one-pass O~(n)-space streaming algorithm that always returns a valid coloring and uses 5.44 Delta colors w.h.p., if the edges arrive in a random order. For adversarial order streams, we give another one-pass O~(n)-space algorithm that requires O(Delta^2) colors
Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs
As massive graphs become more prevalent, there is a rapidly growing need for
scalable algorithms that solve classical graph problems, such as maximum
matching and minimum vertex cover, on large datasets. For massive inputs,
several different computational models have been introduced, including the
streaming model, the distributed communication model, and the massively
parallel computation (MPC) model that is a common abstraction of
MapReduce-style computation. In each model, algorithms are analyzed in terms of
resources such as space used or rounds of communication needed, in addition to
the more traditional approximation ratio.
In this paper, we give a single unified approach that yields better
approximation algorithms for matching and vertex cover in all these models. The
highlights include:
* The first one pass, significantly-better-than-2-approximation for matching
in random arrival streams that uses subquadratic space, namely a
-approximation streaming algorithm that uses space
for constant .
* The first 2-round, better-than-2-approximation for matching in the MPC
model that uses subquadratic space per machine, namely a
-approximation algorithm with memory per
machine for constant .
By building on our unified approach, we further develop parallel algorithms
in the MPC model that give a -approximation to matching and an
-approximation to vertex cover in only MPC rounds and
memory per machine. These results settle multiple open
questions posed in the recent paper of Czumaj~et.al. [STOC 2018]
- âŠ