62,278 research outputs found

    The ZZ-invariant massive Laplacian on isoradial graphs

    Full text link
    We introduce a one-parameter family of massive Laplacian operators (Δm(k))k∈[0,1)(\Delta^{m(k)})_{k\in[0,1)} defined on isoradial graphs, involving elliptic functions. We prove an explicit formula for the inverse of Δm(k)\Delta^{m(k)}, the massive Green function, which has the remarkable property of only depending on the local geometry of the graph, and compute its asymptotics. We study the corresponding statistical mechanics model of random rooted spanning forests. We prove an explicit local formula for an infinite volume Boltzmann measure, and for the free energy of the model. We show that the model undergoes a second order phase transition at k=0k=0, thus proving that spanning trees corresponding to the Laplacian introduced by Kenyon are critical. We prove that the massive Laplacian operators (Δm(k))k∈(0,1)(\Delta^{m(k)})_{k\in(0,1)} provide a one-parameter family of ZZ-invariant rooted spanning forest models. When the isoradial graph is moreover Z2\mathbb{Z}^2-periodic, we consider the spectral curve of the characteristic polynomial of the massive Laplacian. We provide an explicit parametrization of the curve and prove that it is Harnack and has genus 11. We further show that every Harnack curve of genus 11 with (z,w)↔(z−1,w−1)(z,w)\leftrightarrow(z^{-1},w^{-1}) symmetry arises from such a massive Laplacian.Comment: 71 pages, 13 figures, to appear in Inventiones mathematica

    The Statistical Performance of Collaborative Inference

    Get PDF
    The statistical analysis of massive and complex data sets will require the development of algorithms that depend on distributed computing and collaborative inference. Inspired by this, we propose a collaborative framework that aims to estimate the unknown mean Ξ\theta of a random variable XX. In the model we present, a certain number of calculation units, distributed across a communication network represented by a graph, participate in the estimation of Ξ\theta by sequentially receiving independent data from XX while exchanging messages via a stochastic matrix AA defined over the graph. We give precise conditions on the matrix AA under which the statistical precision of the individual units is comparable to that of a (gold standard) virtual centralized estimate, even though each unit does not have access to all of the data. We show in particular the fundamental role played by both the non-trivial eigenvalues of AA and the Ramanujan class of expander graphs, which provide remarkable performance for moderate algorithmic cost

    On Stability and Similarity of Network Embeddings

    Get PDF
    Machine Learning on graphs has become an active research area due to the prevailing graph-structured data in the real world. Many real-world applications can be modeled with graphs. Modern application domains include web-scale social networks [26], recommender systems, knowledge graphs, and biological or protein networks. However, there are various challenges. First, the graphs generated from such applications are often large. Moreover, in some scenarios, the complete graph is not available, e.g., for privacy reasons. Thus, it becomes impractical to perform network analysis or compute various graph measures. Hence, graph sampling becomes an important task.Sampling is often the first step to handle any type of massive data. The same applies to graphs as well, which leads to many graph sampling techniques. Sampling Techniques include Node-based (e.g., Random Node Sampling), Edge-based (e.g., Random Edge Sampling) and Traversal-based (e.g., Random Walk Sampling). Graphs are often analyzed by first embedding (i.e., representing) them in some matrix/vector form with some number of dimensions. Various graph embedding methods have been developed to convert raw graph data into high dimensional vectors while preserving intrinsic graph properties [3]. The embedding methods focus on the node-level, edge-level [28], a hybrid, or at the graph level. This thesis focuses on graph-level embeddings which allows calculating similarity between two graphs. With the knowledge of embedding and sampling methods, the natural questions to ask are: 1) What is a good sampling size to ensure embeddings are similar enough to that of the original graph? 2) Do results depend on the sampling method? 3) Do they depend on the embedding method? 4) As we have embeddings, can we find some similarity between the original graph and sample? 5) How do we decide if the sample is good or not? How do we decide if the embedding is good or not? Essentially, if we have an embedding method and a sampling strategy, can we find the smallest sampling size that will give an Δ-similar embedding to that of the original graph? We will try to answer the above questions in the thesis and give a new perspective on graph sampling. The experiments are conducted on graphs with thousands of edges and nodes. The datasets include graphs from social networks, autonomous systems, peer-to-peer networks, and collabo- ration networks. Two sampling methods are targeted namely - Random Node Sampling, and Random Edge Sampling. Euclidean distance is used as a similarity metric. Experiments are car- ried out on Graph2vec, and Spectral Features(SF) graph embedding methods. Univariate analysis is performed to decide a minimum sample which gives, e.g., 40% minimum sample for 80% similarity. We also design a Regression model which predicts similarity for a given sampling size and graph properties. Finally, we analyze the stability of the embedding methods, where we find that that e.g., Graph2Vec is a stable embedding method

    Estimating Graphlet Statistics via Lifting

    Full text link
    Exploratory analysis over network data is often limited by the ability to efficiently calculate graph statistics, which can provide a model-free understanding of the macroscopic properties of a network. We introduce a framework for estimating the graphlet count---the number of occurrences of a small subgraph motif (e.g. a wedge or a triangle) in the network. For massive graphs, where accessing the whole graph is not possible, the only viable algorithms are those that make a limited number of vertex neighborhood queries. We introduce a Monte Carlo sampling technique for graphlet counts, called {\em Lifting}, which can simultaneously sample all graphlets of size up to kk vertices for arbitrary kk. This is the first graphlet sampling method that can provably sample every graphlet with positive probability and can sample graphlets of arbitrary size kk. We outline variants of lifted graphlet counts, including the ordered, unordered, and shotgun estimators, random walk starts, and parallel vertex starts. We prove that our graphlet count updates are unbiased for the true graphlet count and have a controlled variance for all graphlets. We compare the experimental performance of lifted graphlet counts to the state-of-the art graphlet sampling procedures: Waddling and the pairwise subgraph random walk

    Generating Procedural Materials from Text or Image Prompts

    Full text link
    Node graph systems are used ubiquitously for material design in computer graphics. They allow the use of visual programming to achieve desired effects without writing code. As high-level design tools they provide convenience and flexibility, but mastering the creation of node graphs usually requires professional training. We propose an algorithm capable of generating multiple node graphs from different types of prompts, significantly lowering the bar for users to explore a specific design space. Previous work was limited to unconditional generation of random node graphs, making the generation of an envisioned material challenging. We propose a multi-modal node graph generation neural architecture for high-quality procedural material synthesis which can be conditioned on different inputs (text or image prompts), using a CLIP-based encoder. We also create a substantially augmented material graph dataset, key to improving the generation quality. Finally, we generate high-quality graph samples using a regularized sampling process and improve the matching quality by differentiable optimization for top-ranked samples. We compare our methods to CLIP-based database search baselines (which are themselves novel) and achieve superior or similar performance without requiring massive data storage. We further show that our model can produce a set of material graphs unconditionally, conditioned on images, text prompts or partial graphs, serving as a tool for automatic visual programming completion

    Brief Announcement: Streaming and Massively Parallel Algorithms for Edge Coloring

    Get PDF
    A valid edge-coloring of a graph is an assignment of "colors" to its edges such that no two incident edges receive the same color. The goal is to find a proper coloring that uses few colors. In this paper, we revisit this problem in two models of computation specific to massive graphs, the Massively Parallel Computations (MPC) model and the Graph Streaming model: Massively Parallel Computation. We give a randomized MPC algorithm that w.h.p., returns a (1+o(1))Delta edge coloring in O(1) rounds using O~(n) space per machine and O(m) total space. The space per machine can also be further improved to n^{1-Omega(1)} if Delta = n^{Omega(1)}. This is, to our knowledge, the first constant round algorithm for a natural graph problem in the strongly sublinear regime of MPC. Our algorithm improves a previous result of Harvey et al. [SPAA 2018] which required n^{1+Omega(1)} space to achieve the same result. Graph Streaming. Since the output of edge-coloring is as large as its input, we consider a standard variant of the streaming model where the output is also reported in a streaming fashion. The main challenge is that the algorithm cannot "remember" all the reported edge colors, yet has to output a proper edge coloring using few colors. We give a one-pass O~(n)-space streaming algorithm that always returns a valid coloring and uses 5.44 Delta colors w.h.p., if the edges arrive in a random order. For adversarial order streams, we give another one-pass O~(n)-space algorithm that requires O(Delta^2) colors

    Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs

    Full text link
    As massive graphs become more prevalent, there is a rapidly growing need for scalable algorithms that solve classical graph problems, such as maximum matching and minimum vertex cover, on large datasets. For massive inputs, several different computational models have been introduced, including the streaming model, the distributed communication model, and the massively parallel computation (MPC) model that is a common abstraction of MapReduce-style computation. In each model, algorithms are analyzed in terms of resources such as space used or rounds of communication needed, in addition to the more traditional approximation ratio. In this paper, we give a single unified approach that yields better approximation algorithms for matching and vertex cover in all these models. The highlights include: * The first one pass, significantly-better-than-2-approximation for matching in random arrival streams that uses subquadratic space, namely a (1.5+Ï”)(1.5+\epsilon)-approximation streaming algorithm that uses O(n1.5)O(n^{1.5}) space for constant Ï”>0\epsilon > 0. * The first 2-round, better-than-2-approximation for matching in the MPC model that uses subquadratic space per machine, namely a (1.5+Ï”)(1.5+\epsilon)-approximation algorithm with O(mn+n)O(\sqrt{mn} + n) memory per machine for constant Ï”>0\epsilon > 0. By building on our unified approach, we further develop parallel algorithms in the MPC model that give a (1+Ï”)(1 + \epsilon)-approximation to matching and an O(1)O(1)-approximation to vertex cover in only O(log⁥log⁥n)O(\log\log{n}) MPC rounds and O(n/polylog⁥(n))O(n/poly\log{(n)}) memory per machine. These results settle multiple open questions posed in the recent paper of Czumaj~et.al. [STOC 2018]
    • 

    corecore