173,147 research outputs found

    A Quasi-Random Approach to Matrix Spectral Analysis

    Get PDF
    Inspired by the quantum computing algorithms for Linear Algebra problems [HHL,TaShma] we study how the simulation on a classical computer of this type of "Phase Estimation algorithms" performs when we apply it to solve the Eigen-Problem of Hermitian matrices. The result is a completely new, efficient and stable, parallel algorithm to compute an approximate spectral decomposition of any Hermitian matrix. The algorithm can be implemented by Boolean circuits in O(log2n)O(\log^2 n) parallel time with a total cost of O(nω+1)O(n^{\omega+1}) Boolean operations. This Boolean complexity matches the best known rigorous O(log2n)O(\log^2 n) parallel time algorithms, but unlike those algorithms our algorithm is (logarithmically) stable, so further improvements may lead to practical implementations. All previous efficient and rigorous approaches to solve the Eigen-Problem use randomization to avoid bad condition as we do too. Our algorithm makes further use of randomization in a completely new way, taking random powers of a unitary matrix to randomize the phases of its eigenvalues. Proving that a tiny Gaussian perturbation and a random polynomial power are sufficient to ensure almost pairwise independence of the phases (mod(2π))(\mod (2\pi)) is the main technical contribution of this work. This randomization enables us, given a Hermitian matrix with well separated eigenvalues, to sample a random eigenvalue and produce an approximate eigenvector in O(log2n)O(\log^2 n) parallel time and O(nω)O(n^\omega) Boolean complexity. We conjecture that further improvements of our method can provide a stable solution to the full approximate spectral decomposition problem with complexity similar to the complexity (up to a logarithmic factor) of sampling a single eigenvector.Comment: Replacing previous version: parallel algorithm runs in total complexity nω+1n^{\omega+1} and not nωn^{\omega}. However, the depth of the implementing circuit is log2(n)\log^2(n): hence comparable to fastest eigen-decomposition algorithms know

    Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models

    Full text link
    Motivated by a sampling problem basic to computational statistical inference, we develop a nearly optimal algorithm for a fundamental problem in spectral graph theory and numerical analysis. Given an n×nn\times n SDDM matrix M{\bf \mathbf{M}}, and a constant 1p1-1 \leq p \leq 1, our algorithm gives efficient access to a sparse n×nn\times n linear operator C~\tilde{\mathbf{C}} such that MpC~C~.{\mathbf{M}}^{p} \approx \tilde{\mathbf{C}} \tilde{\mathbf{C}}^\top. The solution is based on factoring M{\bf \mathbf{M}} into a product of simple and sparse matrices using squaring and spectral sparsification. For M{\mathbf{M}} with mm non-zero entries, our algorithm takes work nearly-linear in mm, and polylogarithmic depth on a parallel machine with mm processors. This gives the first sampling algorithm that only requires nearly linear work and nn i.i.d. random univariate Gaussian samples to generate i.i.d. random samples for nn-dimensional Gaussian random fields with SDDM precision matrices. For sampling this natural subclass of Gaussian random fields, it is optimal in the randomness and nearly optimal in the work and parallel complexity. In addition, our sampling algorithm can be directly extended to Gaussian random fields with SDD precision matrices

    Optimal Parallel Randomized Algorithms for the Voronoi Diagram of Line Segments in the Plane and Related Problems

    Get PDF
    In this paper, we present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n non-intersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with very high probability and uses O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of P.T bounds since the sequential time bound for this problem is Ω(n log n). Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm which runs in O(log2 n) time using O(n) processors [13]. We obtain this result by using random sampling at two stages of our algorithm and using efficient randomized search techniques. This technique gives a direct optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use reduction from the 3-d convex hull construction)

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

    Full text link
    Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.Comment: accepted at WWW 201
    corecore