Search CORE

173,147 research outputs found

A Quasi-Random Approach to Matrix Spectral Analysis

Author: Ben-Or Michael
Eldar Lior
Publication venue
Publication date: 06/04/2017
Field of study

Inspired by the quantum computing algorithms for Linear Algebra problems [HHL,TaShma] we study how the simulation on a classical computer of this type of "Phase Estimation algorithms" performs when we apply it to solve the Eigen-Problem of Hermitian matrices. The result is a completely new, efficient and stable, parallel algorithm to compute an approximate spectral decomposition of any Hermitian matrix. The algorithm can be implemented by Boolean circuits in

O(\log^2 n)

parallel time with a total cost of

O(n^{\omega+1})

Boolean operations. This Boolean complexity matches the best known rigorous

O(\log^2 n)

parallel time algorithms, but unlike those algorithms our algorithm is (logarithmically) stable, so further improvements may lead to practical implementations. All previous efficient and rigorous approaches to solve the Eigen-Problem use randomization to avoid bad condition as we do too. Our algorithm makes further use of randomization in a completely new way, taking random powers of a unitary matrix to randomize the phases of its eigenvalues. Proving that a tiny Gaussian perturbation and a random polynomial power are sufficient to ensure almost pairwise independence of the phases

(\mod (2\pi))

is the main technical contribution of this work. This randomization enables us, given a Hermitian matrix with well separated eigenvalues, to sample a random eigenvalue and produce an approximate eigenvector in

O(\log^2 n)

parallel time and

O(n^\omega)

Boolean complexity. We conjecture that further improvements of our method can provide a stable solution to the full approximate spectral decomposition problem with complexity similar to the complexity (up to a logarithmic factor) of sampling a single eigenvector.Comment: Replacing previous version: parallel algorithm runs in total complexity

n^{\omega+1}

and not

n^{\omega}

. However, the depth of the implementing circuit is

\log^2(n)

: hence comparable to fastest eigen-decomposition algorithms know

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models

Author: Cheng Dehua
Cheng Yu
Liu Yan
Peng Richard
Teng Shang-Hua
Publication venue
Publication date: 20/10/2014
Field of study

Motivated by a sampling problem basic to computational statistical inference, we develop a nearly optimal algorithm for a fundamental problem in spectral graph theory and numerical analysis. Given an

n\times n

SDDM matrix

{\bf \mathbf{M}}

, and a constant

-1 \leq p \leq 1

, our algorithm gives efficient access to a sparse

n\times n

linear operator

\tilde{\mathbf{C}}

such that

{\mathbf{M}}^{p} \approx \tilde{\mathbf{C}} \tilde{\mathbf{C}}^\top.

The solution is based on factoring

{\bf \mathbf{M}}

into a product of simple and sparse matrices using squaring and spectral sparsification. For

{\mathbf{M}}

with

m

non-zero entries, our algorithm takes work nearly-linear in

m

, and polylogarithmic depth on a parallel machine with

m

processors. This gives the first sampling algorithm that only requires nearly linear work and

n

i.i.d. random univariate Gaussian samples to generate i.i.d. random samples for

n

-dimensional Gaussian random fields with SDDM precision matrices. For sampling this natural subclass of Gaussian random fields, it is optimal in the randomness and nearly optimal in the work and parallel complexity. In addition, our sampling algorithm can be directly extended to Gaussian random fields with SDD precision matrices

arXiv.org e-Print Archive

CiteSeerX

Optimal Parallel Randomized Algorithms for the Voronoi Diagram of Line Segments in the Plane and Related Problems

Author: Rajasekaran Sanguthevar
Ramaswami Suneeta
Publication venue: ScholarlyCommons
Publication date: 01/01/1993
Field of study

In this paper, we present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n non-intersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with very high probability and uses O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of P.T bounds since the sequential time bound for this problem is Ω(n log n). Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm which runs in O(log2 n) time using O(n) processors [13]. We obtain this result by using random sampling at two stages of our algorithm and using efficient randomized search techniques. This technique gives a direct optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use reduction from the 3-d convex hull construction)

CiteSeerX

ScholarlyCommons@Penn

Parallel Weighted Random Sampling

Author: Sanders Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

Dagstuhl Research Online Publication Server

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Author: Qu Meng
Tang Jian
Xu Shizhen
Zhu Zhaocheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.Comment: accepted at WWW 201

arXiv.org e-Print Archive

Crossref