13 research outputs found
Simple parallel and distributed algorithms for spectral graph sparsification
We describe a simple algorithm for spectral graph sparsification, based on
iterative computations of weighted spanners and uniform sampling. Leveraging
the algorithms of Baswana and Sen for computing spanners, we obtain the first
distributed spectral sparsification algorithm. We also obtain a parallel
algorithm with improved work and time guarantees. Combining this algorithm with
the parallel framework of Peng and Spielman for solving symmetric diagonally
dominant linear systems, we get a parallel solver which is much closer to being
practical and significantly more efficient in terms of the total work.Comment: replaces "A simple parallel and distributed algorithm for spectral
sparsification". Minor change
Optimal approximate matrix product in terms of stable rank
We prove, using the subspace embedding guarantee in a black box way, that one
can achieve the spectral norm guarantee for approximate matrix multiplication
with a dimensionality-reducing map having
rows. Here is the maximum stable rank, i.e. squared ratio of
Frobenius and operator norms, of the two matrices being multiplied. This is a
quantitative improvement over previous work of [MZ11, KVZ14], and is also
optimal for any oblivious dimensionality-reducing map. Furthermore, due to the
black box reliance on the subspace embedding property in our proofs, our
theorem can be applied to a much more general class of sketching matrices than
what was known before, in addition to achieving better bounds. For example, one
can apply our theorem to efficient subspace embeddings such as the Subsampled
Randomized Hadamard Transform or sparse subspace embeddings, or even with
subspace embedding constructions that may be developed in the future.
Our main theorem, via connections with spectral error matrix multiplication
shown in prior work, implies quantitative improvements for approximate least
squares regression and low rank approximation. Our main result has also already
been applied to improve dimensionality reduction guarantees for -means
clustering [CEMMP14], and implies new results for nonparametric regression
[YPW15].
We also separately point out that the proof of the "BSS" deterministic
row-sampling result of [BSS12] can be modified to show that for any matrices
of stable rank at most , one can achieve the spectral norm
guarantee for approximate matrix multiplication of by deterministically
sampling rows that can be found in polynomial
time. The original result of [BSS12] was for rank instead of stable rank. Our
observation leads to a stronger version of a main theorem of [KMST10].Comment: v3: minor edits; v2: fixed one step in proof of Theorem 9 which was
wrong by a constant factor (see the new Lemma 5 and its use; final theorem
unaffected
Ultrasparse Ultrasparsifiers and Faster Laplacian System Solvers
In this paper we provide an -expected time algorithm for solving Laplacian systems on
-node -edge graphs, improving improving upon the previous best expected
runtime of achieved
by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result
we provide efficient constructions of -stretch graph approximations
with improved stretch and sparsity bounds. Additionally, as motivation for this
work, we show that for every set of vectors in (not just those
induced by graphs) and all there exist an ultra-sparsifiers with re-weighted vectors of relative condition number at most . For
small , this improves upon the previous best known multiplicative factor of
, which is only known for the graph case.Comment: 52 pages, comments welcome
SCALABLE INTEGRATED CIRCUIT SIMULATION ALGORITHMS FOR ENERGY-EFFICIENT TERAFLOP HETEROGENEOUS PARALLEL COMPUTING PLATFORMS
Integrated circuit technology has gone through several decades of aggressive scaling.It is increasingly challenging to analyze growing design complexity. Post-layout SPICE simulation can be computationally prohibitive due to the huge amount of parasitic elements, which can easily boost the computation and memory cost. As the decrease in device size, the circuits become more vulnerable to process variations. Designers need to statistically simulate the probability that a circuit does not meet the performance metric, which requires millions times of simulations to capture rare failure events.
Recent, multiprocessors with heterogeneous architecture have emerged as mainstream computing platforms. The heterogeneous computing platform can achieve highthroughput energy efficient computing. However, the application of such platform is not trivial and needs to reinvent existing algorithms to fully utilize the computing resources. This dissertation presents several new algorithms to address those aforementioned two significant and challenging issues on the heterogeneous platform.
Harmonic Balance (HB) analysis is essential for efficient verification of large postlayout RF and microwave integrated circuits (ICs). However, existing methods either suffer from excessively long simulation time and prohibitively large memory consumption or exhibit poor stability. This dissertation introduces a novel transient-simulation guided graph sparsification technique, as well as an efficient runtime performance modeling approach tailored for heterogeneous manycore CPU-GPU computing system to build nearly-optimal subgraph preconditioners that can lead to minimum HB simulation runtime. Additionally, we propose a novel heterogeneous parallel sparse block matrix algorithm by taking advantages of the structure of HB Jacobian matrices as well as GPU’s streaming multiprocessors to achieve optimal workload balancing during the preconditioning phase of HB analysis. We also show how the proposed preconditioned iterative algorithm can efficiently adapt to heterogeneous computing systems with different CPU and GPU computing capabilities. Extensive experimental results show that our HB solver can achieve up to 20X speedups and 5X memory reduction when compared with the state-of-the-art direct solver highly optimized for twelve-core CPUs.
In nowadays variation-aware IC designs, cell characterizations and SRAM memory yield analysis require many thousands or even millions of repeated SPICE simulations for relatively small nonlinear circuits. In this dissertation, for the first time, we present a massively parallel SPICE simulator on GPU, TinySPICE, for efficiently analyzing small nonlinear circuits. TinySPICE integrates a highly-optimized shared-memory based matrix solver and fast parametric three-dimensional (3D) LUTs based device evaluation method. A novel circuit clustering method is also proposed to improve the stability and efficiency of the matrix solver. Compared with CPU-based SPICE simulator, TinySPICE achieves up to 264X speedups for parametric SRAM yield analysis without loss of accuracy
The Complexity of Network Design for s-t Eff ective Resistance
We consider a new problem of designing a network with small - effective resistance.
In this problem, we are given an undirected graph where each edge has a cost and a resistance , two designated vertices , and a cost budget .
Our goal is to choose a subgraph to minimize the - effective resistance, subject to the constraint that the total cost in the subgraph is at most .
This problem has applications in electrical network design and is an interpolation between the shortest path problem and the minimum cost flow problem.
We present algorithmic and hardness results for this problem.
On the hardness side, we show that the problem is NP-hard by reducing the 3-dimensional matching problem to our problem.
On the algorithmic side, we use dynamic programming to obtain a fully polynomial time approximation scheme when the input graph is a series-parallel graph. Finally, we propose a greedy algorithm for general graphs in which we add a path at each iteration and we conjecture that the algorithm is a -approximation algorithm for the problem
Topics in Matrix Sampling Algorithms
We study three fundamental problems of Linear Algebra, lying in the heart of
various Machine Learning applications, namely: 1)"Low-rank Column-based Matrix
Approximation". We are given a matrix A and a target rank k. The goal is to
select a subset of columns of A and, by using only these columns, compute a
rank k approximation to A that is as good as the rank k approximation that
would have been obtained by using all the columns; 2) "Coreset Construction in
Least-Squares Regression". We are given a matrix A and a vector b. Consider the
(over-constrained) least-squares problem of minimizing ||Ax-b||, over all
vectors x in D. The domain D represents the constraints on the solution and can
be arbitrary. The goal is to select a subset of the rows of A and b and, by
using only these rows, find a solution vector that is as good as the solution
vector that would have been obtained by using all the rows; 3) "Feature
Selection in K-means Clustering". We are given a set of points described with
respect to a large number of features. The goal is to select a subset of the
features and, by using only this subset, obtain a k-partition of the points
that is as good as the partition that would have been obtained by using all the
features. We present novel algorithms for all three problems mentioned above.
Our results can be viewed as follow-up research to a line of work known as
"Matrix Sampling Algorithms". [Frieze, Kanna, Vempala, 1998] presented the
first such algorithm for the Low-rank Matrix Approximation problem. Since then,
such algorithms have been developed for several other problems, e.g. Graph
Sparsification and Linear Equation Solving. Our contributions to this line of
research are: (i) improved algorithms for Low-rank Matrix Approximation and
Regression (ii) algorithms for a new problem domain (K-means Clustering).Comment: PhD Thesis, 150 page