Search CORE

94 research outputs found

Tools for efficient Deep Learning

Author: Zhao Ruizhe
Publication venue: Computing, Imperial College London
Publication date: 01/09/2023
Field of study

In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

Spiral - Imperial College Digital Repository

Hyperbolic Concentration, Anti-concentration, and Discrepancy

Author: Song Zhao
Zhang Ruizhe
Publication venue
Publication date: 05/07/2021
Field of study

Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with

n

sets and

n

elements will have discrepancy

O(\sqrt{n \log n})

with high probability, while the famous result by Spencer [Spe85] shows that there exists an

O(\sqrt{n})

discrepancy solution. The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by G{\aa}rding [G{\aa}r59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory. In this paper, we present a list of new results for hyperbolic polynomials: * We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone. * We show a hyperbolic anti-concentration bound. * We generalize the hyperbolic Kadison-Singer theorem [Br\"a18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors. The classical matrix Chernoff and discrepancy results are based on determinant polynomial. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Hyperbolic Concentration, Anti-Concentration, and Discrepancy

Author: Song Zhao
Zhang Ruizhe
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Revisiting Quantum Algorithms for Linear Regressions: Quadratic Speedups without Data-Dependent Parameters

Author: Song Zhao
Yin Junze
Zhang Ruizhe
Publication venue
Publication date: 24/11/2023
Field of study

Linear regression is one of the most fundamental linear algebra problems. Given a dense matrix

A \in \mathbb{R}^{n \times d}

and a vector

b

, the goal is to find

x'

such that

\| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2

. The best classical algorithm takes

O(nd) + \mathrm{poly}(d/\epsilon)

time [Clarkson and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand, quantum linear regression algorithms can achieve exponential quantum speedups, as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017, Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of these algorithms depend on some quantum linear algebra-related parameters, such as

\kappa(A)

, the condition number of

A

. In this work, we develop a quantum algorithm that runs in

\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)

time. It provides a quadratic quantum speedup in

n

over the classical lower bound without any dependence on data-dependent parameters. In addition, we also show our result can be generalized to multiple regression and ridge linear regression

arXiv.org e-Print Archive

Collective modes of a collisional anisotropic quark-gluon plasma

Author: Guo Yun
Qiu Luhua
Strickland Michael
Zhao Ruizhe
Publication venue
Publication date: 22/06/2023
Field of study

In this paper we consider the collective modes of a momentum-space anisotropic quark-gluon plasma taking into account the effect of collisions between the plasma constituents. Our analysis is carried out using a collisional kernel of Bhatnagar-Gross-Krook form and extends prior analyses in the literature by considering all possible angles of propagation of the gluonic modes relative to the momentum-anisotropy axis. We extract both the stable and unstable modes as a function of the collision rate and confirm prior findings that gluonic unstable modes can be eliminated from the spectrum if the collision rate is sufficiently large. In addition, we discuss the conditions necessary for the existence of unstable modes and present evidence that unstable mode growth rates are maximal for modes with momentum along the anisotropy direction. Finally, we demonstrate that when there is a finite collisional rate, gluonic unstable modes are absent from the spectrum at both small and large momentum anisotropy. These results pave the way for understanding the impact of collisions on a variety of non-equilibrium quark-gluon plasma observables.Comment: 19 pages and 15 figure

arXiv.org e-Print Archive

Symmetric Sparse Boolean Matrix Factorization and Applications

Author: Chen Sitan
Song Zhao
Tao Runzhou
Zhang Ruizhe
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given

\mathbf{M}\in\mathbb{Z}^{m\times m}

, we want to find

\mathbf{W}\in\{0,1\}^{m\times r}

such that

\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0

is minimized among all

\mathbf{W}

for which each row is

k

-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks:

\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}

for

\mathbf{W}

a random Boolean matrix with

k

-sparse rows, and the goal is to recover

\mathbf{W}

up to column permutation. Equivalently, this can be thought of as recovering a uniformly random

k

-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about

\mathbf{W}

and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix

\mathbf{W}

has full column rank with high probability as soon as

m = \widetilde{\Omega}(r)

, which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.Comment: 33 pages, to appear in Innovations in Theoretical Computer Science (ITCS 2022), v2: updated ref

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server