94 research outputs found
Tools for efficient Deep Learning
In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption.
We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work.
This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C.
Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets.
All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces
Hyperbolic Concentration, Anti-concentration, and Discrepancy
Chernoff bound is a fundamental tool in theoretical computer science. It has
been extensively used in randomized algorithm design and stochastic type
analysis. Discrepancy theory, which deals with finding a bi-coloring of a set
system such that the coloring of each set is balanced, has a huge number of
applications in approximation algorithms design. Chernoff bound [Che52] implies
that a random bi-coloring of any set system with sets and elements will
have discrepancy with high probability, while the famous
result by Spencer [Spe85] shows that there exists an discrepancy
solution.
The study of hyperbolic polynomials dates back to the early 20th century when
used to solve PDEs by G{\aa}rding [G{\aa}r59]. In recent years, more
applications are found in control theory, optimization, real algebraic
geometry, and so on. In particular, the breakthrough result by Marcus,
Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to
prove the Kadison-Singer conjecture [KS59], which is closely related to
discrepancy theory.
In this paper, we present a list of new results for hyperbolic polynomials:
* We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher
sum of arbitrary vectors and another for random vectors in the hyperbolic cone.
* We show a hyperbolic anti-concentration bound.
* We generalize the hyperbolic Kadison-Singer theorem [Br\"a18] for vectors
in sub-isotropic position, and prove a hyperbolic Spencer theorem for any
constant hyperbolic rank vectors.
The classical matrix Chernoff and discrepancy results are based on
determinant polynomial. To the best of our knowledge, this paper is the first
work that shows either concentration or anti-concentration results for
hyperbolic polynomials. We hope our findings provide more insights into
hyperbolic and discrepancy theories
Revisiting Quantum Algorithms for Linear Regressions: Quadratic Speedups without Data-Dependent Parameters
Linear regression is one of the most fundamental linear algebra problems.
Given a dense matrix and a vector , the goal
is to find such that
. The best
classical algorithm takes time [Clarkson
and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand,
quantum linear regression algorithms can achieve exponential quantum speedups,
as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017,
Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of
these algorithms depend on some quantum linear algebra-related parameters, such
as , the condition number of . In this work, we develop a quantum
algorithm that runs in time. It provides a quadratic quantum speedup in
over the classical lower bound without any dependence on data-dependent
parameters. In addition, we also show our result can be generalized to multiple
regression and ridge linear regression
Collective modes of a collisional anisotropic quark-gluon plasma
In this paper we consider the collective modes of a momentum-space
anisotropic quark-gluon plasma taking into account the effect of collisions
between the plasma constituents. Our analysis is carried out using a
collisional kernel of Bhatnagar-Gross-Krook form and extends prior analyses in
the literature by considering all possible angles of propagation of the gluonic
modes relative to the momentum-anisotropy axis. We extract both the stable and
unstable modes as a function of the collision rate and confirm prior findings
that gluonic unstable modes can be eliminated from the spectrum if the
collision rate is sufficiently large. In addition, we discuss the conditions
necessary for the existence of unstable modes and present evidence that
unstable mode growth rates are maximal for modes with momentum along the
anisotropy direction. Finally, we demonstrate that when there is a finite
collisional rate, gluonic unstable modes are absent from the spectrum at both
small and large momentum anisotropy. These results pave the way for
understanding the impact of collisions on a variety of non-equilibrium
quark-gluon plasma observables.Comment: 19 pages and 15 figure
Symmetric Sparse Boolean Matrix Factorization and Applications
In this work, we study a variant of nonnegative matrix factorization where we
wish to find a symmetric factorization of a given input matrix into a sparse,
Boolean matrix. Formally speaking, given ,
we want to find such that is minimized among all for which
each row is -sparse. This question turns out to be closely related to a
number of questions like recovering a hypergraph from its line graph, as well
as reconstruction attacks for private neural network training.
As this problem is hard in the worst-case, we study a natural average-case
variant that arises in the context of these reconstruction attacks: for a random Boolean matrix with
-sparse rows, and the goal is to recover up to column
permutation. Equivalently, this can be thought of as recovering a uniformly
random -uniform hypergraph from its line graph.
Our main result is a polynomial-time algorithm for this problem based on
bootstrapping higher-order information about and then decomposing
an appropriate tensor. The key ingredient in our analysis, which may be of
independent interest, is to show that such a matrix has full
column rank with high probability as soon as , which
we do using tools from Littlewood-Offord theory and estimates for binary
Krawtchouk polynomials.Comment: 33 pages, to appear in Innovations in Theoretical Computer Science
(ITCS 2022), v2: updated ref
- …