1,445 research outputs found
Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity
The theory underlying robust distributed learning algorithms, designed to
resist adversarial machines, matches empirical observations when data is
homogeneous. Under data heterogeneity however, which is the norm in practical
scenarios, established lower bounds on the learning error are essentially
vacuous and greatly mismatch empirical observations. This is because the
heterogeneity model considered is too restrictive and does not cover basic
learning tasks such as least-squares regression. We consider in this paper a
more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and
show that it covers a larger class of learning problems than existing theory.
Notably, we show that the breakdown point under heterogeneity is lower than the
classical fraction 1/2. We also prove a new lower bound on the learning error
of any distributed learning algorithm. We derive a matching upper bound for a
robust variant of distributed gradient descent, and empirically show that our
analysis reduces the gap between theory and practice.Comment: Accepted to NeurIPS 202
BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a customized version of bootstrapping that can be instantiated with hardware multipliers optimized for area and power.
BASALISC is a three-abstraction-layer RISC architecture, designed for a 1 GHz ASIC implementation and underway toward 150mm2 die tape-out in a 12nm GF process. BASALISC\u27s four-layer memory hierarchy includes a two-dimensional conflict-free inner memory layer that enables 32 Tb/s radix-256 NTT computations without pipeline stalls. Its conflict-resolution permutation hardware is generalized and re-used to compute BGV automorphisms without throughput penalty. BASALISC also has a custom multiply-accumulate unit to accelerate BGV key switching.
The BASALISC toolchain comprises a custom compiler and a joint performance and correctness simulator. To evaluate BASALISC, we study its physical realizability, emulate and formally verify its core functional units, and we study its performance on a set of benchmarks. Simulation results show a speedup of more than 5,000× over HElib – a popular software FHE library
Swift: A modern highly-parallel gravity and smoothed particle hydrodynamics solver for astrophysical and cosmological applications
Numerical simulations have become one of the key tools used by theorists in
all the fields of astrophysics and cosmology. The development of modern tools
that target the largest existing computing systems and exploit state-of-the-art
numerical methods and algorithms is thus crucial. In this paper, we introduce
the fully open-source highly-parallel, versatile, and modular coupled
hydrodynamics, gravity, cosmology, and galaxy-formation code Swift. The
software package exploits hybrid task-based parallelism, asynchronous
communications, and domain-decomposition algorithms based on balancing the
workload, rather than the data, to efficiently exploit modern high-performance
computing cluster architectures. Gravity is solved for using a
fast-multipole-method, optionally coupled to a particle mesh solver in Fourier
space to handle periodic volumes. For gas evolution, multiple modern flavours
of Smoothed Particle Hydrodynamics are implemented. Swift also evolves
neutrinos using a state-of-the-art particle-based method. Two complementary
networks of sub-grid models for galaxy formation as well as extensions to
simulate planetary physics are also released as part of the code. An extensive
set of output options, including snapshots, light-cones, power spectra, and a
coupling to structure finders are also included. We describe the overall code
architecture, summarize the consistency and accuracy tests that were performed,
and demonstrate the excellent weak-scaling performance of the code using a
representative cosmological hydrodynamical problem with billion
particles. The code is released to the community alongside extensive
documentation for both users and developers, a large selection of example test
problems, and a suite of tools to aid in the analysis of large simulations run
with Swift.Comment: 39 pages, 18 figures, submitted to MNRAS. Code, documentation, and
examples available at www.swiftsim.co
Tools for efficient Deep Learning
In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption.
We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work.
This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C.
Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets.
All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces
- …