580 research outputs found
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
Sparse convolution plays a pivotal role in emerging workloads, including
point cloud processing in AR/VR, autonomous driving, and graph understanding in
recommendation systems. Since the computation pattern is sparse and irregular,
specialized high-performance kernels are required. Existing GPU libraries offer
two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is
easy to implement but not optimal in performance, while the dataflows with
overlapped computation and memory access (e.g.implicit GEMM) are highly
performant but have very high engineering costs. In this paper, we introduce
TorchSparse++, a new GPU library that achieves the best of both worlds. We
create a highly efficient Sparse Kernel Generator that generates performant
sparse convolution kernels at less than one-tenth of the engineering cost of
the current state-of-the-art system. On top of this, we design the Sparse
Autotuner, which extends the design space of existing sparse convolution
libraries and searches for the best dataflow configurations for training and
inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and
1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art
MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is
1.2-1.3x faster than SpConv v2 in mixed precision training across seven
representative autonomous driving benchmarks. It also seamlessly supports graph
convolutions, achieving 2.6-7.6x faster inference speed compared with
state-of-the-art graph deep learning libraries.Comment: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this
projec
Optimising algorithm and hardware for deep neural networks on FPGAs
This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs).
The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime.
The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process.
The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Despite the soaring use of convolutional neural networks (CNNs) in mobile
applications, uniformly sustaining high-performance inference on mobile has
been elusive due to the excessive computational demands of modern CNNs and the
increasing diversity of deployed devices. A popular alternative comprises
offloading CNN processing to powerful cloud-based servers. Nevertheless, by
relying on the cloud to produce outputs, emerging mission-critical and
high-mobility applications, such as drone obstacle avoidance or interactive
applications, can suffer from the dynamic connectivity conditions and the
uncertain availability of the cloud. In this paper, we propose SPINN, a
distributed inference system that employs synergistic device-cloud computation
together with a progressive inference method to deliver fast and robust CNN
inference across diverse settings. The proposed system introduces a novel
scheduler that co-optimises the early-exit policy and the CNN splitting at run
time, in order to adapt to dynamic conditions and meet user-defined
service-level requirements. Quantitative evaluation illustrates that SPINN
outperforms its state-of-the-art collaborative inference counterparts by up to
2x in achieved throughput under varying network conditions, reduces the server
cost by up to 6.8x and improves accuracy by 20.7% under latency constraints,
while providing robust operation under uncertain connectivity conditions and
significant energy savings compared to cloud-centric execution.Comment: Accepted at the 26th Annual International Conference on Mobile
Computing and Networking (MobiCom), 202
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
The challenging deployment of compute-intensive applications from domains
such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces
the community of computing systems to explore new design approaches.
Approximate Computing appears as an emerging solution, allowing to tune the
quality of results in the design of a system in order to improve the energy
efficiency and/or performance. This radical paradigm shift has attracted
interest from both academia and industry, resulting in significant research on
approximation techniques and methodologies at different design layers (from
system down to integrated circuits). Motivated by the wide appeal of
Approximate Computing over the last 10 years, we conduct a two-part survey to
cover key aspects (e.g., terminology and applications) and review the
state-of-the art approximation techniques from all layers of the traditional
computing stack. In Part II of our survey, we classify and present the
technical details of application-specific and architectural approximation
techniques, which both target the design of resource-efficient
processors/accelerators & systems. Moreover, we present a detailed analysis of
the application spectrum of Approximate Computing and discuss open challenges
and future directions.Comment: Under Review at ACM Computing Survey
- …