2 research outputs found
GEVO: GPU Code Optimization using Evolutionary Computation
GPUs are a key enabler of the revolution in machine learning and high
performance computing, functioning as de facto co-processors to accelerate
large-scale computation. As the programming stack and tool support have
matured, GPUs have also become accessible to programmers, who may lack detailed
knowledge of the underlying architecture and fail to fully leverage the GPU's
computation power. GEVO (Gpu optimization using EVOlutionary computation) is a
tool for automatically discovering optimization opportunities and tuning the
performance of GPU kernels in the LLVM representation. GEVO uses
population-based search to find edits to GPU code compiled to LLVM-IR and
improves performance on desired criteria while retaining required
functionality. We demonstrate that GEVO improves the execution time of the GPU
programs in the Rodinia benchmark suite and the machine learning models, SVM
and ResNet18, on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves
GPU kernel runtime performance by an average of 49.48% and by as much as 412%
over the fully compiler-optimized baseline. If kernel output accuracy is
relaxed to tolerate up to 1% error, GEVO can find kernel variants that
outperform the baseline version by an average of 51.08%. For the machine
learning workloads, GEVO achieves kernel performance improvement for SVM on the
MNIST handwriting recognition (3.24X) and the a9a income prediction (2.93X)
datasets with no loss of model accuracy. GEVO achieves 1.79X kernel performance
improvement on image classification using ResNet18/CIFAR-10, with less than 1%
model accuracy reduction
Synthesizing Safe and Efficient Kernel Extensions for Packet Processing
Extended Berkeley Packet Filter (BPF) has emerged as a powerful method to
extend packet-processing functionality in the Linux operating system. BPF
allows users to write code in high-level languages (like C or Rust) and execute
them at specific hooks in the kernel, such as the network device driver. To
ensure safe execution of a user-developed BPF program in kernel context, Linux
uses an in-kernel static checker. The checker allows a program to execute only
if it can prove that the program is crash-free, always accesses memory within
safe bounds, and avoids leaking kernel data.
BPF programming is not easy. One, even modest-sized BPF programs are deemed
too large to analyze and rejected by the kernel checker. Two, the kernel
checker may incorrectly determine that a BPF program exhibits unsafe behaviors.
Three, even small performance optimizations to BPF code (e.g., 5% gains) must
be meticulously hand-crafted by expert developers. Traditional optimizing
compilers for BPF are often inadequate since the kernel checker's safety
constraints are incompatible with rule-based optimizations.
We present K2, a program-synthesis-based compiler that automatically
optimizes BPF bytecode with formal correctness and safety guarantees. K2
produces code with 6--26% reduced size, 1.36%--55.03% lower average
packet-processing latency, and 0--4.75% higher throughput (packets per second
per core) relative to the best clang-compiled program, across benchmarks drawn
from Cilium, Facebook, and the Linux kernel. K2 incorporates several
domain-specific techniques to make synthesis practical by accelerating
equivalence-checking of BPF programs by 6 orders of magnitude