We show that program synthesis can generate GPU algorithms as well as their optimized implementations. Using the scan kernel as a case study, we describe our evolving synthesis techniques. Relying on our synthesizer, we can parallelize a serial problem by transforming it into a scan operation, synthesize a SIMD scan algorithm, and optimize it to reduce memory conflicts. 1 The Problem Parallel codes are almost exclusively hand-written. Their coding is time-consuming in part for these reasons: • Need for new algorithms. To map a problem to hardware, we may need a new algorithm. For example, by reformulating a serial problem as a prefix sum, we can solve it on data-parallel architectures
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.