We propose a soft processor programming
model and architecture inspired by graphics processing units
(GPUs) that are well-matched to the strengths of FPGAs,
namely, highly parallel and pipelinable computation. In
particular, our soft processor architecture exploits multithreading,
vector operations, and predication to supply a
floating-point pipeline of 64 stages via hardware support
for up to 256 concurrent thread contexts. The key new
contributions of our architecture are mechanisms for managing
threads and register files that maximize data-level and
instruction-level parallelism while overcoming the challenges
of port limitations of FPGA block memories as well as
memory and pipeline latency. Through simulation of a
system that (i) is programmable via NVIDIA's high-level
Cg language, (ii) supports AMD's CTM r5xx GPU ISA, and
(iii) is realizable on an XtremeData XD1000 FPGA-based
accelerator system, we demonstrate the potential for such
a system to achieve 100% utilization of a deeply pipelined
floating-point datapath