Popular sorting algorithms do not translate well
into hardware implementations. Instead, hardware-based solutions
like sorting networks, systolic sorters, and linear sorters
exploit parallelism to increase sorting efficiency. Linear sorters,
built from identical nodes with simple control, have less area
and latency than sorting networks, but they are limited in
their throughput. We present a system composed of multiple
linear sorters acting in parallel to increase overall throughput.
Interleaving is used to increase bandwidth and allow sorting
of multiple values per clock cycle, and the amount of
interleaving and depth of the linear sorters can be adapted
to suit specific applications. Contention for available linear
sorters in the system is solved through the use of buffers that
accumulate conflicting requests, dispatching them in bulk to
reduce latency penalties. Implementation of this system into a
field programmable gate array (FPGA) results in a speedup
of 68 compared to a MicroBlaze processor running quicksort