The overarching objective of this thesis was to develop tools for parallelising, optimising,
and implementing algorithms on parallel architectures, in particular General Purpose
Graphics Processors (GPGPUs). Two projects were chosen from different application areas
in which GPGPUs are used: a defence application involving image compression, and a
modelling application in bioinformatics (computational immunology). Each project had its
own specific objectives, as well as supporting the overall research goal.
The defence / image compression project was carried out in collaboration with the Jet
Propulsion Laboratories. The specific questions were: to what extent an algorithm designed
for bit-serial for the lossless compression of hyperspectral images on-board unmanned
vehicles (UAVs) in hardware could be parallelised, whether GPGPUs could be used to
implement that algorithm, and whether a software implementation with or without GPGPU
acceleration could match the throughput of a dedicated hardware (FPGA) implementation.
The dependencies within the algorithm were analysed, and the algorithm parallelised. The
algorithm was implemented in software for GPGPU, and optimised. During the optimisation
process, profiling revealed less than optimal device utilisation, but no further optimisations
resulted in an improvement in speed. The design had hit a local-maximum of performance.
Analysis of the arithmetic intensity and data-flow exposed flaws in the standard optimisation
metric of kernel occupancy used for GPU optimisation. Redesigning the implementation
with revised criteria (fused kernels, lower occupancy, and greater data locality) led to a new
implementation with 10x higher throughput. GPGPUs were shown to be viable for on-board
implementation of the CCSDS lossless hyperspectral image compression algorithm,
exceeding the performance of the hardware reference implementation, and providing
sufficient throughput for the next generation of image sensor as well.
The second project was carried out in collaboration with biologists at the University of
Arizona and involved modelling a complex biological system – VDJ recombination involved
in the formation of T-cell receptors (TCRs). Generation of immune receptors (T cell receptor
and antibodies) by VDJ recombination is an enormously complex process, which can
theoretically synthesize greater than 1018 variants. Originally thought to be a random
process, the underlying mechanisms clearly have a non-random nature that preferentially
creates a small subset of immune receptors in many individuals. Understanding this bias is a
longstanding problem in the field of immunology. Modelling the process of VDJ
recombination to determine the number of ways each immune receptor can be synthesized,
previously thought to be untenable, is a key first step in determining how this special
population is made. The computational tools developed in this thesis have allowed
immunologists for the first time to comprehensively test and invalidate a longstanding theory
(convergent recombination) for how this special population is created, while generating the
data needed to develop novel hypothesis