525 research outputs found
Parallelisation of greedy algorithms for compressive sensing reconstruction
Compressive Sensing (CS) is a technique which allows a signal to be compressed at the same
time as it is captured. The process of capturing and simultaneously compressing the signal is
represented as linear sampling, which can encompass a variety of physical processes or signal
processing. Instead of explicitly identifying redundancies in the source signal, CS relies on the
property of sparsity in order to reconstruct the compressed signal. While linear sampling is
much less burdensome than conventional compression, this is more than made up for by the high
computational cost of reconstructing a signal which has been captured using CS. Even when
using some of the fastest reconstruction techniques, known as greedy pursuits, reconstruction
of large problems can pose a significant burden, consuming a great deal of memory as well as
compute time.
Parallel computing is the foundation of the field of High Performance Computing (HPC).
Modern supercomputers are generally composed of large clusters of standard servers, with a
dedicated low-latency high-bandwidth interconnect network. On such a cluster, an appropriately
written program can harness vast quantities of memory and computational power. However, in
order to exploit a parallel compute resource, an algorithm usually has to be redesigned from
the ground up. In this thesis I describe the development of parallel variants of two algorithms
commonly used in CS reconstruction, Matching Pursuit (MP) and Orthogonal Matching Pursuit
(OMP), resulting in the new distributed compute algorithms DistMP and DistOMP. I present
the results from experiments showing how DistMP and DistOMP can utilise a compute cluster
to solve CS problems much more quickly than a single computer could alone. Speed-up of as
much as a factor of 76 is observed with DistMP when utilising 210 workers across 14 servers,
compared to a single worker. Finally, I demonstrate how DistOMP can solve a problem with a
429GB equivalent sampling matrix in as little as 62 minutes using a 16-node compute cluster.Funded by an ICASE award from the Engineering and Physical Sciences Research Council, with sponsorship provided by Thales Research and Technology
Processor design space exploration and performance prediction
The use of simulation is well established in processor design research to evaluate architectural design trade-offs. More importantly, Cycle by Cycle accurate simulation is widely used to evaluate the new designs in processor research because of its accurate and detailed processor performance measurement. However, only configuration in a subspace can be simulated in practice due to its long simulation time and limited resources, leading to suboptimal conclusions that might not be applied to the larger design space. In this thesis, we propose a performance prediction approach which employs a state-of-the-art technique from experimental design, machine learning and data mining. Our model can be trained initially by using Cycle by Cycle accurate simulation results, and then it can be implemented to predict the processor performance of the entire design space. According to our experiments, our model predicts the performance of a single-core processor with median percentage error ranging from 0.32% to 3.01% for about 15 million design spaces by using only 5000 initial independently sampled design points as a training set. In CMP the median percentage error ranges from 0.50% to 1.47% for about 9.7 million design spaces by using only 5000 independently sampled CMP design points as a training set. Apart from this, the model also provides quantitative interpretation tools such as variable importance and partial dependence of the design parameters
- …