The idea of software defined radio (SDR) describes a signal processing system for wireless
communications that allows performing major parts of the physical layer processing in
software. SDR systems are more flexible and have lower development costs than traditional
systems based on application-specific integrated circuits (ASICs). Yet, SDR requires
programmable processor architectures that can meet the throughput and energy efficiency
requirements of current third generation (3G) and future fourth generation (4G) wireless
standards for mobile devices.
Single instruction, multiple data (SIMD) processors operate on long data vectors in parallel
data lanes and can achieve a good ratio of computing power to energy consumption. Hence,
SIMD processors could be the basis of future SDR systems. Yet, SIMD processors only
achieve a high efficiency if all parallel data lanes can be utilized.
This thesis investigates the scalability of SIMD processing for algorithms required in 4G
wireless systems; i. e. the scaling of performance and energy consumption with increasing
SIMD vector lengths is explored. The basis of the exploration is a scalable SIMD processor
architecture, which also supports long instruction word (LIW) execution and can be
configured with four different permutation networks for vector element permutations.
Radix-2 and mixed-radix fast Fourier transform (FFT) algorithms, sphere decoding for
multiple input, multiple output (MIMO) systems, and the decoding of quasi-cyclic lowdensity
parity check (LDPC) codes have been examined, as these are key algorithms for
4G wireless systems. The results show that the performance of all algorithms scales with
the SIMD vector length, yet there are different constraints on the ratios between algorithm
and architecture parameters. The radix-2 FFT algorithm allows close to linear speedups
if the FFT size is at least twice the SIMD vector length, the mixed-radix FFT algorithm
requires the FFT size to be a multiple of the squared SIMD width. The performance of
the implemented sphere decoding algorithm scales linearly with the SIMD vector length.
The scalability of LDPC decoding is determined by the expansion factor of the quasicyclic
code. Wider SIMD processors offer better performance and also require less energy
than processors with a shorter vector length for all considered algorithms. The results for
different permutations networks show that a simple permutation network is sufficient for
most applications