Implementation of a protocol and channel coding strategy for use in ground-satellite applications by Wiid, Riaan
Implementation of a Protocol and
Channel Coding Strategy for use in
Ground-Satellite Applications
by
Riaan Wiid
Thesis presented in partial fulfilment of the requirements
for the degree Master of Science in Engineering at
Stellenbosch University
Supervisor : Dr. R. Wolhuter
Department of Electrical & Electronic Engineering
March 2012
Declaration
By submitting this thesis electronically, I declare that the entirety of the work
contained therein is my own, original work, that I am the sole author thereof
(save to the extent explicitly otherwise stated), that reproduction and pub-
lication thereof by Stellenbosch University will not infringe any third party
rights and that I have not previously in its entirety or in part submitted it for
obtaining any qualification.
Date : March 2012
i
Stellenbosch University   http://scholar.sun.ac.za
Abstract
Implementation of a Protocol and Channel Coding
Strategy for use in Ground-Satellite Applications
R. Wiid
Department of Electrical and Electronic Engineering,
University of Stellenbosch,
Private Bag X1, Matieland 7602, South Africa.
Thesis: MScEng (E&E)
March 2012
A collaboration between the Katholieke Universiteit van Leuven (KUL) and
Stellenbosch University (SU), resulted in the development of a satellite based
platform for use in agricultural sensing applications. This will primarily serve
as a test platform for a digitally beam-steerable antenna array (SAA) that was
developed by KUL. SU developed all flight - and ground station based hardware
and software, enabling ground to flight communications and interfacing with
the KUL SAA. Although most components had already been completed at the
start of this M.Sc.Eng. project, final systems integration was still unfinished.
Modules necessary for communication were also outstanding. This project
implemented an automatic repeat and request (ARQ) strategy for reliable file
transfer across the wireless link. Channel coding has also been implemented
on a field programmable gate array (FPGA). This layer includes an advanced
forward error correction (FEC) scheme i.e. a low-density parity-check (LDPC),
which outperforms traditional FEC techniques. A flexible architecture for
channel coding has been designed that allows speed and complexity trade-offs
on the FPGA. All components have successfully been implemented, tested and
integrated. Simulations of LDPC on the FPGA have been shown to provide
excellent error correcting performance. The prototype has been completed and
recently successfully demonstrated at KUL. Data has been reliably transferred
between the satellite platform and a ground station, during this event.
ii
Stellenbosch University   http://scholar.sun.ac.za
Uittreksel
Implementasie van ’n Kommunikasie Protokol en
Kanaalkoderingstrategie vir Gebruik in Grond-Satelliet
Toepassings
R. Wiid
Departement Elektries en Elektroniese Ingenieurswese,
Universiteit van Stellenbosch,
Privaatsak X1, Matieland 7602, Suid Afrika.
Tesis: MScIng (E&E)
Maart 2012
Tydens ’n samewerkingsooreenkoms tussen die Katholieke Universiteit van
Leuven (KUL) en die Universiteit van Stellenbosch (US) is ’n satelliet stelsel
ontwikkel vir sensor-netwerk toepassings in die landbou bedryf. Hierdie stel-
sel sal hoofsaaklik dien as ’n toetsmedium vir ’n digitaal stuurbare antenna
(SAA) wat deur KUL ontwikkel is. Die US het alle hardeware en sagteware
komponente ontwikkel om kommunikasie d.m.v die SAA tussen die satelliet en
’n grondstasie te bewerkstellig. Sedert die begin van hierdie M.Sc.Ing. pro-
jek was die meeste komponente alreeds ontwikkel en geïmplementeer, maar
finale stelselsintegrasie moes nog voltooi word. Modules wat kommunikasie
sou bewerkstellig was ook nog uistaande. Hierdie projek het ’n ARQ proto-
kol geïmplementeer wat data betroubaar tussen die satelliet en ’n grondstasie
kon oordra. Kanaalkodering is ook op ’n veld programmeerbare hekskikking
(FPGA) geïmplementeer. ’n Gevorderde foutkorrigeringstelsel, naamlik ’n lae
digtheids pariteit toetskode (LDPC), wat tradisionele foutkorrigeringstelsels
se doeltreffendheid oortref, word op hierdie FPGA geïmplementeer. ’n Ka-
naalkoderingsargitektuur is ook ontwikkel om die verwerkingspoed van data
en die hoeveelheid FPGA logika wat gebruik word, teenoor mekaar op te weeg.
Alle komponente is suksesvol geïmplementeer, getoets en geïntegreer met die
hele stelsel. Simulasies van LDPC op die FPGA het uistekende foutkorrige-
ringsresultate gelewer. ’n Werkende prototipe is onlangs voltooi en suksesvol
gedemonstreer by KUL. Betroubare data oordrag tussen die satelliet en die
grondstasie is tydens hierdie demonstrasie bevestig.
iii
Stellenbosch University   http://scholar.sun.ac.za
Acknowledgements
I would like to express my gratitude towards the following persons :
• God for always inspiring me to give my best.
• Dr. Riaan Wolhuter, my study leader, for his wisdom and guidance
during difficult challenges of this project.
• Dr. Gert-Jan van Rooyen who provided many insights during the soft-
ware design of this project.
• Rob Anderson for helping to track down and identify countless RF prob-
lems.
• Project colleagues Ewald van der Westhuizen, Wynand van Eden and
Kobus Botha who provided many technical assistance during systems
integration.
• Dr. Vladimir Volski and Hadi Aliakbarian for helping to successfully
demonstrate this project in Leuven.
• Jaco du Toit for sharing his insights of LDPC FEC with me.
• All my friends and colleagues from the DSP lab for interesting conver-
sations over a cup of coffee.
• My parents for constantly supporting and motivating me during the
course of this Masters degree.
iv
Stellenbosch University   http://scholar.sun.ac.za
Contents
Abstract ii
Uittreksel iii
Acknowledgements iv
Contents v
List of Figures vii
List of Tables xi
List of Abbreviations xii
Nomenclature xv
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation for Work . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Project Contributions and Summary . . . . . . . . . . . . . . . 3
1.5 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Previous Work and Literature Review 5
2.1 IS-HS 2 Theory of Operation . . . . . . . . . . . . . . . . . . . . 6
2.2 Existing Work on IS-HS 2 . . . . . . . . . . . . . . . . . . . . . 6
2.3 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Inter Protocol Layer Communication . . . . . . . . . . . . . . . 11
2.5 Error Control Strategies . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Wireless Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Detail Design 29
3.1 Block Error Probability Analysis . . . . . . . . . . . . . . . . . 29
3.2 Channel Coding Design . . . . . . . . . . . . . . . . . . . . . . . 34
v
Stellenbosch University   http://scholar.sun.ac.za
CONTENTS vi
3.3 BCH FEC Design . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 LDPC FEC Design . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 IPC Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Software Protocol Layers . . . . . . . . . . . . . . . . . . . . . . 53
3.7 FEC Block Error Rate Simulation . . . . . . . . . . . . . . . . . 59
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 Implementation 65
4.1 Existing Hardware Layout . . . . . . . . . . . . . . . . . . . . . 65
4.2 Channel Coding Implementation . . . . . . . . . . . . . . . . . . 67
4.3 BCH Implementation . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 LDPC Implementation . . . . . . . . . . . . . . . . . . . . . . . 74
4.5 IPC Implementation . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 TM Implementation . . . . . . . . . . . . . . . . . . . . . . . . 88
4.7 ARQ Implementation . . . . . . . . . . . . . . . . . . . . . . . . 92
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5 Testing, Results and Discussion 99
5.1 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 BCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 LDPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 TM and ARQ Protocols . . . . . . . . . . . . . . . . . . . . . . 109
5.5 Belgium Demonstration . . . . . . . . . . . . . . . . . . . . . . 110
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6 Conclusion, Contributions
and Recommendations 114
6.1 Conclusion and Summary . . . . . . . . . . . . . . . . . . . . . 114
6.2 Contributions to the Project . . . . . . . . . . . . . . . . . . . . 115
6.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . 116
References 118
Appendices 122
A Mathematical Derivations 123
A.1 QPSK Bit Error Probability Analysis . . . . . . . . . . . . . . . 123
A.2 Signal-to-Noise Ratio for Simulations . . . . . . . . . . . . . . . 129
Stellenbosch University   http://scholar.sun.ac.za
List of Figures
2.1 Communications channel block diagram. . . . . . . . . . . . . . . . 5
2.2 Intended interaction between the satellite and ground station plat-
forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Block diagram of both ground station and satellite platforms. . . . 8
2.4 Layers of the OSI model [1]. . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Shared memory between 2 processes. . . . . . . . . . . . . . . . . . 12
2.6 QNX message passing between a client and server process. . . . . . 14
2.7 A message queue shared between two processes. . . . . . . . . . . . 15
2.8 A FEC encoder and decoder in a communications channel. . . . . . 18
2.9 A (7,4) Hamming code’s parity bit dependency diagram. . . . . . . 19
2.10 Visual representation of a parity check matrix. . . . . . . . . . . . . 23
(a) Parity check matrix of the Tanner graph in Fig. 2.10b. . . 23
(b) Tanner graph of parity check matrix in Fig. 2.10a . . . . . 23
2.11 A transmitter and receiver communicating over a wireless channel. . 24
2.12 Section of a communications channel included in a data error prob-
ability model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.13 A BSC model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.14 A BEC model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.15 A BI-AWGN channel model. . . . . . . . . . . . . . . . . . . . . . . 27
3.1 QPSK symbol decision - and error regions. . . . . . . . . . . . . . . 30
(a) A QPSK signal constellation. . . . . . . . . . . . . . . . . 30
(b) Error region for symbol S1 in Fig. 3.1a. . . . . . . . . . . 30
3.2 Gaussian white noise added to S1. . . . . . . . . . . . . . . . . . . . 30
3.3 An aircraft passing over a ground station at altitude h = 3 km. . . 32
3.4 Codeword error probability vs. SNR for BCH when using block
length n = 511 bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Interaction between channel coding modules on the FPGA for both
ground station and satellite platforms. . . . . . . . . . . . . . . . . 34
3.6 CRC encoding procedure. . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Pseudo random sequence LFSR. . . . . . . . . . . . . . . . . . . . . 37
3.8 LFSR computed by the Berlekamp-Massey algorithm [2]. . . . . . . 41
3.9 A Chien search circuit [3]. . . . . . . . . . . . . . . . . . . . . . . . 44
3.10 A QC-LDPC parity matrix structure. . . . . . . . . . . . . . . . . . 45
vii
Stellenbosch University   http://scholar.sun.ac.za
LIST OF FIGURES viii
(a) Circulant composition of a QC-LDPC parity check marix
H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
(b) An example of a 5x5 circulant permutation matrix. . . . . 45
3.11 Message passing along the edges of a Tanner graph. . . . . . . . . . 47
3.12 Dimensions of sub-matrices within H. . . . . . . . . . . . . . . . . . 51
3.13 Layout of H using the template in Fig. 3.12. . . . . . . . . . . . . . 51
3.14 Message passing IPC using POSIX semaphores and shared memory. 52
3.15 Interaction between OSI software layers. . . . . . . . . . . . . . . . 54
3.16 TM frame layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.17 TM header layout [1]. . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.18 Setting FHP when ARQ packet spans multiple TM frames. . . . . . 57
3.19 An ARQ packet structure. . . . . . . . . . . . . . . . . . . . . . . . 58
3.20 Packet round trip time measurement. . . . . . . . . . . . . . . . . . 60
3.21 General operation of simulator. . . . . . . . . . . . . . . . . . . . . 61
3.22 Bit probability decision making. . . . . . . . . . . . . . . . . . . . . 62
3.23 Hardware based BER simulation. . . . . . . . . . . . . . . . . . . . 63
4.1 Channel coding on the satellite platform. . . . . . . . . . . . . . . . 65
4.2 Channel coding on the ground station platform. . . . . . . . . . . . 67
4.3 Interface of a general channel coding module. . . . . . . . . . . . . 68
4.4 Timing diagram of the channel coding module from Fig. 4.3. . . . . 68
(a) Timing diagram when receiving data on dat_in. . . . . . . 68
(b) Timing diagram when outputting data on dat_out. . . . . 68
4.5 State machine diagram of the module in Fig. 4.3. . . . . . . . . . . 69
4.6 Serial implementation of a polynomial division LFSR. . . . . . . . . 71
4.7 Codeword as constructed by BCH encoder. . . . . . . . . . . . . . . 72
4.8 State machine diagram of a BCH decoder. . . . . . . . . . . . . . . 73
4.9 A BCH decoder’s hardware layout. . . . . . . . . . . . . . . . . . . 74
4.10 Cyclic matrix multiplier architecture from [4]. . . . . . . . . . . . . 75
4.11 A reduced complexity cyclic matrix multiplier architecture. . . . . . 75
4.12 A parallelised implemenation of Fig. 4.11. . . . . . . . . . . . . . . 76
4.13 A modified TM frame structure for half code rate LDPC. . . . . . . 77
4.14 State machine diagram of a LDPC encoder. . . . . . . . . . . . . . 78
4.15 LDPC encoder hardware layout. . . . . . . . . . . . . . . . . . . . . 79
4.16 State machine diagram of a LDPC decoder. . . . . . . . . . . . . . 80
4.17 General hardware layout of a LDPC decoder. . . . . . . . . . . . . 82
4.18 A 5-bit LLR value. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.19 A CNU’s hardware layout. . . . . . . . . . . . . . . . . . . . . . . . 83
4.20 A VNU’s hardware layout. . . . . . . . . . . . . . . . . . . . . . . . 84
4.21 RAM block configuration for H. . . . . . . . . . . . . . . . . . . . . 85
4.22 A square permutation matrix stored as a row vector. . . . . . . . . 86
4.23 Address partitioning of a 9-kbit RAM block. . . . . . . . . . . . . . 86
4.24 Logical to physical address translation. . . . . . . . . . . . . . . . . 86
4.25 Startup of a TM module. . . . . . . . . . . . . . . . . . . . . . . . . 89
Stellenbosch University   http://scholar.sun.ac.za
LIST OF FIGURES ix
4.26 The receive thread of a TM module. . . . . . . . . . . . . . . . . . 90
4.27 The transmit thread of a TM module. . . . . . . . . . . . . . . . . 91
4.28 Startup of an ARQ module. . . . . . . . . . . . . . . . . . . . . . . 92
4.29 Round trip time measurements. . . . . . . . . . . . . . . . . . . . . 93
4.30 The receive thread of an ARQ module. . . . . . . . . . . . . . . . . 95
4.31 The acknowledge procedure from Fig. 4.30. . . . . . . . . . . . . . 96
4.32 The transmit thread of an ARQ module. . . . . . . . . . . . . . . . 97
5.1 Hardware loopback test in FPGA for channel coding modules. . . . 100
5.2 Measurement of channel coding’s processing delay. . . . . . . . . . . 101
(a) Encoding delay measurement. . . . . . . . . . . . . . . . . 101
(b) Decoding delay measurement. . . . . . . . . . . . . . . . . 101
5.3 BLER plot of a (511,484) BCH code. . . . . . . . . . . . . . . . . . 103
5.4 BLER plot of a (511,259) BCH code. . . . . . . . . . . . . . . . . . 104
5.5 Optimal α search for the (512, 256) code. . . . . . . . . . . . . . . . 106
5.6 Optimal α = 0.9 compared against α = 1. . . . . . . . . . . . . . . 106
5.7 Termination of α = 0.9 scaling at different iteration counts. . . . . . 107
5.8 Comparison between ET = 15 iterations and no ET when using
α = 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9 Comparison between FPGA and Matlab simulations for α = 0.9
and ET = 15 iterations. . . . . . . . . . . . . . . . . . . . . . . . . 108
5.10 Bit error rate comparison between half rate BCH and LDPC im-
plementations from Matlab. LDPC uses ET = 15 and α = 0.9. . . . 108
5.11 TM and ARQ testing procedure. . . . . . . . . . . . . . . . . . . . 110
5.12 IS-HS 2 demo setup in Belgium. . . . . . . . . . . . . . . . . . . . . 111
5.13 Application receiving files from ARQ on ground station FIT-PC. . . 111
5.14 CRC error rate while moving from A to B in Fig. 5.12. . . . . . . . 112
5.15 Images received on the ground station after file transfer from the
satellite platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
(a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
(b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
(c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1 QPSK symbol decision - and error regions. . . . . . . . . . . . . . . 123
(a) A QPSK signal constellation. . . . . . . . . . . . . . . . . 123
(b) Error region for symbol S1 in Fig. A.1a. . . . . . . . . . . 123
A.2 Vector representation of Gaussian noise added to symbol S1. . . . . 124
(a) Gaussian white noise added to S1. . . . . . . . . . . . . . . 124
(b) Unit area of integration when using polar coordinates. . . 124
A.3 Gray coding scheme for the symbols of a QPSK constellation. . . . 127
A.4 Time domain BPSK signal. . . . . . . . . . . . . . . . . . . . . . . 129
A.5 Frequency domain of a BPSK bit. . . . . . . . . . . . . . . . . . . . 129
Stellenbosch University   http://scholar.sun.ac.za
LIST OF FIGURES x
A.6 A QPSK symbol amplitude i.t.o two BPSK symbols on channels I
and Q. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Stellenbosch University   http://scholar.sun.ac.za
List of Tables
2.1 Description of OSI layers in Fig. 2.4. . . . . . . . . . . . . . . . . . 11
3.1 Link budget parameters for the uplink when using aircraft altitude
h = 3 km and elevation angle θ = 30◦. . . . . . . . . . . . . . . . . 32
3.2 PC to FPGA control byte values. . . . . . . . . . . . . . . . . . . . 64
4.1 SH4 to FPGA expansion port lines and their description. . . . . . . 66
4.2 Message passing IPC functions for Linux Ubuntu 7.10. . . . . . . . 87
5.1 Channel coding FPGA module implementation details. . . . . . . . 100
5.2 BCH FPGA module implementation details. . . . . . . . . . . . . . 102
5.3 LDPC FPGA module implementation details. . . . . . . . . . . . . 105
xi
Stellenbosch University   http://scholar.sun.ac.za
List of Abbreviations
A/D Analogue to Digital
API Application Programming Interface
ARQ Automatic Repeat-Request
ASE Aircraft Satellite Emulator
ASM Attached Synchronisation Marker
AWGN Additive White Gaussian Noise
BCH Bose Chaudhuri Hocquenghem
BEC Binary Erasure Channel
BEP Bit Error Probability
BER Bit Error Ratio
BI-AWGN Binary AWGN
BLEP Block Error Probability
BLER Block Error Rate
BMA Berlekamp Massey Algorithm
BP Belief Propagation
BPSK Binary Phase Shift Keying
BSC Binary Symmetric Channel
CAN Controller-Area Network
CCSDS Consultative Committee for Space Data Systems
CFDP CCSDS File Delivery Protocol
CNU Check Node Update
COTS Commercial Off-The-Shelf
CPU Central Processing Unit
CRC Cyclic Redundancy Check
DSP Digital Signal Processing
DVB-S Digital Video Broadcast Satellite
EA Euclidean Algorithm
EDAC Error Detection And Correction
ECSS European Cooperation for Space Standardization
ET Early Termination
FEC Forward Error Correction
FER Frame Error Rate
FHP First Header Pointer
FIFO First In First Out
xii
Stellenbosch University   http://scholar.sun.ac.za
LIST OF ABBREVIATIONS xiii
FPGA Field Programmable Gate Array
FSM Finite State Machine
FTDI Future Technology Devices Incorporated
GEO Geostationary Orbit
GF Galois Field
GPS Global Positioning System
GS Ground Station
GUI Graphical User Interface
ID Identifier
I/O Input and Output
IP Internet Protocol
IPC Interprocess Communication
ISE Integrated Software Environment
IS-HS In-Situ-Hyperspectral
ISM Industrial Scientific and Medical
ISO International Organisation for Standardization
KUL Katholieke Universiteit van Leuven
LCM Least Common Multiple
LDPC Low Density Parity Check
LE Logic Element
LEO Low Earth Orbit
LFSR Linear Feedback Shift Register
LLR Log Likelihood Ratio
LUT Lookup Table
MEO Medium Earth Orbit
MPA Message Passing Algorithm
MS Minimum-Sum
OBC On-Board Computer
OS Operating System
OSI Open Systems Interconnection
PC Personal Computer
PDF Probability Distribution Function
PDF Probability Density Function
POSIX Portable Operating System Interface for Unix
QC Quasi-Cyclic
QPSK Quadrature Phase Shift Keying
RAM Random Access Memory
RISC Reduced Instruction Set Computer
RF Radio Frequency
RTT Round Trip Time
RX Receive
SAA Steerable Antenna Array
SCPS-TP Space Communications Protocol Specification Transport Protocol
SCSS Satellite Communication Software System
Stellenbosch University   http://scholar.sun.ac.za
LIST OF ABBREVIATIONS xiv
SDR Software Defined Radio
SP Sum-Product
SNR Signal-to-Noise Ratio
TC Telecommand
TCP Transmission Control Protocol
TM Telemetry
TX Transmit
UART Universal Asynchronous Receiver Transmitter
USB Universal Serial Bus
VHSIC Very High Speed Integrated Circuit
VHDL VHSIC Hardware Description Language
VNU Variable Node Update
XOR Exclusive-OR
Stellenbosch University   http://scholar.sun.ac.za
Nomenclature
Greek Letters:
σ Standard deviation
σ2 Variance
ρ Probability density function
ωr Row weight
ωc Column weight
Matrices and Vectors:
Matrices and vectors will always describe the following unless otherwise spec-
ified.
G Generator matrix
H Parity check matrix
I Identity matrix
P Parity matrix
c Codeword vector
c(X) BCH codeword vector in polynomial format
d(X) BCH message vector in polynomial format
e(X) BCH error vector in polynomial format
g(X) BCH generator polynomial
s Syndrome vector
r Remainder after division
r(X) BCH remainder after division in polynomial format
s(X) BCH syndrome vector in polynomial format
x Message vector
φ(X) BCH minimal polynomial
xv
Stellenbosch University   http://scholar.sun.ac.za
NOMENCLATURE xvi
Subscripts and Superscripts:
Subscripts and superscripts will always describe the following unless otherwise
specified.
i Parity matrix column index
j Parity matrix row index
T Matrix transpose
Units:
bps bits per second
dB Decibel
Hz Hertz
m meter
s second
W Watt
Variables:
Variables will always describe the following unless otherwise specified.
A Amplitude
Ccap Channel capacity
Eb Bit energy
Es Symbol energy
I In-phase axis
k Message length
Lji LLR check node to variable node message
No Noise spectral density
n Codeword length
Q Quadrature axis
qij Variable node to check node message
R Code rate
rji Check node to variable node message
t Number of correctable errors by BCH
Zij LLR variable node to check node message
Stellenbosch University   http://scholar.sun.ac.za
Chapter 1
Introduction
1.1 Background
Agricultural institutions are often required to track changes in the physical
environment, on a regular basis. These include air temperature, ground mois-
ture levels and many more. Sensor stations collecting this data are sometimes
situated in very remote areas, making it difficult to reach by foot or vehicle.
Large areas might also have too much sensor data to collect manually. A pos-
sible solution is to deploy a telemetry system that wirelessly gathers data from
sensor array stations. Collected data will then be routed to a central station
for further processing. This technique is known as remote sensing and can be
implemented via either a terrestrial network or a satellite system.
A micro-satellite network can offer a significant number of advantages over
a terrestrial network, amongst others offering better coverage over a large
area. It can be controlled through a single operator and offers a low cost of
adding additional ground stations to the network [5]. Satellites operate at
a number of different orbital patterns around the earth. These include Low
Earth Orbit (LEO), Medium Earth Orbit (MEO) and Geostationary Earth
Orbit (GEO). Typical altitudes for LEO, MEO and GEO are 500-1000 km,
10000 km and 35786 km respectively [6]. Most telemetry and communication
satellites operate in LEO. Transmit power requirements are the lowest here,
making it ideal for relatively cheap communication satellites. However, a LEO
satellite’s orbital period is typically shorter than those from other satellites.
This results in a smaller communications time window with a particular ground
station, hence communication have to be efficient.
Communication quality is typically determined by Doppler frequency shift
effects and low signal-to-noise ratios (SNRs) at the satellite’s receiver. It would
therefore be highly desirable to include technologies that could circumvent
these problems at a minimum cost. As part of a project known as the In-
Situ Hyper-Spectral (IS-HS) 2, a micro-satellite platform has been developed
that addresses the aforementioned communication problems. It is primarily
1
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 2
intended for use in agricultural research, but not limited to that. The satellite
will gather in-situ sensor data from a ground station as well as hyper-spectral
imagery of the area. Collected data will then be downloaded to a central server
station for further processing.
This project is a collaboration between Stellenbosch University (SU) and
the Katholieke Universiteit van Leuven (KUL). SU designed the digital signal
processing part of the system as well as all software and hardware, except for
the steerable antenna array (SAA). Innovative technologies such as a software
defined radio (SDR) modem and channel coding, which includes forward error
correction (FEC), are present in the design. The SDR actively changes its
demodulation frequency to compensate for Doppler frequency shift while the
FEC ensures reliable communication at very low SNRs. KUL designed a SAA
that ensures maximum signal gain at the receiver of the satellite. The antenna’s
angle of maximum gain is constantly directed towards the ground station as
the satellite passes over it. Using this in conjunction with FEC allows for
efficient usage of the time limited communications window.
Feasibility studies and prototypes for all technology to be used on the IS-
HS 2 satellite, have been completed. This thesis will focus on implementing
all the components necessary to facilitate reliable data exchange between a
ground station and the satellite platform. These components were integrated
with existing subsystems to create a functioning demonstration platform.
1.2 Motivation for Work
An eventual flight model for the project has to be preceded by a fully functional
engineering model and it is around the latter that the work encompassed by
this project, has been centred.
At commencement of the project, systems integration of the IS-HS 2 con-
figuration was in progress, but incomplete. The SDR has been implemented
on a digital signal processor (DSP) development board. Software such as the
satellite communication software system (SCSS) have also been completed.
The SCSS schedules communication with a particular ground station after
which data is exchanged by file transfer. No means existed to facilitate the file
transfer and had to be implemented.
Although a communications protocol adhering to the OSI standards has
been basically selected initially, the individual layers and overall implemen-
tation were outstanding. Allowance for integration of specific FEC schemes,
such as LDPC into the structure, was also still required.
Channel coding, especially the FEC, is computationally too expensive to
run on the satellite’s OBC. Therefore, it was proposed to be implemented
on a field programmable gate array (FPGA) connected to the OBC. Fast, or
parallel, designs use vast quantities of logic, while slower serial designs tend
to be more compact. Given the FPGAs selected for this project, a trade-off
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 3
between speed and complexity had to be made since other components also
have to fit onto the FPGA.
1.3 Project Objectives
As reliable file transfer between the satellite platform and ground station is
an essential requirement, the objectives of this work documented herein, were
defined as follows :
• Implement an efficient file transfer protocol between the ground- and
flight based hosts.
• Ensure compatibility and portability of the relevant protocol layers be-
tween two different operating systems (OS). A platform independent
design was required.
• Layers of implemented communication modules conform to OSI specifi-
cations.
• Design and implementation of data processing routines for each corre-
sponding layer of the OSI model.
• Ensure reliability of file transfer, by implementing firmware (FPGA)
based channel coding.
• Choosing a suitable block length for FEC.
• Choice of a FEC scheme based on the findings in earlier work [1].
• A verification technique that the implemented FEC is performing as
expected.
• Ensuring complete and stable integration with all other system compo-
nents.
• Practical field testing of the complete system.
1.4 Project Contributions and Summary
The following contributions specific to the project have been made :
• Development of a POSIX compliant inter process communication (IPC)
message passing library for Linux Ubuntu. It hides OS specific IPC im-
plementation details and allows for bidirectional communication between
protocol layers.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 1. INTRODUCTION 4
• Protocol layers in software are dynamically executable. This allows for
new OSI layers to be easily added for experimentation purposes.
• Synthesisable architectures for Bose Chaudhuri and Hoquenghem (BCH)
and low-density parity-check (LDPC) FEC have been developed in VHDL.
• Architecture used for LDPC is configurable for speed and complexity
trade-offs.
• A testing procedure has been devised using Matlab, that confirms FEC
performance results on the FPGA, for a particular FEC scheme.
• Integrating all software protocol layers and channel coding modules into
the final prototype for the IS-HS 2.
• Confirmation of reliable file transfer between the satellite - and ground
station platforms.
1.5 Outline of Thesis
Chapter 2 provides a short overview of work that has been previously com-
pleted for the IS-HS 2. Some background regarding FEC and communication
protocols are also given. Chapter 3 presents design details for each compo-
nent implemented in this project. A BCH code capable of correcting three
random bits, is chosen for initial implementation. A decoder is designed for
LDPC using the hardware friendly minimum-sum (MS) decoding algorithm.
Frame and packet structures for the protocol layers are also given. Finally, a
simulation strategy using a C based test application and FPGA is described,
which will compare the performance between Matlab and hardware FEC im-
plementations. Chapter 4 provides a flexible channel coding FPGA module
architecture, which allows for easy removal or addition of new modules. The
layouts for both BCH and LDPC modules, are also given. Packet and frame
processing routines are presented as flow charts for all protocol layers in soft-
ware. Chapter 5 confirms the findings of [1], that LDPC outperforms BCH.
It is also found that the BCH design from Chapter 3 is adequate for this im-
plementation. A demonstration of the final system at KUL in Belgium proved
that the final system is capable of transferring files reliably between a ground
station and the satellite platform. Lastly, Chapter 6 concludes the results
of this work and makes recommendations in terms of the current and next
generation designs.
Stellenbosch University   http://scholar.sun.ac.za
Chapter 2
Previous Work and Literature
Review
The constituent components present in the communication system of IS-HS 2
are block diagrammatically presented in Fig. 2.1. Key concepts necessary to
understand the design and implementation of components A,B,F and G will be
discussed in this chapter. Properties of D, the physical communications chan-
nel, as required for bit error probability analysis, are also dealt with. The next
section illustrates the intended interaction between satellite and ground sta-
tion platforms. This is followed by a short overview of components previously
implemented for the IS-HS 2 project.
File Transfer
Protocol Layers
(OBC Software)
Channel
Coding
(FPGA Firmware)
Modem and
RF Hardware
Modem and
RF Hardware
Channel
Coding
(FPGA Firmware)
Satellite Ground Station
File Transfer
Protocol Layers
(PC Software)
Physical Channel
FB
C
GA
E
D
Figure 2.1: Communications channel block diagram.
5
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 6
2.1 IS-HS 2 Theory of Operation
Fig. 2.2 shows an example of a scenario where the satellite interacts with sensor
ground stations GS1 and GS2. Sensor data is uploaded when the satellite
passes overGS1. The SAA stays directed towardsGS1 during this transaction.
Upon completion, GS2 will be scheduled for communication where the SAA is
redirected towards its position. After collecting all ground station information,
it is downloaded to a central ground station where this data will be processed.
Figure 2.2: Intended interaction between the satellite and ground station platforms.
2.2 Existing Work on IS-HS 2
The architectures of both ground station and satellite platforms, are shown in
Fig. 2.3. The system consists of an uplink and a downlink, allowing bidirec-
tional communication between the satellite and a ground station. The uplink
uses quadrature phase shift keying (QPSK) modulation and has a throughput
of 19200 baud or 38400 bps. Commercial off-the-shelf (COTS) radios are used
for the downlink, which operate at 115200 bps. All IS-HS 2 technologies rele-
vant to communication, including the SDR, SAA and channel coding, are used
on the uplink. Therefore, the uplink will serve as an evaluation platform for
these technologies in the current project. Eventually, the flight model of this
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 7
satellite prototype will implement the same SDR, SAA and channel coding
technologies on the downlink as well.
Green component blocks indicate the components as implemented in this
work. Red component blocks were also implemented, but not as part of this
thesis. Dashed connector lines with arrow heads indicate the different commu-
nication paths and directions.
2.2.1 Satellite Platform
The satellite platform in Fig. 2.3 consists of 5 major components :
• An aircraft OBC that supplies avionics information.
• A PC running aircraft satellite emulator (ASE) software.
• The IS-HS 2 communications payload.
• An uplink steerable antenna array (SAA) for receiving data from the
ground station.
• A downlink radio which transmits data to the ground station.
In order to verify that all subsystems of the IS-HS 2 project are working
correctly, a test involving a real satellite passing over a ground station would
be required. Since this cannot be done in the engineering model, it has to
be simulated. Mounting the satellite platform on an aircraft and following
a particular flight path over a ground station would be the most realistic
simulation, as proposed by [7]. The ASE along with the aircraft avionics OBC
forms part of this simulation strategy. Parameters such as ground station GPS
coordinates and the satellite’s simulated LEO altitude can be entered into the
ASE. A flight plan for the aircraft is then generated. This plan includes the
aircraft’s flight path, air speed as well as its altitude above sea level. Adherence
to this flight plan emulates a real satellite passing over a ground station at an
altitude of approximately 600 km. While this is still very much a viable option,
such a test was not finally implemented, due to peripheral project timelines
and constraints. Final testing was ground based, as covered in later sections.
The IS-HS 2 communications payload contains an OBC, SDR modem run-
ning on a DSP, FPGA which performs general data marshalling and channel
coding processing, as well as the RF modules. These components initiate and
control communications over the satellite link.
The OBC is a Sun Space and Information Systems design. The South-
African designed Sumbandila satellite launched in 2008 also uses this OBC.
It has a SH7750R CPU based on the Renesas SH-4 family of 32-bit RISC
architectures [8]. Other features include 8 MB of S-RAM and a CAN bus for
reliable communication with external components. This OBC will henceforth
be referred to as the SH4. The SH4 runs the Unix based QNX OS. Programs
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 8
Aircraft
Avionics
Aircraft OBC
PC
OBC
SAA
Control
SCSS
File
Transfer
Protocol
SDR
FPGA
Demodulation
Channel Coding
IS-HS II Communications Payload
Steerable Antenna Array (Uplink)
Antenna
Array
FPGA A/D
Boards
Steering
Module
Off Shelf
Radio
(Downlink)
Off Shelf
Radio
(Downlink)PC
Sensors
File
Transfer
Protocol
Sensor
Data
Collector
with GUI
FPGA
SDR
Modulation
RF Electronics
(Uplink)
915 MHz
2.4 GHz
SATELLITE
GROUND
STATION
IS-HS II Ground Station
Channel
Coding
Aircraft
Satellite
Emulator
Mixer and
power amplifiers
Demodulation
Modulation
D/A
Data
Processing
Figure 2.3: Block diagram of both ground station and satellite platforms.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 9
running under this OS, will be referred to as processes. The SAA control,
SCSS and file transfer protocols are processes necessary for communications.
The SAA control process regularly polls the ASE for updated aircraft avion-
ics information. After receiving this data, a new steering angle for the SAA is
calculated. This angle is then written to the SAA’s FPGA which in turn steers
the antenna. Communication with a particular ground station is scheduled by
the SCSS process. It initiates a communications transaction, based on the
avionics information it receives from the SAA control process. After sensor
data has been uploaded, the next ground station is placed on its schedule.
Information exchange between a ground station and the SCSS happens in
the form of files. File transfer protocols must segment these files into smaller
packets before transmitting it over the link. Once a packet has been sent,
the ground station has to acknowledge its reception. After receiving an ac-
knowledge the next packet is transmitted, otherwise the current packet will
be retransmitted. This procedure will repeat until the complete file has been
transferred.
Base band QPSK data received from the SAA is conveyed by the FPGA
to the SDR for demodulation. The FPGA then routes demodulated data from
the SDR to the channel coding modules. After successful decoding, data is sent
to the protocol layers on the SH4. Should data be corrupted and irrecoverable
by the FEC, it gets rejected on the FPGA.
An off-the-shelf data radio that operates in the 915 MHz license free indus-
trial scientific and medical (ISM) band is used for the satellite’s downlink. It
accepts data from a UART connected to the FPGA, after which it gets trans-
mitted to the ground station. No additional channel coding are required since
these radios already implement error control schemes. The SAA constructed
by KUL, operates in the 2.4 GHz ISM band and consists of a four by four
array of circularly polarised antennas, called elements. It mixes the 2.4 GHz
signal down to base band QPSK before getting sampled by the A/D boards.
A signal received by one element, only differs in phase from the others. These
phases are manipulated on the FPGA before summing it all together. Cor-
rectly manipulating these phases leads to an optimal signal angle at all times.
This processing scheme effectively beam steers the antenna.
2.2.2 Ground Station Platform
An IS-HS 2 ground station platform contains the following components :
• A downlink radio which receives data from the satellite.
• Sensors that collect agricultural - or similar types of data.
• A PC for storing sensor data and controlling communication.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 10
• FPGA for channel coding and data marshalling between the SDR and
RF electronics.
• A SDR for QPSK modulation.
• RF electronics including a quadrature mixer and power amplifiers.
Central to the operation of the ground station is the PC. Known as a FIT-PC,
its compact design allows for constructing a small and energy efficient ground
station. It hosts the Linux Ubuntu 7.10 OS. Processes that will run on this
OS include the file transfer protocol and the sensor data collector. Sensor
data to be uploaded to the satellite are generated by the sensor data collector
process. Once implemented, this process will contain a graphical user interface
(GUI) that enables a user to access collected information. This includes sensor
measurements and link information such as time stamps of the last satellite
pass.
Data coming from the file transfer protocol are sent to the FPGA. Here the
channel coding module adds redundancy for the FEC. Data are then forwarded
to the SDR for QPSK modulation. Modulated data are mixed up to 2.4 GHz
by the RF electronics before being amplified. Finally, the 2.4 GHz signal is
transmitted over the link to the satellite. A downlink radio, similar to the one
used on the satellite, receives data from the satellite. It passes data to the file
transfer protocols if no errors are present on the received data. The radio will
discard data if it contains errors.
2.3 Protocols
Previous work [1] suggested that IS-HS 2 communication protocols should con-
form to OSI specifications. This standard is maintained by the International
Organization for Standardization (ISO) [9]. The OSI model defines communi-
cation software i.t.o layers where each layer provides a different service such
as end-to-end reliability [9]. Data is transferred from one system to another
through interaction between these layers.
Fig. 2.4 shows the OSI layers described by [1]. Descriptions of these layers
are given in Table 2.1. Note that the data link layer has been split into two
sub sections. This allows any type of FEC to be implemented without having
to adopt a new data link layer standard.
The SCSS and SAA control processes on the SH4 forms the application layer.
Similarly, the ground station PC has the sensor data collector process on this
layer. Hardware such as the SDR and RF electronics creates the physical layer
on both satellite and ground station platforms. Transport as well as data link
layers still had to be implemented as part of the present work.
A data link layer at the receiving platform guarantees the data it passes to
a transport layer, to be error free. Since these erroneous frames are rejected,
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 11
Application Layer
Data Link Protocol
Sublayer
Synchronization
And Channel
Coding Sublayer
Transport Layer
Data Link Layer
Physical Layer
Figure 2.4: Layers of the OSI model [1].
OSI layer Layer description
Application User software that controls and initiates communication
transactions over the satellite link.
Transport Ensures end-to-end reliability when transmitting data.
Data Link Provides error detection and correction services.
Physical Hardware for transmitting or receiving data on the satellite
link.
Table 2.1: Description of OSI layers in Fig. 2.4.
this layer cannot guarantee data to be successfully received every time. At
the transmitting platform, a data link layer receives data from a transport
layer. Here transport layer data are divided into fixed length data frames to
be transmitted sequentially.
The transport layer is responsible for transmitting high level data such as
files. It guarantees a sent file to be correctly assembled on the receive side.
The transmitting platform breaks a file into packets before sending these to the
data link layer. Should a packet get lost due to errors in the data link layer, the
transport layer will retransmit this lost packet. An Automatic Repeat reQuest
(ARQ) strategy can provide the functionality expected from this layer [1].
2.4 Inter Protocol Layer Communication
The OSI layers from Section 2.3 have to interact with each other. Layers
running on the SH4 and FIT-PC will typically use memory resources to com-
municate with each other. Hardware based layers use electrical interfaces to
communicate with adjacent layers. Unlike the electrical interfaces, a software
inter layer communication strategy have not yet been designed at project ini-
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 12
tiation.
Different processes on an OS provide unique services such as TCP/IP net-
working and file system support [10]. Processes may contact each other to
request a particular service and exchange data if necessary. This is done via
interprocess communication (IPC) facilities provided by the OS.
The OSI layers are regarded as different processes running on an OS. Layers
such as the transport layer must be able to send and receive files as discussed
in Section 2.3. By using threads, both transmit and receive functionalities can
be implemented on a single process. Threads reside within a process and are
the most basic units to be scheduled for execution on a CPU [10]. The OS
rapidly switches between threads on the CPU, when a process is scheduled
to run. This allows transmit and receive functionalities to run simultaneously
and independently of each other.
2.4.1 IPC Schemes
2.4.1.1 Shared Memory
In Fig. 2.5 a section of memory is shared between two different processes.
Processes 1 and 2 call the OS kernel to map this shared memory region into
their separate address spaces. No kernel call is necessary to modify data in
this memory, making it the fastest way to exchange large quantities of data
between two processes [10]. By using routine memory access procedures to
modify data [10], changes are instantly available to the other process.
OS Kernel
(Linux, QNX)
Shared
memory
Process 1 Process 2
Figure 2.5: Shared memory between 2 processes.
No synchronisation services are provided by default with shared memory. A
receiving process should only read data once a sending process is done modi-
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 13
fying the region of interest. Therefore, processes 1 and 2 have to either agree
on a concurrent access technique or resort to OS synchronisation services, dis-
cussed in Section 2.4.2. Typical shared memory applications include bulk data
transfer to a video display device driver [11].
2.4.1.2 Message Passing
Processes communicate by sending and receiving messages via a mailbox. The
mailbox is a data structure that allows messages to be placed in or removed
from [10]. A mailbox resides either in the OS kernel or as shared memory
outside the kernel. A process sends message M to mailbox A by calling
send(A,M). The receiving process may call receive(A) to receive this mes-
sage from mailbox A. Unlike shared memory IPC, an OS kernel can provide
a variety of synchronisation services when calling send() and receive() [10].
These services include :
• Blocking send() : The sending process sleeps until the receiving process
collects the message. Alternatively the sender could also be unblocked
once a receiver sends a reply message.
• Blocking receive() : A receiving process waits for an available message.
The OS kernel notifies the receiver to wake up if a message is available.
Message passing is the primary IPC technique used in QNX [11]. It uses a
client-server relationship between communicating processes as shown in Fig.
2.6. The client process sends the server process a message by calling send().
The client then blocks until it receives a reply message from the server. After
processing the received message, the server sends a reply by calling reply().
This reply may contain processed data or could just be a notification that
message processing is done.
2.4.1.3 Pipes
A pipe is a file with a predetermined size that only exists in memory [12].
Reading and writing operations are performed in memory and not via the file
system. Therefore, it acts as a type of shared memory. Pipes only allows one
way communication between two processes, hence a pair of pipes are required
for bidirectional communication.
The sending process calls write() to add data to one end of the pipe. A
process calling read() on the other end, receives data from this pipe. The
sender is blocked when a pipe becomes full. Similarly, a receiving process is
suspended if no data is available.
Processes sometimes spawn new processes, called child processes. The pro-
cess that spawns, called a parent, typically communicates with a child by
using an unnamed pipe. This pipe is only visible between the parent and child
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 14
OS Kernel
(Linux, QNX)
Client
Process
Server
Process
send() receive()
Mailbox
reply()
Figure 2.6: QNX message passing between a client and server process.
processes. Operating systems such as Linux uses this communication scheme
between a parent and a child [13]. Named pipes, also known as FIFOs, are
visible to all processes. A reference to the pipe’s memory mapped file is placed
as an entry in the file system [13]. Processes may open this file, allowing them
to connect to this pipe.
2.4.1.4 Message Queues
A message queue is a linked list of messages that exists inside the OS kernel as
shown in Fig. 2.7. The queue uses a FIFO principle : messages M1 to M3 are
removed in the same order they have been added. Bidirectional communication
are allowed between two processes that share the same queue.
Receiving processes call receive() to retrieve the first message from the queue.
This process is blocked if no messages are available. Similarly, a sending pro-
cess is blocked if the queue becomes full. It is unblocked once the receiving
process removes a message. On the Linux OS, a receiving process can be asyn-
chronously notified of an incoming message [14]. This allows the process to
perform other tasks instead of having to regularly poll or wait for incoming
messages. Productivity of a process is therefore increased.
2.4.1.5 Sockets
Sockets offer connection orientated communication between two processes. Un-
like the other IPC in this chapter, sockets allow processes to communicate over
a network. Sockets may also be used between processes on the same computer.
These local sockets are known as Unix domain sockets [12].
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 15
OS Kernel
(Linux, QNX)
Process 1 Process 2
send() receive()
Message Queue
M3 M2 M1
Figure 2.7: A message queue shared between two processes.
Bidirectional communication is possible between two processes. Similar to
pipe IPC, data is written to one end of a socket while data is removed from the
other end. Memory mapped files buffer data on both communication paths
[12]. A client-server model is adopted between communicating processes. The
server process creates a socket and listens for incoming connections. Client
processes can connect to a server after which a bidirectional connection is
established. Many different clients may connect to the same server process.
2.4.1.6 Files
This is the simplest form of IPC between two processes. By default no syn-
chronisation services are provided by the OS to control file access, hence data
consistency cannot be guaranteed. Communication happens via disk I/O or in
memory through memory mapped files. The latter is faster since modification
happens at memory access speeds. By contrast, disk I/O happens at much
slower speeds.
File locks control file access while modifying a particular section of the
file and are either mandatory or advisory. Mandatory locks deny read and
write access to all processes other than the one holding the lock. Advisory
locks indicate that the file is being modified but does not deny read and write
permissions to the calling process.
2.4.2 Synchronisation Schemes
Shared data have to be accessed concurrently between processes or threads.
No other process or thread are allowed to access this data while being modified
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 16
by another process or thread. Operating systems provide a variety of solutions
to this synchronisation problem.
2.4.2.1 Semaphores
A semaphore controls access to a shared region by using locks. Processes
must request a lock from this semaphore before access is granted to the shared
region. This lock is returned after modifying the shared region. Semaphores
implement these locks by using an integer [10]. This integer gets initialised
to a specific positive value upon creation of the semaphore. Each process
requesting a lock, decrements this integer until it reaches zero. Lock requests
are denied when this integer is zero, followed by the requesting process being
blocked. A blocked processes is waken when another process returns its lock.
Typically a lock is requested by calling sem_wait() and is released by
calling sem_signal(). These functions are executed atomically. An atomic
call guarantees that only one process will modify the semaphore at a time
when multiple processes call sem_wait() simultaneously [10]. This prevents
a situation where two processes obtain a lock whereas only one should have
received a lock.
A binary semaphore is a specific implementation that only contains one
lock. Linux for example uses binary semaphores to synchronise threads. These
are known as mutex locks, since they mutually exclude multiple threads from
simultaneously accessing a resource protected by this lock [12].
2.4.2.2 Signals
Signals informs a process of an event that is in progress or has just taken place
[12]. These events could be external or internal with regard to a process. An
internal event could be something such as an illegal memory access attempt
[10]. Events like these are synchronous since the process that caused them is
signalled immediately. External events cause a signal to be delivered asyn-
chronously to a process. This may happen if some process have modified a file
and wishes to inform another process that its modifications are done.
Processes implement routines for handling the different signals delivered to
it. Some signals cannot be processed since the handler routines may override
its original purpose. Examples include Linux’s kill signal - called SIG_KILL
- which cannot be processed by a signal handling routine [13]. It forces a
process to terminate even if some resources owned by the process have not yet
been released.
2.4.2.3 File Locks
These locks are critical to synchronise access to a file. A file could be locked
using either a lock-file or a system call that associates a lock with an open
file’s descriptor inside the OS kernel. Lock-files are typically empty files that
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 17
are created alongside the file being modified. Its presence is an indication
that another process is busy modifying the desired file [12]. After modification
is complete the lock file is removed or unlinked, allowing other processes to
recreate a lock-file and modify the same file. The Linux OS creates files atom-
ically, meaning that only one of multiple competing processes will be able to
create the lock-file at a time. This ensures concurrent access to the file being
modified. By calling fcntl() in QNX and Linux, a mandatory lock is placed
on a file that is already open. Either the whole file or a specific section could
be locked [12]. This system call is performed atomically.
2.5 Error Control Strategies
Error control is a term referring to both error detection and correction (EDAC)
schemes [15]. An error detection scheme can identify whether data is corrupted
based on the received information. Corrupted data is discarded before request-
ing retransmission of the same information. Forward error correction (FEC)
attempts to identify and correct all errors in the received information. Failing
to do so will result in a retransmission of the same information. Since FEC
can identify erroneous data, it also serves as an error detection scheme.
The noisy channel coding theorem has been part of the work done by
Claude E. Shannon in 1948 [3]. It states that error free communication is
possible over a channel containing Gaussian noise. Specifically, if the rate of
information transmission, Rinf , is less than the channel’s capacity Ccap then
error free communication is possible. Units of Rinf and Ccap are both in bits
per second (bps).
Error correction adds additional information to a message that needs to
be transmitted. This redundancy along with the original message is known as
a codeword. Redundancy effectively spreads a message’s information across
the whole codeword which averages the effect of noise [16]. Burst errors for
example, may corrupt a certain location in a codeword. However, that section
may be recoverable using the redundancy of the codeword.
Two classes of error correction codes exist, namely convolution codes and
linear block codes [3]. A convolution code operates on a continuous stream
of data that enters the encoder. Internally it is synchronised to encode k-bit
message sections to n-bit codewords. The decoder aligns itself with these n-bit
sections before decoding the stream.
Linear block codes operate on fixed length messages. An encoder will
wait for k message bits before encoding commences. Similarly the decoder
will gather n codeword bits before performing error detection and correction.
According to [1], BCH and LDPC linear block codes are to be considered
for implementation in the IS-HS 2. The implementation of these linear block
codes, formed part of the work as set out in this thesis.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 18
2.5.1 Linear Block Codes
In linear block codes a k-bit message is encoded to form a n-bit codeword,
referred to as a (n, k)-code [17]. A (n, k) code may have up to 2k different
codewords. Only binary codes are considered in this thesis. All addition and
multiplication operations are done modulo-2.
Fig. 2.8 illustrates how information is processed by linear block codes.
A 4-bit codeword c is created by multiplying message x with G, called the
generator matrix. This step appends (n− k) = 2 redundant bits to x without
altering x. A codeword having this type of structure is called a systematic
code. The code rate R = (k/n) = 0.5 indicates how much usable information
is contained in a codeword [17]. Noise gets added to c being sent across the
channel to give c’, which may contain errors. Multiplying c’ with HT , known
as the parity check matrix, gives syndrome s. Only an all zero syndrome
indicates that no errors are present in c’. After successfully correcting all
errors in c’, the decoder strips this codeword’s redundant bits and outputs the
original message x.
Modem Modem
Transmitter
FEC
Encoder
FEC
Decoder
Channel Receiver
G (2x4)x (1x2) c (1x4) c’ (1x4) HT (4x2) s (1x2)
Noise
x c c’ x
Figure 2.8: A FEC encoder and decoder in a communications channel.
Note that Rinf = R×Ccap. Therefore, if less redundancy is used in FEC then
Rinf would be higher as expected. A good FEC code corrects a lot of errors
while allowing Rinf → Ccap.
Linear combinations of codewords will result in another valid codeword
[15]. An encoder can use this property to encode any message using a set
of basic codewords. Eq. 2.5.1 shows a four bit message [x0, x1, x2, x3] being
encoded to a seven bit systematic codeword c by using basic codewords c0 to
c3. Note that these basic codewords are all linearly independent.
c = [ x | p ]
= c0 + c1 + c2 + c3
= [ x0 0 0 0 p00 p01 p02 ] +
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 19
[ 0 x1 0 0 p10 p11 p12 ] +
[ 0 0 x2 0 p20 p21 p22 ] +
[ 0 0 0 x3 p30 p31 p32 ]
= [ x0 x1 x2 x3 p0 p1 p2 ] (2.5.1)
A matrix notation can be used to represent the steps of Eq. 2.5.1. This is
illustrated below.
c =
[
x0 x1 x2 x3
] ×

1 0 0 0 p00 p01 p02
0 1 0 0 p10 p11 p12
0 0 1 0 p20 p21 p22
0 0 0 1 p30 p31 p32

= x× [ I P ]
= x×G (2.5.2)
The matrix G = [ I | P ] is the same as used in Fig. 2.8. An encoder requires
this matrix to transform a message into a codeword. The identity matrix I
ensures that message x is added unaltered to c. Redundant, or parity bits, p0
to p2 are added to x by parity matrix P.
A parity bit’s value indicates whether a sequence of bits have an even or
uneven number of ones. In even parity, modulo-2 summation of all these bit
values ,including the parity’s value, will produce zero. A (7,4) Hamming code’s
parity is illustrated in Fig. 2.9.
x0x1
x2
x3
p2 p1
p0
Figure 2.9: A (7,4) Hamming code’s parity bit dependency diagram.
Each circle in the diagram contains three data bits and one parity bit. These
illustrate which bits of message x are used to calculate a parity bit’s value.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 20
Parity bits p0 to p2 are calculated using Eqs. 2.5.3 to 2.5.5.
p0 = x0 + x1 + x3 (2.5.3)
p1 = x0 + x2 + x3 (2.5.4)
p2 = x1 + x2 + x3 (2.5.5)
After sending a codeword across the channel, the FEC decoder receives a code-
word c’ = [x′0 x′1 x′2 x′3 p′0 p′1 p′2] that may contain errors. Continuing
the (7,4) Hamming example, the decoder evaluates the following equations :
s0 = p
′
0 + x
′
0 + x
′
1 + x
′
3 (2.5.6)
s1 = p
′
1 + x
′
0 + x
′
2 + x
′
3 (2.5.7)
s2 = p
′
2 + x
′
1 + x
′
2 + x
′
3 (2.5.8)
Eqs. 2.5.6 to 2.5.8 are known as parity check equations. Evaluation of these
equations can be presented in matrix form as shown below.
s =
[
s0 s1 s2
]
= c’×

1 1 0
1 0 1
0 1 1
1 1 1
1 0 0
0 1 0
0 0 1

= c’×
[
P
I
]
= c’×HT (2.5.9)
Matrix H is the parity check matrix used in Fig. 2.8. A special property of
linear block codes is that G×HT = 0 [15]. Since this is true, matrices P and
I of Eq. 2.5.9 are the same as used in Eq. 2.5.2.
As mentioned before, vector s is the syndrome. A non-zero element’s po-
sition in s indicates which parity check equation failed. Unfortunately, the
syndrome provides no information regarding the exact location of a bit er-
ror in the codeword. In codes such as BCH, the syndrome requires further
processing by the decoder to determine error locations [3].
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 21
2.5.2 BCH Codes
Named after inventors Bose, Chaudhuri and Hocquenghem (BCH), this code
forms part of a powerful class of random error correcting codes [3]. First to
invent BCH was Hocquenhem in 1959. Independent from his result, Bose
and Chaudhuri also published their work regarding the code’s design. Unlike
the single error correcting Hamming code, BCH can specify the amount of
correctable errors, t, when implementing a (n, k) code [15]. A Hamming code
can therefore be seen as a single error correcting BCH with t = 1. Numerous
communication standards includes BCH error correction. Among these are
DVB-S2 [18], a digital satellite television standard, and the TC protocol used
in space telemetry systems [19]. BCH codewords are viewed as polynomials.
Each bit from c = [cn−1, · · · , c0] represents the coefficient of a term in Eq.
2.5.10.
c(X) = cn−1Xn−1 + · · ·+ c1X1 + c0 (2.5.10)
These codewords are also cyclic. A cyclic property allows creation of a new
codeword, denoted cnew(X), by rotating an existing codeword’s elements. As
an example, Eq. 2.5.10 has been rotated to the left by one bit :
cnew(X) = cn−2Xn−1 + · · ·+ c0X1 + cn−1 (2.5.11)
Cyclic codes also allow its codewords to be created by using a generator poly-
nomial as shown in Eq. 2.5.12. The k-bit message to be encoded is represented
by d(X), which has order k − 1. Polynomial g(X) is the generator of order
n− k.
c(X) = d(X)× g(X) (2.5.12)
A BCH encoder applies either generator matrix G or polynomial g(X) to en-
code a k-bit message x. Note that Eq. 2.5.12 can be used to construct G
from Eq. 2.5.2. Both g(X) and G generate the same systematic codewords
[17]. After computing the syndrome at the decoder, error locations in c(X)
are determined by using either linear algebra or iterative decoding techniques.
The first BCH decoding algorithm has been introduced by Peterson in 1960
[3]. It involves solving a set of linear equations to identify error positions in
c(X). This technique becomes time consuming when using large codewords
and results in a highly complex decoder [15]. Berlekamp and Massey later de-
veloped an iterative decoding technique that finds a polynomial σ(X) of which
the roots can be used to locate the errors in c(X). This is computationally
less expensive than Peterson’s solution [3].
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 22
2.5.3 LDPC Codes
LDPC codes have originally been invented by Robert Gallager during the
1960’s as part of the work done for his PhD [20]. However, a cost effective
implementation of the codes was not possible at that stage due to limited
computer technologies and the high complexity of Gallager’s decoding algo-
rithm [21]. After being forgotten for a few decades, the codes were accidentally
rediscovered by MacKay and Neal [22] and Wiberg [23]. Here MacKay showed
that LDPC can achieve near Shannon limit performance when using an opti-
mized iterative decoding technique [21].
LDPC is a very powerful FEC code and can be found in numerous com-
munications standards such as DVB-S2 [18] and IEEE 802.11n Wi-Fi [24].
Turbo Codes, another powerful FEC code and strong competitor to LDPC, can
also be found in other modern communications standards such as DVB-RSC
and 802.16e WiMAX [25]. What makes LDPC attractive compared to Turbo
Codes, is that there are no patent issues surrounding the code [26]. Other ad-
vantages include better complexity-performance trade-off options [25], lower
decoding complexity and very low error floors at low bit error rates (BER)
[21].
LDPC also falls in the category of linear block codes. Characteristic to
LDPC is its sparse H matrix, hence the term low density parity check. A
sparse matrix have less non-zero elements than zeros. H is described by both
its column weight ωc and row weight ωr. The weight of a vector refers to the
number of non-zero entries it contains, i.e. the number of ones contained in
a binary vector. A (ωc, ωr)-regular LDPC code has the same ωc for all its
columns and the same ωr for all its rows. An irregular code have different ωc’s
and ωr’s for some or all of its columns and rows.
Matrix H is visually presented by a Tanner graph [27] shown in Fig. 2.10.
The circles are known as check nodes which represent a parity check equation,
and hence a row in H. The squares are called variable nodes and represent the
columns of H. Column positions coincide with bit positions of the received
codeword c’. Whenever Hij=1 the associated variable node and check node of
row i and column j are joined by a line, called an edge.
In general the BER performance of a LDPC code is governed by the length
of its codewords as well as the techniques used to construct H. The mini-
mum Hamming distance dmin increases as the codeword’s length increases [15].
Hamming distance refers to the number of bits by which two codewords differ.
Increasing this distance improves the error correction capabilities of a code.
Constructing H to be as random as possible, delivers good BER results [15],
but increases decoding complexity. This is due to lots of information regarding
H being stored in memory. By using a structured code such as quasi-cyclic
(QC) LDPC, lowers decoding complexity, but reduces BER performance [28].
Another important property of H that limits decoder performance is girth
[27]. Starting at any check node or variable node in the Tanner graph, girth
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 23
H =

1 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 1 0 1 1 1

V0 V1 V2 V3 V4 V5
c0
c1
c2
c3
(a) Parity check matrix of the Tanner graph in Fig.
2.10b.
C1
C2
C3
C0
V0
V1
V2
V3
V4
V5
(b) Tanner graph of
parity check matrix in
Fig. 2.10a
Figure 2.10: Visual representation of a parity check matrix.
is defined as the minimum number of edges to be traversed to reach the same
starting node, without traversing any edge more than once. This is indicated
by the dashed lines in Fig. 2.10b. Starting at V1 , 4 lines must be traversed to
reach V1 again thus giving this code a girth of 4. This cycle can also be seen in
H. Whenever two non-zero elements in two different columns are in the same
two rows, a length 4 cycle is present. This is indicated by the bold ones in
Fig. 2.10a. The presence of length 4 cycles severely deteriorates performance
of the decoder, so that it takes more iterations to find the correct codeword
[27].
2.5.3.1 Decoding
A LDPC decoder computes the syndrome s = c′HT where c’ is the received
codeword with errors. Only when s 6= 0, a decoding cycle is started to correct
the errors in c’. Note that the syndrome is used here only as an error detection
method. Decoding achieves the best BER performance when using an iterative
message passing algorithm (MPA) [1]. These messages are either log-likelihood
values or probability values exchanged between check nodes and variable nodes
during an iteration. The MPA decoder’s architecture imitates the structure
of a Tanner graph [29]. An iteration begins with each variable node passing
a message to its connected check nodes. This is followed by each check node
passing a message to all its connected variable nodes. Note that messages travel
along the Tanner’s edges between connected nodes. Messages arriving at each
variable node vj for 0 ≤ j ≤ n are now used to modify the corresponding bit
c′j of codeword c’ to form cnew. The final step in an iteration recomputes the
syndrome by using s = cnewH. Decoding stops when s = 0, otherwise a new
iteration is started. This process continues until s = 0 or when a predefined
maximum number of iterations are reached.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 24
2.5.3.2 Encoding
Encoding can be done using a generator matrix G. Although H is sparse,
matrix G will not necessarily be sparse [15]. Therefore, encoding might have a
time complexity of O(n2) with n the length of a codeword. However, another
technique exists that lowers this complexity toO(n) [30]. Section 3.4.3 provides
more detail about this technique.
2.6 Wireless Channels
Various parameters are associated with a wireless communications channel.
The most important to consider in a design are noise sources, multipath effects,
transmit power and sources of signal attenuation. Taking these into account,
a few concepts will now be explained that are necessary for error probability
analysis of data sent over a wireless channel.
2.6.1 Link Margin
A link budget calculation is the first step in designing any wireless communi-
cations system. Important decisions regarding transmitter power and receiver
sensitivity are made here. Fig. 2.11 shows a typical setup of both a transmitter
and receiver communicating over a wireless channel.
Power
Amplifier
GPA TX
Cable Losses
LCABLE TX
Antenna
Gain
GANT TX
Antenna
Gain
GANT RX
Message
Signal
Cable Losses
LCABLE RX
Free Space Loss
LPATH
Transmitter Channel Receiver
A
B
C
D E
F G
Receiver
Figure 2.11: A transmitter and receiver communicating over a wireless channel.
The transmitter sends a signal from A to amplifier B. This signal continues
through cable C with some loss after which it reaches antenna D. Depending
how directional the antenna is, more gain is added to the transmit path. After
losing most of its power over the channel, the signal reaches a receiving station.
The antenna at E also adds some gain in the receive path. After experiencing
more loss through RF cabling at F, the signal reaches the receiver at G. A
signal being generated by A has unity power. Expressing all losses and gains
in terms of decibels (dB), the signal power in dB reaching G can be expressed
as :
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 25
PG =GPA_TX − LCABLE_TX +GANT_TX − LPATH
+GANT_RX − LCABLE_RX (2.6.1)
Receivers typically have a lower bound on acceptable input signal levels, known
as its sensitivity [31]. This value specifically accounts for thermal noise from
the antenna and noise added by each amplifier in front of the receiver. Sen-
sitivity thus specifies the minimum acceptable power level, after the antenna,
of a received signal. Signals below this value will disappear into the noise
floor of the receiver. Assuming that PG given in Eq. 2.6.1 is greater than this
sensitivity, the following term is formed :
PLINK = PG − PSENSITIV ITYG (2.6.2)
The term PLINK is known as the link margin [32]. Since the receiver’s sensi-
tivity is equal to its noise floor, PLINK can be seen as a signal-to-noise ratio
(SNR). This SNR forms the lower bound on the SNR at which a FEC code
must be able to deliver a low BER.
2.6.2 Channel Error Probability Model
A channel model mathematically describes the effects of disturbances such as
noise on a transmitted signal [15]. Since these disturbances affects random
segments of transmitted information, a statistical model is applied to each bit
being transmitted. This model applies a certain weight to a bit’s chance of be-
ing received correctly or incorrectly. In Fig. 2.12 a channel model encapsulates
modem and RF components as well as the wireless channel. The FEC encoder
inputs bits into this model. Bits are then flipped according a chosen statistical
model after which bits are output to the FEC decoder. A few channel models
are considered below.
Wireless
Cahnnel
Modulator
and RF
Demodulator
and RF
FEC
Encoder
FEC
Decoder
Channel
Model
Figure 2.12: Section of a communications channel included in a data error proba-
bility model.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 26
2.6.2.1 Binary Symmetric Channel
A binary symmetric channel (BSC) is shown in Fig. 2.13. Bits being trans-
mitted move from left to right along the routes of the arrowed lines. A bit
being sent has probability Pe of being changed. This is known as a crossover
probability and is represented by the diagonal lines. In a binary channel the
probability of successful transmission is Ps = 1− Pe as indicated by the hori-
zontal lines. Error probabilities for both a 1 and 0 are the same. This model
assumes that Pe is always the same for a certain channel.
1− Pe
1− Pe
Pe
Pe
1
0
1
0
Figure 2.13: A BSC model.
2.6.2.2 Binary Erasure Channel
The binary erasure channel (BEC) allows bits to be received either correctly or
as unknown. The demodulator marks a bit as erased if it is unsure whether a
1 or a 0 has been received. Erasures are marked as E in Fig. 2.14. Probability
of an erasure is indicated as Pe. Similar to a BSC, the BEC model assumes a
constant Pe for a channel. A SDR from this project outputs demodulated data
according to the phase difference between subsequent received QPSK symbols.
Demodulated data is never marked as unknown by the SDR, hence the BEC
model will not be used in this thesis.
1− Pe
1− Pe
Pe
Pe
1
0
1
0
E
Figure 2.14: A BEC model.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 27
2.6.2.3 Additive White Gaussian Noise Channel
Noise present on devices such as antennas are typically thermal noise of which
the behaviour is modelled by a Gaussian random process [33]. An additive
white Gaussian noise (AWGN) channel model gets noise values from a Gaus-
sian random process [34]. Combining a BSC and the AWGN model is possible
[15]. The BSC allows a bit to be in only one of two states while the AWGN
provides an accurate description of the channel noise. This hybrid model re-
places Pe of the BSC with a value from a Gaussian probability distribution
function (PDF) having standard deviation σ. This hybrid model is called a
binary AWGN (BI-AWGN) [34] and will be used in the rest of this thesis.
Binary AWGN channels map bit values 1 and 0 to values 1 and −1 respec-
tively in Fig. 2.15. A Gaussian PDF with standard deviation σ is placed over
each bit. At the thick vertical line between the PDFs, the demodulator decides
between a 1 or a 0. Adding sufficient noise to bit −1 such that the decision
point is crossed, the demodulator could interpret it as a 1. The converse is
also true.
1 -1
σσ
Decision Point
Figure 2.15: A BI-AWGN channel model.
2.7 Summary
A description of the interaction between a satellite and ground station platform
has been presented. Previously implemented components of the IS-HS 2 have
also been mentioned. By using FEC, the time-limited communications window
of a satellite pass can be more efficiently utilised. Linear block codes will
spread k information bits over a codeword of n information bits in order to
lower the errors introduced by noise. A BCH code uses polynomial principles
to achieve this, which is simple to implement in hardware. It provides good
error correcting performance and a choice for the number of correctable errors
in a codeword. By using the iterative BM decoding technique, a BCH decoder’s
hardware complexity can also be kept to a minimum. The LDPC code has been
shown to outperform BCH [1] when using a random construction technique
for H. A structured QC-LDPC code can provide the same performance as
with random constructions, provided H has been properly designed. More
importantly, a structured code will keep hardware complexity relatively low.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 2. PREVIOUS WORK AND LITERATURE REVIEW 28
It has also been shown that the first step to implement FEC, is to determine
a proper codeword length through error probability analysis. This calculation
will require parameters from the testing scenario such as transmit power and
receiver sensitivity. Finally, an overview of various IPC schemes has been given,
which allow for interaction between adjacent protocol layers from the OSI
model. It is clear that synchronisation is important when two processes share
information. Shared memory is the fastest IPC medium, but doesn’t guarantee
concurrent data access. However, by using shared memory in conjunction with
semaphores or signals for example, it can provide both fast and synchronised
communication between two processes.
Determination of a proper block length for FEC, will be covered in Chapter
3. Details regarding encoding and decoding of both BCH and LDPC, are also
handled. A proper IPC scheme is also chosen from the findings of this chapter.
Stellenbosch University   http://scholar.sun.ac.za
Chapter 3
Detail Design
This chapter presents a more detailed description of the contents from Chapter
2. Referring to Fig. 2.1, a bottom up approach will be followed in this chapter.
At first, an error probability analysis is performed for the wireless channel at
D. This will indicate which FEC codeword length to use for a given BER
constraint. Codeword length and block length will be used interchangeably
in this thesis. The other modules including B, A, F and G will be designed
around this chosen length. Firstly, all channel coding modules are designed
after which the software protocol layers are handled. Finally, a block error rate
(BLER) simulation application for BCH and LDPC is designed. Simulation
strategies are discussed for both Matlab and FPGA platforms.
3.1 Block Error Probability Analysis
This section calculates a QPSK receiver’s block error probability (BLEP) when
using the BI-AWGN wireless channel model. A derivation for bit error proba-
bilities and block error probabilities for such a receiver are covered in Appendix
A.1. This section highlights some of these results and uses them to determine
a suitable FEC block length for this implementation.
Fig. 3.1a shows a QPSK signal constellation with four two-bit symbols
S1 to S4. Symbols are indicated as black dots on both the quadrature (Q)
and in-phase I axis. Each symbol differs by pi/2 radians in phase from its
neighbour. Phase noise added to a received symbol will cause it to deviate
from its position in Fig. 3.1a. Should a symbol cross the dashed lines due
to phase noise, the modem will make an error. For example, if symbol S1’s
phase deviates into the grey area of Fig. 3.1b, it will be mistaken for another
symbol. Since QPSK is a phase modulation scheme, only phase noise will be
considered.
Suppose symbol S1 having amplitude A has been received with some noise.
It is now positioned at Snoise in Fig. 3.2. Gaussian noise is represented by n(t)
which has both in-phase, ni, and quadrature, nq, components. Adding vectors
29
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 30
00
10
11
01
Q
I
pi
4
pi
4
Decision
Boundary
Decision
Boundary
S1
S2
S3
S4
(a) A QPSK signal constellation.
Q
Decision
Boundary
00
S1
pi
4
pi
4
Error
Region
I
(b) Error region for symbol S1 in Fig. 3.1a.
Figure 3.1: QPSK symbol decision - and error regions.
S1 and n(t) results in Snoise of amplitude E and phase θ. Angle θ represents
phase noise, which is of interest.
θ
A nc
ns
E
S1
Snoisepi
4
Decision
Boundary
I
Q
n
Figure 3.2: Gaussian white noise added to S1.
Phase noise, θ, is a Gaussian random variable and its probability density func-
tion (PDF) is given by :
ρθ(θ) =
1
2pi
e
−A2
2σ2
[
1 +
(
Acos(θ)
σ
)(
e
A2cos2(θ)
2σ2
)(√
2pi
)
×
(
1−Q
(
Acos(θ)
σ
))]
(3.1.1)
where
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 31
Q(x) =
1√
2pi
∫ ∞
x
(
e
−a2
2
)
da (3.1.2)
Integrating ρθ(θ) over the white region in Fig. 3.1b leads to a probability of
successfully receiving a symbol :
Psuccess =
∫ pi
4
−pi
4
ρθ(θ)dθ (3.1.3)
This Psuccess is used to determine the probability of receiving a symbol incor-
rectly :
Perror = 1−
∫ pi
4
−pi
4
ρθ(θ)dθ (3.1.4)
Assuming both symbol bits have equal probability of being changed by noise,
a bit error probability (BEP) is produced :
Pbep =
Perror
2
(3.1.5)
Finally, having FEC correct up to t bit errors in a n-bit block leads to block
error probability :
Pblep =
n∑
i=t+1
(
n
i
)
P ibep(1− Pbep)n−i (3.1.6)
Eq. 3.1.6 is evaluated over a range of Eb/No SNR values. A SNR is expressed
as either A2/σ2 or Eb/No. The former is a ratio of average signal power to
average noise power whereas the latter is a ratio of energy per transmitted
bit to noise power spectral density. Eq. 3.1.7 shows the relationship between
these two SNRs for a QPSK receiver. Using this relationship, Eq. 3.1.1 can
written i.t.o Eb/No. Relationship 3.1.7 is derived in Appendix A.2.
A2
σ2
=
2Eb
No
(3.1.7)
Using a link budget, the link margin at the satellite’s receiver is computed.
This value will determine the range of SNRs to use when evaluating Eq. 3.1.6.
Since the aircraft simulator of Section 2.2.1 will be used for testing, all link
budget calculations are done with regard to this scenario.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 32
Work done in [7] suggests that an aircraft’s altitude be 3 km for this simula-
tion. An aircraft pass is illustrated in Fig. 3.3. The ground station uses a
stationary patch antenna that faces towards the sky. Its beam width allows
for communication within a α = 120◦ field of view. Therefore, communica-
tion will begin at elevation angle φ = 30◦. At this particular φ, the distance
between the ground station and aircraft is d = 6 km. Signal loss at this d is
the highest during a pass, and therefore places a lower bound on the receiver’s
link margin. A summary of the parameters used by link budget Eq. 2.6.1 are
given in Table 3.1.
d
φ
h = 3km
α
Figure 3.3: An aircraft passing over a ground station at altitude h = 3 km.
Parameter Description Value
GPA_TX Ground station transmit power. 30 dBm
LCABLE_TX Ground station cable losses. 0.5 dB
GANT_TX Ground station antenna gain. 2.25 dB
LPATH Free space signal losses. 115 dB
GANT_RX Payload antenna gain. 6 dB
LCABLE_RX Payload cable losses. 0.5 dB
Table 3.1: Link budget parameters for the uplink when using aircraft altitude h =
3 km and elevation angle θ = 30◦.
Transmit power and free space losses for this simulation have been determined
in [35] and [7]. Cables connecting the amplifiers to antennas typically have
low losses as indicated [5]. The SAA’s gain is about GANT_RX = 6 dB when
steering towards the ground station [36]. Sensitivity of the receiver at the
payload has been designed to be PSENSITIV ITY = -93 dBm [37]. By using Eq.
2.6.2 along with the aforementioned values leads to :
PLINK = (30− 0.5 + 2.25− 115 + 6− 0.5)− (−93)
= 15.25 dB (3.1.8)
Knowing from Section 2.6.1 that PLINK serves as a receiver’s SNR, Value 3.1.8
forms the upper bound on the range of simulated SNRs for Eq. 3.1.6. A BCH’s
block length in bits is given as follows [3] :
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 33
n = 2i − 1 , i ≥ 3 (3.1.9)
These blocks contain frames that are used by the software protocol layers.
Frames are composed of both protocol overhead and file data to be sent over
the link. Overhead for this implementation is to be calculated as Novh =
Novh_TM + Novh_ARQ = 256 bits. Details regarding this calculation are dis-
cussed in Section 3.6. Since no file data can be fitted into a block having
n = Novh, the minimum length, therefore, has to be n > Novh. Using this
constraint along with Eq. 3.1.9, a minimum length of nmin = 511 is obtained.
Figure 3.4: Codeword error probability vs. SNR for BCH when using block length
n = 511 bits.
The results of evaluating Eq. 3.1.6 at different SNRs up to PLINK is shown
in Fig. 3.4. Different error correction capabilities ranging from t = 0 to t = 3
has been plotted for length n = 511. In general, a good error rate is Pblep =
10−6. Clearly a BCH correcting up to t = 3 bit errors satisfies this property
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 34
between 11 dB and 15 dB SNR. Using the lookup table of standard BCH codes
in [3], a (n, k) = (511, 484) BCH code having t = 3 error correction capability
is chosen.
3.2 Channel Coding Design
Work done in [1] suggests that these modules implement standards from the
Consultative Committee for Space Data Systems (CCSDS). Specifically the
channel coding standard of the Telemetry (TM) Space Data Link Protocol is
implemented. Both ground station and satellite platforms, will contain the
following modules :
• A 16-bit cyclic redundancy check (CRC).
• Forward error correction (FEC).
• A pseudo randomiser.
• An attached synchronisation marker (ASM).
CRC 16
Bit Add
FEC
Encode
Pseudo
Randomiser
ASM Add
FPGAFIT-PC
Modem
and RF
Modem
and RFFPGA
SH4
OBCSatellite
Ground
Station
xk x
′
k cn c
′
n c
′
n+32
FEC
Decode
xk x
′′
k c
′′
n c
∗
n c
∗
n+32CRC 16 Bit
Remove
Pseudo
Randomiser
ASM
Remove
Figure 3.5: Interaction between channel coding modules on the FPGA for both
ground station and satellite platforms.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 35
Fig. 3.5 identifies data vectors entering or exiting a module by using different
symbols. The subscripts of these symbols indicate how many bits they contain
after being processed by a specific module. For example vector xk contains k
bits. These symbols are used in the following discussion of each module.
3.2.1 Cyclic Redundancy Check
A CRC code is used for error detection. It is excellent for detecting both
random and burst errors present in a set of binary data [3]. In case FEC fails
to correct all errors in a codeword, CRC will detect these errors and reject
the faulty data. Similar to FEC, CRC adds information to a message at the
transmitter and removes this information at the receiver.
Polynomial ViewBinary View
g = [g1, g0]
= [1, 1]
m = [m2,m1,m0]
= [1, 1, 1]
g(X) = g1X + g0
= X + 1
m(X) = m2X
2 +m1X +m0
= X2 +X + 1
m′(X) = Xm(X)
= X3 +X2 +X + 0
m′ = [1, 1, 1, 0]
X3 +X2 +X + 0X + 1
X2 + 1
X3 +X2
0 + X
X + 1
1
1 1 1 01 1
1 0 1
1 1
0 1
1 1
1
m′′(X) = m′(X) + r(X)
= X3 +X2 +X + 1
m′′ = m′ + r
= [1, 1, 1, 1]
i) Message and
Generator
Polynomials
ii) Zero Pad
Message
iii) Find
Remainder After
Division
iv) Add
Remainder to
Zero Padded
Message
CRC Encoding
Step
Figure 3.6: CRC encoding procedure.
Like BCH’s encoding procedure, CRC views the message to be encoded as
a polynomial m(X). Firstly, a string of h zero bits are appended to m(X)
which is equivalent to Xhm(X). The value of h is equal to the order of the
CRC’s generator polynomial g(X). Using long division, polynomial Xhm(X)
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 36
is divided by g(X) until remainder r(X) is obtained. This remainder gets
modulo-2 subtracted from Xhm(X) such that m′(X) = Xhm(X) − r(X) is
divisible by g(X) with zero remainder. Fig. 3.6 shows this encoding procedure
from both binary and polynomial perspectives.
The ground station PC sends a k-bit frame, xk, to the channel coding Sec-
tion in Fig. 3.5. This frame implements the TM protocol standard from ECSS
and is discussed in Section 3.6.1. Due to this standard, xk already contains
the trailing zeros required for CRC. Therefore the CRC encoded message x′k is
also k bits long. After FEC decoding at the receiver, x′′k gets divided by g(X).
Message x′′k is error free when remainder r(X) is zero, otherwise x′′k contains
errors.
The CRC generator polynomial used in this design is given in Eq. 3.2.1.
This polynomial has been specifically adopted by the CCSDS for its low proba-
bility of missing undetected errors [19]. A 16-bit CRC field available in message
xk will be set by this polynomial.
g(X) = X16 +X12 +X5 + 1 (3.2.1)
3.2.2 Forward Error Correction
Both BCH and LDPC will be implemented. The ground station adds (n− k)
redundant bits to x′k to create n-bit codeword cn. The satellite’s FEC decoder
will attempt to correct all errors after which it will strip redundancy to produce
error free message x′′k. Details regarding BCH and LDPC are provided later in
this chapter.
3.2.3 Pseudo Randomiser
Randomising the codeword allows it to have sufficient bit transitions. This
ensures optimal utilisation of bandwidth for the given baudrate of the system
[19]. It prevents all transmitted power being focused in a narrow bandwidth
due to long sequences of either ones or zeros in cn. Secondly, the SDR modem
uses a timing error detector to update how many symbol samples to wait for,
before the next symbol starts. This timing error detector requires frequent
symbol transitions to work properly.
A pseudo-random sequence is generated using a linear feedback shift reg-
ister (LFSR). Its random output is generated by using polynomial g(X), as
determined by the CCSDS [19] :
g(X) = X8 +X7 +X5 +X3 + 1 (3.2.2)
Fig. 3.7 shows the 8-bit LFSR’s output being XORed with the FEC encoder’s
output to produce a randomised sequence of data. Since data only gets ran-
domised, the length of c′n is the same as cn in Fig. 3.5.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 37
The LFSR’s output at x1 is also fed back to its input at x8. This output gets
XORed with other LFSR bits along the way. Polynomial g(X) determines
which LFSR bits to XOR output x1 with. This polynomial also determines
how long the LFSR’s random sequence is before repeating itself. This design’s
sequence repeats after 255 bits.
x8 x7 x6 x5 x4 x3 x2 x1
1
FEC
Data
Randomised
Data
LFSR
Output
Initialise LFSR to
ones on reset
Register Bit
XOR
Operation
Legend :
Figure 3.7: Pseudo random sequence LFSR.
At the receiver in Fig. 3.5, codeword c∗n is XORed with the same pseudo-
random sequence used at the transmitter. This removes the randomisation
effect that has been applied to cn.
3.2.4 Attached Synchronisation Marker
The attached synchronisation marker (ASM) synchronises the randomiser,
FEC and CRC at the receiver with the start of a new codeword. This 32-
bit sequence precedes the randomised codeword c′n to produce c′n+32 in Fig.
3.5. At the receiver, the ASM is removed before c∗n enters the de-randomiser.
The sequence used in this design is shown below in hexadecimal and binary
format, where the right most bit represents the least significant bit (LSB) :
1︸︷︷︸
0001
A︸︷︷︸
1010
C︸︷︷︸
1100
F︸︷︷︸
1111
F︸︷︷︸
1111
C︸︷︷︸
1100
1︸︷︷︸
0001
D︸︷︷︸
1101
This specific sequence contains sufficient bit transitions and, according to the
CCSDS, can also be used by the receiver’s SDR for symbol synchronisation
purposes [38]. However, in this project the ASM will only be used to identify
the start of a codeword.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 38
3.3 BCH FEC Design
This section discusses the encoding and decoding techniques for the (511, 484)
BCH code of Section 3.1. Note that BCH makes extensive use of Galois field
(GF) arithmetic. A Galois field is the group of a finite collection of unique
numbers [3]. Addition or multiplication operations on any two or more num-
bers of this group, results in another number that exists in this group.
A Galois field of prime q having a group size of qm elements is denoted
GF(qm). The prime q indicates which alphabet is being used. In the case
of a binary alphabet, q = 2 and there exists 2m m-bit binary numbers. The
(511, 484) BCH code uses GF(29).
Finally, there exists a m-bit number, α, in GF(2m) such that subsequent
powers of α will generate 2m − 1 unique numbers in the group. The group’s
2m unique numbers include 0, 1, α, α2 · · ·α2m−2.
3.3.1 Encoder
A generator polynomial g(X) is used to create a codeword as mentioned in
Section 2.5.2. The polynomial for a (511, 484) code obtained from lookup
tables in either Matlab or [3]. A (511,484) code’s g(X) is shown below :
g(X) =X27 +X26 +X24 +X22 +X21 +X16 +X13 +X11
+X9 +X8 +X6 +X5 +X4 +X3 + 1 (3.3.1)
Encoding is performed similar to that of CRC in Fig. 3.6. The (511, 484) code
appends 27 zeros to 484-bit message m(X) to form a 511-bit vector m′(X).
Using long division, m′(X) is divided by g(X) until a remainder r(X) of order
less than 27 is found. This r(X) gets modulo-2 added to m′(X) to form
codeword c(X), which is divisible by g(X).
Since the (511,484) code can correct t = 3 bits, the abovementioned poly-
nomial has a special property. It has GF elements αi for i = 1, 2, · · · , 2t
as its roots such that g(αi) = 0 [3]. Furthermore, polynomial g(X) is the
least common multiple (LCM) of at most 2t minimal polynomials φi(X) for
i = 1, 2, · · · , 2t. A minimal polynomial cannot be further factored and has αi
as its root such that φi(αi) = 0 [3]. Polynomial g(X) can therefore be written
as :
g(X) = LCM{φ1(X), φ2(X), · · · , φ2t(X)} (3.3.2)
These minimal polynomials are necessary for calculating the syndrome in the
decoder as discussed below.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 39
3.3.2 Decoder
Section 2.5.1 mentioned the use of parity matrix H to compute syndrome s.
In BCH codes, matrix H is defined as [3]:
H =

1 α α2 · · · αn−1
1 (α2) (α2)2 · · · (α2)n−1
...
...
1 (α2t) (α2t)2 · · · (α2t)n−1
 (3.3.3)
Note that H only has 2t rows, which is not equal to the number of parity bits
present in a codeword. This is due to the m-bit GF elements present in H,
which forces it to have less than (n− k) rows.
The decoder receives codeword c∗ which may contain errors. Eq. 3.3.4
writes this i.t.o the correct codeword c and an error vector e. A BCH decoder
must find this e in order to correct errors in c∗.
c∗ = c+ e (3.3.4)
Using Eq. 2.5.9 and assuming e=0, leads to :
s = c∗ ×HT
= 0 (3.3.5)
Putting s in polynomial format, the above equation implies :
si = c
∗(αi) , i = 1, 2, · · · , 2t (3.3.6)
As mentioned earlier, g(αi) = 0. Combining Eqs. 3.3.4, 3.3.6 and 2.5.12 gives
:
si = c
∗(αi)
= c(αi) + e(αi)
= d(αi)g(αi) + e(αi)
= e(αi) (3.3.7)
Eq. 3.3.7 does not imply that e(X) is the remainder after dividing c∗(X) by
g(X), since the order of e(X) could be greater than g(X). By using property
3.3.2 and Eqs. 3.3.4 and 2.5.12, the following is obtained :
c∗(X) = c(X) + e(X)
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 40
= d(X)g(X) + e(X)
= ai(X)φi(X) + bi(X) , i = 1, 2, · · · , 2t (3.3.8)
where bi(X) is the remainder after dividing c∗(X) by φi(X). Evaluating Eq.
3.3.8 at αi also gives si = c∗(αi) = b(αi), since φ(αi) = 0. Now that the
syndromes can be computed, error locations in c∗ need to be identified. Since
BCH corrects up to t random errors, vector e(X) is written as [3]:
e(X) = Xj1 +Xj2 + · · ·+Xjv (3.3.9)
where jv is a random bit position in codeword c∗ for v ≤ t. Using Eqs. 3.3.7
and 3.3.9 gives :
si = (α
i)j1 + (αi)j2 + · · ·+ (αi)jv
= (αj1)i + (αj2)i + · · ·+ (αjv)i
= βi1 + β
i
2 + · · ·+ βiv ; i = 1, 2, · · · , 2t ; v ≤ t (3.3.10)
Lin and Costello [3] defines an error-location polynomial σ(X) shown in Eq.
3.3.11. The inverse of its roots represent the error locations in codeword c∗.
σ(X) = (1 + β1X)(1 + β2X) · · · (1 + βvX)
= σ0 + σ1X + · · ·+ σvXv , v ≤ t (3.3.11)
It is also shown by [3] that the syndromes in 3.3.6 and the coefficients of 3.3.11
are related as follow :
s1 + σ1 = 0
s2 + σ1s1 + 2σ2 = 0
s3 + σ1s2 + σ2s1 + 3σ3 = 0
· · ·
sv + σ1sv−1 + · · ·+ σv−1s1 + vσv = 0
sv+1 + σ1sv + · · ·+ σv−1s2 + σvs1 = 0 (3.3.12)
Three techniques exist to solve the above equations. Peterson’s algorithm puts
these equations in matrix form and tries to solve all σ(X) coefficients [15].
Although this is a simple approach, it involves computing determinants of ma-
trices which increases in complexity for long codewords [15]. The Berlekamp-
Massey algorithm (BMA) tries to find a LFSR having feedback coefficients
equal to the coefficients of σ(X) as shown in Fig. 3.8. This LFSR eventually
produces all syndromes present in the last Eq. of 3.3.12. Similar to BMA,
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 41
the Euclidean algorithm (EA) also tries to find the coefficients of a LFSR that
produces all syndromes [15]. However, the BMA works with polynomials hav-
ing a degree less than σ(X) when compared to EA [15]. This results in the
BMA having a simpler hardware implementation than EA. Since the BMA is
a fast decoding algorithm with low hardware complexity ([2], [15]), it will be
used in this design.
· · ·
· · ·
Register
XOR
Operation
Legend :
sv sv−1 s2 s1
σvσv−1σ2σ1 Feedback
Coefficients
Syndrome Shift
Register
Figure 3.8: LFSR computed by the Berlekamp-Massey algorithm [2].
BMA iteratively finds the coefficients of σ(X) by solving each Eq. in 3.3.12.
Letting k depict the current iteration, σ(k)(X) is the error location polynomial
at iteration k where 1 ≤ k ≤ 2t :
σ(k)(X) = 1 + σ
(k)
1 X + σ
(k)
2 X
2 + · · ·+ σ(k)l X l , l ≤ v (3.3.13)
The BMA is initialised with σ(0)(X) = σ0 = 1 [3]. At k = 1, the first Eq. of
3.3.12 is evaluated to obtain σ(1)(X). Since σ1 = 0 at this stage, a correction
term is added to σ(0)(X) such that σ(1)(X) = σ(0)(X) + Tcorr(X) satisfies the
first Eq. of 3.3.12. Similarly, for k = 2 the second Eq. of 3.3.12 is evaluated
using coefficients from σ(1)(X). Again, a correcting term would be added
to σ(1)(X) should the second Eq. not be satisfied. However, using modulo-2
addition results in 2σ2 = 0 since σ2 is multiplied with an even value. Therefore
Eq. two of 3.3.12 is satisfied when using σ(1), hence σ(2)(X) = σ(1)(X). In
general σ(k+1)(X) = σ(k)(X) when the (k + 1)-th Eq. of 3.3.12 is satisfied by
using coefficients from σ(k)(X). Any other case will result in a correction term
being added such that σ(k+1)(X) = σ(k)(X)+Tcorr(X) satisfies the first (k+1)
Eqs. of 3.3.12 [3]. Note that correction term, Tcorr(X), is calculated such that
σ(k+1)(X) is of minimal degree [3]. The BMA will run for at most 2t iterations
before σ(X) is found. Further details regarding the BMA are discussed in [3]
and [15].
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 42
The final step before finding polynomial e(X) in Eq. 3.3.9 is to locate the
roots of σ(X). This is done by substituting and testing whether σ(αi) = 0 for
i = 0, 1, 2, · · · , 2m − 2. After finding all αi for which σ(αi) = 0, the inverse
of these roots are calculated as required by Eq. 3.3.11. Error locations in a
codeword are then revealed by the exponents of these inverted roots. These
exponents coincide with exponents jv from error vector e(X) in Eq. 3.3.9.
Using Eq. 3.3.4, this e(X) is XORed with c∗(X) to obtain error free codeword
c(X). A GF element’s inverse is defined as [3] :
(αi)−1 = α2
m−1−i , i = 0, 1, 2, · · · , 2m − 1 (3.3.14)
Since evaluation of σ(αi) starts at i = 0, the exponent of (αi)−1 is j = 2m−1−i.
If codeword c∗(X) is stored in the decoder as shown in Eq. 2.5.10, error
correction on c∗(X) would take place from right to left. A specialised circuit
to perform error locating and correction has been developed by Chien [3] as
shown in Fig. 3.9. Before starting, the registers represented by box A gets
initialised to 1. Also the complete received codeword is present in a FIFO
register before starting. Firstly, the contents of registers A gets multiplied
by σ(X) coefficients in box B. These outputs go to a summation circuit at C
which evaluates σ(αi) as mentioned earlier. Note that σ0 = 1 is not included in
this summation. This causes the summation output to be ’1’ when σ(αi) = 0.
Bit c∗2m−1−i is then read from the FIFO after which it gets XORed with either
1 or 0, depending whether σ(αi) = 0 or not.
Algorithm 1 displays a pseudo code summary of the BCH decoder. After re-
ceiving a complete codeword c∗, the syndromes are calculated. This is followed
by the BMA which calculates error-locator polynomial σ(X). Using this σ(X),
the Chien circuit determines the error locations in c∗ and corrects them.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 43
Input : Codeword c∗ (n-bit vector)
Output: Message x (k-bit vector)
** Syndrome Calculation **
for i← 1 to 2t do
Divide c∗(X) by φi(X);
Store remainder as ri(X);
si = ri(αi);
end
** Calculate σ(X) with BMA **
Initialise : σ(0)(X) = σ0
for i← 1 to 2t do
sum = [
∑i−1
j=0(sj−i)(σj)] + (i)(σi)
if sum not equals 0 then
Calculate Correction Term Tcor;
σ(i) = σ(i−1) + Tcor;
else
σ(i) = σ(i−1);
end
end
** Chien Search **
for i← 0 to n− 1 do
sum = [
∑t
j=1((α
j)i)(σj)];
if sum equals 1 then
Change bit (n− i) of codeword c∗;
end
end
x = Strip (n− k) redundant bits from c∗;
Algorithm 1: BCH decoder algorithm.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 44
GF
Multiplier
α
Register
σ1 · · ·
· · ·
· · ·
Σti=1σiα
j
Codeword FIFO Register
c∗(X) = c0 + c1X + · · · + cnXn
Input
Output
GF
Multiplier
Register
GF
Multiplier
Register
α2 αt
B
C
A
GF
Multiplier
GF
Multiplier
GF
Multiplier
σ2 σt
Legend :
XOR
GF
Multiplier
Figure 3.9: A Chien search circuit [3].
3.4 LDPC FEC Design
The (511,484) BCH is the main FEC used during the aircraft based test. How-
ever, it has been shown in [1] that LDPC is also worth considering over con-
ventional linear block codes such as BCH. Parallel with BCH’s development,
LDPC FEC is implemented as an alternative to the (511,484) BCH.
Unlike BCH, LDPC doesn’t have a designed error correction capability.
Error correction performance depends on the construction technique of parity
check matrixH as mentioned in Section 2.5.3. Random irregular constructions
of H delivers the best performance [39] when using long block lengths of n >
10000 [15]. However, structured codes can outperform random and irregular
codes for short to medium block lengths where n < 10000 [39].
Work done in [1] indicates that half rate code LDPC delivers a good BLER
performance at low SNRs of 4 dB. Using row weight ωr and column weights
ωc, a regular LDPC code’s rate is given by :
R = 1− ωc
ωr
(3.4.1)
Among the half rate LDPC codes, (3,6)-regular codes have a low hardware
complexity decoder [40]. Column weights of ωc < 3 tend to deliver poor
BLER results [15] whereas ωc > 3 lowers the error floor for high SNRs ([41],
[42]) at the expense of elevated decoder complexity [40]. Error floors refer to a
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 45
phenomenon that the slope of a code’s BLER curve decreases when increasing
the SNR [43]. Irregular LDPC codes also have an encoding complexity that
is quadratic with block length n [44]. In general, short regular LDPC codes
can be encoded in almost linear time when using the technique suggested by
[30]. By using a regular QC-LDPC structured code, this encoding complexity
is further reduced to linear time [4]. Due to QC-LDPC’s structure, an encoder
can use simple shift registers to perform encoding [4] . This avoids having
to store large generator matrices in memory. This implementation will use a
(3,6)-regular QC-LDPC code having block length n = 512.
A QC-LDPC code’s H matrix is constructed in Fig. 3.10a. Sub-matrices
Hi,j are square v × v circulant matrices where 1 ≤ i ≤ a, 1 ≤ j ≤ b, a = k/v
and b = n/v. A circulant matrix has each row equal to the previous row being
rotated one bit position to the right. Similarly, each column is equal to the
previous being rotated one bit position downwards. Circulant matrices having
a row weight of one are referred to as circulant permutation matrices as shown
in Fig. 3.10b.
H =
 H1,1 · · · H1,b... . . . ...
Ha,1 · · · Ha,b

(a) Circulant composition of a
QC-LDPC parity check marix
H.
Hi,j =

0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
0 1 0 0 0

(b) An example of a 5x5 circu-
lant permutation matrix.
Figure 3.10: A QC-LDPC parity matrix structure.
Two classes of decoders exist, namely hard decision and soft decision. Sim-
ilar to the Chien error correction circuit of BCH in Section 3.3.2, hard deci-
sion decoding directly changes a bit’s value in the received codeword [15]. It
provides the simplest decoding architecture [15] at the expense of BER per-
formance as shown in [1] and [15]. Performance penalties up to 1.5 dB at the
same BER can be expected when using hard decision instead of soft decision
decoding [1]. Soft decision decoders gather bit-probability information from
the modem before starting a decoding cycle. Bit-probability provides a con-
fidence value in the correctness of each bit as received by the modem. These
confidence values are then used to solve each parity check equation of H such
that the most probable codeword cnew is found for which cnewHT = 0 [1].
Since soft decision decoding delivers superior BER performance compared to
hard decision decoding [15], it will be used in this design.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 46
3.4.1 Soft-Decision Decoding
Section 2.5.1 mentioned each row of H representing a parity check equation.
Also each column position of H coincides with a bit’s position in the received
codeword c∗. Since this LDPC implementation has a column weight ωc = 3,
each bit of c∗ is involved in three different parity check equations. The set of
parity check equations in which bit c∗i participates is denotedMi for 0 ≤ i ≤ n.
A set which excludes parity check equation p is written as Mi/p where p ∈Mi.
Similarly, a row weight ωr = 6 indicates that six bits of c∗ are used per parity
check equation. Such a set of bits participating in parity check sj is denoted
Nj for 0 ≤ j ≤ k. Set Nj/i excludes parity bit c∗i , where c∗i ∈ Nj.
Soft decision decoding consists of two major steps; horizontal step updating
and vertical step updating. During these steps, messages containing probabil-
ities are passed between check nodes and variable nodes in the Tanner graph.
Each step is listed below :
• Horizontal Updating : Each check node Cj receives a message from
all its connected variable nodes Vi of set Nj. In return, a message is sent
from Cj to each connected Vi, which contains the probability Pr(sj =
0|c∗i = x) where x is either 1 or 0. This message indicates the probability
of parity check equation j being satisfied given the value of bit c∗i . Only
messages from variables nodes of set Nj/i are considered when sending
Vi a message.
• Vertical Updating : During horizontal updating, each variable node
Vi received messages from its connected check nodes Cj of set Mi. A
message is then generated for each check node Cj, which contains the
probability Pr(c∗i = x|sj = 0) for x = 1 or x = 0. This message
represents the probability of bit c∗i being its its current value given that all
parity check Eqs. sj involving bit c∗i are satisfied. Again, only messages
from check nodes of set Mi/j are considered when sending Cj a message.
A message sent from check node Cj to variable node Vi is denoted rji. Similarly,
message qij is sent from Vi to Cj. These messages along with their directions
are shown in Fig. 3.11. Note that qij = qij(0) + qij(1) = 1 where qij(0) is the
probability of bit c∗i = 0 and qij(1) the probability of c∗i = 1 [15]. Both qij(0)
and qij(1) are sent when sending qij.
When initialising the decoder, bit-probability information are supplied by the
modem. These probabilities are denoted pi(0) = Pr(c∗i = 0) and pi(1) =
Pr(c∗i = 1) and will serve as initial values for vertical update messages qij such
that qij(0) = pi(0) and qij(1) = pi(1). By using this pi along with all qij, a
value qi is calculated from which an initial codeword c∗ is determined. Having
qi = qi(1) + qi(0) = 1, bit c∗i = 1 when qi(1) > 0.5, or c∗i = 0 when qi(0) > 0.5.
This is followed by evaluation of syndrome s = c∗H. Should s 6= 0, a decoding
cycle starts otherwise the codeword is error free when s = 0.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 47
C0 C1
V0 V1 V2 V3
rji qij
Figure 3.11: Message passing along the edges of a Tanner graph.
A decoding cycle may run many iterations before finding an error free
codeword. The first step in an iteration performs horizontal updating for
all check nodes. This is followed by vertical updating where new messages
qij are generated. Calculating qi as stated earlier, a new codeword cnew is
determined before evaluating s = cnewH. If s 6= 0 a new iteration is started.
Decoding is successful as soon as s = 0. Failure is declared when s 6= 0
and a predefined maximum number of iterations are exceeded. The decoding
technique described in this Section is known as a belief propagation (BP) or
sum-product (SP) decoder.
3.4.2 Min-Sum Decoder
The SP decoder’s messages are all expressed in the probability domain. Calcu-
lating messages rji and qij requires many multiplication operations [15] which
consumes lots of logic when implemented on a FPGA. Working in the log
domain transforms many of these multiplication operations into simple addi-
tion operations [27]. Specifically bit-probability Pr(c∗i = x) is expressed as a
log-likelihood ratio (LLR) :
LLR = log
(
Pr(c∗i = 0)
Pr(c∗i = 1)
)
= log
(
Pr(c∗i = 0)
1− Pr(c∗i = 0)
)
(3.4.2)
Message rji of the SP decoder is now replaced by a log domain message Lji :
Lji = −2tanh−1
 ∏
j′∈Mi/j
tanh
(
Zij′
2
) (3.4.3)
Similarly, message qij is replaced by :
Zij = (Lc)(c
∗
i ) +
∑
i′∈Nj/i
Lji′ (3.4.4)
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 48
Term Lc is known as the channel reliability and is given by :
Lc = 2
√
(R)(Eb)
σ2
(3.4.5)
where R is the FEC code rate, Eb the energy per bit and σ2 the noise power.
Term qi, from the previous Section, is also replaced by Yi [15]:
Yi = (Lc)(c
∗
i ) +
∑
i∈Nj
Lji (3.4.6)
However, Eq. 3.4.3 still contains products and non-linear tanh() functions that
would consume lots of logic on a FPGA. Secondly, Lc of Eq. 3.4.5 requires
noise power which is difficult to measure [27]. By using the minimum-sum (MS)
algorithm, a LLR SP decoder’s hardware complexity can be further reduced.
Here, Eq. 3.4.3 can be estimated as :
Lji ≈
 ∏
j′∈Mi/j
sign(Zij′)
 . (min|Zij′|) (3.4.7)
Value Lji in Eq. 3.4.3 is primarily dominated by the smallest value of Zij′ ,
hence the approximation in 3.4.7. Term sign(Zij′) refers to the +1 or -1 sign of
Zij′ . Also the sign() function has a low complexity implementation in hardware
[39]. In Eq. 3.4.5, term (Lc)(c∗i ) represents the initial value of Zij obtained
from the modem. However, in MS channel reliability is omitted and (Lc)(c∗i )
replaced by the LLR of c∗i [27], denoted (c∗i )LLR. An additional improvement
to MS is the normalised MS algorithm. Since Lji is approximated in the MS,
its value tends to be slightly higher than expected [29]. Term Lji is replaced
with :
Lji → αLji (3.4.8)
where α < 1 is a scaling value. The best α for a LDPC implementation is
determined by simulation [1]. Typical values for α range between 0.7 and 0.9
[29]. By terminating scaling a few iterations before decoding finishes, provides
about 0.25 dB improvement i.t.o BER performance [29]. It has been found that
α shrinks the value of Lji too much, when a maximum of 20 to 30 iterations
per decoding cycle are used. This thesis will use a normalised MS decoder
with early termination of α for a FPGA implementation of LDPC. Algorithm
2 explains a soft decision MS decoder.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 49
Input : Codeword c∗ (n-bit vector)
Output: Message x (k-bit vector)
** Initialise **
foreach bit c∗i in codeword c∗ do
foreach row j where Hij = 1 do
Zij ← (c∗i )LLR;
end
end
while iteration < MAX_ITERATIONS and done == 0 do
** Check node update **
foreach row i in H do
find first and second minimum |Zij| in row i;
calculate
∏
signs of terms Zij;
foreach j where Hij = 1 do
sign(Lji) = (
∏
signs)× sign(Zij);
if |first minimum| == |Zij| then
|Lji| = α× |second minimum|;
else
|Lji| = α× |first minimum|;
end
end
end
** Variable node update **
foreach column i in H do
sum=0;
foreach j where Hij = 1 do
sum = sum + Lji;
end
foreach j where Hij = 1 do
Zij = sum− Lji
end
(c∗new_i)LLR = (c∗i )LLR + sum;
end
** Compute new codeword **
foreach element i in (c∗new)LLR do
if (cnew_i)LLR > 0 then
cnew_i = 0;
else
cnew_i = 1;
end
end
s = cnew ×H;
if s == 0 then done = 1;else done = 0;
iteration+ +;
end
Algorithm 2: Minimum-Sum LDPC decoder algorithm.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 50
3.4.3 Parity Check Matrix Construction
Minimum Hamming distance and girth are the main parameters that influence
a QC-LDPC code’s BLER performance. A QC-LDPC code’s minimum Ham-
ming distance is bounded by the column weight of H [45]. Unlike a random
construction of H, this distance does not increase linearly as block length n
increases for QC-LDPC [45]. A good Hamming distance primarily lowers the
error floor phenomenon mentioned before. High girth allows the decoder to
converge quickly towards a correct codeword. The (3,6)-regular LDPC in [1]
achieves a BER of 10−6 at low SNRs of 4 dB without encountering an er-
ror. Therefore only girth optimisations will be performed when constructing
H. By using the encoding technique from [30], matrix H is designed to be
approximately lower triangular :
H =
[
A B T
C D E
]
(3.4.9)
with T a lower triangular matrix having only zeros above its diagonal. Having
this H, codeword c is given as :
c = [x, p1, p2] (3.4.10)
where x is a k-bit message. Parity vectors p1 and p2 have a combined length
of n− k bits and is given by :
pT1 = −φ−1[−(E)(T−1)(A) + C]xT (3.4.11)
pT2 = −T−1[(A)(xT ) + (B)(pT1 )] (3.4.12)
with
φ = −(E)(T−1)(B) +D (3.4.13)
Dimensions of the sub-matrices in Eq. 3.4.9 appear in Fig. 3.12.
Dimension g is conveniently chosen such that g = n− k − g. Having k = 256
and n = 512, matrices B, T,D and E are all square of size g×g where g = 128.
Circulant permutation matrices of weight ω = 1 is chosen as the building blocks
of H. This choice is important when performing girth optimisation as outlined
in [45]. Permutation matrices cannot have dimension 128× 128, otherwise the
row weight of H will be ωc = 2. Dividing dimension k = 256 by three does
not result in a integer. Choosing a 64×64 permutation matrix leads to ωc = 4
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 51
k n− k − gg
n− k − g
g
A B T
C D E
Figure 3.12: Dimensions of sub-matrices within H.
H =

H00 H01 H02 H03 H04 H05 H06 H07
H10 H11 H12 H13 H14 H15 H16 H17
H20 H21 H22 H23 H24 H25 H26 H27
H30 H31 H32 H33 H34 H35 H36 H37

A B T
EDC
Figure 3.13: Layout of H using the template in Fig. 3.12.
and ωr = 8. Such a H is given in Fig. 3.13, with Hij a 64 × 64 matrix for
0 ≤ i ≤ 3 and 0 ≤ j ≤ 7.
Permutation matrix Hij is rewritten as Iij(v) for 0 ≤ v < 64, where v
indicates the column position of the first row’s 1. Letting some Iij(v) = 0,
a (3,6)-regular code can be obtained. Matrix φ in Eq. 3.4.13 is designed to
be an identity matrix, which guarantees invertibility as required in Eq. 3.4.11
[46]. By using the technique from [46] to find this φ, leads to H in Eq. 3.4.14.
H =

I00(b0) 0 I02(b1) I03(b2) I04(b12) I05(b13) I06(0) 0
I10(b3) I11(b4) 0 I13(b5) 0 I15(a0) I16(b15) I17(0)
I20(b6) I21(b7) I22(b8) 0 I24(0) 0 I26(a1) I27(b17)
0 I31(b9) I32(b10) I33(b11) I34(a3) I35(0) 0 I37(a2)

(3.4.14)
Matrices Iij(v) having v = bl are generated at random. Values v = ak can be
solved in terms of bl such that φ in Eq. 3.4.13 is an identity matrix. A Matlab
script has been developed to generate this H after which girth optimisations
are applied to. Using the girth optimisations explained in [45], cycles of up to
length 10 has been removed from H, therefore ensuring a girth of 12. These
optimisations involve comparing all bl and ak against certain criteria as set in
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 52
[45]. If some bl have to be changed to satisfy this criteria, all ak are regenerated.
The script runs until all girth criteria are satisfied.
3.5 IPC Design
A message passing scheme is selected for IPC between software protocol layers.
Since it uses a client-server relationship, it provides synchronisation between
two communicating processes. Message passing is recommended in the QNX
OS for its speed [11]. Among all QNX’s IPC schemes, message passing is the
most flexible. For example, an entire message or only part of it can be read by
a receiving process. However, Linux Ubuntu 7.10 does not come with message
passing as standard. Therefore a basic message passing library for Ubuntu will
be designed by using other IPC schemes.
Linux supports two different IPC standards, namely System V and portable
operating system interface for Unix (POSIX). Since System V is outdated,
this design uses the latter for portability between modern POSIX operating
systems. Shared memory is chosen, as it is the fastest IPC scheme in Linux. As
mentioned in Section 2.4.1.1, shared memory has no synchronisation support
for concurrent data access between two processes. Combining a semaphore
with shared memory will ensure synchronisation.
A server will sleep until a client initiates a communications transaction with
it. After a client sent its message and received a reply from the server, a
transaction is complete. On startup, a server process creates a shared memory
structure along with three binary semaphores, SEM0, SEM1 and SEM2, as
shown in Fig. 3.14. On creation, SEM0, SEM1 and SEM2 are initialised with
values 1,0 and 0 respectively. Afterwards the server calls sem_wait(SEM1)
that requests a lock from SEM1. Since there is no lock available, the server
sleeps.
SEM0
SEM2
Shared Memory
Client
Process 0
Client
Process 1
Server
Process
SEM1
Figure 3.14: Message passing IPC using POSIX semaphores and shared memory.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 53
Only one client may communicate with a server at any given time, hence mul-
tiple clients compete for SEM0 at first. The client possessing this lock may
initiate a communications transaction with the server. A client writes data to
the shared memory before calling sem_post(SEM1) and sem_wait(SEM2).
This causes the client to block and the server to wake up. After processing
the client’s data in shared memory, the server calls sem_post(SEM2) and
sem_wait(SEM1), putting itself to sleep and waking the client. A commu-
nications transaction will consist of many such function calls from both client
and server. A complete transaction is listed as follow :
1. Client : Write size of message in bytes to shared memory.
2. Server : Read this message size.
3. Client : Write message data.
4. Server : Read message data. Process data. Do not reply yet.
5. Server : Write size of reply message.
6. Client : Read size of reply message.
7. Server : Write reply message data.
8. Client : Read reply message data.
After completing the procedure above, the client releases SEM0 while the
server sleeps again.
3.6 Software Protocol Layers
The data link protocol sublayer in Fig. 2.4 implements standards from the
TM Space Data Link Protocol of the CCSDS. This thesis will refer to this pro-
tocol sublayer as TM. The transport layer in Fig. 2.4 implements a mission
specific stop-and-wait ARQ protocol. Other standardised transport layer pro-
tocols such as the TCP based Space Communications Protocol Specification
Transport Protocol (SCPS-TP) and CCSDS File Delivery Protocol (CFDP)
will add unnecessary complexity in this design. The CFDP has the ability to
manage files remotely over a space link. However, this is already done by the
SCSS running on the SH4. The IS-HS 2 satellite only communicates with one
ground station at a time, hence the network-management features of SCPS-TP
would be excessive. Fig. 3.15 depicts the interaction between the application
layer, ARQ, TM and the channel coding layer. The dashed lines indicate data
flow direction between two layers.
Data transfer between software layers happens via message passing IPC as
discussed in Section 3.5. A process receiving data acts as a server, while the
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 54
ARQ Protocol
(Transport Layer)
TM Protocol
(Data Link Layer)
FPGA
(Channel Coding)
TX
TX
TX
RX
RX
RX
Software
Hardware
Application Layer
Figure 3.15: Interaction between OSI software layers.
sender is a client. The interface between TM and the channel coding layer
differs for both SH4 and FIT-PC platforms. On the SH4 a QNX resource
manager [35] is used to write and read from the FPGA whereas the FIT-PC
uses USB-to-RS232 modules to communicate with the channel coding layer.
Each module contains both a receive (RX) and transmit (TX) thread. In
general the TX thread creates and transmits data. Similarly, the RX thread
processes and extracts data from the received packet or frame. Variable length
packets are sent and received by ARQ. Fixed length frames are sent and re-
ceived by TM. A detailed discussion of ARQ packet and TM frame structures
are given later in this section.
In Fig. 3.15, the application layer sends file names to ARQ’s TX thread
for transmission over the link. At the receiver, ARQ’s RX thread notifies the
application layer of received files. Also note that ARQ’s RX and TX threads
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 55
are connected to each other. Upon receiving an ARQ packet, the receiver
has to acknowledge reception of this packet. A request for generating and
transmitting such an acknowledge is sent from RX to TX via this path. Both
the application layer and ARQ’s RX thread communicates with ARQ’s TX
thread, hence the reason for semaphore SEM0 in Fig. 3.14 of Section 3.5.
3.6.1 TM Protocol
The TM protocol manages data being sent and received over a space link.
For example, it could control many devices’ data streams, known as virtual
channels (VC), simultaneously over a single space link. Data sent and received
by TM are fixed length frames of k′-bits. These frames are directly sent to
and received from the channel coding layer in the FPGA. Communication takes
place between one ground station and the satellite at any given moment, hence
VC support is not required for the IS-HS 2. The most important function of
TM is to split and reassemble ARQ packets at the transmitter and receiver
respectively. A TM frame shown in Fig. 3.16 consists of a header, packet
length, data segment and a CRC control field. Assuming (511,484) BCH FEC,
the TM frame length of k′ bits is less than k = 484. Both the SH4 and FIT
PC address data on byte level, hence k−k′ = 4 bits are not used in the 511-bit
block. Information necessary for TM frame processing is present in the 48-bit
header. The packet length field indicates the length in bytes of a new ARQ
packet. Specifically, this field is only present in a frame containing the start
of an ARQ packet. The data section contains ARQ packet segments. Finally,
the 16-bit CRC field is set by the channel coding layer as stated in Section
3.2.1. Total frame overhead in bytes for TM is :
Novh_TM = Nhdr_len +Narq_len +Ncrc_len (3.6.1)
= 6 + 4 + 2
= 12 bytes
The 48-bit header is expanded in Fig. 3.17. Three fields are of interest here :
the master channel frame count, virtual channel frame count and first header
pointer (FHP). The remaining header fields are set as recommended by [1].
The master and virtual channel counts are the same as this design does not
use virtual channels. These counters are incremented by one for each TM
frame sent. A receiver uses them to detect a dropped frame in a sequence of
received frames. A dropped frame causes TM to discard all received frames
for the ARQ packet currently being reassembled.
The FHP indicates the offset after the frame header where a new ARQ
packet starts. At this offset the 32-bit packet length field shown in Fig. 3.16
is present, followed by an ARQ packet. Setting FHP=0x7FF, a whole TM
frame’s data section is spanned by a packet. A FHP=0x7FE indicates idle or
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 56
Data
Section
Header CRC
Field
48-bit 384-bit 16-bit
k′ = 480 bits
Packet
Length
32-bit
Figure 3.16: TM frame layout.
Master Channel
Identifier
Spacecraft
Identifier
10 bits
Transfer
Frame
Version
Number
2 bits
Virtual
Channel
Identifier
3 bits
Master
Channel
Frame
Count
8 bits
Virtual
Channel
Frame
Count
8 bits
Operational 
Control 
Field
Flag
1 bit
Transfer Frame
Data Field Status
Transfer 
Frame
Secondary
Header
Flag
1 bit
Synch.
Flag
1 bit
Packet
Order
Flag
1 bit
Segment
Length
ID
2 bits
First
Header
Pointer
11 bits
Figure 3.17: TM header layout [1].
unusable data in the whole frame. Such a frame is sent to synchronise the
receiver and transmitter’s master frame count. Having the FHP < 0x7FE, in-
dicates an offset as mentioned earlier. The FHP works well with ARQ schemes
that transmit multiple packets at a time. Due to the stop-and-wait ARQ’s op-
eration, only one packet needs to be sent. Therefore, in this implementation,
the FHP points to idle data when the first packet ends.
Suppose a 110 byte ARQ packet has to be sent across the space link. The
generated TM frames are shown Fig. 3.18. Note that the shaded area in the
data section of frame 2 indicates idle data.
A detailed TM frame processing implementation is provided in the next chap-
ter.
3.6.2 ARQ Protocol
Since TM cannot guarantee reliable data transfer, an ARQ strategy existing
in the transport layer will ensure reliable data transfer across the space link.
ARQ’s transmitter receives a file from the application layer, after which it
gets split into ARQ packets to be sent over the link. A receiving ARQ then
assembles this file from the received packets.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 57
110 byte ARQ packet
52 Bytes 56 Bytes 2 Bytes
Header
FHP =
0
Data CRC Header
FHP =
0x7FF
Data CRC Header
FHP =
0x002
Data CRC
TM Frame 0 TM Frame 1 TM Frame 2
Packet
Length
110
Figure 3.18: Setting FHP when ARQ packet spans multiple TM frames.
In an ARQ strategy, packets sent by the transmitter must be acknowl-
edged by the receiver. Transmitted packets contain a sequence number in its
header. Upon receiving this packet, the receiver puts this sequence number
in an ARQ acknowledge packet that is destined for the transmitter. Various
ARQ strategies exist for handling this send and acknowledge scheme [47] :
• Stop-And-Wait ARQ : Only one packet is sent at a time. The trans-
mitter measures the duration between sending a packet and receiving its
acknowledge. Should a threshold duration be exceeded, the transmit-
ted packet or its acknowledge is assumed to be lost. The same packet
is then retransmitted. Data transmission is aborted when a maximum
retransmit count is exceeded for the current packet.
• Go-Back-N ARQ : This is an instance of sliding window ARQ. In
this window a sequence of l packets are sent whilst having l outstand-
ing acknowledges. Packets are sent without waiting for an acknowledge
between packet transmissions. Acknowledges are received in the same
order of packet transmission. However, should packet i for 0 ≤ i ≤ l
contain errors, the receiver will reject all subsequent received packets.
The receiver sends a reject message, indicating it expects packet i. All
packets from i are then retransmitted. Should the reject message get
lost, a time out similar to stop-and-wait ARQ causes all packets from i
to be retransmitted.
• Selective-Reject ARQ : The transmitter continually sends packets
and their respective acknowledges. A negative acknowledge message is
received when a packet at the receiver contains errors. Unlike go-back-N
ARQ, only the rejected packet is retransmitted. Acknowledges that get
lost, cause a time out at the transmitter after which the particular packet
is retransmitted.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 58
Go-Back-N and selective reject schemes have the best channel utilisation of all
ARQ types [47]. Delays between subsequent packet transmissions are minimal,
hence the data throughput is high. This implementation is expected to work
with file sizes of 10 kB and less, therefore a high throughput ARQ scheme will
be unnecessary. Due to the simplicity of a stop-and-wait ARQ [47], it is chosen
for this design. It is shown in the next chapter, that Stop-And-Wait ARQ has
sufficient data throughput for transferring 10 kB files during an aircraft pass.
3.6.2.1 Packet Structure
In Fig. 3.19 an ARQ packet consists of both a header and data section. Each
ground station and the satellite have a unique 32-bit ID assigned to it. A
transmitter includes both the receiver’s and its own ID in the destination and
source address fields respectively. The satellite has knowledge of each ground
station’s ID. Since the satellite initiates communication, the ground station
will get the satellite’s ID from the initial ARQ packets. Ground stations may
be closely located to each other and could all receive the same ARQ packet.
By checking the packet’s destination ID, a ground station can decide if this
packet is intended for it or not.
Header Data
Destina-
tion
Address
Sequence
Number
Data
Length
Source
Address
Packet
Type
32-bit 32-bit 32-bit 32-bit 32-bit Variable
Figure 3.19: An ARQ packet structure.
The packet type field distinguishes between a normal or acknowledge type.
Normal types are further divided into four subtypes : first, message, last
and single. Type first signals the first packet of a new file. A packet not
being either the first or last is classified as message. Type single is used for
special cases when a file is only one packet long. Finally, type last is the final
packet of the current file.
The sequence number field is used by the send-acknowledge scheme. After
sending a packet and receiving its acknowledge, the transmitter increments
this sequence number for the next packet. When acknowledging a sequence
number, the receiving ARQ knows the next sequence number to expect. Should
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 59
the next packet’s number not match this expected number, a retransmitted
packet has been received. Retransmitted packets only get acknowledged as
their contents are discarded by the receiver. Lastly, the packet length field
indicates the length in bytes of the data section. The total packet overhead in
bytes is :
Novh_ARQ = Nsrc_adr +Ndes_adr +Npck_typ +Nseq_num +Npck_len (3.6.2)
= 4 + 4 + 4 + 4 + 4
= 20 bytes
A detailed packet processing flow diagram is provided in the next chapter.
3.6.2.2 Round Trip Time Calculation
A round trip time (RTT) refers to the delay between sending an ARQ packet
and receiving its acknowledge [3]. This parameter accounts for delays intro-
duced by data processing in the various OSI layers, and signal propagation
delays through the physical link. The RTT will be the lower bound on ARQ’s
time out value. Both satellite and ground station platforms consist of many
hardware devices and software layers, making it difficult to calculate an accu-
rate RTT. This parameter is measured when both ground station and satellite
platforms are implemented. A measurement setup is shown in Fig. 3.20. The
current time tA is taken when an ARQ packet leaves A. This packet travels
through the downlink until it reaches ARQ at B on the ground station. An
acknowledge travels from B over the uplink before reaching C where the cur-
rent time tC is taken. Time stamps tA and tC are accurate to the millisecond
when using Unix’s clock_gettime() function. Finally, RTT = tC − tA.
3.7 FEC Block Error Rate Simulation
A receiver’s BLER at different Eb/No SNRs is simulated when transmitting
data over a noisy AWGN channel. Simulation is performed in Matlab and
on FPGA for both BCH and LDPC FEC schemes. Each simulation platform
consists of the following components :
• Message Generator : Creates a random k-bit message x.
• FEC Encoder : Encodes message x into n-bit codeword c for either
LDPC or BCH.
• QPSK Modulator : Modulates codeword c.
• AWGN Channel : Adds Gaussian distributed amplitude noise to both I
and Q channels.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 60
OSI
Software
Layers
OSI
Software
Layers
OSI
Hardware
Layers
OSI
Hardware
Layers
ARQ ARQ
Satellite
Ground
Station
900 MHz
Downlink
2.4 GHz
Uplink
A BC
Figure 3.20: Packet round trip time measurement.
• QPSK Demodulator : Demodulates the noisy QPSK signal.
• FEC Decoder : Decodes the received codeword c∗ using either LDPC or
BCH.
A detailed flow diagram of the simulation is given in Fig. 3.21. At first,
the range of simulated SNRs along with the maximum number of iterations
per simulated SNR are created. This is followed by the simulation’s main loop
where a message is created, encoded, sent over an AWGN channel and decoded
for each iteration.
Beginning at the main loop, a random message x having k bits is generated.
This message gets encoded for either a BCH or LDPC scheme. The resulting
codeword c is then translated to QPSK symbols by using Fig. 3.1a as reference.
Symbol amplitude Aqpsk is derived from the simulated SNR by using Eq. A.2.9
:
A2qpsk
σ2
=
2Eb
No
(3.7.1)
where noise variance σ2 = 1. After adding noise to the QPSK symbols, de-
modulation is performed. BCH uses the constellation diagram in Fig. A.1a to
translate the received symbols into codeword c∗. Section 3.4.2 explained that
MS LDPC decoders require input LLRs for each bit in c∗. These LLRs are
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 61
Begin
Set simulated SNR
range, SNR_Sim[0...max]
Zero error counters
for each simulated SNR
Err_Cnt[0...max]=0
index=0
iterations=0
SNR = SNR_Sim[index]
Set maximum iterations
for each simulated SNR
Itr_Max[0...max]
iterations
< Itr_Max[index]
?
Create Message
FEC Encode
QPSK Modulate
Add AWGN noise
QPSK Demodulate
FEC Decode
Count Errors
Err_Cnt[index]++
Err_Cnt[index] = 0
?
index++ index > max
Calculate BER
Plot BER vs SNR
End
Yes
No
Yes
No
Yes
No
Figure 3.21: General operation of simulator.
calculated inside the decoder by only using Pr(c∗i = 0) as measured by the
demodulator.
An example of this measurement is now explained. Fig. 3.22 shows the
position of a received symbol, SR, on the QPSK constellation diagram. As SR
falls in S2’s quadrant, SR is interpreted as S2. Having S2 = [1, 0] = [b0, b1],
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 62
Q
I
Decision
Boundary
Decision
Boundary
S1
S2
S3
S4
00
10
11
01
pi
4
SR
φ
Figure 3.22: Bit probability decision making.
probability values are assigned to bits b0 and b1. Since SR lies between S1 and
S2, the demodulator only decides between b0 = 1 or b0 = 0 because b1 = 0
for both S1 and S2. Using angle φ in Fig. 3.22 to decide on b0, the following
probabilities for b0 and b1 are obtained :
Pr(b0 = 0) = 1− Pr(b0 = 1) = 1−
( pi
2
− φ
pi
2
)
(3.7.2)
Pr(b1 = 0) = 1 (3.7.3)
A BCH decoder receives c∗ and either declares failure or success after decoding.
In case of a failure, Err_Cnt[index] gets incremented by one. The LDPC
decoder receives a vector containing Pr(c∗i = 0) for 0 ≤ i ≤ n. After decoding
completes, c is compared with the returned c′. Counter Err_Cnt[index] gets
incremented when c 6= c′.
After completing Itr_Max[index] iterations for the current SNR, the next
SNR is selected. Prior to this step, Err_Cnt[index] gets incremented when
Err_Cnt[index] = 0. This prevents the BLER graph not displaying Err_Cnt
correctly on a log scale. When all SNRs have been simulated, the BLER for
each SNR is calculated :
BLER =
Err_Cnt[index]
Itr_Max[index]
(3.7.4)
where (Itr_Max[index]) represents the total FEC blocks received by the de-
coder. The BLER values of Eq. 3.7.4 are then plotted against all SNR values.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 63
3.7.1 Matlab Simulation
The software implementation of Fig. 3.21 is done in Matlab. Message x is a
1× k row vector containing ones and zeros. These binary values are randomly
generated using Matlab’s rand() function.
Matlab also comes with a BCH toolkit, hence encoding is done with func-
tion c = bchenc(x, n). Functions have been written for LDPC encoding which
performs the encoding as per Section 3.4.3.
Since QPSK has two bits per symbol, the modulated codeword w is a (1×
n/2) row vector. This vector has complex numbers whose real and imaginary
parts represent the I and Q channels respectively. Using Matlab’s randn(),
Gaussian noise having variance σ2 = 1 is added to both I and Q in w.
Using the BCH toolkit, x = bchdec(c∗, k, n) is used for BCH decoding.
A function implementing the MS decoder has been written for LDPC. As
the iterative message passing decoder requires the parity matrix’s structure,
function c’ = MinSum(c∗prob,H) is called with H as argument. Argument
c∗prob is a (1× n) vector of Pr(c∗i = 0).
3.7.2 FPGA Simulation
This hardware simulator verifies Matlab’s BLER simulation results by using
both BCH and LDPC’s implementation on FPGA. Fig. 3.23 illustrates the
simulator layout. A PC running a C application houses some components of
the simulator in Fig. 3.21. An Altera Cyclone III EP3C120F780 FPGA, as
part of a Cyclone III DSP development board, contains both the encoder and
decoder for either BCH or LDPC. Bidirectional communication between the
PC and FPGA occurs via a high speed 1.5 Mbps FTDI FT-232BL USB-To-
UART module.
Encoder
(BCH or LDPC)
Decoder
(BCH or LDPC)
FPGA
UART
Controller
C Simulation
Application
PC
USB-To-
UART
Figure 3.23: Hardware based BER simulation.
Sending data from the PC to the FPGA is preceded by a control byte bcntl. This
byte’s value signals the FPGA to route incoming data to either the encoder
or decoder. Values of bcntl are listed in Table 3.2. After receiving all data
for either the encoder or decoder, the FPGA acknowledges its reception by
sending byte 0x00 to the PC.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 3. DETAIL DESIGN 64
Value of bcntl Description
0 Send data to encoder.
2 Send data to decoder.
Table 3.2: PC to FPGA control byte values.
During simulation, message x is created on the PC as an array of bytes having
a total of k bits. These are created at random using the standard rand()
function. After sending x to the FPGA for encoding, the PC waits for the
returned codeword c. By calling POSIX read() on the UART, the PC sleeps
until it receives data from the encoder.
Similar to Matlab’s simulation, c gets modulated after which Gaussian
noise is added. Since the C language doesn’t come with Gaussian random
number generators, the GNU scientific library for C is used to generate such
numbers. After demodulation, c∗ is sent for decoding. The hardware decoder
module for either LDPC or BCH does not declare a decoding failure. It simply
discards a codeword c∗ without notifying external components. Therefore the
PC sets an interrupt timer, before calling read() to wait for the decoder. This
timer allows the C application to wake up and continue simulation instead of
waiting indefinitely due to a decoding failure.
After simulation, the BLER data is saved to a file. Using Matlab, this data
is plotted for comparison to the Matlab simulation results. These simulation
results are shown in Sections 5.2 and 5.3 of Chapter 5.
3.8 Summary
A FEC block length of n = 511 bits have been determined in this chapter.
Implementing a (511,484) BCH, capable of correcting t = 3 errors, would be
sufficient for the estimated link quality. Detailed descriptions of the channel
coding modules for both satellite and ground station platforms, have also been
provided. Pseudo code segments also described the behaviour of both BCH
and LDPC decoders. A LDPC code implementing a soft decision MS decoder
will provide superior performance compared to a hard decision based decoder.
Applying the two optimisation strategies mentioned, will further enhance its
performance. A message passing IPC scheme that uses shared memory and
semaphores have been designed for Linux Ubuntu. This will provide a fast
and synchronised communications medium between ARQ and TM. Frame and
packet structures for TM and ARQ have also been presented. Finally, a Matlab
and FPGA simulation strategy for FEC implementations have been described.
Chapter 4 will provide hardware architectures for all channel coding mod-
ules. The message passing API for IPC is also described. Finally, detailed
frame and packet processing routines for TM and ARQ are also presented.
Stellenbosch University   http://scholar.sun.ac.za
Chapter 4
Implementation
This chapter starts off with a description of existing hardware configurations
for both satellite and ground station platforms. An implementation for channel
coding on a FPGA is then provided. Finally, the packet processing routines
for both TM and ARQ are provided.
4.1 Existing Hardware Layout
Section 2.2 showed which hardware modules are connected to each other for
both satellite and ground station platforms. This section explains some of
these components’ interfaces in detail which are necessary for implementation
of channel coding. Before starting, the term de− assert refers to a logic high
state or a ’1’ whereas assert indicates a logic low state or ’0’.
4.1.1 Satellite Platform
FPGASH4 OBC
Data 32-bit
Address 8-bit
CS 1-bit
RDnWR 1-bit
nRD 1-bit
nWR 1-bit
nIRL 1-bit
VHDL
Expansion
Port
Module
Expansion
Port
clk 1-bit
Glue
Logic
Glue
Logic
VHDLChannel
Coding
Other VHDL
Modules
Clock
Generator
XTend
Radio
VHDL
UART
RS-232
rst 1-bit
D32
DS1
ACK1
D32DS1ACK1
QNX
Resource
Manager
Figure 4.1: Channel coding on the satellite platform.
65
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 66
The Sun Space SH4 OBC interfaces with a Xilinx ML501 Virtex 5 FPGA
development board via its expansion port as per Fig. 4.1. A Digi International
9XTend-PKG-R radio used by the downlink is also connected via RS-232 to
the FPGA. Black connector lines between components indicate data direction.
Expansion port lines entering the FPGA are described and listed in Table
4.1. This interface provides bidirectional communication between the SH4
and FPGA. A VHDL expansion port module written by [35] allows the SH4 to
access a command register, status register and data FIFOs on the FPGA. The
command register is 32 bits wide and is used to reset all modules on the FPGA.
Modules are reset by de-asserting the red rst line. A transmit data FIFO,
buffers data between the SH4 and XTend radio whereas a receive FIFO stores
data from the channel coding module. The status register indicates whether
these FIFOs are full or empty. By asserting nIRL, the FPGA interrupts
the SH4 when the receive FIFO is full. Only when the SH4 reads the status
register, this interrupt gets acknowledged after which nIRL is de-asserted.
Expansion Port Line Description
Data CPU’s 32-bit data bus.
Address An 8-bit segment of the CPU’s address bus.
CS Chip Select line. Asserted when SH4 access
FPGA .
RDnWR De-asserted when SH4 reads from FPGA at
value on address bus. Asserted when SH4
writes to FPGA at value on address bus.
nRD Asserted when reading value from data bus.
nWR Asserted when writing value on data bus.
nIRL Interrupts SH4 when asserted.
clk Clock from SH4.
Table 4.1: SH4 to FPGA expansion port lines and their description.
Channel coding interfaces with this expansion port module and other FPGA
modules via glue logic. Glue logic represents a module that connects two mod-
ules, having different interfaces, with each other. Channel coding’s interface
consists of three entities : a 32-bit data bus D32, data strobe D1 and acknowl-
edge line ACK1. Note that these symbols’ subscripts indicate the number of
bits represented by each line. The expansion port’s FIFO contains a 32-bit
data input, hence channel coding also uses a 32-bit data bus.
4.1.2 Ground Station Platform
The FIT-PC connects to a Xilinx Spartan 3E starter kit FPGA board and a
9XTend-PKG-R radio via USB-To-RS232 modules in Fig. 4.2. Black connec-
tor lines indicate the direction of data flow between components.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 67
FPGAFIT PC
USB-To-RS232
USB-To-RS232
Command
Interface
Module
USB Glue
Logic
Glue
Logic
VHDLChannel
Coding
Other VHDL
Modules
Clock
GeneratorXTend
Radio
rst 1-bit
D32
DS1
ACK1
D32DS1ACK1
USB
Command
Interface
Applica-
tion
Figure 4.2: Channel coding on the ground station platform.
A command interface application on the FIT-PC transfers TM frames to the
FPGA. The FPGA command interface module, that were previously devel-
oped, receives and acknowledges these TM frames. This module may also
receive a reset command from the FIT-PC’s command interface application.
All FPGA modules are then reset by asserting the red rst line for a short
period.
Channel coding also interfaces via glue logic to the command interface
module and other FPGA modules. In order to keep the design procedure
simple, the same interface for channel coding’s modules is used as mentioned
in Section 4.1.1.
4.2 Channel Coding Implementation
4.2.1 Module Interface
Fig. 3.5 from Section 3.2 showed the order in which channel coding’s modules
are connected to each other. All modules connect with the same interface
which allows modules to be easily added, removed or replaced. No additional
glue logic is required when connecting any two modules with each other. This
design is very convenient when changing from BCH to LDPC and vice versa.
An interface for a general channel coding module is shown in Fig. 4.3. The
width in bits of each input and output is provided below the arrowed line. All
inputs and outputs are synchronised to the rising edge of the clock. Inputs
for clock, clk, and asynchronous reset, rst, are obtained from the blue and red
lines in Fig. 4.2. Data strobe ds_in is de-asserted when valid data is available
on dat_in. Input ds_in is rising-edge sensitive, meaning that the line must be
asserted for at least 1 clock cycle before de-asserting. After latching dat_in,
the module de-asserts acknowledge ack_out to inform the previous module
that dat_in has been received. Line ack_out is de-asserted for 1 clock cycle.
When outputting values on dat_out, data strobe ds_out is de-asserted until
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 68
clk
rst
1
1
ack out
1
ds in
1
dat in
32
32
dat out
1
ack in
1
ds out
General Channel Coding Module
(CRC, BCH, LDPC, ASM, Pseudo
Randomiser)
Figure 4.3: Interface of a general channel coding module.
acknowledge ack_in gets de-asserted. This is followed by assertion of ds_out.
Timing diagrams for receiving data on dat_in and output data on dat_out
are provided in Figs. 4.4a and 4.4b respectively. The vertical arrows on clk
indicate that all other lines are toggled on the rising edge of clk.
stable
clk
ds in
ack out
dat in
(a) Timing diagram when receiving data on dat_in.
stable
clk
ds out
ack in
dat out
(b) Timing diagram when outputting data on dat_out.
Figure 4.4: Timing diagram of the channel coding module from Fig. 4.3.
4.2.2 Module Internals
Each module implements a finite state machine (FSM), which controls when
the module inputs, outputs and processes data. A single VHDL process is used
for the FSM as recommended by Xilinx [48]. The CRC, pseudo randomiser
and ASM modules, processes data as it arrives on dat_in. In general this is
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 69
true for all channel coding modules except for FEC decoding, which requires
512 bits of data before processing starts. A general state diagram of a module
is provided in Fig. 4.5.
Figure 4.5: State machine diagram of the module in Fig. 4.3.
Four states exist namely Idle, Busy, Output and Reset. Upon de-asserting rst
in Fig. 4.3, the Idle state is entered. During reset, an expected data counter,
exp_dat, is set to the number of data bits the module expects. Since dat_in
is 32 bits wide, exp_dat gets decremented by 32, after data has been latched.
During Idle the module waits for valid data on dat_in. Upon receiving
data, the Busy state is entered where data processing occurs. To keep this
state diagram simple, the Busy state represents a collection of many sub-
states during data processing. After data processing finishes, the Output state
puts data on dat_out for the next module. Variable exp_dat is evaluated
before returning to Idle. Should exp_dat = 0, no more data is expected by
this module. A Reset state is then entered where all internal registers and
exp_dat are restored as done for an asynchronous reset.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 70
4.2.3 Parallel Data Processing
Both CRC and pseudo randomiser modules use shift registers when processing
their input data. As an example, Fig. 3.7 shows a serial implementation of the
randomiser’s register. Since the module accepts 32 bit data, it would require
32 clock cycles to process data with this register. To minimise this latency, a
parallel processing scheme is adopted. A technique proposed by [49] transforms
a serial shift register into a parallel shift register. Continuing the randomiser
example, a single shift and feedback operation is characterised by Eq. 4.2.1.
x’ = Tx (4.2.1)
x′8
x′7
x′6
x′5
x′4
x′3
x′2
x′1

=

0 1 0 1 1 1 1 1
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0


x8
x7
x6
x5
x4
x3
x2
x1

Vector x is the serial register’s state before a shift-feedback operation. Matrix
T applies a single shift and feedback operation to x in Fig. 3.7. Therefore,
when multiplying any given state of x with T, the next state x’ is obtained.
The i-th state of x, denoted x(i), is equivalent to :
x(i) = Tix (4.2.2)
Matrix T(i) is pre-computed in Matlab using modulo-2 arithmetic. Each row
j in T(i) indicates which bits of x is XORed to obtain the value of row j in
x(i). An example will now be given. Using Eq. 4.2.2 and assuming T from
Eq. 4.2.1, element x(2)8 from x(2) = [x
(2)
8 , x
(2)
7 , x
(2)
6 , x
(2)
5 , x
(2)
4 , x
(2)
3 , x
(2)
2 , x
(2)
1 ] is
computed as follow :
x
(2)
8 = x8 + x6 + x5 + x4 + x3 + x2 (4.2.3)
where ’+’ indicates XOR. Similarly the other elements from x(2) are calculated.
This technique allows x(2) to be computed in less than a clock cycle. Since the
randomiser register is only 8-bits wide, it can be configured to randomise a
32-bit word in less than a clock cycle. A 32-bit register, y32, is initialised with
values y32 = [x(24),x(16),x(8),x(0)], which have been pre-computed in Matlab.
Register y32 now contains the first 32 bits as output by a serial LFSR. Input
data from dat_in is bit-wise XORed with register y32, before outputting the
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 71
result on dat_out. Finally, y32 is updated by letting y32 ← T 32 × y32 such
that y32 = [x(24+32),x(16+32),x(8+32),x(0+32)].
The polynomial division circuit for CRC also uses a LFSR. By determining
T, it could parallelised in a similar way.
4.3 BCH Implementation
4.3.1 Polynomial Division Register
An example circuit in Fig. 4.6 divides polynomial c(X) by g(X) until r(X)
remains in the register. Letting g(X) = 1 + X + X2, results in r(X) having
an order of at most 1. Polynomial g(X) = g0 + g1X + g2X2 is depicted
as g = [g0, g1, g2] = [1, 1, 1] in Fig. 4.6. Similarly the coefficients of c(X)
are denoted c = [c0, c1, · · · , cn−1]. Register r is zeroed before division starts.
Starting at cn−1, register c is shifted into r, one bit per clock cycle. As soon as
r1 is a 1, the next shift will result in r(X) = r0 + r1X +X2. In this case g(X)
gets subtracted from r(X) such that r(X) is of order 1 at most. This circuit
essentially performs long division, as illustrated by the binary long division
example from Fig. 3.6 in Section 3.2.1.
r0 r1
Input
c(x) = [c0, c1, · · · , cn−1]
g0 g1 g2 g(x) Coefficients
r(x) Register
Figure 4.6: Serial implementation of a polynomial division LFSR.
4.3.2 Encoder
This module’s state machine behave exactly as depicted in Fig. 4.5. Upon
receiving a value on dat_in, it gets processed before being output on dat_out.
Since BCH uses polynomial division for encoding, it uses a LFSR to implement
this division operation. After processing a complete TM frame, this 27-bit
LFSR contains the remainder after division. Exactly how this value is handled
is explained later. By applying the parallelisation technique mentioned in
Section 4.2.3, this module also processes a 32-bit word in one clock cycle.
Fig. 4.7 shows a codeword as output by the encoder module. Since the
module uses a 32-bit data bus, the 511-bit codeword is modified to be 32-
bit aligned. Before doing so, it is noted that a polynomial view of codeword
c(X) has its highest order term on the left in Eq. 2.5.10. By adding a 0
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 72
480-bit
TM Frame
512-bit
0
1-bit 4-bit 27-bit1-bit
0000 27-bit FEC
Redundancy
TM0
32-bit
TM14
32-bit
32-bit
CDW0
32-bit
CDW15
32-bit
Figure 4.7: Codeword as constructed by BCH encoder.
left of this term aligns the codeword while leaving c(X) unchanged. The first
TM frame segment to be received and processed, is TM0 in Fig. 4.7. After
processing TM0, the first codeword segment, CDW0, is output. Processing
continues until the final segment, CDW15, is constructed. Here, the all zero
4-bit segment in CDW15 is unused space, which has been explained in Section
3.6.1. The 27-bit FEC redundancy is obtained from the polynomial division
LFSR, as mentioned earlier.
4.3.3 Decoder
4.3.3.1 State Diagram
Fig. 4.8 shows the decoder’s state diagram. On asynchronous reset, it enters
the Idle state. Received 32-bit data is stored in a FIFO, while at the same
time being passed to the syndrome calculation module. Here, the received
codeword c∗(X) is simultaneously divided by each primitive polynomial φi(X),
as mentioned in Section 3.3.2. The decoder returns to Idle if it expects more
data for the current c∗(X).
After receiving all 512 bits of c∗(X), the calculated 2t = 6 syndromes are
moved to the Berlekamp-Massey module which finds σ(X). Coefficients of
σ(X) are then moved to the Chien search module. Before starting, counter =
32, since 32-bit segments of c∗(X) are processed at a time. Polynomial eval-
uation σ(αi) is performed as depicted in Fig. 3.9 from Section 3.3.2. After
evaluating σ(αi) for an αi, counter gets decremented, until 32 such evaluations
have been done. This leads to 32-bit error vector e, which is XORed with a
32-bit segment from the c∗(X) FIFO. The corrected segment is now output
on dat_out for the next module. Chien search continues until all data in the
FIFO has been processed and output. Finally, a Reset state clears all registers
and restores counters before entering the Idle state again.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 73
Figure 4.8: State machine diagram of a BCH decoder.
4.3.3.2 Hardware Layout
A hardware layout is presented in Fig. 4.9. The FSM component implements
the state machine in Fig. 4.8. Bold blue lines represent a collection of wires
that allows the FSM to control a component and read its status bits. These
status bits indicate whether a component is done with its current operation,
therefore allowing the FSM to make state change decisions if necessary. Control
lines are used for example to read from the FIFO during Chien search.
Syndrome coefficients are transferred sequentially to the BM module via
the indicated interface. A value on data bus D9 is transferred when data
strobe DS1 is de-asserted. The BM module acknowledges data reception by
de-asserting ACK1. Coefficients are 9 bit GF values as explained in Section
3.3, hence D9 is also 9 bits wide. Similarly data is transferred between the
BM and Chien modules. After error vector e is ready, the FSM issues a read
command to the FIFO. Value FIFO_Data gets XORed with e before output
on dat_out.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 74
Control
Logic
Control
Logic
Control
Logic
Control
Logic
FSM
Logic
Syndrome
Calculator
Berlekamp
Massey
Chien
Search
FIFO
512-bit
32 XOR Logic Gates
32-bit 1-bit 1-bit
32-bit
1-bit
1-bit
dat in ds in ack out
dat out
ack in
ds out
32-bit
32-bit
D9 DS1 ACK1
ACK1DS1D9
FIFO Data
e
Figure 4.9: A BCH decoder’s hardware layout.
4.4 LDPC Implementation
4.4.1 Matrix Multiplication Circuit
A circuit for multiplying a column vector with a cyclic matrix, is provided in
this section. Suppose a (1× 4) row vector x has to be multiplied with (4× 4)
cyclic matrix T. Example values are provided below :
z = x×T (4.4.1)
=
[
x0 x1 x2 x3
] 
1 1 0 0
0 1 1 0
0 0 1 1
1 0 0 1

Assuming modulo-2 arithmetic, the multiplier architecture discussed in [4] is
applied to Eq. 4.4.1 in Fig. 4.10. Register zreg = [z0, z1, z2, z3] contains the
answer after multiplication.
Initially zreg = 0, xreg = [x0, x1, x2, x3] and Treg = [1, 1, 0, 0], which is the
first row of T. At first, the AND gates multiply x0 with Treg. This answer
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 75
AND AND AND AND
XORXORXORXOR
Matrix T Register : Treg
Matrix x Register : xreg
Matrix z Register : zreg
x0
x1
x2
x3
1 1 0 0
z0 z1 z2 z3
Figure 4.10: Cyclic matrix multiplier architecture from [4].
gets added by the XOR gates to zreg. Now Treg is rotated one bit to the
right such that it becomes the second row of T. Register xreg is also rotated
one bit position upwards, such that xreg ← [x1, x2, x3, x0]. Value x1 is then
multiplied with Treg before adding the result to zreg. Continuing this multiply-
add procedure for each element in xreg, eventually leads to zreg = (x×T).
The multiplier in Fig. 4.10 allows any (4×4) matrix T to be multiplied by
any (1× 4) vector x. By removing Treg, the logic usage is reduced. However,
this optimisation causes the multiplier to implement a specific T and cannot
be used for any T, as before. An optimised circuit is shown in Fig. 4.11. Note
that all AND gates and three XOR gates have been removed.
XOR
Matrix x Register : xreg
Matrix z Register : zreg
x0 x1 x2 x3
z0 z1 z2 z3
Figure 4.11: A reduced complexity cyclic matrix multiplier architecture.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 76
The XOR gate’s two inputs represent x being multiplied by the left most
(first) column of T. This result is stored in z0 after which both xreg and zreg
are rotated such that xreg ← [x1, x2, x3, x0] and zreg ← [z1, z2, z3, z0]. Now x
is multiplied with the second column in T where z1 stores the result. This
rotate-add operation is done for each element in zreg.
XOR
Matrix x Register : xreg
Matrix z Register : zreg
x0 x1 x2 x3
z0 z1 z2 z3
XOR
Figure 4.12: A parallelised implemenation of Fig. 4.11.
Fig. 4.12 shows a parallelised implementation of the design in Fig. 4.11. This
implementation performs multiplication twice as fast. Instead of multiplying
x with one column in T, it is now multiplied with two subsequent columns in
T. Both zreg and xreg are rotated by two bit positions instead of one. Paral-
lelisation can be applied until zreg = (x×T) is computed in one clock cycle.
4.4.2 Encoder
4.4.2.1 Frame Structure
A 480-bit TM frame would require a (n, k) = (480, 960) half rate code LDPC.
However, all channel coding modules have been designed around the 512 bit
block length of BCH. In order to fit this LDPC module into the current hard-
ware configuration, with BCH as FEC, a TM frame is split into two 240-bit
sections. These two sections are individually encoded using a (256,512) LDPC
code. Splitting a 480-bit TM frame is illustrated in Fig. 4.13. A 16-bit frame
ID precedes a 240-bit TM section in each codeword. The decoder expects code-
word 0 to arrive first, followed by codeword 1. Should the decoder not receive
two successive codewords in this order, it discards all received TM sections.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 77
FEC
Redundancy
FEC
RedundancyTM SectionTM Section
Frame
ID = 1
Frame
ID = 0
TM Frame
256-bit240-bit16-bit 16-bit 240-bit 256-bit
512-bit 512-bit
240-bit240-bit
Codeword 0 Codeword 1
Figure 4.13: A modified TM frame structure for half code rate LDPC.
4.4.2.2 State Diagram
On reset, the Idle state is entered with variable exp_dat = 512, as shown in
Fig. 4.14. Data received in 32-bit sections, from the CRC module, are directly
stored in a 256-bit register called xreg. Its first 16-bits represent the frame ID
depicted in Fig. 4.13. After receiving the first 256 bits from CRC, variable
exp_dat = 224. Only 16 bits of the last received 32-bit section are stored in
xreg. The other 16-bit section is buffered in a register, tmp_16. Register xreg
is now used to create codeword register creg = [xreg, preg], where register preg
represents the parity section.
Vector preg = [p1, p2], where p1 is calculated from Eq. 3.4.11 by using the
following multiplication sequence [30] :
1. a0 = (A× xTreg) and a1 = (C × xTreg)
2. a2 = (T−1 × a1) = a1
3. a3 = (E × a2)
4. a4 = (φ−1 × a3) = a3
Finally, pT1 = (a1 XOR a4). Note that T−1 and φ−1 are both identity matrices,
hence multiplication in steps 2 and 4 are omitted. Next, vector p2 is calculated
by using Eq. 3.4.12 :
1. b0 = (A× xTreg) and b1 = (B × pT1 )
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 78
Figure 4.14: State machine diagram of a LDPC encoder.
2. b2 = (T−1 × (b0 XOR b1)) = (b0 XOR b1)
This leads to p2 = (b0 XOR b1). Similar to p1, multiplication with T−1 is
omitted when evaluating p2. The StoreRegister state now stores xreg and
preg in creg, as explained earlier. In vector creg = [cn−1, cn−2, · · · , c0], bit cn−1
represents the most significant bit (MSB). The most significant 32-bits of creg
are output during the Output state. Upon de-asserting ack_in, register creg
is shifted 32-bits to the left such that creg ← [cn−33, cn−34, · · · , c0,0] where
the bold 0 indicates 32 zeros. Variable dat_rem tracks the remaining bits
in creg. After codeword 0 has been sent, variable frame_id = 1 is selected
for codeword 1. Similarly, when codeword 1 has been sent, frame_id = 0 is
selected. The Reset state clears all multiplier registers and variables except
for frame_id and tmp_16, which are cleared on asynchronous reset. Note
that exp_dat is only restored after codeword 1 has been sent. Register xreg
is now updated to contain frame_id and tmp_16. The remaining exp_dat
bits from CRC are gathered and stored in xreg. Codeword 1 is then created
by following the same procedure as with codeword 0.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 79
4.4.2.3 Hardware Layout
The most important components of an encoder is shown in Fig. 4.15. A FSM
module implements the states in Fig. 4.14 and controls data flow between
all components. Similar to BCH’s decoder in Fig. 4.9, the bold blue lines
represent data and control lines between a component and the FSM.
FSM
Logic
xreg
256-bit
tmp 16
16-bit
Matrix
Multiplier
creg 512-bit
256-bit 256-bit
256-bit
256-bit
256-bit
32-bit
ds in
ack out
dat in
ds out
ack in
1-bit
1-bit
32-bit
1-bit
1-bit
dat out
32-bit
Figure 4.15: LDPC encoder hardware layout.
Data received on dat_in are routed by the FSM to the correct position in
xreg during the StoreRegister state. Register xreg is connected to the matrix
multiplier component via a 256-bit bus. The matrix multiplier component con-
tains 4 individual matrix multipliers. In the previous section, only values a0,
a1, a3 and b1 need to be computed since a0 = b0. Each multiplier implements
the logic efficient architecture discussed in Section 4.4.1. The parallelisation
technique from that section is also applied. A symbol, Fm, representing the
degree of parallelisation, is now introduced. It indicates how much faster the
circuit performs with parallelisation than without it. A Fm = 32 is chosen such
that the multipliers perform at least an order of magnitude faster. Should this
encoder, along with the other channel coding components, not fit onto the
target FPGA, Fm could always be lowered to reduce logic usage. After multi-
plication, both xreg and preg are moved to creg via 256-bit data busses. Register
creg is then finally output, 32-bits at a time as shown.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 80
4.4.3 Decoder
During development of the two LDPC decoders, both satellite and ground sta-
tion platforms had to be prepared for demonstration in Belgium. Development
on the Xilinx Virtex 5 and Spartan 3e FPGAs were suspended at that time.
Therefore LDPC development continued on the Altera FPGA referred to in
Section 3.7.2. The Altera FPGA has much more RAM and logic resources
than the Xilinx Virtex 5. A design fitting on the Altera FPGA may therefore
not necessarily fit on the Xilinx FPGA. However, by reducing the degree of
parallelism in the design, a decoder’s logic resources can be lowered to fit on
the Xilinx FPGA.
4.4.3.1 State Diagram
Figure 4.16: State machine diagram of a LDPC decoder.
Upon reset the Idle state is entered in Fig. 4.16. Probability values received
from the modem have been quantised. Work done in [1] showed that 3-bit
quantisation only performs 0.4 dB worse compared to 32-bit floating point
values. In order to fit an even amount of probabilities inside a 32-bit data
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 81
bus value, Nq = 4-bit quantisation has been chosen. The total number of
data bits expected by the decoder is exp_dat = (Nq × n) = (4× 512) = 2048.
After receiving a 32-bit word, the LLR state runs each of the Pr(c∗i = 0) values
through a lookup table (LUT), which computes the LLR. These LLRs are then
stored inside a register called creg, before returning to the Idle state. Values
stored in creg represent (c∗i )LLR from Section 3.4.2. This process continues until
exp_dat = 0.
A RAM-bank that buffers messages Zij and Lji between horizontal and
vertical updating, is initialised with values from creg. Messages are arranged
in RAM as depicted by the H matrix. Accessing a row in H is synonymous
to accessing a row in RAM. Similarly, a column access in H equals a column
access in RAM. The rest of this section will refer to row and column access in
H, for ease of explanation. The blue dashed box in Fig. 4.16 encapsulates the
states associated with horizontal updating. The red box encapsulates all the
vertical update states.
After incrementing the iteration count, horizontal updating is performed,
where H is accessed row-wise. At first, ωr = 6 elements is read from row j
in H, where 0 ≤ j < 256. A check node update (CNU) is then performed as
depicted by Eq. 3.4.7. The result from this CNU is written back to row j.
This process continues until all 256 rows have been processed. Vertical step
updating is now performed, where H is now accessed column-wise. The first
state reads ωc = 3 elements from column i in H, where 0 ≤ i < 512. Vertical
node updating (VNU) is then performed by using the MS version of Eq. 3.4.4
:
Zij = (c
∗
i )LLR +
∑
i′∈Nj/i
Lji′ (4.4.2)
During a VNU for column i in H, the value of (cnew_i)LLR is also computed
with the MS version of Eq. 3.4.6 :
(cnew_i)LLR = (c
∗
i )LLR +
∑
i′∈Nj/i
Lji′ (4.4.3)
Message Zij are now written back to column i in H. Value (cnew_i)LLR is also
converted to a bit-value, cnew_i, before stored at position i in register cnew.
According to Eq. 3.4.2, a positive LLR has Pr(c∗i = 0) > Pr(c∗ = 1), which
results in c∗i = 0. Similarly a negative LLRs has c∗i = 1. Using this criteria,
value cnew_i is determined from (cnew_i)LLR.
After vertical updating, the syndrome is evaluated as s = cnewHT . A
matrix multiplier from Section 4.4.1, having parallelisation Fp = 32, is used.
In case s 6= 0, a new iteration is started. A maximum of 20 iterations is
performed before declaring a decoding failure. A soft decision decoder requires
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 82
at most 20 iterations to be efficient [1]. Stopping at 100 iterations would only
provide about 0.05 dB improvement.
Should s = 0, the TM section from cnew is extracted and passed to xreg.
This TM section, along with the current value in xreg, is discarded if the frame
ID is not expected by the decoder. After resetting exp_dat in the Reset state,
the decoder waits for the next TM section. Variables TM_full = 1 and
TM_empty = 0 are set when xreg contains a complete TM frame. Register
xreg is then output on dat_out, in a similar way as with creg on the LDPC
encoder.
4.4.3.2 Hardware Layout
Layout of the decoder is presented in Fig. 4.17. Blue and black lines represent
FSM control lines and data busses respectively.
RAM
Lji/Zij
FSM
Logic
CNUs VNUs
LLR
to Bit
Syndrome
Calculator
Register cnew
Module xreg
RAM (c∗)LLR LLR
LUT
32-bit
32-bit
1-bit1-bit
1-bit
1-bit
dat out
dat in
ds outack in
ds in
ack out
40
512
512
64
5×
3×64
5×
6×64
5×
64
5× 64
Figure 4.17: General hardware layout of a LDPC decoder.
The LLR LUT component consists of 8 individual LUTs, each having a Nq = 4-
bit input. Since Nq = 4, only 24 = 16 output values need to be stored inside
a LUT. Each LUT outputs a NLLR = 5-bit value, consisting of both a 4-bit
magnitude and 1-bit sign. An example is given in Fig. 4.18. Setting the sign
bit to 1 indicates a negative value. These LLRs are not in 2’s complement
format.
Multiple columns and rows of H can be processed simultaneously by adding
more VNUs and CNUs. This requires the Zij/Lji RAM-component to simul-
taneously access many rows or columns. Similar to the encoder, a parallelism
factor Fm = 64 is chosen to be an order magnitude higher than a non-parallel
implementation. This translates to Fm rows or columns in H being accessed
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 83
1 0 1 1 0
MSB
Sign
1-bit
Magnitude
4-bit
Figure 4.18: A 5-bit LLR value.
simultaneously. Since each row in H contains ωr = 6 LLR values, the CNUs
and Zij/Lji RAM are joined by a (64× 6× 5)-bit bus in Fig. 4.18. Similarly
the VNUs and Zij/Lji RAM are joined by a (64× 3× 5)-bit bus.
During VNU processing, Fm new values for cnew are calculated. Value
(cnew_i)LLR is converted to bit-value cnew_i by the LLR−to−Bit combination
logic circuit. This circuit outputs cnew_i = 0 when the LLR’s sign bit is 0,
otherwise it outputs cnew_i = 1. A 512-bit bus transfers cnew to both the
syndrome calculator and xreg module. After passing the syndrome check, the
xreg module is notified to process cnew. Here, the frame ID is parsed after which
the module stores the TM frame section. A complete TM frame is output from
xreg via a 32-bit bus.
A detailed description of the CNU, VNU and RAM Zij/Lji modules will
now be given.
4.4.3.3 CNU Module
First and
Second
Minimum
Comparator
α
α
Sign()
Combiner
Comparator
M
U
X
M
U
X
1-bit
1-bit
6 × (4-bit)
6 × (1-bit)
rst
clk
|Zji|
sign(Zij)
Lji
6 × (5-bit)
start done en scaling
1-bit 1-bit 1-bit
6
6× 4
6× 4
4
4
4
4
F
S
Figure 4.19: A CNU’s hardware layout.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 84
A CNU module is shown in Fig. 4.19. Asynchronous reset, rst, and clock, clk,
are both connected to the decoder’s matching inputs. This module expects
ωr = 6 Zij messages from the RAM Zij/Lji component. Both magnitude and
signs of Zij are input on the |Zij| and sign(Zij) lines respectively. De-asserting
start tells the module to begin processing its input data. By de-asserting done,
output Lji is assumed to be valid.
Firstly, the comparator finds the first and second minimum magnitudes
among the 6 |Zij| values. These first and second minimum values are output
on lines F and S respectively. A clock cycle later, line done is de-asserted . This
gives the other combination logic circuits a clock cycle’s time to processes F
and S. Both F and S are multiplied by scaling term α from Section 3.4.2. The
early termination scheme, also mentioned in that section, asserts en_scaling
to bypass the multiplier after a certain number of decoding iterations. Both
multiplexer (MUX) outputs go to a final comparator where each original input
|Zij| is replaced by the first minimum value. However, should |Zij| equal the
first minimum, it is replaced by the second minimum. Using Eq. 3.4.7, a
sign for each input Zij is computed in the Sign() module. An output sign is
computed by XORing the appropriate input signs. Finally, the corresponding
signs and LLR magnitudes are joined before output on Lji.
4.4.3.4 VNU Module
1-bit
1-bit
5-bit
rst
clk
3 × (5-bit)
start done
1-bit
3 ×
3-Input
Summation
1 ×
4-Input
Summation
To 2’s
Complement
To 2’s
Complement
From 2’s
Complement
From 2’s
Complement
Zij
(cnew i)LLR
5-bit
(c∗i )LLR Lji
3 × (5-bit)
1-bit
5 3 × 5
Figure 4.20: A VNU’s hardware layout.
A VNU module is shown in Fig. 4.20. During vertical updating, it receives
ωc = 3 Lji messages from column i in RAM Zij/Lji. This VNU also receives
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 85
(c∗i )LLR from creg. Before starting, combination logic converts all VNU module
data inputs to 2’s complement format.
Upon de-asserting start, both adder modules sum their inputs. The 3-input
adder module consists of 3 individual 2’s complement adders. Each of these
adders calculates Zij according to Eq. 4.4.2. The 4-input adder calculates
(c∗new_i)LLR by using Eq. 4.4.3. Since 4 terms are summed by this adder, it
takes one clock cycle longer than the 3-input adders to finish. Therefore, only
it de-asserts done when finished. Combination logic converts the outputs of all
adders back to sign and magnitude format, before finally leaving the module.
4.4.3.5 RAM Zij/Lji Module
Looking at Eq. 3.4.14, it is clear that each ’1’, of a row’s ωr ones, is in a separate
(64 × 64) matrix. The same can be seen for each column’s ωc elements. Fig.
4.21 shows how this H is mapped to an array of (1× 256) 9-kbit RAM blocks,
on the FPGA.
256 blocks
1 block256 9-kbit RAM
Blocks
= H
0
3
0 1 7
...
· · ·
Row
Sector
Physical
Address
Column
Sector
Logical
Address
R0 R1 R2 R3
Figure 4.21: RAM block configuration for H.
This (1 × 256) RAM array is divided into four (1 × 64) segments, R0 to R3.
Each (64×64) permutation matrix in H can be stored in such a (1×64) RAM
segment. For example, a (4 × 4) permutation matrix is reduced to a (1 × 4)
array in Fig. 4.22. This example has replaced the ones with symbols, xi, for
illustration purposes.
A 9-kbit RAM block’s address space is partitioned in Fig. 4.23. Bits a1 to
a0 controls which row sector is accessed in the blue dashed box of Fig. 4.21.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 86

0 0 x0 0
0 0 0 x1
x2 0 0 0
0 x3 0 0

[
x2 x3 x0 x3
]
Figure 4.22: A square permutation matrix stored as a row vector.
Bit a2 allows the RAM blocks to access data in the green dashed box from
Fig. 4.21.
0 0 0 0 0 0 0 a2 a1 a0
Unused
7 bits
Column
Sector
Physical
Address
1 bit
Row
Sector
Physical
Address
2 bits
Figure 4.23: Address partitioning of a 9-kbit RAM block.
Note that column sector addresses in Fig. 4.21 are 3-bit logical addresses.
These are given to the RAM array’s column controller, which uses it to set
bit a2 from Fig. 4.23. Logical address, [b2, b1, b0], is translated to a physical
address in Fig. 4.24. Bits b1 and b0 are used to select a particular (1 × 64)
RAM segment among R0 to R3 in Fig. 4.21.
b2 b1 b0
MSB
Value of a2 RAM
Segment
R0 to R3
Figure 4.24: Logical to physical address translation.
A single row and column controller is used in this design. During horizontal
updating, the row controller receives a row sector address. By knowing the
positions of the all zero matrices in Eq. 3.4.14, the controller only reads the
correct RAM segments among R0 to R3. This controller also knows how each
permutation matrix’s ones are ordered. Using this information, it groups each
of the 64 row’s ωr messages before outputting it to the CNUs. Applying the
reverse of this procedure, data from the CNUs are written back to memory. A
column controller also knows where all the zero matrices are located. It receives
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 87
a particular logical column address, after which ωc elements are sequentially
accessed for each column in a (1× 64) RAM section.
4.5 IPC Implementation
Function Description
∗ipc_info =
name_attach(∗server_name)
Creates an IPC server having name
server_name. Returns ipc_info
that points to the created IPC
resources.
∗ipc_info =
name_open(∗server_name)
Connects a client to an existing
server of name server_name.
Returns ipc_info which points to the
server’s IPC resources.
success = name_close(∗ipc_info) Disconnects from a server when called
by the client. Releases IPC resources
when called by a server. Returns the
operation’s success.
success = msg_send(∗ipc_info,
∗msg, msg_len, ∗rep, rep_len)
Client sends the server a message,
msg, of length msg_len bytes. Client
blocks until reply, rep, of length
rep_len bytes has been received.
success = msg_receive(∗ipc_info,
∗msg, msg_len, ∗msg_inf)
Server waits for message msg. Should
msg = NULL, the received message
length is given by msg_inf . Server
then allocates memory for message
before calling msg_read().
success = msg_reply(∗ipc_info,
∗msg, msg_len)
Server sends the client a reply
message, msg, of length msg_len
bytes. Client now unblocked.
success = msg_read(∗ipc_info,
∗msg, msg_len, offset)
Allows a server to retrieve a client’s
message while client is blocked.
Retrieved message stored in msg.
Variable offset allows a certain
segment of client’s message to be
read.
Table 4.2: Message passing IPC functions for Linux Ubuntu 7.10.
Table 4.2 provides a list of functions present in the Linux message passing
scheme. These function names are the same as used in QNX, because TM
and ARQ in this chapter will not distinguish between QNX and Linux IPC
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 88
function calls. Note that the input and return types of each function have
been omitted. An interested reader is referred to the written C code.
During msg_receive(), a client’s message is normally copied from the
shared memory segment into buffer msg. However, a server doesn’t need
to allocated memory for msg, before calling msg_receive(). This is saves
memory on the SH4 and FIT-PC when communication is not active. After
receiving the length of a client’s message via pointer msg_info, the server
calls msg = malloc() to allocate memory. Function msg_read() is then called
to read the message from the IPC shared memory segment.
4.6 TM Implementation
A configuration file is parsed by TM during startup in Fig. 4.25. This file
contains IPC server and client names which connect TM to ARQ in Fig. 3.15.
Both RX and TX threads of TM communicate with the channel coding layer
via the two specified device names. Devices in QNX and Linux are listed as file
entries under directory /dev/. On the SH4 both device names are /dev/exp,
which is the QNX resource manager written by [35]. Linux’s devices are
two USB-To-RS232modules named /dev/ttyUSB0 and /dev/ttyUSB1. The
downlink radio is connected to ttyUSB0 while the FPGA runs on ttyUSB1.
Functions from section A in Fig. 4.25, are contained in separate source files
for both OS platforms. File NEUTRINO_IPC.c is compiled for QNX while
LINUX_IPC.c is compiled for Ubuntu. Next, a set of callback functions
are created. A callback function is a pointer to another function. These
pointers reside in TM’s source files and point to functions existing in the
aforementioned IPC source files. The idea is that TM calls the callback which
in turn calls the appropriate IPC routine in the IPC source files. These IPC
routines consist of functions msg_send(), msg_receive() and msg_reply()
from Table 4.2. This configuration allows TM’s frame processing routines to
be separated from platform dependent IPC. It also allows for easy switching
between different OS platforms. Finally, both TX and RX threads of TM are
created by using POSIX’s pthread_create() function.
4.6.1 Receive Thread
A TM frame is retrieved from channel coding by using callback function
tm_from_lower_layer() in Fig. 4.26. Firstly, the frame’s master frame
count is compared to TM RX’s expected frame count. Should these two not be
equal, any partial ARQ packet contained in buffer ARQPacket is discarded. A
frame having FHP=0x7FE, is used for synchronisation purposes as explained
in Section 3.6.1. Frames having any other FHP > 0 are automatically dropped
until a FHP = 0 is found, which contains the start of an ARQ packet. The
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 89
Figure 4.25: Startup of a TM module.
expected frame count is then set to that of the current frame before processing
starts.
Firstly, it extracts an ARQ packet’s length, as showed in Fig. 3.16, and
allocates memory for buffer ARQPacket. All remaining data from this frame is
then extracted. Should more data be expected, function tm_from_lower_layer()
is called. Finally, after receiving all ARQ packet data, tm_to_upper_layer()
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 90
Figure 4.26: The receive thread of a TM module.
is called, which sends the assembled packet to ARQ.
4.6.2 Transmit Thread
By calling function tm_from_upper_layer(), the TX thread waits for an
ARQ packet. Upon receiving ARQPacket and its length, ARQPacketLength,
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 91
memory is allocated for buffer, TMFrame, which will store a TM frame. A
frame header is then added, followed by the packet’s length showed in Fig. 3.16.
Next, ARQ data is copied to the frame’s data section until it is full. The FHP
is then set as explained in Section 3.6.1. By calling tm_to_lower_layer(), a
frame is sent to the channel coding layer. The MasterFrameCount variable
is then incremented, before updating the frame counter in TMFrame. Should
more ARQ data remain, the data section of TMFrame is filled again. This
process repeats until all packet data has been sent.
Figure 4.27: The transmit thread of a TM module.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 92
4.7 ARQ Implementation
Figure 4.28: Startup of an ARQ module.
Similar to TM, a configuration file is parsed on startup in Fig. 4.28. The
station ID from Section 3.6.2.1 is stored as My_ID. Three IPC servers are
present in Fig. 3.15 and are listed as follow :
• ARQ TX Server : Receives data from both application layer and
ARQ’s RX thread.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 93
• ARQ RX Server : Receives ARQ packets from TM.
• ARQ Reporter : Resides on the RX side of ARQ. Receives a request
from the application layer for new files, received by ARQ. The reporter
replies as soon as new files have been received.
Next, IPC resources are allocated. Note that separate IPC source files for both
QNX and Linux exist, as in the case of TM. Callback functions are then also
created to separate IPC functionality from ARQ packet processing routines.
Finally, TX, RX and reporter threads are created by using pthread_create().
4.7.1 Packet RTT
An ARQ packet’s RTT is measured in this section. Using the setup depicted in
Fig. 3.20 from Section 3.6.2.2, both ground station and satellite platforms are
placed 1 metre apart. Signal propagation delay is negligible at this distance.
The packet retransmission feature of ARQ has been disabled for this test. The
RTT of every tenth ARQ packet that has been sent was measured. Fig. 4.29
shows a section of this measurement.
Figure 4.29: Round trip time measurements.
Averaging the graph in Fig. 4.29 leads to RTTavg = 60.9ms. Values deviat-
ing from RTTavg is suspected to be caused by high data traffic on the SH4.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 94
Apart from ARQ and TM requesting CPU time, the SAA program on the SH4
performs extensive file I/O and communication over the CAN bus. As drivers
have higher scheduling priority over user level processes, TM and ARQ might
get unnecessarily delayed. At distance d = 6 km in Fig. 3.3 from Section 3.1,
the signal propagation delay is calculated as :
tdel =
d
c
= 20 µs (4.7.1)
where c = 3 × 108 m/s, is the speed of light [5]. Since tdel is an order of
magnitude smaller than 1 ms, it is omitted from the final RTT . By using the
results from Fig. 4.29, ARQ’s time out value is chosen to be 100 ms.
4.7.2 Receive Thread
This thread calls arq_rx_from_lower_layer() to wait for an ARQ packet
from TM in Fig. 4.30. The first step in packet parsing compares its ID
with My_ID. A mismatch causes the packet to be discarded, since it is
not intended for this station. Should this packet be an acknowledge type, its
sequence number gets checked first. After passing this check, global variable
ack_received is set before using POSIX’s pthread_signal() to notify the TX
thread of its reception. Other packets intended for file transfer proceeds to the
next processing step.
Firstly, its sequence number is compared to the expected sequence number.
A mismatch occurs when an acknowledge got lost or when the transmitter
rebooted during a file transfer. Should the packet be of type First or Single,
a new transfer starts and hence the expected sequence number is updated to
that of the packet’s. In any other case, packets get acknowledged without
processing its data contents.
After sequence number verification, the expected sequence number is in-
cremented before acknowledging packet reception. The acknowledge proce-
dure is illustrated in Fig. 4.31. An RX thread of ARQ cannot transmit
a packet, therefore an acknowledge request type is sent to TX by calling
arq_tx_to_upper_layer(). This request type contains all the information
required by TX to construct an acknowledge ARQ packet.
Upon receiving a type First or Single packet, a temporary file is created,
or cleared if already opened by a previous transfer attempt. The packet’s data
section is then written to file. This file is closed if the current packet has been
of type Last or Single. By using pthread_signal(), the reporter thread is
notified of this new file. Should file transfer not yet be completed, function
arq_rx_from_lower_layer() is called to receive the next packet.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 95
Figure 4.30: The receive thread of an ARQ module.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 96
Figure 4.31: The acknowledge procedure from Fig. 4.30.
4.7.3 Transmit Thread
A request type is received when calling arq_tx_from_upper_layer() in Fig.
4.32. This request is received from either ARQ’s RX thread or the application
layer. Note that IPC doesn’t immediately reply when receiving a request
via arq_tx_from_upper_layer(). By calling this function again after data
transmission, it can reply transfer success or failure to the particular client.
A file request contains a file name, file directory and the destination sta-
tion’s ID. Depending on how many bytes have been read from this file, mem-
ory is allocated for packet buffer ARQPacket. Its header is constructed dur-
ing the next phase. Here, the sequence number is set according to variable
sequence_number. Since this variable gets incremented before packet trans-
mission, ARQ RX must compare the acknowledge packet’s sequence number
to (sequence_number − 1) in Fig. 4.30.
Before transmitting, variable TransmitRetryCount is cleared. Calling
arq_tx_to_lower_layer() sends the packet to TM. The TX thread then
waits 100 ms before retrying transmission. Should an acknowledge be re-
ceived before this time runs out, TX is notified via a pthread_signal() call
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 97
Figure 4.32: The transmit thread of an ARQ module.
from RX. A maximum of 5 packet re-transmissions are allowed before abort-
ing file transmission. After a file has been successfully transferred, function
arq_tx_from_upper_layer() is called to notify the application layer of suc-
cessful transfer.
4.7.4 Stop-And-Wait ARQ Data Throughput
Utilising the test aircraft’s flight duration and an ARQ packet’s RTT, the
amount of file data transferable during a satellite pass can be determined. A
worst case scenario where 5 packets have to be retransmitted is assumed. The
time to transmit a single packet is therefore :
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 4. IMPLEMENTATION 98
tpck = (4× tARQ_T ime_Out) +RTTavg = (4× 0.1) + 0.0609 = 460ms (4.7.2)
Using Eq. 4.7.2 and expecting the aircraft pass to last 15 minutes, the total
number of transmitted ARQ packets are :
NPck =
tflight
tpck
=
15× 60
0.46
= 1956 packets (4.7.3)
Since 28 bytes of a packet are file data, the total number bytes transferable
during a pass are :
NBytes = NARQ_Pck × 28 = 54.78 kB (4.7.4)
Section 3.6.2 mentioned the expected file sizes to be 10 kB. Having NBytes >
10 kB, the data throughput of stop-and-wait ARQ is more than adequate for
this project.
4.8 Summary
It has been shown how channel coding’s modules will interact with existing
FPGA firmware’s interfaces for both satellite and ground station platforms.
These modules implement an interface that allows new modules to be easily
added or existing ones to be easily removed. This is particularly useful when
switching between BCH and LDPC implementations. Implementation of all
channel coding modules have also been described by state machine - and hard-
ware layout diagrams. Parallelisation techniques to reduce the data processing
latency on each module, has also been provided.
A message passing API for IPC has also been presented. Various functions
have been implemented in C to hide unnecessary IPC details from the user.
Finally, detailed packet and frame processing routines for both ARQ and TM
have been provided by flow diagrams. Data throughput for the stop-and-wait
ARQ protocol have been shown to be sufficient for this project.
The next chapter evaluates these implementation results for both satellite
and ground station platforms. Results from the project’s demonstration at
KUL are also highlighted.
Stellenbosch University   http://scholar.sun.ac.za
Chapter 5
Testing, Results and Discussion
In this chapter, the results following the implementation of the different system
components, as discussed in Chapter 4, are presented. Unit test procedures
carried out for each developed component, are also set out. The channel
coding subsection reveals the FPGA logic usage and data throughput in bps
for each of its modules. Forward error correction BLER and BER plots are then
presented for both BCH and LDPC. Finally, TM and ARQ’s test procedures
and performance results are provided.
5.1 Channel Coding
5.1.1 Testing
Unit testing for each VHDL module has been performed by writing a VHDL
test bench. This test bench simulates a module’s operation according to the
given inputs on its interface as per Fig. 4.3. The module’s output is then
compared to a known and correct result. For example, the CRC polynomial
division has been performed in Matlab by using r = deconv(m, g), where
vectors r,m and g contain binary coefficients from the example in Section
3.2.1. Inputting m to the VHDL module, the same output r is expected as in
Matlab.
Integration testing comprised the loop back test as per Fig. 5.1. This test
checks whether all modules communicate correctly with each other. It also
tests whether each module resets its registers and variables correctly after pro-
cessing a complete TM frame. The Channel Coding Encode module contains
all channel coding modules from the ground station of Fig. 3.5. Similarly, the
satellite platform’s modules are encapsulated by the Channel Coding Decode
module. A test application on the SH4 generates and sends TM frames from A,
denoted TMA. These are encoded and decoded on the FPGA before receiving
them at B, denoted TMB. After the application verified TMA = TMB, a new
frame is generated and sent. This test was repeated for an hour and regarded
99
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 100
as successful if no failure occurred. This was indeed the case.
SH4 ML501
QNX
Resource
Manager
C Test
Application :
TM Frames
Expansion
Port
Module
Channel
Coding
Encode
Channel
Coding
Decode
A
B
Figure 5.1: Hardware loopback test in FPGA for channel coding modules.
5.1.2 Results
Table 5.1 shows the implementation results of both encoder and decoder com-
ponents from Fig. 5.1. Logic usage on both FPGAs are expressed as a per-
centage of the available resources.
Component FPGA Logic
Elements Used
Processing
Delay at 100
MHz Clock
Channel
Coding
Encode
Xilinx
Spartan3E
XC3S500E
390 of 4656 Flip
Flops; 818 of
9312 LUTs
1.275 µs
Channel
Coding
Decode
Xilinx
Virtex5
XC5VLX50
2470 of 28800
Flip Flops; 3664
of 28800 LUTs
4.43 µs
Table 5.1: Channel coding FPGA module implementation details.
The processing delay refers to the time taken to successfully encode or decode
a TM frame. Fig. 5.2a shows a timing diagram for the channel coding encode
component. This diagram has been obtained from a VHDL test bench simu-
lation in Xilinx’s ISim simulator. Blue markers indicate the start and finish
times for encoding. The first 32-bit section of a TM frame enters the module
at 20 ns. The last 32-bit section leaves at 1.295 µs. This leads to an encoding
delay of 1.275 µs. Similarly, the decoding delay has been determined from Fig.
5.2. Note that three random bits have been corrupted in this codeword.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 101
cl
k
rs
t
en
c_
da
t_
in
[3
1:
0]
…
11
11
10
10
11
00
00
11
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
…
en
c_
ds
_i
n
en
c_
ac
k_
ou
t
en
c_
da
t_
ou
t[3
1:
0]
10
11
10
00
00
11
11
11
11
1…
…
…
…
…
…
0…
…
…
…
…
…
…
…
0…
…
…
…
…
…
…
…
0…
…
…
…
…
…
…
…
0…
…
en
c_
ds
_o
ut
en
c_
ac
k_
in
20
,0
00
 p
s
1,
29
5,
00
0 
ps
0 
ps
20
0,
00
0 
ps
40
0,
00
0 
ps
60
0,
00
0 
ps
80
0,
00
0 
ps
1,
00
0,
00
0 
ps
1,
20
0,
00
0 
ps
(a) Encoding delay measure-
ment.
cl
k
rs
t
de
c_
da
t_
in
[3
1:
0]
…
…
…
…
…
…
…
…
…
…
…
…
…
…
00
10
01
10
11
10
11
10
01
00
10
00
11
01
01
00
de
c_
ds
_i
n
de
c_
ac
k_
ou
t
de
c_
ds
_o
ut
de
c_
ac
k_
in
de
c_
da
t_
ou
t[3
1:
0]
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
…
1,
33
5,
00
0 
ps
5,
76
5,
00
0 
ps
2,
00
0,
00
0 
ps
3,
00
0,
00
0 
ps
4,
00
0,
00
0 
ps
5,
00
0,
00
0 
ps
(b) Decoding delay measure-
ment.
Figure 5.2: Measurement of channel coding’s processing delay.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 102
5.1.3 Discussion
Logic usage on both FPGAs are quite low when implementing either the encod-
ing or decoding chain of modules. This leaves plenty of space for other FPGA
modules to be added during final systems integration. More than enough space
will also be available when upgrading to half code rate BCH or LDPC in the
future. The time to process a TM frame is also very low. Decoding delay is
short, even when the maximum amount of 3 random bits have to be corrected
by the (511,484) BCH. Finally, both encoder and decoder components add
a delay that is more than an order of magnitude smaller than the measured
RTTavg of 60 ms in Section 4.7.1. Therefore, the channel coding process is
very fast compared to the rest of the communications channel.
5.2 BCH
Results of the (511, 484) code and a (511, 259) code are discussed in this section.
The latter code of rate R = (259/511) ≈ 0.5 has been implemented to compare
BCH’s BLER performance with R = 0.5 LDPC.
5.2.1 Testing
Unit and integration testing for the (511, 484) code has been part of testing in
Section 5.1.1. Sample codewords have been simulated in a VHDL test bench.
These codewords contained three random bit errors for the decoder to correct.
5.2.2 Results
Component FPGA Logic
Elements Used
Processing
Delay at 100
MHz Clock
(511,484)
Encoder
Xilinx Spartan3E
XC3S500E
102 of 4656 Flip
Flops; 322 of
9312 LUTs
1.11 µs
(511,259)
Encoder
Altera Cyclone III
EP3C120F780C7N
1680 of 120000
LEs
2.515 µs
(511,484)
Decoder
Xilinx Virtex5
XC5VLX50
1595 of 28800
Flip Flops; 2334
of 28800 LUTs
3.83 µs
(511,259)
Decoder
Altera Cyclone III
EP3C120F780C7N
9960 of 120000
LEs
45.47 µs
Table 5.2: BCH FPGA module implementation details.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 103
Table 5.2 shows FPGA implementation details for both encoder and decoder
modules. Altera’s synthesis report expresses FPGA logic usage i.t.o logic el-
ements (LEs) and not as flip flops or LUTs. Similar to channel coding, the
processing delay measures how long it takes to encode or decode a complete
TM frame. Here, the (511,484) decoder had to correct 3 bit errors. Since the
(511,259) code is half rate, it implements the TM frame splitting technique
discussed in Section 4.4.2.1. The (511,259) code corrected a maximum of 30
errors per sub-frame.
Figure 5.3: BLER plot of a (511,484) BCH code.
Fig. 5.3 shows the BLER performance of a (511, 484) code. The Predicted val-
ues have been obtained by using Eq. 3.1.6 in Section 3.1. Note that Predicted
matches with the BLER simulation in Matlab except at Eb/No = 11 dB.
The (511, 259) code’s BLER is shown in Fig. 5.4. By using Eq. 3.1.6, the
Predicted values have been determined. Both FPGA and Matlab simulation
results match each other for this code. Note that the Matlab results differ
from the others by exactly an order of magnitude at Eb/No = 6 dB.
5.2.3 Discussion
Looking at the logic usage for (511,484) BCH, it is clear that it occupies the
most logic from channel coding’s usage in Table 5.1. At first, the logic usage
of (511,259) may seem low. Since the Altera FPGA contains 120000 LEs, the
encoder would use almost an equivalent of 36% resources on the Spartan3E
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 104
Figure 5.4: BLER plot of a (511,259) BCH code.
FPGA. Similarly, the decoder would take up 34% resources on the Virtex 5
FPGA. Even though this code requires more logic, it would still fit on both
Xilinx platforms.
The block error probability predictions for the (511,484) BCH coincides
with Matlab’s BLER simulation in Fig. 5.3. Given the time constraints of this
project, there was not enough time to simulate this code on the FPGA as well.
However, this module did successfully pass the VHDL test bench simulations,
as well as systems integration testing from Section 5.1.1. Furthermore, given
that both Matlab and FPGA platforms implement the same (511,484) decoder,
and the success of simulation in Matlab in Fig. 5.3, it can be deduced that
the FPGA decoder is functioning correctly.
Clearly, the (511,259) module functions correctly since all three sets of re-
sults coincide with each other, as can be seen in Fig. 5.4. Given that this
decoder implements the same syndrome computation, BM and Chien algo-
rithms used by the (511,484) code, it serves as further proof that the (511,484)
code is functioning correctly. An order of magnitude difference in the BLER
at Eb/No = 6 dB, is due to not enough codewords being simulated in Mat-
lab. In order to obtain a BLER of 10−6, it is required to simulate at least a
million codewords. Only 100 000 codewords have been simulated, as a million
would have taken a few days to run. This resulted in a BLER of only 10−5 at
Eb/No = 6 dB for the Matlab simulation. Since the hardware processing delay
is short, the FPGA easily simulated a million codewords within 3 hours.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 105
5.3 LDPC
5.3.1 Testing
Similar to the (511,259) BCH modules, LDPC has not been part of the loop
back testing in Fig. 5.1. During development of the greater project, there has
not been enough time to elsewhere implement bit confidence measurement on
the QPSK modem. Only a BLER performance comparison between the FPGA
and Matlab implementations, has been done.
5.3.2 Results
Implementation details for the FPGA modules are given in Table 5.3. This
decoder assembled a TM frame from two subsequent codewords as discussed in
section 4.4.2.1. After processing the first codeword, the next one has been input
immediately. A maximum of 20 iterations per codeword have been performed
while measuring the time to process a complete TM frame.
Component FPGA Logic
Elements Used
Processing
Delay at
100 MHz
Clock
Encoder Altera Cyclone III
EP3C120F780C7N
3000 of 120000
LEs
2.63 µs
Decoder Altera Cyclone III
EP3C120F780C7N
38880 of 120000
LEs
151.91 µs
Table 5.3: LDPC FPGA module implementation details.
The α scaling value from Eq. 3.4.8 in Section 3.4.2 has been determined by
Matlab simulation in Fig. 5.5. Note that a bit error rate (BER) instead of
a block error rate (BLER) is used. Fig. 5.6 compares the optimal value of
α = 0.9 against no scaling, or α = 1.
Scaling by α is terminated before reaching 20 decoding iterations, as dis-
cussed in Section 3.4.2. The iteration count at which to terminate scaling,
denoted early termination (ET), has been simulated Matlab. Fig. 5.7 shows
these results when using α = 0.9. Clearly ET = 15 delivers the best BER
performance. Fig. 5.8 compares this result to the absence of an ET scheme.
Note the improvement of about 0.5 dB at BER=10−7.
Using an ET = 15 and α = 0.9, Fig. 5.9 compares the Matlab and FPGA
results with each other. Both graphs follow each other except at Eb/No = 6
dB. Finally, Fig. 5.10 compares the BER performance of both (511,259) BCH
and (512,256) LDPC implementations from Matlab.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 106
Figure 5.5: Optimal α search for the (512, 256) code.
Figure 5.6: Optimal α = 0.9 compared against α = 1.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 107
Figure 5.7: Termination of α = 0.9 scaling at different iteration counts.
Figure 5.8: Comparison between ET = 15 iterations and no ET when using
α = 0.9.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 108
Figure 5.9: Comparison between FPGA and Matlab simulations for α = 0.9 and
ET = 15 iterations.
Figure 5.10: Bit error rate comparison between half rate BCH and LDPC imple-
mentations from Matlab. LDPC uses ET = 15 and α = 0.9.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 109
5.3.3 Discussion
Resource usage on the FPGA for the LDPC decoder is very high compared to
BCH. This is due to the parallelisation introduced in Section 4.4.3.2. Reducing
parallelisation by a factor of 4 would more than half its current logic usage,
therefore allowing implementation on the Virtex 5 FPGA. However, this would
increase processing delay almost a factor of four. Still, the total delay will be
less than a millisecond and therefore an order of magnitude less than the
current RTT of an ARQ packet.
Scaling by α = 0.9 delivered an improvement at Eb/No = 3 dB in Fig.
5.6. However, it also degraded performance by almost 0.1 dB between 4 and
6 dB SNR. By using the ET technique in [29], this degrading in performance
has been countered. A clear improvement ranging from 0.1 to 0.7 dB between
4 and 6 dB SNR can be noticed. This improvement only requires adding
a simple multiplexer in the check node processor unit, which can enable or
disable scaling by α.
The exact order of magnitude difference between Matlab and FPGA BLER
simulations is again attributed to the duration of simulation. In order to com-
plete simulations within reasonable time, only 100 000 codewords have been
processed in Matlab. A million codewords have been processed on the FPGA
within 3 hours. Finally, it is clear from Fig. 5.10 that LDPC outperforms
BCH. Bit error rate performance, between 3 and 5 dB SNR, is at least an
order of magnitude better compared to BCH. A BER=10−6 is considered as
error free [17], therefore giving LDPC a 0.5 dB lead above BCH.
5.4 TM and ARQ Protocols
Unit testing took place on both the FIT-PC and SH4 platforms, since both
TM and ARQ use different IPC modules on these platforms. Fig. 5.11 shows
the loop back test setup on both platforms. All instances of ARQ and TM run
on the same computer and connect to each other via IPC. A test application,
which mimes the application layer from Fig. 3.15, then connects to ARQ at
A. Files now get transferred from ARQ at A to ARQ at D. This test ensures
that frame and packet processing routines from TM and ARQ are handled
according to the flow diagrams from chapter 4. Files ranging from 10 Mb to
400 Mb have been transferred to check for stability issues.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 110
TM
B
TM
C
ARQ
D
ARQ
A
C Test
Application :
Send File Name
TX
RX
TX TX TX
RXRXRX
Figure 5.11: TM and ARQ testing procedure.
After both ground station and satellite platforms have been implemented, file
transfer between them were interrupted at random times. An interruption
has been caused by either cutting the power to one of the platforms or by
terminating TM and ARQ processes. After resetting the interrupt condition,
the platform which hasn’t been interrupted is expected to be fully functional.
For example, a receiver is expected to clear an incomplete file and have its
expected sequence number adapted on the next file transfer. Files being suc-
cessfully transferred are compared against its source for data corruption, by
using file comparison applications such as Meld in Linux. Testing continued
until both TM and ARQ’s program flow coincided with its flow diagrams as
depicted in Sections 4.6 and 4.7. After finishing, there were absolutely no
remaining concerns regarding file transfer integrity and stability.
5.5 Belgium Demonstration
An aircraft borne test has been originally scheduled for August 2011. However,
due to project scope changes, this test has been cancelled. A new test was then
scheduled for October 2011 in Leuven, Belgium. The SAA was previously
tested for functionality and compatibility with the platform, but the entire
integrated system still has to be field tested and proven. This would include
the SAA, satellite platform, communication subsystems and ground station
with all constituent hardware and software. Fig. 5.12 illustrates the locality
for the new test procedure which took place on ground.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 111
A
B
GS
KUL
Engineering
Faculty
Legend :
A : Satellite Starting Point
B : Satellite Stop Point
GS : Ground Station Position
Figure 5.12: IS-HS 2 demo setup in Belgium.
A trolley, carrying the satellite, moved from A to B, while transmitting files to
the ground station at GS. These files contained either statistical information
or webcam images, obtained from a camera connected to the ASE’s PC, as per
Fig. 2.3. Statistical information included the trolley’s current GPS position,
CRC error counts on the uplink and other relevant SAA information. A special
register on the FPGA allowed retrieval of the current CRC error count. The
SCSS on the SH4 has been removed for this demonstration and replaced by
an application which transmits these files.
Figure 5.13: Application receiving files from ARQ on ground station FIT-PC.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 112
Fig. 5.13 shows the ground station application which received these files. It
connected to ARQ and is informed by the reporter thread, see Section 4.7,
when new files arrive. Webcam images are displayed in the red dashed box.
The blue dashed box updated the trolley position on a map, by using the
received GPS coordinates. Cyclic redundancy check information is displayed
in the green box. During the final demonstration, the CRC counts have been
logged, as shown in Fig. 5.14.
Figure 5.14: CRC error rate while moving from A to B in Fig. 5.12.
Between 0 and 300 seconds the error rate has been quite high. In the map
of Fig. 5.12, a row of trees are visible next to the blue line between the
satellite and ground station. As these have been wet, it severely attenuated
the uplink’s signal power at the particular frequency concerned. This in turn
lead to many decoding failures on the (511,484) BCH decoder. These errors
were then caught by the CRC module. All trees have been out of sight at
destination B, hence the low CRC error count between 300 and 500 seconds
as per Fig. 5.14.
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 5. TESTING, RESULTS AND DISCUSSION 113
(a)
(b)
(c)
(d)
Figure 5.15: Images received on the ground station after file transfer from the
satellite platform.
A few samples of the received webcam images are shown in Fig. 5.15. Low
resolution images have been chosen to enable fast transfer over the link. Cor-
rupted data are generally always visible on an image. Apart from the exact
error count measurements implemented, the lack of visible errors in the images
is further proof of the correct functioning of TM and ARQ in the satellite -
to ground station platform file transfer. Given the high CRC error rate from
Fig. 5.14, it also shows that channel coding ensures data integrity. Corrupt
TM frames are dropped while error free frames are allowed through.
5.6 Summary
This chapter presented unit - and system testing procedures to verify the
correct functioning of the components as per Chapter 4. By implementing FEC
on a FPGA, it is observed that data processing speeds are at least an order
of magnitude faster than the RTT of an ARQ packet. Given the amount of
processing required for BCH and LDPC, the FPGAs truly lower the work load
on both SH4 and FIT-PC platforms. It has been confirmed that (511,484) BCH
delivers the error correcting performance as intended. Both R = 0.5 BCH and
LDPC are worthy competitors for the (511,484) BCH. Furthermore, properly
optimising a R = 0.5 QC-LDPC will enable it to outperform the equivalent
R = 0.5 BCH. Finally, the successful demonstration at KUL concluded that
both ground station and satellite platforms function exactly as designed.
Stellenbosch University   http://scholar.sun.ac.za
Chapter 6
Conclusion, Contributions
and Recommendations
6.1 Conclusion and Summary
The IS-HS 2 project is quite complex, requiring very thorough integration of
a relatively large number of hardware and software modules. At the com-
mencement of this Masters project, a significant portion of the overall system
has been completed, but some important gaps remained. These mostly en-
tailed the ground-satellite platform file transfer mechanisms and ensuring the
reliability thereof, as well as stable integration into the rest of the system.
The research and development work documented herein, therefore, focused on
channel coding and implementing a suitable file transfer protocol.
The flexible modular layout of channel coding allows modules to be easily
added or removed. Other channel coding standards which adopt different
CRC and randomisation schemes can be implemented by simply replacing
the appropriate modules. This also holds true when testing and evaluating
FEC schemes other than LDPC or BCH. Interfaces of other modules utilising
channel coding would not have to be redesigned.
Channel coding ensured data integrity as required. Clearly there has been
a very high error rate in communication during the Belgian demonstration.
Due to low signal power, the (511,484) sometimes failed to correct all errors
within a codeword. This is to be expected, but even when data containing
errors passed through the decoder, CRC detected those errors and dropped
the frame. Addition of FEC to channel coding meant that a communication
time window could indeed be optimally used. Both R = 0.5 BCH and LDPC,
significantly lowered the required signal power for reliable communications, as
shown by simulation. An uncoded system required approximately Eb/No = 16
dB for a block error rate of 10−6. By contrast, LDPC requires an Eb/No = 6
dB for the same block error rate, hence an improvement of 10 dB.
The FEC simulations also confirmed the findings in [1], that LDPC does
114
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 6. CONCLUSION, CONTRIBUTIONS
AND RECOMMENDATIONS 115
outperform BCH. Properly optimising the low hardware complexity MS de-
coder allows it to match BCH’s performance at 6 dB SNR. At SNRs lower
than 6 dB it shows a clear improvement over BCH. Reducing the parallelisa-
tion for both BCH and LDPC implementations, would allow implementation
on smaller FPGAs, reducing the total system cost. Retransmitting an ARQ
packet 5 times, has been shown to introduce a total delay of 460 ms per packet.
This was still sufficient for transferring 10 kB files during a 15 minutes com-
munications window. Using no parallelisation, it is estimated both decoders
would introduce a latency of no more than 10 ms. Therefore parallelisation for
both BCH and LDPC decoders can be reduced without affecting the required
data throughput.
Interprocess communication have been separated from both TM and ARQ’s
frame and packet processing routines. Only the IPC module has to be changed
when switching between Windows or Linux Ubuntu on a ground station. Fur-
thermore, the message passing API hides shared memory and semaphore de-
tails from the user. Therefore, it provides a simple way of connecting two OSI
layers to each other in order to exchange data.
Finally, the protocol layers ensured reliable file transfer between a ground
station and the satellite platform. The simple stop-and-wait ARQ strategy
managed to assemble files correctly, even when many lost packets had to be
retransmitted. Apart from ARQ and TM’s testing procedures, reliable file
transfer was confirmed when photos have been transferred without error dur-
ing the demonstration in Belgium. A very satisfying outcome was the field
proven capability of the coding and protocol combination to enable data trans-
fer reliability under quite pathological practical conditions.
6.2 Contributions to the Project
This thesis documents specific inputs and contributions to the IS-HS 2 project
as well as a number of interesting results of a more general nature. Contribu-
tions specific to the project are :
• Provided the software and firmware components necessary for final com-
munication system integration on both satellite and ground station plat-
forms.
• Added reliable and efficient file transfer mechanisms to the communica-
tions channel.
• Designed specific subunit and integration testing procedures to verify the
implemented strategies.
Further contributions to the project, but also of more general nature, include
:
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 6. CONCLUSION, CONTRIBUTIONS
AND RECOMMENDATIONS 116
• Confirmed the findings of previous investigations that BCH and LDPC
are viable FEC strategies for this type of project, to a high degree of
confidence.
• Designed a parallelisable FPGA architecture for both BCH and LDPC
decoders.
• Determined speed and complexity trade-offs by adjusting the amount of
parallelism in the particular FEC scheme.
• Implementation of a robust FEC performance verification strategy by
using a combination of Matlab and FPGA results to establish function-
ality of the FPGA design beyond doubt. It is also valuable to identify
dysfunctional implementations.
• To provide TM and ARQ functionality to their particular layers as de-
scribed by OSI specifications, thereby making the scheme available for
wider applications.
• Development of a simple message passing API for operating systems
other than QNX.
• Easy porting of TM and ARQ to operating systems other than QNX or
Linux Ubuntu. Interprocess communication has been separated from the
service provided by TM and ARQ.
• Implemented a TM protocol according to ECSS specifications. This can
be used with any other transport layer protocol, as no packet specific
information from that layer is required by TM.
• Implemented an elegantly simple stop-and-wait ARQ which provided
reliable and sufficient data throughput for this project, but again has
wider application under poor communications environments.
6.3 Recommendations
The following recommendations for possible improvements of the current sys-
tems design, as well as for a next generation design, are presented :
• Use higher girth construction techniques for LDPC’s parity check ma-
trix. High girth reduces the number of iterations required for successful
decoding. This is useful when reducing the processing latency for low
complexity, non-parallel LDPC decoders.
• Program crashes frequently happen when developing application layer
software. This tends to leave IPC structures of ARQ in an unknown
Stellenbosch University   http://scholar.sun.ac.za
CHAPTER 6. CONCLUSION, CONTRIBUTIONS
AND RECOMMENDATIONS 117
state after which all OSI layers have to be manually restarted. Self
restarting routines can be added to IPC when it loses connection with
another layer.
• Move parameters for TM and ARQ, such as time-out settings, to config-
urations files. These parameters can then be tweaked during operation
to optimise protocol performance for the current situation. Currently
these parameters are set during compile time.
• At the time of writing this thesis, the satellite concept for this project
has been cancelled. All channel coding components could be ported from
FPGA to software running on a modern powerful PC. Removing FPGAs
and the expensive SH4 will reduce system cost and hardware complexity.
• Move the SDR modem from a DSP to the SAA’s FPGA. This would
further reduce hardware complexity and system cost.
• Remove IPC between TM and ARQ layers as it adds unnecessary com-
plexity to the software layers. Layers to be used for a particular im-
plementation are chosen before the start of development. Therefore,
dynamic addition or removal of layers are not really required. By using
C compiler directives, only the necessary layers can be compiled to run
as one process.
• Investigate new construction techniques for the parity check matrix of
QC-LDPC. Some techniques allow the minimum Hamming distance for
QC-LDPC to grow linearly as the block length increases. Larger block
lengths would lead to better BLER performance.
• Consider the use of FPGA based FEC simulators. No slow serial con-
nectivity between a PC and the FPGA, as with the simulator from this
thesis, would be used. Hardware simulators tend to be much faster than
software simulators. Design tools from Xilinx such as AccelDSP helps in
translating Matlab implementations quickly to VHDL.
• A new initiative between SU and KUL would be investigating the appli-
cation of the SAA concept, to mobile communications.
Stellenbosch University   http://scholar.sun.ac.za
References
[1] F. Olivier, “An LDPC Error Control Strategy for Low Earth Orbit Satel-
lite Communication Link Applications ,” Master’s thesis, University of
Stellenbosch, 2009.
[2] L. Zhou, “Implementation of the Berlekamp-Massey algorithm and Pe-
terson’s algorithm in C programming language,” Apr 2007, eCE Dept.,
University of Toronto.
[3] S. Lin and D. Costello, Error Control Coding : Fundamentals and Appli-
cations. Prentice-Hall, Inc., 1983.
[4] Z. Li, L. Chen, L. Zeng, S. Lin, and W. H. Fong, “Efficient Encoding of
Quasi-Cyclic Low-Density Parity-Check Codes,” IEEE Transactions on
Communications, vol. 54, no. 1, pp. 71–81, Jan 2006.
[5] B. R. Elbert, The Satellite Communication Applications Handbook.
Artech House, 1997.
[6] R. A. Nelson, “A Primer on Satellite Communications,” Via Satellite,
1998.
[7] I. Kruger, “An aircraft based emulation platform and control model for
LEO satellite antenna beam steering,” Master’s thesis, University of Stel-
lenbosch, 2010.
[8] SH7750, SH7750S, SH7750R Group, 7th ed., Renesas Electronics Corpo-
ration, Oct 2008.
[9] (2009, Dec.) Internetworking Basics. [Online]. Available: http:
//docwiki.cisco.com/wiki/Internetworking_Basics
[10] Silberschatz et al., Operating System Concepts, 7th ed. John Wiley &
Sons, 2005.
[11] System Architecture, 5th ed., QNX Software Systems, May 2004.
[12] M. Beck et al., Linux Kernel Internals. Addison Wesley Longman, 1996.
118
Stellenbosch University   http://scholar.sun.ac.za
REFERENCES 119
[13] K. Robbins and S. Robbins, Unix Systems Programming. Prentice Hall,
2003.
[14] (2011, Nov) mq_overview(7) - Linux man page. [Online]. Available:
http://linux.die.net/man/7/mq_overview
[15] T. Moon, Error Correction Coding. John Wiley & Sons, 2005.
[16] P. Sweeney, Error Control Coding : From Theory to Practice. John Wiley
& Sons, 2004.
[17] B. P. Lathi, Modern Digital and Analog Communication Systems, 3rd ed.
New York: Oxford University Press, Inc., 1998.
[18] “DVB Fact Sheet : 2nd Generation Satellite,” Digital Video Broadcast,
Tech. Rep., Sep 2010.
[19] ECSS, “Space engineering : Space data links - Telecommand protocols,
synchronisation and channel coding,” European Space Agency, Tech. Rep.
ECSS-C-50-04A, Nov 2007.
[20] R. G. Gallager, “Low-Density Parity-Check Codes,” Ph.D. dissertation,
Cambridge, Mass, Jul 1963.
[21] D. J. MacKay, “Good Error-Correcting Codes Based On Very Sparse Ma-
trices,” IEEE Transactions on Information Theory, vol. 45, no. 2, pp.
399–431, Mar 1999.
[22] D. MacKay and R. Neal, “Near Shannon Limit Performance of Low Den-
sity Parity Check Codes,” Electronics Letters, vol. 33, no. 6, pp. 457–458,
Mar 1997.
[23] N. Wiberg, “Codes and Decoding on General Graphs,” Ph.D. dissertation,
Dept. Elect. Eng. Linköping Univ., Linköping, Sweden, Oct 1996.
[24] “IEEE Standard for Information Technology - Part 11 : Wireless LAN
Medium Access Control and Physical Layer Specifications,” IEEE, Tech.
Rep. 802.11-2007, Jun 2007.
[25] T. Lestable, E. Zimmerman, M.-H. Hamon, and S. Stiglmayr, “Block-
LDPC Codes Vs Duo-Binary Turbo-Codes for European Next Generation
Wireless System,” IEEE, Feb 2007.
[26] Turbo Codes. [Online]. Available: http://www.francetelecom.com/en_
EN/innovation/intellectual_property/turbo_codes
[27] W. E. Ryan, “An Introduction to LDPC Codes,” Dept. Elec. Eng. and
Comp. Eng., Univ. Arizona, University of Arizona, Box 210104, Tucson,
AZ 85721, Tech. Rep., Aug 2003.
Stellenbosch University   http://scholar.sun.ac.za
REFERENCES 120
[28] J. Lu and M. Moura, “Structured LDPC Codes for High-Density Record-
ing : Large Grith and Low Error Floor,” IEEE Transactions on Magnetics,
vol. 42, no. 2, pp. 208–213, Feb 2006.
[29] Efficient LDPC Decoder Implementation for DVB-S2 System, Apr 2010.
[30] T. Richardson and R. Urbanke, “Efficient Encoding of Low-Density
Parity-Check Codes,” IEEE Transactions on Information Theory, vol. 47,
no. 2, pp. 638–656, Feb 2001.
[31] R. Ziemer and W. Tranter, Principles of Communications, 5th ed. John
Wiley & Sons, 2002.
[32] H. Killen, Digital Communications with Fiber Optics and Satellite Appli-
cations. Prentice-Hall, Inc., 1988.
[33] P. Z. Peebles, Probability, Random Variables and Random Signal Princi-
ples. McGraw-Hill Book Co, 2001.
[34] S. J. Johnson, Iterative Error Correction : Turbo, Low-Density Parity-
Check and Repeat-Accumulate Codes. Cambridge University Press, Nov
2009.
[35] J. Botha, “A Reusable Signal Processing Architecture for Satellite based
Communication Systems,” Master’s thesis, University of Stellenbosch,
2011.
[36] H. Aliakbarian, V. Volski, and G. Vandenbosch, “Maximum gain of the
antenna,” University of Leuven, Tech. Rep., 2009.
[37] H. Aliakbarian, “An Estimation of SAA Sensitivity,” University of Leuven,
Tech. Rep., 2009.
[38] ECSS, “Space engineering : Communications Guidelines,” European
Space Agency, Tech. Rep. ECSS-E-HB-50A draft 1.4, Apr 2008.
[39] S. J. Johnson. (2008, Apr) Introducing Low-Density Parity-Check Codes.
[Online]. Available: http://materias.fi.uba.ar/6624/index_files/outline_
archivos/SJohnsonLDPCintro.pdf
[40] C. A. Cole, E. K. Hall, S. G. Wilson, and T. R. Giallorenzi, “Analysis
and design of moderate length regular LDPC codes with low error floors,”
Univ. of Virginia and L-3 Communications, Tech. Rep., May 2006.
[41] C. A. Cole, S. G. Wilson, E. K. Hall, and T. R. Giallorenzi, “Regular
(4,8) LDPC codes and their low error floors,” Univ. of Virginia and L-3
Communications, Tech. Rep., May 2006.
Stellenbosch University   http://scholar.sun.ac.za
REFERENCES 121
[42] L. Sun, H. Song, V. Kumar, and Z. Keirn, “Field-Programmable Gate-
Array-Based Investigation of the Error Floor of Low-Density Parity Check
Codes for Magnetic Recording Channels,” IEEE Transactions on Magnet-
ics, vol. 41, no. 10, pp. 2983–2985, Oct 2005.
[43] Y. Han and W. E. Ryan, “LDPC Decoder Strategies for Achieving Low
Error Floors,” Dept. Elec. Eng. and Comp. Eng., Univ. Arizona, Tech.
Rep., Dec 2007.
[44] S. J. Johnson and S. R. Weller, “A Family of Irregular LDPC Codes With
Low Encoding Complexity,” IEEE Communications Letters, vol. 7, no. 2,
pp. 79–81, Feb 2003.
[45] M. P. C. Fossorier, “Quasi-Cyclic Low-Density Parity-Check Codes From
Circulant Permutation Matrices,” IEEE Transactions on Information
Theory, vol. 50, no. 8, pp. 1788–1793, Aug 2004.
[46] S. Myung, K. Yang, and J. Kim, “Quasi-Cyclic LDPC Codes for Fast
Encoding,” IEEE Transactions on Information Theory, vol. 51, no. 8, pp.
2894–2901, Aug 2005.
[47] W. Stallings, Data and Computer Communications, 7th ed. Pearson
Prentice Hall, 2004.
[48] Synthesis and Simulation Design Guide, v11.4 ed., Xilinx, Dec 2009.
[49] S. Ruckmani and P. Angbalagan, “High Speed Cyclic Redundancy Check
for USB,” DSP Journal, vol. 6, no. 1, pp. 45–50, Sep 2006.
[50] I. L. W. Couch, Digital and Analog Communication Systems, 7th ed.
Pearson Prentice Hall, 2007.
Stellenbosch University   http://scholar.sun.ac.za
Appendices
122
Stellenbosch University   http://scholar.sun.ac.za
Appendix A
Mathematical Derivations
A.1 QPSK Bit Error Probability Analysis
Fig. A.1a shows a QPSK signal constellation with four two-bit symbols S1 to
S4. Symbols are indicated by the black dots on both axis. The in-phase axis is
indicated by I while Q represents the quadrature axis. Each symbol differs by
pi
2
radians in phase from its neighbour. If a received symbol’s phase deviates
by more than pi
4
radians due to phase noise, the modem makes an error. The
dashed line represents the boundary which a symbol may not cross before
being mistaken for another symbol. For example if symbol S1’s phase deviates
into the grey area of Fig. A.1b, it will be mistaken for another symbol. All
symbols will be equally affected by noise, therefore BEP analysis will continue
by using S1 as an example.
00
10
11
01
Q
I
pi
4
pi
4
Decision
Boundary
Decision
Boundary
S1
S2
S3
S4
(a) A QPSK signal constellation.
Q
Decision
Boundary
00
S1
pi
4
pi
4
Error
Region
I
(b) Error region for symbol S1 in Fig.
A.1a.
Figure A.1: QPSK symbol decision - and error regions.
123
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 124
Eq. A.1.1 shows noise vector n(t) [17] consisting of both in-phase and quadra-
ture components, ni(t) and nq(t) respectively :
n(t) = ni(t) + nq(t) (A.1.1)
Adding n(t) to symbol S1, represented by phasor yS1(t) = Aejωt, results in
Snoise with ynoise(t) = Eejωt+θ as shown in Fig. A.2a. Scalar E is symbol S1’s
amplitude with added amplitude noise while θ indicates phase noise added by
n(t).
θ
A nc
ns
E
S1
Snoisepi
4
Decision
Boundary
I
Q
n
(a) Gaussian white noise added to
S1.
dE
pi
4
Decision
Boundary
I
Q
Decision
Boundary
pi
4
dθ
E
(b) Unit area of integration when
using polar coordinates.
Figure A.2: Vector representation of Gaussian noise added to symbol S1.
Since ni(t) and nq(t) of Eq. A.1.1 are statistically independent Gaussian ran-
dom variables, we have :
ρ(ni, nq) = ρ(ni)ρ(ns)
=
1√
2piσ2
e
−n2i
2σ2
1√
2piσ2
e
−n2q
2σ2
=
1
2piσ2
e
−(n2i+n2q)
2σ2 (A.1.2)
The term ρ(ni, nq) is a joint Gaussian PDF. In order to write Eq. A.1.2 i.t.o
amplitude, E, and phase noise, θ, we first show the following :
E2 = (A+ ni)
2 + n2q
n2i + n
2
q = E
2 − A2 − 2Ani
= E2 − 2A(A+ ni) + A2
= E2 − 2AEcos(θ) + A2 (A.1.3)
Substituting Eq. A.1.3 into Eq. A.1.2 leads to :
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 125
ρ(θ, E) =
(
1
2piσ2
)
e
−(E2−2AEcos(θ)+A2)
2σ2
=
(
1
2piσ2
e
−A2
2σ2
)
e
−(E2−2AEcos(θ))
2σ2 (A.1.4)
Symbol Snoise in Fig. A.2a will always be interpreted by the modem as S1, as
long as it stays in the white area of Fig. A.1b. Integrating PDF A.1.4 over this
area gives the probability of Snoise falling in this region, hence the probability
of receiving S1 successfully. Using polar coordinates and the unit area in Fig.
A.2b, leads to the following :
Psuccess_S1_unit = ρ(θ, E).E.dθ.dE
Psuccess_S1 =
∫ pi
4
−pi
4
(∫ ∞
0
ρ(θ, E).E.dE
)
dθ (A.1.5)
Evaluating the integral between brackets of Eq. A.1.5 marginalises the PDF
from Eq. A.1.4. This is shown below :
ρθ(θ) =
∫ ∞
0
ρθE(θ, E).E.dE
=
1
2piσ2
e
−A2
2σ2
∫ ∞
0
(
e
−(E2−2AEcos(θ))
2σ2
)
EdE
= C0
∫ ∞
0
(
e
−(E2−2AEcos(θ))
2σ2
)
EdE (A.1.6)
where
C0 =
1
2piσ2
e
−A2
2σ2 (A.1.7)
Re-arranging Eq. A.1.6 we have :
ρθ(θ) =
(−2σ2C0)(1
2
)∫ ∞
0
(−2E + 2Acos(θ)
2σ2
)(
e
−E2+2AEcos(θ)
2σ2
)
dE
+
(
2σ2C0
)(1
2
)∫ ∞
0
(
2Acos(θ)
2σ2
)(
e
−E2+2AEcos(θ)
2σ2
)
dE
=
(−σ2C0)(e−E2+2AEcos(θ)2σ2 ) ∣∣∣∣∞
0
+ (AC0cos(θ))
∫ ∞
0
(
e
−E2+2AEcos(θ)
2σ2
)
dE
=
(−σ2C0) (0− 1) + (AC0cos(θ))∫ ∞
0
(
e
−E2+2AEcos(θ)
2σ2
)
dE
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 126
=σ2C0 + (AC0cos(θ))
∫ ∞
0
(
e
−(E2−2AEcos(θ)+A2cos2(θ)−A2cos2(θ))
2σ2
)
dE
=σ2C0 + (AC0cos(θ))
∫ ∞
0
(
e
−((E−Acos(θ))2+A2cos2(θ))
2σ2
)
dE
=σ2C0 + (AC0cos(θ))
(
e
A2cos2(θ)
2σ2
)∫ ∞
0
(
e
−(E−Acos(θ))2
2σ2
)
dE (A.1.8)
Now let y = (E−Acos(θ))
σ2
, then :
dy
dE
=
1
σ
dE = σdy (A.1.9)
Using transformation A.1.9 on Eq. A.1.8 we get :
ρθ(θ) =σ
2C0 + (σAC0cos(θ))
(
e
A2cos2(θ)
2σ2
)∫ ∞
−Acos(θ)
σ
(
e
−y2
2
)
dy
=σ2C0
[
1 +
(
σAcos(θ)
σ2
)(
e
A2cos2(θ)
2σ2
)(√
2pi
1
1√
2pi
)∫ ∞
−Acos(θ)
σ
(
e
−y2
2
)
dy
]
=σ2C0
[
1 +
(
Acos(θ)
σ
)(
e
A2cos2(θ)
2σ2
)(√
2pi
)
×
(
1−Q
(
Acos(θ)
σ
))]
(A.1.10)
where
Q(x) =
1√
2pi
∫ ∞
x
(
e
−a2
2
)
da (A.1.11)
Substituting Eq. A.1.10 into Eq. A.1.5 gives :
Psuccess_S1 =
∫ pi
4
−pi
4
ρθ(θ)dθ (A.1.12)
Phase noise is of interest when performing BEP analysis for QPSK. Note that
Psuccess_S1 indicates the probability of successfully receiving a symbol in the
presence of phase noise θ. Using Eq. A.1.12, the symbol error probability is
determined :
Perror_S1 = 1− Psuccess_S1
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 127
= 1−
∫ pi
4
−pi
4
ρθ(θ)dθ (A.1.13)
However, a BEP is of interest in this analysis. Most symbol errors occur when
the modem mistakes the correct symbol with its neighbour [17]. Looking at
Fig. A.3, each symbol is arranged such that it differs only in one bit from
its neighbour. This is called Gray coding [17]; a technique that minimises the
amount of bit errors per symbol error.
I
Q
0 0
1 1
1 0
0 1
Pbep
Pbep
S1S2
S3 S4
Figure A.3: Gray coding scheme for the symbols of a QPSK constellation.
Fig. A.3 also assumes equiprobability, Pbep, for each bit of a symbol changing
its value. This leads to the following BEP :
Pbep =
Perror_S1
2
(A.1.14)
A FEC codeword will consist of a sequence of bits as discussed in Chapter 2,
Section 2.5. The bit error probability of Eq. A.1.14 can be used to determine
the length of such a codeword by means of Bernoulli trials. Assuming a t-error
correcting code, leads to codeword error probability :
Pcep =
n∑
i=t+1
P (i, n) (A.1.15)
Since FEC can correct up to t errors, only the occurrence of errors greater
than t bits affects the codeword error probability. Therefore term P (i, n) of
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 128
Eq. A.1.15 is the probability of making i > t errors in a n-bit codeword. Since
there are
(
n
i
)
to make i errors in a codeword [17], P (i, n) can be written as :
P (i, n) =
(
n
i
)
P ibep(1− Pbep)n−i (A.1.16)
Eq. A.1.16 is also known as a Bernoulli trial [33]. Substituting Eq. A.1.16
into Eq. A.1.15 gives the final codeword error probability :
Pcep =
n∑
i=t+1
(
n
i
)
P ibep(1− Pbep)n−i (A.1.17)
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 129
A.2 Signal-to-Noise Ratio for Simulations
This section derives the relationship between signal-to-noise ratios A2/σ2 and
Eb/No for QPSK. The latter is expressed as energy per bit to noise power spec-
tral density whereas the first ratio indicates average signal power to average
noise power. A time domain binary phase shift keying (BPSK) signal is shown
in Fig. A.4.
A
Tb Tb Tb Tb
t
fbpsk(t)
Figure A.4: Time domain BPSK signal.
Unit Tb indicates transmission time per BPSK symbol. Note that a BPSK
symbol is equivalent to a bit. A single bit is represented by :
fbit(t) = Π
(
t− nTb
2
Tb
)
× Asin(2pif0t+ θ), n > 0 , all odd n (A.2.1)
where Π is a time shifted rectangular pulse of width Tb and unity amplitude.
Taking the Fourier transform of fbit(t) leads to a BPSK bit’s frequency spec-
trum in Fig. A.5.
f
|hbpsk(f)|
B
f0−f0 0
ATb
2
BBB
Figure A.5: Frequency domain of a BPSK bit.
The Fourier transform of Π in Eq. A.2.1 is a sin(x)/x function. Although
this function has a infinite bandwidth, most of its power lies in the main
lobe. Therefore its effective bandwidth is B = 1/Tb [17] and is shown in Fig.
A.5. Noise is also present on this signal when entering the receiver. Placing a
bandpass filter (BPF) of bandwidth 2B, unity gain and centre frequency f0 at
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 130
the receiver, limits this noise power. The average power of a sinusoid having
amplitude A is A2/2. In a AWGN channel, variance σ2 represents average
noise power [15]. Expressing this as a ratio, leads to the SNR for BPSK :
SNR =
A2
2
σ2
=
A2
2σ2
(A.2.2)
Gaussian noise has a power spectral density of No/2, where No is a constant
[15]. Filtering the signal in Fig. A.5 with a BPF as mentioned before, the
total noise power is :
Np = 2× No
2
× 2B (A.2.3)
Knowing that σ2 represents the average noise power, Eq. A.2.2 is transformed
using Eq. A.2.3 :
SNR =
A2
2σ2
(A.2.4)
=
A2
2× 2× No
2
× 2B
Since B = 1/Tb we have :
SNR =
A2Tb
2× 2×No (A.2.5)
The average energy of a sinusoid over period Tb is given by Eb = A2Tb/2 [50].
Using this, Eq. A.2.5 is transformed :
SNR =
Eb
2×No (A.2.6)
Comparing Eqs. A.2.2 and A.2.6 and letting A = Abpsk, we get the following
for BPSK :
SNRbpsk =
A2bpsk
σ2
=
Eb
No
=
Es_bpsk
No
(A.2.7)
where Eb and Es_bpsk represent bit and symbol energies. A QPSK receiver
uses a BPSK receiver for its orthogonal channels, I and Q, respectively. Fig.
A.6 shows that for QPSK channels I and Q, the amplitude of symbol S1 is
Stellenbosch University   http://scholar.sun.ac.za
APPENDIX A. MATHEMATICAL DERIVATIONS 131
Aqpsk =
√
A2bpsk + A
2
bpsk. Substituting Abpsk with Aqpsk in Eq. A.2.7 leads to
the following :
SNRqpsk =
A2qpsk
σ2
(A.2.8)
=
2A2bpsk
σ2
=
2Eb
No
Therefore we have :
A2qpsk
σ2
=
2Eb
No
=
Es_qpsk
No
(A.2.9)
Comparing Eqs. A.2.7 and A.2.9, it can be seen that QPSK uses twice the
energy compared to BPSK.
Q
I
Abpsk
Abpsk
Aqpsk
S1
Figure A.6: A QPSK symbol amplitude i.t.o two BPSK symbols on channels I and
Q.
Stellenbosch University   http://scholar.sun.ac.za
