Intelligent SpaceWire Router Design by SKVARČ BOŽIČ, GAŠPER
University of Ljubljana
Faculty of Electrical Engineering
Gašper Skvarč Božič
Intelligent SpaceWire Router
Design
Master’s thesis
2nd cycle postgraduate study program in Electrical Engineering
Supervisors:
Prof. Dr. Andrej Žemva, University Of Ljubljana
Dr.-Ing Markus Plattner, Technical University of Munich
Dr.-Ing Sabine Ott, MPE
Munich, 2019

Acknowledgments
I would first like to thank Dr.-Ing Markus Plattner of the Integrated Systems
at the Technical University of Munich for giving me the opportunity to work on
my thesis under his supervision. The door to Dr.-Ing Plattner was always open
whenever I ran into a trouble spot or had a question about my research or writing.
I offer my sincere gratitude to my supervisor at Max Planck Institute for Ex-
traterrestrial Physics, Dr.-Ing Sabine Ott who has supported me through my
Master thesis with answering all my questions, giving me advice how to tackle
problems, and guiding me through the writing process.
I would like to thank Prof. Dr. Andrej Žemva of the Laboratory for Integrated
Circuit Design at the University of Ljubljana for supporting me throughout this
thesis and providing me his feedback even though for most of the time we could
only exchange emails.
I would like to acknowledge all my colleagues at Max Planck Institute for Ex-
traterrestrial Physics for helping me whenever I needed something, be it software
packages, components, or lab equipment. It has been an excellent working expe-
rience at the Max Planck Institute for Extraterrestrial Physics.
Finally, I must express my very profound gratitude to my family and friends for
providing me with unfailing support and continuous encouragement throughout
my years of study and through the process of researching and writing this thesis.
This accomplishment would not have been possible without them. Thank you.
Gasper Skvarc Bozic
Munich, September 2019
iii
iv
Povzetek
SpaceWire je komunikacijski standard, ki se uporablja za prenos informacij med
merilnimi in drugimi enotami vesoljskega plovila kot so na primer senzorji z visoko
hitrostjo podatkov, procesne enote, spominske enote in telemetrične podenote.
Cilj standarda je zagotoviti visoko zmogljiv sistem za ravnanje s podatki, poma-
gati zmanǰsati ceno sistemske integracije, zagotoviti skladnost med pod-sistemi
za ravnanje s podatki in spodbuditi ponovno uporabo opreme za ravnanje s po-
datki v več različnih misijah. SpaceWire standard dovoljuje povezave od točke do
točke in omrežne povezave. Za namen povezovanja več omrežnih vozlǐsč znotraj
omrežja so potrebni usmerjevalniki.
Usmerjevalniki, ki so trenutno na voljo po večini omogočajo le dvo-nivojsko
(visoka in nizka prioriteta) prioritetno razvrščanje paketkov v primeru dostopa-
nja do istih izhodnih vrat in sploh ne omogočajo oddajanja več prejemnikom
(angleško ”multicast”). Zaradi omejene funkcionalnosti in odsotnosti omenjene
lastnosti pri obstoječih usmerjevalnikih, je bil v sklopu magistrske naloge za-
snovan in izdelan pametni usmerjevalnik za komunikacijski protokol SpaceWire.
Izdelan usmerjevalnik omogoča več-nivojsko dinamično prioritetno razvrščanje in
oddajanje več prejemnikom. Izdelan je kot VHDL komponenta z namenom, da je
implementacija prilagodljiva in s tem omogoča enostavno dodajanje novih funkcij
ter, da je usmerjevalnik mogoče vključiti v druge projekte. Za namen testiranja
je usmerjevalnik prilagojen za ciljno napravo RTG4 FPGA.
Pred začetkom načrtovanja usmerjevalnika so bile postavljene sledeče zahteve.
Usmerjevalnik mora biti izdelan kot prilagodljiva IP komponenta. Usmerjevalnik
mora zagotoviti povezavo vseh vhodnih vrat z vsemi izhodnimi vrati. Sočasen
dostop do istih izhodnih vrat se mora razrešit s pomočjo več-nivojskega priorite-
tnega razsodnika ( angleško ”arbiter”). Vrata usmerjevalnika moraj biti sposobna
v
vi Povzetek
komunicirati z omrežnimi vozlǐsči na podlagi SpaceWire standarda in identifici-
rati različne dele paketkov. Usmerjevalnik mora podpirati oba načina naslavljanja
paketkov tako v obliki poti skozi omrežje kot logično naslavljanje. Usmerjevalnik
mora podpirati oddajanje več prejemnikom. Kot zadnje, mora biti usmerjevalnik
nastavljiv z oddaljenim dostopom.
Usmerjevalnik predstavljen v tej magistrski nalogi je zasnovan na podlagi
predstavljenih zahtev, arhitekture generičnega usmerjevalnik, arhitektur ob-
stoječih usmerjevalnikov za komunikacijski protokol SpaceWire in arhitektur
usmerjevalnikov s področja omrežij v integriranih vezjih. Kot osnova usmerjeval-
nika je izbrana ”wormhole”tehnika komutacije (angleško šwitching technique”).
Na pram bolj poznani paketnih komutaciji iz sveta lokalnih omrežij, kjer se v
usmerjevalniku shrani celoten paketek, se v primeru ”wormhole”komutacije shrani
le del paketka. S tem se obdelava paketka občutno pohitri in komunikacija med
omrežnimi vozlǐsči lahko poteka hitreje. Slabost te tehnike je, da se paketki lahko
raztezajo preko več različnih usmerjevalnikov. V kolikor omrežje ni pravilno za-
snovano lahko pride do zaklenitve omrežja.
Omenjeno je bilo, da mora biti usmerjevalnik nastavljiv z oddaljenim dosto-
pom. V ta namen ima usmerjevalnik dodana nastavitvena vrata, ki omogočijo
dekodiranje SpaceWire paketkov v ukaze notranjega vodila. Namreč vse nasta-
vljive komponente usmerjevalnika so med seboj povezane preko notranjega vodila,
kjer nastavitven vrata igrajo vlogo nadrejene naprave vodila. Za notranje vodilo
je izbrano APB vodilo z enostavnimi operacijami, ki se uporablja predvsem kot
periferno vodilo. Vsaka nastavljiva komponenta v usmerjevalniku z izjemo usmer-
jevalne table ima le nekaj registrov, zato ni potrebe po visoko zmogljivem vodilu.
Registri so zasnovani tako, da imajo enako podatkovno širino kot vodilo. Za
branje ali pisanje v register je tako zadosti že ena sama operacija. To vodilo je
zadostno tudi za dostopanje do usmerjevalne tabele, saj se le-ta med samim de-
lovanjem usmerjevalnika bistveno ne spreminja. Če že, se spremeni le del vnosa
v usmerjevalni tabli, ki ima ravno tako podatkovno širino enako širini vodila.
Za dostopanje do internih spominskih enot v omrežnih vozlǐsčih se v SpaceWire
omrežjih uporablja poseben protokol imenovan RMAP. Gre za standardiziran
SpaceWire protokol, ki določa obliko paketkov glede na željeno operacijo nad
spominsko enoto. Definirane so tri različne operacije: branje, pisanje in modifi-
kacija, pri čemer je pri modifikaciji dolžina podatkov omejena saj se morajo le-ti
Povzetek vii
pred modifikacijo lokalno shranit. Nastavitvena vrata so zasnovana tako, da so
skladna z RMAP standardom.
Omenjena usmerjevalna tabela hrani podatke o tem kam mora usmerjevalnik
usmeriti prihajajoč paketek glede na njegov naslov. Dostop do usmerjevalne table
se izvede le v primeru logičnega naslova. V primeru naslavljanja v obliki poti skozi
omrežje, glava paketka že sama po sebi določa izhodna vrata paketka, tako dostop
do usmerjevalne tabele ni potreben. Usmerjevalna tabela poleg informacije o
ciljnih izhodnih vratih za pripadajoč naslov ob enem hrani podatke o prioriteti
paketkov, seznam izhodnih vrat v primeru oddajanja več prejemnikom in še nekaj
dodatnih parametrov. Usmerjevalna tabel je skupna vsem vhodnim vratom, ker
pa v danem trenutku do tabel lahko dostopajo samo ena vhodna vrata, je za
name razreševanja sočasnega dostopa do tabel potreben razsodnik. Razsodnik
deluje po principu ≫round-robin≪, torej vsa vhodna vrata imajo enako prioriteto
in si krožno izmenjujejo dostop do usmerjevalne tabele.
Vrata usmerjevalnika komunicirajo z omrežnimi vozlǐsči na podlagi SpaceWire
standarda in so sposobna identificirati posamezne dele paketka. Vrata so tako se-
stavljena iz SpaceWire kodirnika/dekodirnika in na sprejemni strani iz enote za
obdelovanje paketkov. SpaceWire kodirnik/dekodirnik omogoča kodiranje in de-
kodiranje znakov definiranih v SpaceWire standardu, ki so potrebni za prenos
informacij med omrežnimi vozlǐsči in za vzpostavitve povezave. Enota za obde-
lavo paketkov iz paketka izlušči njegov naslov, jedro in konec paketka. Polege
tega kontrolira pretok paketkov, dostopa do usmerjevalne tabele in komunicira s
kontrolno logiko. Vsa vhodna vrata so povezana z vsemi izhodnimi vrati preko
koordinatnega stikala (angleško črossbar switch”).
Kontrolna logika usmerjevalnika izda dovoljenje za dostop vhodnih vrat do
izhodnih vrat in kontrolira pozicijo koordinatnega stikala. Zahteva po sočasnem
dostopu do istih izhodnih vrat s strani več vhodnih vrat se rešuje z več-nivojskim
prioritetnim razsodnikom. Prioritetni razsodnik je zasnovan kot tri stopenjski
cevovod. V prvi stopnji se shranijo prioritete in vse zahteve vhodnih vrat. V
drugi stopnji se poǐsče vhodna vrata s paketkom z najvǐsjo prioriteto ali skupina
vhodnih vrat s paketki z najvǐsjo prioriteto. V tretji stopnji se izda dovoljenje za
dostop do izhodnih vrat. V primeru, ko ima več vhodnih vrat paketke z najvǐsjo
prioriteto, se dovoljenje določi po principu ”round-robin”. V primeru paketka z
viii Povzetek
oddajanjem več prejemnikom se dostop do izhodnih vrat dovoli le v primeru, ko
so vsa izhodna vrata iz seznama pripravljena na sprejem novega paketka.
Za preizkus delovanja in karakterizacijo izdelanega usmerjevalnika je bilo pri-
pravljeno testno okolje. Testno okolje sestoji iz simulacijskega okolja in labo-
ratorijskega okolja z izbrano strojno opremo. Simulacijsko okolje je bilo upora-
bljeno za validacijo delovanja usmerjevalnika pred sintezo in strojno implemen-
tacijo usmerjevalnika. Laboratorijsko okolje z izbrano strojno operemo; NI PXI
sistem kot vir in ponor SpaceWire paketkov, RTG4 Development Kit z FMC
razširitveno ploščico za SpaceWire kot strojna implementacija usmerjevalnika ter
osciloskop, je bilo uporabljeno za validacijo delovanja strojne implementacije Spa-
ceWire usmerjevalnika.
S pomočjo testnega okolja sta bila določena dva ključna parametra usmer-
jevalnika, usmerjevalna zakasnitev in prehodni čas paketka skozi usmerjevalnik.
Usmerjevalna zakasnitev določa koliko časa potrebuje usmerjevalnik, da določi
izhodna vrata paketka od časa, ko pakete pride skozi vhodna vrata. Prehodni
čas paketka skozi usmerjevalnik določa koliko časa potrebuje celoten paketek od
vhodnih do izhodnih vrat. Oziroma, z drugimi besedami določa zakasnitev pa-
ketka zaradi prehoda skozi usmerjevalnik. Simuliran je bil pretok paketkov iz
več različnih virov z enakimi in različnimi prioritetami z namenom, da se preveri
delovanje kontrolne logike.
Na podlagi izvedenih testov usmerjevalnik dosega zadovoljivo zmogljivost z
možnostjo hitrosti prenosa do 200 Mbps. Še več, izdelani usmerjevalnik poveča
zmogljivost dosedanjih satelitskih sistemov, olaǰsa načrtovanje kompleksneǰsih
omrežij in omogoča nove načine izdelovanja redundantnih sistemov z opcijo od-
dajanja več prejemnikom.
Ključne besede: SpaceWire, usmerjevalnik, omrežje, spojni vod, paketek,
RMAP, nastavitvena vrata, prioritetni razsodnik, oddajane več prejemnikom
Abstract
SpaceWire (SpW) is a communication standard used to connect high-speed data
rate sensor, processing units, memory units, and telemetry subsystem onboard
a spacecraft. It supports point-to-point and network connections. In order to
interconnect several network nodes within a network, SpW routers are needed.
Currently available devices support only two-level priority arbitration and do not
support multicast packet transmission at all. To overcome this, an intelligent
SpW router was designed in the course of this thesis. This router supports dy-
namic multilevel priority arbitration based on a maximum finder circuit and mul-
ticast packet transmission. The SpW router was implemented as a VHDL com-
ponent to have a flexible design and a possibility to include it in other projects.
The implemented SpW router has an RMAP supported configuration port for re-
motely configuring the router. A test setup was designed to characterize and test
the SpW router functionality. A routing delay and a packet router latency were
determined. Simultaneous packet traffic from different sources with the same and
different priority levels was emulated to demonstrate the arbitration functionality.
Based on performed tests, the implemented router achieved a satisfying perfor-
mance with capability of handling transmission rates up to 200 Mbps. Moreover,
the designed router increases payload system functionality, eases the design of
more complicated networks, and enables a new way to design redundant systems
with its multicast transmission support.
Key words: SpaceWire, router, network, links, packet, RMAP, configuration
port, priority arbiter, multicast
ix
x Abstract
Table of contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Task Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 SpaceWire standards and router theory 5
2.1 SpaceWire standard . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 SpaceWire protocol stack . . . . . . . . . . . . . . . . . . . 6
2.1.2 SpaceWire links . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 SpaceWire networks . . . . . . . . . . . . . . . . . . . . . 16
2.2 Remote Memory Access Protocol . . . . . . . . . . . . . . . . . . 21
2.2.1 RMAP operations . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 RMAP command and reply fields . . . . . . . . . . . . . . 24
2.2.3 Write command . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.4 Read Command . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.5 Read-Modify-Write Command . . . . . . . . . . . . . . . . 31
xi
xii Table of contents
2.3 Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.1 Switching Techniques . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Routing Algorithms . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 Arbiters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Design Implementation 45
3.1 Design overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Internal interconnect . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Configuration port . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Routing Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6 SpW router ports . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.7 Crossbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.8 Control logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Test setup 85
4.1 Test by simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Hardware test setup . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.1 Hardware capabilities of the PXI system test . . . . . . . . 90
4.2.2 Packet Router Latency test . . . . . . . . . . . . . . . . . 90
4.2.3 Control Logic Functionality test . . . . . . . . . . . . . . . 91
4.2.4 Multicast Packet Transmission test . . . . . . . . . . . . . 91
Table of contents xiii
4.2.5 Logical Address Deletion test . . . . . . . . . . . . . . . . 91
5 Evaluation and discussion of results 93
5.1 Initialization time . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Routing delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4 Hardware capabilities of the PXI system . . . . . . . . . . . . . . 103
5.5 Packet router latency . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.6 Control logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.7 Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.8 Logical Address Deletion . . . . . . . . . . . . . . . . . . . . . . . 109
6 Conclusion 111
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Bibliography 115
A Detailed state machine diagrams 121
B Detailed block diagrams 135
C Code snippets 139
C.1 Bit selector (multiplexer) component . . . . . . . . . . . . . . . . 139
C.2 Vector selector (multiplexer) component . . . . . . . . . . . . . . 139
xiv Table of contents
C.3 Is power of 2 function . . . . . . . . . . . . . . . . . . . . . . . . . 140
C.4 Binary Encoder component . . . . . . . . . . . . . . . . . . . . . 140
C.5 Ready signal - crossbar . . . . . . . . . . . . . . . . . . . . . . . . 141
D Register descriptions 143
D.1 Configuration port registers . . . . . . . . . . . . . . . . . . . . . 143
D.2 SpaceWire port registers . . . . . . . . . . . . . . . . . . . . . . . 144
E Routing Table entry detailed description 147
F RMAP conformance statement 151
F.1 RMAP write command . . . . . . . . . . . . . . . . . . . . . . . . 151
F.2 RMAP read command . . . . . . . . . . . . . . . . . . . . . . . . 151
F.3 RMAP read-modify-write command . . . . . . . . . . . . . . . . . 151
G Simulation Timing Diagrams 155
List of Figures
2.1 Example of a SpW architecture[1] . . . . . . . . . . . . . . . . . . 7
2.2 SpaceWire protocol stack . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 SpaceWire port architecture[2] . . . . . . . . . . . . . . . . . . . . 11
2.4 SpaceWire Data and Control characters and Control Codes[2] . . 13
2.5 SpW Parity Coverage[2] . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 SpW Packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Simplified SpW router . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Path Addressing example . . . . . . . . . . . . . . . . . . . . . . . 19
2.9 Logical Addressing example . . . . . . . . . . . . . . . . . . . . . 20
2.10 RMAP Write Command Format [11] . . . . . . . . . . . . . . . . 29
2.11 RMAP Write Reply Format [11] . . . . . . . . . . . . . . . . . . . 29
2.12 RMAP Read Command Format [11] . . . . . . . . . . . . . . . . . 30
2.13 RMAP Read Reply Format [11] . . . . . . . . . . . . . . . . . . . 31
2.14 RMAP Read-Write-Modify Command Format [11] . . . . . . . . . 32
2.15 RMAP Read-Modify-Write Reply Format [11] . . . . . . . . . . . 32
xv
xvi List of Figures
2.16 Generic router model [3] . . . . . . . . . . . . . . . . . . . . . . . 33
2.17 Synopsis of an arbiter . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.18 Two-Phase or Non-Return-to-Zero (NRZ) protocol . . . . . . . . . 39
2.19 Four-Phase or Return-to-Zero (RTZ) protocol . . . . . . . . . . . 40
2.20 Passing Round Robin token . . . . . . . . . . . . . . . . . . . . . 42
3.1 Proposed SpW router architecture . . . . . . . . . . . . . . . . . 46
3.2 APB state machine diagram [4] . . . . . . . . . . . . . . . . . . . 51
3.3 APB slave wrapper state machine for SRAM . . . . . . . . . . . . 54
3.4 Libero SmartDesign block diagram of SRAM as APB slave . . . . 55
3.5 Initializer APB state machine diagram . . . . . . . . . . . . . . . 57
3.6 Initializer transfer operation state machine diagram . . . . . . . . 58
3.7 Initializer initialization phases state machine diagram . . . . . . . 59
3.8 Configuration port block diagram . . . . . . . . . . . . . . . . . . 60
3.9 RMAP decoder - receive data/calculate CRC state machine . . . 61
3.10 RMAP decoder - bus access state machine . . . . . . . . . . . . . 63
3.11 RMAP decoder - transmit data/calculate CRC . . . . . . . . . . . 64
3.12 Libero SmartDesign block diagram of RMAP target device . . . . 65
3.13 SpW router port block diagram . . . . . . . . . . . . . . . . . . . 68
3.14 Control logic block diagram . . . . . . . . . . . . . . . . . . . . . 73
3.15 Grant signal array example . . . . . . . . . . . . . . . . . . . . . . 74
3.16 Priority Arbiter block diagram . . . . . . . . . . . . . . . . . . . . 76
List of Figures xvii
3.17 Round Robin arbiter topology . . . . . . . . . . . . . . . . . . . . 77
3.18 Arbitration of two packets with matching priority . . . . . . . . . 78
3.19 Arbitration of packets with matching priority . . . . . . . . . . . 79
3.20 Arbitration of two packets with different priorities . . . . . . . . . 80
3.21 Arbitration of multiple packets with different priorities . . . . . . 81
3.22 Arbitration of three packets with matching priority . . . . . . . . 82
3.23 Arbitration of three packets with matching priority . . . . . . . . 83
4.1 Hardware test setup block diagram . . . . . . . . . . . . . . . . . 87
4.2 Hardware test setup in a lab environment . . . . . . . . . . . . . . 87
4.3 Target to host VI data flow [5] . . . . . . . . . . . . . . . . . . . . 89
5.1 Initialization time - simulation result . . . . . . . . . . . . . . . . 93
5.2 Initialization time - scope measurement . . . . . . . . . . . . . . . 94
5.3 Routing delay - path addressed packets 1 . . . . . . . . . . . . . . 96
5.4 Routing delay - path addressed packets 2 . . . . . . . . . . . . . . 97
5.5 Routing delay - logically addressed packets 1 . . . . . . . . . . . . 99
5.6 Routing delay - logically addressed packets 2 . . . . . . . . . . . . 100
5.7 SpW router components LUT usage . . . . . . . . . . . . . . . . . 102
5.8 SpW router components Dff usage . . . . . . . . . . . . . . . . . . 103
5.9 SpW PXI RMAP card performance . . . . . . . . . . . . . . . . . 105
5.10 Router packet latency . . . . . . . . . . . . . . . . . . . . . . . . 106
xviii List of Figures
5.11 Control logic - Round-Robin arbitration . . . . . . . . . . . . . . 108
5.12 Control logic - Priority arbitration . . . . . . . . . . . . . . . . . . 109
5.13 Multicast packet transmission - simulation . . . . . . . . . . . . . 110
5.14 Logical address deletion . . . . . . . . . . . . . . . . . . . . . . . 110
A.1 Initializer APB state machine . . . . . . . . . . . . . . . . . . . . 122
A.2 Initializer APB transfer operation state machine . . . . . . . . . . 122
A.3 Initializer APB initialization phase state machine . . . . . . . . . 123
A.4 Receive data/calculate CRC state machine . . . . . . . . . . . . . 124
A.5 Transmit data/calculate CRC state machine . . . . . . . . . . . . 124
A.6 Internal bus access state machine . . . . . . . . . . . . . . . . . . 125
A.7 RMAP command decoder state machine . . . . . . . . . . . . . . 129
A.8 RMAP reply encoder state machine . . . . . . . . . . . . . . . . . 133
A.9 Packet Processor state machine . . . . . . . . . . . . . . . . . . . 134
B.1 Configuration port . . . . . . . . . . . . . . . . . . . . . . . . . . 136
B.2 SpW router port . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
B.3 SpW router ports with external components . . . . . . . . . . . . 138
G.1 APB bus simulation - normal . . . . . . . . . . . . . . . . . . . . 157
G.2 APB bus simulation - wait states . . . . . . . . . . . . . . . . . . 158
G.3 Crossbar switch - Ready in signal simulation . . . . . . . . . . . . 159
G.4 Control logic - high priority request simulation . . . . . . . . . . . 160
List of Tables
2.1 RMAP command codes [11]. . . . . . . . . . . . . . . . . . . . . . 25
2.2 Reply Address field size . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Example Reply Address field to Reply SpW Address mappings . . 26
2.4 Fixed priority arbiter grant signals . . . . . . . . . . . . . . . . . 41
3.1 Bus architecture comparison . . . . . . . . . . . . . . . . . . . . . 49
3.2 APB signals description [4] . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Input port to output port signal connections . . . . . . . . . . . . 72
3.4 Grant signal in case of a multicast packet . . . . . . . . . . . . . . 75
5.1 SpW synthesis results - resource usage . . . . . . . . . . . . . . . 101
F.1 RMAP write command characteristics . . . . . . . . . . . . . . . 152
F.2 RMAP read command characteristics . . . . . . . . . . . . . . . . 153
F.3 RMAP read-modify-write command characteristics . . . . . . . . 154
xix
xx List of Tables
Symbols
The following quantities and symbols are used in this thesis:
Quantity Unit
Name Symbol Name Symbol
data rate - bits per second bps
transmission rate - bits per second bps
routing delay tr second s
packet router latency Prl second s
packet length Plen - -
A more detailed explanation of the symbols and their indexes can be seen
from the appropriate figures or in the accompanying text where the symbol is
used.
xxi
xxii Symbols
Acronyms
AbT Array-Based Topology
AHB Advanced High-performance Bus
AMBA Advanced Microcontroller Bus Architecture
APB Advanced Peripheral Bus
API Application Programming Interface
ASIC Application Specific Integrated Circuit
AXI Advanced eXtensible Interface
BC Broadcast code
CCC Clock Conditioning Circuit
CODEC Coder/Decoder
CRC Cycle Redundancy Check
Dff D-FlipFlop
DMA Direct Memory Access
ECC Error Correction Codes
ECSS European Cooperation for Space Standardization
EEP Error End of Packet
EMC Electromagnetic Compatibility
xxiii
xxiv Acronyms
EOP End Of Packet
ESA European Space Agency
ESC Escape
FCT Flow Control Token
FIFO First In First Out
FPGA Field Programmable Gate Array
FSM Fianite State Machine
GPIO General Purpose Input/Output
HDL Hardware Description Language
IO Input/Output
IP Intelectual Property
JAXA Japan Aerospace Exploration Agency
JTAG Joint Test Action Group
LAN Local Area Network
LC Link Controller
LSB Least Significant Bit
LUT LookUp Table
LVDS Low-Voltage Differential Signalling
LVTTL Low-Voltage Transistor-Transistor Logic
MPE Max Planck Institute for Extraterrestrial Physics
MSB Most Significant Bit
NASA National Aeronautics and Space Administration
NoC Network on Chip
Acronyms xxv
NRZ Non-Return-to-Zero
OPB On-chip Peripheral Bus
PCB Printed Circuit Board
PLB Processor Local Bus
RMAP Remote Memory Access Protocol
ROM Read Only Memory
RTZ Return-to-Zero
RX Receive
SEU Single Event Upset
SoC System-On-Chip
SOP Start Of Packet
SPI Serial Peripheral Interface
SpW SpaceWire
SRAM Static Random-Access Memory
TX Transmit
UART Universal asynchronous receiver-transmitter
VCT Virtual Cut-Through
VHDL VHSIC Hardware Description Language
VHSIC Very High Speed Integrated Circuit
VI Virtual Instrument
xxvi Acronyms
1 Introduction
This chapter outlines the motivation for this thesis. It provides information
about currently available devices and points out some of their key features. It
lists requirements for the SpW router design which were derived from missing
features in current devices. Also, it presents the overall structure of the thesis.
1.1 Motivation
All spacecraft use some onboard communication network for transferring data
between different subsystems. One commonly chosen communication network is
the European Space Agency (ESA)’s SpW standard supporting both point-to-
point and network connections. For constructing a SpW network, one not only
needs network nodes but also routers, which route incoming packets to the target
output port.
ESA recommends directly supported devices. However, choices are limited.
This list includes two Application Specific Integrated Circuit (ASIC) routers, ten
port SpW router AT7910E, which is also available as Intelectual Property (IP)
core from STAR-Dundee and eighteen port GR718 SpW router. Both sup-
port two-level, high and low, priority arbitration for resolving output port con-
tention. GR718 also includes implementation of Serial Peripheral Interface (SPI)
and Universal asynchronous receiver-transmitter (UART) serial interfaces, and
General Purpose Input/Output (GPIO) and Joint Test Action Group (JTAG)
interfaces for easier configuration and status checking of the SpW router [6, 7].
The list expands if one looks at available commercial products. These include
1
2 Introduction
Flexible SpaceWire router with 2 to 32 ports from 4Links, 4-port SpaceWire
router UT200SpW4RTR with round-robin output arbitration from Cobham,
SpaceWire router from NEC Japan and an open-source six-port SpaceWire router
IP core with round-robin arbitration for output ports from Shimafuji Electric
[8, 9]. None of these routers support multicast packet transmission. Arbitration
schemes are fixed to two-level priority or round-robin arbitration. And ASIC
implementations lack the flexibility that is available with IP cores.
The motivation for this master thesis is driven from a shortage of some features
in previously mentioned routers. Also, the idea of having a customizable IP
core is welcoming since it can be adapted for a more specific target application.
Moreover, in house design allows for fast integration of new features.
1.2 Task Description
The goal of this master thesis is to design an intelligent SpW router which
could be used in future projects of the Max Planck Institute for Extraterres-
trial Physics (MPE). Here is a set of requirements. The SpW router should
be designed as soft IP core to achieve the desired flexibility. Therefore, VHSIC
Hardware Description Language (VHDL) shall be used to describe SpW router’s
architecture. The target device shall be an RTG4 Field Programmable Gate
Array (FPGA) since MPE uses this as their main choice for radiation-hardened
FPGA component in their electronics development for spacecraft. The SpW
router should be able to connect all input ports to all output ports. For achiev-
ing this, a non-blocking crossbar switch shall be implemented. Contention for the
same output port shall be resolved by priority arbitration. Ports of the router
must be able to communicate with network nodes using the SpW standard and
identify the packet fields. Consequently, SpW Coder/Decoder (CODEC) shall be
implemented in each port alongside a packet processor. The SpW router should
support both path addressing and logical addressing. Therefore, a routing table
shall be implemented. The SpW router also needs the availability to be remotely
configured. Thus, a configuration port with support for Remote Memory Access
Protocol (RMAP) shall be implemented.
1.3 Thesis structure 3
As can be seen, the router is composed of different components. Certain con-
sideration and tradeoffs must be taken into account when designing each com-
ponent. After design for a particular component is completed, each component
should undergo a set of tests to prove its functionality and correctness in opera-
tion.
In the scope of this thesis, some additional features shall be implemented.
These include support for multicast operation and multi-level priority arbitration.
Another aspect of this thesis is to prepare a test environment for testing the
capabilities of the router design.
1.3 Thesis structure
The thesis is divided into six chapters. The first chapter is an introduction, where
the motivation for this thesis is described, and a brief description of the targeted
task is provided.
The second chapter focuses on the SpW standard and switching theory. It pro-
vides an in-depth description of the SpW communication protocol and layers of its
operation. Followed by an explanation on how Remote Memory Access Protocol
works since it is essential for understanding the functionality of the configuration
port in the SpW router. This chapter also provides an in-depth description of
the most commonly used router architectures, routing algorithms, and essential
components. Furthermore, it gives a baseline for some design decisions described
in the next chapter.
The third chapter is all about the design implementation of the SpW router.
It outlines the thought process behind the design choices. It gives a walk through
the design process for individual components.
The fourth chapter outlines the test setup and applied test procedures that
were used to test and characterize the final design.
The fifth chapter depicts results from the previous chapters. It explains how
simulation results were used to derive a baseline for the hardware tests. It presents
4 Introduction
how typical router characterization parameters were measured. Furthermore, it
provides an evaluation of hardware measurements that were taken to prove the
functionality of the SpW router.
The sixth chapter provides a summary of achieved goals and objectively critic-
sizes the final design. The outlook shows possible improvements in the design and
suggests additional features and functionality to be implemented in the future
iterations of the SpW router design. Furthermore, it provides some examples
where part of the implementation can be used in other non-router related designs.
2 SpaceWire standards and router
theory
This chapter describes the SpW standard. It explains the SpW protocol stack
and its layers, SpW links, and SpW networks. It describes the RMAP protocol
and supported commands. It gives an overview of a generic router architecture,
possible switching techniques, routing algorithms, and arbiters.
2.1 SpaceWire standard
The SpaceWire standard is a standard for high-speed links and networks for
use onboard spacecraft, easing the interconnection of sensors, mass-memories,
processing units, and downlink telemetry sub-systems. SpaceWire was developed
under ESA in the late ’90s and formally standardized by European Cooperation
for Space Standardization (ECSS) to provide space users with directly applicable
specifications. It is mainly used by ESA and other agencies, namely by National
Aeronautics and Space Administration (NASA), Japan Aerospace Exploration
Agency (JAXA), and Roscosmos. However, it has also been adopted by other
agencies and commercial companies. It is used for a variety of missions be it
scientific, Earth observation, commercial or other spacecraft. Some of the high-
profile missions that used SpW for onboard communications are Gaia, ExoMars
rover, James Webb Space Telescope, GOES-R, Lunar Reconnaissance Orbiter,
Astro-H.
SpW links provide serial, high-speed(2 Mbps to 200 Mbps), bi-directional, full-
duplex communication between two devices. SpW links can be combined with
5
6 SpaceWire standards and router theory
routers to form data-handling networks targeting specific applications. Within a
SpW network, the processing nodes are connected through low-error rate, low-
footprint, low-cost, low-latency, full-duplex, point-to-point links, and wormhole
switches. The objective of the SpW standard is:
 to facilitate the construction of high-performance onboard data-handling
systems,
 to help reduce system integration cost,
 to facilitate compatibility between data-handling sub-systems,
 to encourage re-use of data handling equipment across several different mis-
sions [10].
Use of the SpW standard ensures that equipment is compatible at both the
component and sub-system levels. Processing units, mass-memory units, and
downlink telemetry systems using SpW interface developed for one mission can
be readily used on another mission. This:
 reduces the cost of development (Cheaper),
 reduces development timescales (Faster),
 improves reliability (Better),
 increases the amount of scientific work that can be achieved within a limited
budget (More) [1].
SpaceWire can support many different payload processing architectures us-
ing point-to-point links and SpW routers. An architecture can be tuned to the
requirements of specific missions. An example of a SpW architecture is shown
in Figure 2.1. It uses two SpW routers to provide the interconnectivity between
instruments, memory and processing units. Detailed description can be found
in [1].
2.1.1 SpaceWire protocol stack
In the next sections, a more detailed description of the SpaceWire standard fol-
lows. These sections provide description of the SpaceWire protocol stack, explain
2.1 SpaceWire standard 7
Figure 2.1: Example of a SpW architecture[1]
the functionality of SpW links and point out on which layers of the protocol stack
they operate, and portray how SpW networks can be constructed and which com-
ponents are needed for that.
SpaceWire as communication standard has a defined protocol stack. The SpW
protocol stack is composed of a Network layer, Data Link layer, Encoding layer,
Physical layer, and Management information base as can be seen in Figure 2.2.
Network layer
The SpW Network layer provides three essential services[2]:
8 SpaceWire standards and router theory
Figure 2.2: SpaceWire protocol stack
 A packet service which sends and receives packets over a SpW network.
 A time-code service which sends and receives time-codes over a SpW net-
work.
 A distributed interrupt service which sends and receives distributed inter-
rupts over a SpW Network.
This means it is responsible for transferring SpW packets, time-codes and dis-
tributed interrupts over a SpW network. Since it is the top layer of the stack, it
can also handle requests from user applications and issue before mentioned trans-
fer operations based on these requests. A SpW implementation always supports
the packet service; however, time-code and distributed interrupts services can be
optionally implemented based on application needs.
Network layer covers SpW packets, nodes, routers, networks, time-code broad-
casting, and distributed interrupt operation. Some of the aspects of the network
layer, which are closely related to the topic of this thesis, are described in more
detail in section 2.1.3, for example, the structure of a SpW network and how
2.1 SpaceWire standard 9
packets are transferred from a source to a destination node across a network.
Data Link layer
The SpW Data Link layer provides two services[2]:
 An N-Char service which sends and receives N-chars (the components of
packets) over a SpW link.
 A broadcast code service which sends and receives broadcast codes (time-
codes and distributed interrupt codes) over a SpW link.
And it is responsible for link initialization, for flow control, for sending and receiv-
ing N-chars, for sending and receiving broadcast codes, for link error detection
and link error recovery. It also handles request form the Network layer.
Encoding layer
The SpW Encoding layer provides two services[2]:
 A character encoding service which encodes characters and control codes
into symbols, serialize those symbols and data-strobe encode them ready
for transmission over the SpW Physical layer.
 A character decoding service which recovers the data bit stream from the
data-strobe signals received from the Physical layer, de-serialize that bit-
stream, and decode the resulting symbols into characters and control codes.
It handles request form the Data Link layer. It describes data and control char-
acters used to manage the flow of data across a SpW link.
Physical layer
The SpW Physical layer provides two services[2]:
 A transmit service which transmits the data and strobe signals from the
encoding layer over a physical medium.
10 SpaceWire standards and router theory
 A receive service which receives the data and strobe signals from the phys-
ical medium and passes them to the encoding layer.
It is responsible for transmitting and receiving the data and strobe signals over
Printed Circuit Board (PCB) tracks, connectors, and cable assemblies. It defines
the type of connectors to be used, cables, cable assemblies, and specifications for
the PCB tracks. It also defines the signal standard to be used covering signal
encoding, voltage levels, noise margins, and data signaling rates used in SpW.
Management Information Base
The SpW Management Information Base provides two services[2]:
 A set parameter service which writes control or configuration information
to the other SpW layers.
 A get status service which reads the status or current configuration or con-
trol values of the other SpW layers.
The Management Information Base has direct access to relevant configuration pa-
rameters, control parameters, and status parameters in all layers of the SpaceWire
protocol stack. It is directly accessible from the user application which can start
or disable SpW link or configure it to auto-start. It also allows the user applica-
tion to configure the transmission rate and check the status of the link to confirm
if any errors have occurred.
2.1.2 SpaceWire links
SpaceWire links are point-to-point data links that provide the means of connect-
ing two nodes (e.g., processors, instruments, memory, etc.), a node and a router
port, or two router ports. SpW links allow data transmission in both ways at the
same time. Each link is a full-duplex, bi-directional, serial data link which can
operate at data-rates of between 2 Mbps and 200 Mbps.
SpaceWire link is an expression used to describe two components. The first
one being SpW port sometimes also called SpW interface which operates on
2.1 SpaceWire standard 11
Physical, Encoding and Data layer of the SpW protocol stack. And the second
one being a SpW cable which exclusively represents the operation of the Physical
layer and its responsible for transferring voltage signals across longer distances.
Figure 2.3 depicts the SpW router port architecture.
Figure 2.3: SpaceWire port architecture[2]
SpW was developed to meet the Electromagnetic Compatibility (EMC) spec-
ifications of a typical spacecraft. As mentioned in section 2.1.1, Physical layer
covers the specification of PCB tracks, cables, cable assemblies and defines a sig-
nal standard to be used to transmit and receive serial bit stream over a SpW link.
For the signal levels LVDS standard is used which provides adequate noise mar-
gin to enable the use of low voltages in practical systems. Moreover, lower power
consumption can be achieved due to small voltage swings resulting in shorter
12 SpaceWire standards and router theory
signal transition times. The SpW port sends information as serial bit stream via
two signal lines, data, and strobe in each direction. These signals are driven by
LVDS drivers and received by LVDS receivers, this means two wires per signal,
and four screened twisted-pairs altogether. For interfacing with the SpW ca-
ble micro-miniature, D-Type connectors are used. Where pin three is connected
to circuit ground. Other connectors may be used provided they have connection
pairs with differential impedance of 100 Ohm. Described components can be seen
in the lower part of Figure 2.3 enclosed by a dashed rectangle named Physical
layer.
The functionality of the Encoding layer is achieved with a transmitter encoder
and a receiver decoder. They are responsible for encoding/decoding of characters
into symbols, serializing/de-serializing encoded symbols into a bit-stream, and
data-strobe encoding/decoding of a serial bitstream. SpW employs three different
character groups data characters, control characters, and control codes, as is
depicted in Figure 2.4.
There is only one type of data character. This character is used to transfer
eight-bits of packet data Least Significant Bit (LSB) first over a SpW link. Data
character is encoded into ten-bits resulting in data symbol containing a parity
bit, data-control flag, and eight bits of data. The data-control flag is set to zero
to indicate that this is a data character. Arrow by the data symbol in Figure 2.4
indicates the order in which the bits will be serialized.
Control character group has four control characters. They are encoded into
four bits resulting in control symbol containing a parity bit, data-control flag and
two bits determining a type of a control character. The data-control flag is set
to one to indicate that these are control characters. The first control character
is a Flow Control Token (FCT) identified by type bits set to 0b00. It is used by
the Data Link layer for managing the flow of N-Chars or Normal characters over
a SpW link. Normal characters are defined as characters and control codes used
for sending packets, which are data characters and end of packet markers. The
second control character is normal End Of Packet (EOP) identified by type bits
set to 0b01 and its one of the two end of packet markers. It is passed to Data
Link layer to indicate the end of an error-free packet. The third control character
is an Error End of Packet (EEP) identified by type bits set to 0b10, and it is the
2.1 SpaceWire standard 13
Figure 2.4: SpaceWire Data and Control characters and Control Codes[2]
14 SpaceWire standards and router theory
second end of packet marker. It is used in the Data Link layer to terminate a
packet prematurely at the point where an error occurred indicating that an error
occurred while the packet was being transferred. The last control character is
Escape (ESC) identified by type bits set to 0b11. It is exclusively used to form
control codes.
There are two types of control codes. NULL and Broadcast code (BC). NULL
control code is an ESC control character followed by a FCT with appropriately set
parity bit. It is used by the Data Link layer to maintain an active state of the link
when no traffic is present on the link and to support link disconnect detection.
Broadcast code is an ESC followed by a single data character with parity set
as described in the following paragraph. It is used to transfer time-codes and
distributed interrupt codes over a SpW link.
The parity coverage for SpW is a bit unusual as it follows that of IEEE 1355-
1995. Because of the different lengths of control and data characters, the parity
field of the previous character includes the data-control flag of the next character.
This is to ensure that the length of the following character is validated by the
parity bit before that character is decoded. Avoiding incorrect decoding of a
character when its data-control flag is in error [1]. The parity bit is set to produce
odd parity so that a total number of ones in the field covered is an odd number.
Visualization of the parity coverage and an example of the parity coverage for a
data character followed by a NULL character is depicted in Figure 2.5.
Described characters and control codes have a fixed transmission priority.
They are prioritized as follows:
 Time-codes – highest priority
 FCTs
 N-Chars
 NULLs – lowest priority
Time-codes have the highest priority because they are used to transfer time
or carry synchronization information and have to be delivered with low jitter.
FCTs have higher priority than N-chars to ensure adequate flow control even in
the case of a large amount of data being sent. NULLs have the lowest priority
2.1 SpaceWire standard 15
Figure 2.5: SpW Parity Coverage[2]
since they are only used to maintain an active link status in case there is nothing
else to send.
SpaceWire port provides a simple mechanism for starting a link, keeping the
link running, sending data over the link, ensuring that data is not sent if the
receiver at the other end is not ready for it, and for recovering from any error on
the link. All this is handled by the link state-machine in the SpW interface and
is transparent to the user application.
When the SpW link is initialized and running, it is ready to receive and
transmit packets. When a packet is scheduled for transmission, it is passed to
Transmit (TX) First In First Out (FIFO) buffer character by character. Received
packets are stored in Receive (RX) FIFO buffer. Where only N-chars (data, EOP,
and EEP) are stored and passed to a higher layer, the Network layer.
At this point, no detailed explanation will be provided on how SpW links
get initialized and perform error recovery since this is done by the CODEC state
machine which is transparent for the user application. For now, it is enough that
one knows that links can be started, disabled or configured for auto-start via
Management Information base. For more information about this topic one may
take a look in [1, 2].
When any communication protocol is described, one must think about syn-
16 SpaceWire standards and router theory
chronization between the source and sink nodes. Bit synchronization is achieved
by sending a clock with data. The clock signal is encoded intro strobe signal
in such a way that XORing data and strobe signal recovers the clock. Encod-
ing the clock reduces the maximum clock to data skew requirements. Character
synchronization is performed only once on link start. If it is lost, it will be de-
tected as a parity error, and the link will be restarted to recover the character
synchronization.
2.1.3 SpaceWire networks
SpaceWire networks are constructed using SpW point-to-point links and routers.
Routers connect many links and provide the means of routing packets from one
port to any other port. Since links are bi-directional, each link interface can be
seen as input and output port. The network node can be either a source or a sink
device or both. Two main components of a SpW router are a switch matrix and
SpW link interfaces. The switch matrix is used to connect any input port to any
output port. It is usually implemented as non-blocking NxN crossbar switch. The
packet is routed through the switch based on its address. The address is provided
as the first data character in the packet also called a header. Appropriate ports
are connected when the output port is not busy. If two input ports compete for
the same output port which gets to send first is determined using arbitration.
Network Node
A device is considered to be a network node if it is connected to a SpW network
via a SpW link and can transmit and/or receive SpW packets. For example, a
network node can be an instrument, a mass-memory, or a processor. These nodes
can have multiple SpW links connected to them and be identified by multiple
logical addresses.
SpaceWire packet
In the SpaceWire standard information is transferred with packets. Packets can
be sent in both directions if there is space in the receiver to receive the incoming
packet. As shown in Figure 2.6 a SpW packet has three parts. Packet header,
2.1 SpaceWire standard 17
Figure 2.6: SpW Packet
cargo, and an End Of Packet. The header of the packet contains information
about how the packet should traverse through the network to reach the desti-
nation node. It can contain the destination node’s address or the path which
needs to be taken to reach the destination node. The header can be omitted in
point-to-point communication. The cargo section of the packet contains the ac-
tual data to be transmitted; it can be of an arbitrary length. The EOP indicates
the end of the packet. The character following an EOP is applicable as the start
of a new packet. Packets are simple with little overhead capable of carrying a
range of user-defined protocols.
SpaceWire router
The purpose of the router is to connect multiple nodes and to route packets from
any input port to any output port based on the first character of the packet.
Figure 2.7 depicts a very simplified SpW router. Structurally it is composed of a
switch matrix and physical ports. The switch matrix is responsible for physical
connections between ports and is usually implemented as a non-blocking crossbar
switch. Physical ports can be either SpW router ports or so-called FIFO ports. As
described in section 2.1.2, SpW links can receive and transmit at the same time.
Therefore, one instance of a SpW port or interface can be viewed as an input
and output port in one. FIFO ports are used for applications local to the router.
They are also called external (to the network) or parallel (as opposed to serial
SpW link) port. In the router, the packets are routed based on their leading data
character (header). For routing purposes, a routing table can be implemented.
Each entry holds the necessary information to route the incoming packet based
on its header value. Whenever two or more arriving packets have the same output
port as destination arbitration is needed to resolve the output port contention. A
fair arbitration scheme should be implemented. A non-fair arbitration algorithm
can cause starvation on some ports. A SpW router is implemented as a wormhole
architecture router.
18 SpaceWire standards and router theory
Figure 2.7: Simplified SpW router
Node addressing
There are two forms of addressing in the SpW network: path addressing and
logical addressing. When using path addressing packet header contains a list
of instructions – data characters, which contain the output port numbers for
each router encountered from source to destination node – or in other words a
path that packet needs to traverse to reach the destination. At each router, the
first data character (header) is checked to determine the output port and then
deleted to expose the next data character (header) for use in the next router.
Routers can have up to 31 physical or also called external ports (numbers 1 to
31) and one internal configuration port (number 0). The Figure 2.8 shows how
the path address at the start of the packets is modified as the packet traverses the
network. Since a router can have a maximum of 31 ports along with an internal
configuration port, each data character forming the path address is in range 0 to
31.
The second form of addressing in SpW networks is logical addressing. In this
case, each destination is given an identifier a number between 32 and 255. To
successfully route packets through the network, a routing table needs to be added
to every router. Each entry in the routing table corresponds to one logical address
2.1 SpaceWire standard 19
Figure 2.8: Path Addressing example
and contains the necessary information to which output port the packet should
be routed.
Leading data character is now set to the required destination identifier. At
each router, the leading data character is used to lookup an appropriate direc-
tion from the routing table. Leading data character is not discarded since it is
needed for future routing decisions. The use of logical addressing is illustrated in
Figure 2.9. Logical addressing uses just a single data character, which holds the
value in the range 32 to 255 to identify the destination. This is so that it does
not get confused with path addressing.
When comparing two methods for addressing in SpW networks, one can see
that for logical addressing, only one data character is needed for the address, but
one needs to add routing tables to routers. Whereas, using path addressing one
20 SpaceWire standards and router theory
Figure 2.9: Logical Addressing example
needs to specify the whole path of a packet, but can omit routing tables.
There exists a third addressing scheme called regional logical addressing which
requires a router capable of deleting logical address characters. A SpW network
is divided into regions, each region having its own logical address. This allows the
user application to reuse logical addresses already used in other regions, making
it very useful when designing a large SpW network architecture. Whether the
logical address character is deleted or not is usually determined by a specific
control bit set in the routing table entry.
Additional features
Routing table entries usually map a single logical address to a single output port.
However, for some logical address, multiple output ports can be assigned. This
is called group adaptive routing, and if enabled, it allows packets to be routed
through any port specified in the group. Whenever a packet arrives at a router,
the control logic checks if the first output port in the group is busy or not. If
the port is available, the packet is routed through that port. However, when the
port is busy, the control logic moves to the next port in the group and continues
so until it finds an available port from the group. This can be used to implement
redundant paths to the same destination node. In case one of the paths to the
2.2 Remote Memory Access Protocol 21
destination node breaks the packets would still be able to reach the destination.
Group adaptive routing can also be used as a quality of service feature allowing
packets to traverse a router even in the case when the primary output port (e.g.,
the first one in the list) is busy.
As of new SpW standard multicasting is supported [2]. It allows a source
node to send a single packet to multiple destinations. A new field is added to the
routing table containing a multicast set for a particular logical address. Multicast
set defines through which output ports should the packet be routed. The packet
is routed only when all of the targeted output ports are ready to receive a new
packet.
2.2 Remote Memory Access Protocol
Remote Memory Access Protocol (RMAP) is a standardized SpW protocol which
supports read and write operations to memory, registers, FIFO memory, mail-
boxes, etc, in a remote SpW node. It is defined by ECSS-E-ST-50-52C[11] stan-
dard and is designed for variety of SpW applications. However, its primary
purposes are to configure a SpW network, to control SpW nodes, and to gather
data and information of those nodes.
In a SpW node input/output registers, control/status registers and FIFOs are
memory-mapped, which means they can be accessed as memory and all standard
memory operations can be performed.
RMAP can be used to configure SpW routers, set their operating parameters,
and to modify routing table entries. It can be used to monitor the status of those
routers. RMAP can also be used to configure and read the state of SpW nodes.
For example, change transmission rate and enable auto-start mode.
2.2.1 RMAP operations
All read and write operations defined in RMAP protocol are posted operations,
i.e., the initiator does not wait for a reply to be received. This means that
22 SpaceWire standards and router theory
many read and write commands can be outstanding at any time. There is no
timeout mechanism implemented in RMAP for missing replies. If a reply timeout
mechanism is used, it is implemented in the initiator user application [11].
Write commands
The write command provides a means for one node, the initiator, to write zero
or more bytes of data into a specified area of memory in another node, the target
on a SpW network.
Write commands can request acknowledgment from the target node when a
write operation is performed. This informs the initiator whether the command
was received successfully or not. A reply packet containing error code is sent
back to the initiator. But only if the correct header was received since it holds
the reply address information. In the case of no acknowledgment, the error code
is stored in one of the status registers of the target node and can be later checked
by the initiator if any errors have occurred.
Another option specified by the write command is whether data should be
verified or not before the execution of the command. Verification is only viable
for smaller write operations since data needs to be buffered and buffer space in
the target node is usually limited. It is used when writing data to critical memory
locations such as configuration and control register which affect the functionality
of the target node. A large amount of data can be directly written to a specified
memory location without verification and can be checked after the write operation
has completed.
Based on described acknowledged/non-acknowledged and verified/non-
verified options for a write command, four different write operations are possible:
 Write non-acknowledged, non-verified - writes zero or more bytes to a
specified memory location in a target. Before data is written, the header is
checked using a Cycle Redundancy Check (CRC) however, data is not. No
reply is sent to the initiator. This type of write operation is used for writing
a large amount of data to target where it can be safely assumed that the
write operation completed successfully. For example, writing camera data
2.2 Remote Memory Access Protocol 23
to a temporary buffer.
 Write non-acknowledged, verified - writes zero or more bytes to a
specified memory location in a target. Before data is written the header
and data are checked using CRCs. No reply is sent to the initiator. Limited
amount of data can be written in a single write operation due to limited
buffer space in the target. However, it is improbable that erroneous data
would be written. It is typically used when writing to control registers and
a small amount of data to a target where it can be safely assumed that write
operation completed successfully. For example, many write commands to
different control registers. After operations have been completed, check the
status register for possible errors.
 Write acknowledged, non-verified - write zero or more bytes to a spec-
ified memory location in a target. Before data is written, the header is
checked using a CRC however, data is not. A reply packet is sent back to
the initiator to indicate the status of the commend. However, only when
the header was received intact. Used for writing a large amount of data
to a target where it can be safely assumed that write operation completed
successfully, but acknowledgment is required. For example, writing sensor
data.
 Write acknowledged, verified - write zero or more bytes to a specified
memory location in a target. Before data is written the header and data is
checked using CRCs. Limited amount of data can be written per single write
operation due to limited buffer space in the target. Used for writing a small
amount of data to a target where it is important to receive a confirmation of
a successful write operation. For example, writing to configuration registers.
Read commands
Read commands are used for reading zero or more bytes from memory in a target.
Data is returned in a reply packet, normally to the initiator.
Read-Modify-Write commands
The read-modify-write command reads data from a specified memory location in
a target, modifies it and writes it back to the same address. Modification is done
locally in the target and not by initiator performing two separate read and write
24 SpaceWire standards and router theory
commands. Nevertheless, the original data is returned in a reply packet to the
initiator.
2.2.2 RMAP command and reply fields
Target SpW address field holds data characters determining the path address
of a target if path addressing or regional addressing is used to address the target.
It should be omitted when only logical addressing is used.
Target Logical Address field holds the logical address of a target if
logical addressing is used to address the target. When the path address-
ing is used, this field can be used to confirm that the packet arrived at
the correct destination. In case a target does not have a defined logical
address, this field should be set to the default logical address 0xFE. Then it
is on the target whether it will accept or reject packets with logical address 0xFE.
Protocol Identifier field links the packet with a specific SpW protocol
being used for communication. For the RMAP protocol, the protocol identifier
has the value 0x01. Therefore, when sending RMAP packets to a target, this
field should be set to 0x01. If the RMAP target receives a packet with a different
protocol identifier, it will discard the packet.
Instruction field defines a packet type, command to be executed, and re-
ply address length. First two bits determine the type of an RMAP packet i.e.
command (0b01) or reply(0b00). Other two combinations (0b10 and 0b11) are
reserved and should not be used. Four-bit command section of the instruction
field encodes the type of a command to be executed when the packet type is set
to command or holds the information on which command caused a reply in case
of a reply packet. All possible commands are listed in Table 2.1. Last two bits
determine the number of bytes in the reply address field.
Key field provides a one-byte key which must be matched by the target
user application if an RMAP command is to be accepted. It should be used only
for command authorization and nothing else.
2.2 Remote Memory Access Protocol 25
Table 2.1: RMAP command codes [11].
Bit 5 Bit 4 Bit 3 Bit 2 Command Field
Write /
Read
Verify
Data
Before
Write
Ack Increment
Address
Function
0 0 0 0 Not used
0 0 0 1 Not used
0 0 1 0 Read single address
0 0 1 1 Read incrementing address
0 1 0 0 Not used
0 1 0 1 Not used
0 1 1 0 Not used
0 1 1 1 Read-Modfiy-Write incrementing
address
1 0 0 0 Write, single address, don’t verify
before writing, no acknowledge
1 0 0 1 Write, incrementing address, don’t
verify before writing, no
acknowledge
1 0 1 0 Write, single address, don’t verify
before writing, send acknowledge
1 0 1 1 Write, incrementing address,
dont’ verify before writing, send
acknowledge
1 1 0 0 Write, single address, verify before
writing, no acknowledge
1 1 0 1 Write, incrementing address,
verify before writing, no
acknowledge
1 1 1 0 Write, single address, verify before
writing, send acknowledge
1 1 1 1 Write, incrementing address,
verify before writing, send
acknowledge
26 SpaceWire standards and router theory
Reply Address filed provides a path address of an initiator when path
addressing is used for a reply packet. The number of bytes in the reply address
is determined by Reply Address Length bits in the instruction field. Tabel 2.2
lists the possible sizes of the Reply Address field. The size is always a multiple of
four; therefore, whenever a reply address length is not a multiple of four, the rest
of bytes should be set to zero. Table 2.3 lists some examples of how the reply
address should be constructed and depicts the resulting reply addresses used
when sending a reply back to the initiator. This field should be omitted when
a single logical address is being used for routing the reply backt to the initiator.
Table 2.2: Reply Address field size
Value of Reply Address Length Field Size of Reply Address field
0b00 0
0b01 4 bytes
0b10 8 bytes
0b11 12 bytes
Table 2.3: Example Reply Address field to Reply SpW Address mappings
Reply Address Field Resulting Reply SpW Address
0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x03 0x04 0x03 0x04
0x00 0x01 0x04 0x00 0x01 0x04 0x00
0x00 0x00 0x00 0x01 0x01 0x02 0x03 0x04 0x05
0x02 0x03 0x04 0x05
0x00 0x00 0x77 0x06 0x77 0x06
0x00 0x62 0x09 0x00 0x62 0x09 0x00
Initiator Logical Address field represent the logical address of the ini-
tiator or is set to 0xFE if logical addressing is not used.
Transaction Identifier field is a 16-bit value that can be used to uniquely
identify and match a reply packet with a command that caused it. The
2.2 Remote Memory Access Protocol 27
initiator of the command gives a command a unique transaction identity. This
transaction identifier is then returned to the initiator in the reply. Typically, it
is an incrementing integer sequence that increments whenever a new command
is sent.
Extended Address field is used to extend the 32-bit memory address to
40-bits allowing a 1 Terabyte address space to be accessed directly in each node.
When a target does not support the 40-bit address space, this byte should be set
to zero.
Address field contains bottom 32-bits of the memory address.
Data Length field specifies the amount of data to be written or read
from memory, starting from the address provided in the Address field.
Header CRC field holds the calculated 8-bit CRC value by the initiator
covering fields from Target Logical Address down to and including Data Length
field. It is used by the target to confirm that the header is correct before
executing the command.
Data filed has a variable number of bytes based on the Data Length field. It
holds data to be written to a memory location in case of a write command
or data that was read from memory in a read reply, or read and written in a
read-modify-write command and reply.
Mask field is used in read-modify-write command and defines which bits
should be updated in the specified memory location.
Data CRC field holds the calculated 8-bit CRC value by the initiator
covering Data and Mask fields. It is used by the target to confirm that data
is correct before being written in a verified write command or was correctly
transferred in a non-verified write command or a read reply.
Reply SpW address field in a reply holds a value of the path address
of the initiator if it was provided in the Reply Address field of the command.
28 SpaceWire standards and router theory
Status filed contains the status/error code.
2.2.3 Write command
Figure 2.10 depicts the format of an RMAP write command to be used when one
wants to write data to a remote SpW node and a detailed view of the instruction
field. Each field should be set to an appropriate value as described in section 2.2.2.
It is important to understand the functionality of the increment / no increment
address bit in the instruction field. If this bit is set (1) it causes the address to be
incremented for every byte (or word as determined by the destination unit) writ-
ten to target node so that the bytes are written to consecutive memory locations.
If the bit is not set (0), the address does not change, and each consecutive byte
is written to the same memory location. Note that memory width is determined
by the target, and it can be any multiple of 8-bits. For example, if the target
node has a memory with a width of 32-bits four bytes from the data field in
the write command will be written to the same address. Normally the memory
address would be aligned on a 32-bit boundary when doing 32-bit writes. Fields
with turquoise color in Figure 2.10 can be omitted. For example, if only logical
addressing is being used, one does not need to provide the target’s SpW address
(i.e., path address) and the replay address since the packets will be routed based
on the logical address. Also, one can choose to send zero bytes of data, the pur-
pose of such a command (with acknowledge bit set) is to check if the target can
be reached via the network.
Figure 2.11 depicts the format of an RMAP write reply to be used by the
target when write command request an acknowledgment and a detailed view of
the instruction field. First two bits of the instruction field are set to 0b00 to
indicate a reply packet type. All other bits are copied from the command that
caused the reply.
2.2 Remote Memory Access Protocol 29
Figure 2.10: RMAP Write Command Format [11]
Figure 2.11: RMAP Write Reply Format [11]
30 SpaceWire standards and router theory
2.2.4 Read Command
Figure 2.12 depicts the format of an RMAP read command to be used when
one wants to read data from a remote SpW node and a detailed view of the
instruction field. Each field should be set to an appropriate value as described
in section 2.2.2. Most of the bits in the instruction field have a predefined value
since there are only two possible read operations defined by the standard. One
can choose between the two by changing the value of the increment / no increment
address bit. If set (1) the read memory address is incremented after each byte
(or word as determined by the destination unit) has been read so that bytes are
read from consecutive memory locations. If not set (0) bytes are read from the
same memory location.
Figure 2.12: RMAP Read Command Format [11]
Figure 2.13 depicts the format of an RMAP read reply to be used by the
target when a read command is issued by the initiator and a detailed view of the
instruction field. First two bits of the instruction field are set to 0b00 to indicate
a reply packet type. All other bits are copied from the command that caused the
reply. The reply contains read data from the target.
2.2 Remote Memory Access Protocol 31
Figure 2.13: RMAP Read Reply Format [11]
2.2.5 Read-Modify-Write Command
Figure 2.14 depicts the format of an RMAP read-modify-write command to be
used when one wants to update a particular memory location in a remote SpW
node and a detailed view of the instruction field. Each field should be set to
an appropriate value as described in section 2.2.2. Read-Modify-Write command
has a little bit different format than a normal write command. The Data Length
field is limited to only a few values, those being 0x00, 0x02, 0x04, 0x06, 0x08
for the least significant byte and 0x00 for upper two bytes. This results in a
maximum data length of 8 bytes. If any other length is specified a reply with an
appropriate error code is sent back to the initiator. Even numbers are defined for
the data length because for each byte one wants to modify a corresponding mask
byte needs to be sent indicating which bits are to be changed.
Figure 2.15 depicts the format of an RMAP read-modify-write reply to be
used by the target when a read-modify-write command is issued by the initiator
and a detailed view of the instruction field. First two bits of the instruction field
are set to 0b00 to indicate a reply packet type. All other bits are copied from
the command that caused the reply. The reply contains up to four bytes of the
original data before it was modified.
32 SpaceWire standards and router theory
Figure 2.14: RMAP Read-Write-Modify Command Format [11]
Figure 2.15: RMAP Read-Modify-Write Reply Format [11]
2.3 Routers 33
2.3 Routers
Routers are devices used to connect multiple network nodes forming a communi-
cation network. They can resolve competition for output ports by employing an
arbitration scheme which is executed by an arbiter. They provide the means for
implementing a routing algorithm.
Generic router [3] is shown in Figure 2.16. The router is composed of these
major components:
 FIFO buffers. They are used for storing messages in transit. They are
present in input and output ports. In some alternate, designs only input or
output buffers are used.
 Switch. Connects inputs to outputs. In high-speed implementations where
full connectivity is desired, a crossbar is used. Whereas lower-speed imple-
mentations may utilize networks without full connectivity.
 Routing and arbitration unit. It executes the routing algorithm, selects
the appropriate output port for an incoming message and accordingly sets
the switch.
 Link Controllers (LCs). They are used for controlling message flow over
a physical channel.
Figure 2.16: Generic router model [3]
34 SpaceWire standards and router theory
Router performance is defined by two parameters. When a message arrives
at the router, its routing information must be extracted to determine the output
port through which the message is to be routed. This is known as Routing delay.
The second parameter is Internal flow control latency; it is defined as a rate at
which the message traverses through the switch. This rate is determined by the
propagation delay through the switch (message router delay) and the signaling
rate for synchronizing the transfer of data between the input and output buffers[3].
2.3.1 Switching Techniques
SpaceWire standard was inspired by the IEEE 1355 standard for Heterogeneous
interconnect which can be used for parallel system construction. Therefore, SpW
networks can relate to multiprocessor networks and nowadays to Network on
Chips (NoCs). This means some implementations from mentioned networks can
be adapted and used in SpW networks.
Switching techniques in multiprocessor networks firstly followed those from
local and wide area communication networks, e.g., circuit switching and packet
switching. However, with increasing computation demand switching techniques
borrowed from Local Area Network (LAN) soon became a bottleneck and a limit-
ing factor for the multiprocessor system. Their low-latency demand for executing
parallel programs drove the evolution of new switching techniques that are better
suited for this type of applications.
Switching techniques differ in the relationship between the size of the physical
and message control unit. In general, a message can be split into fixed-length
parts called packets. Packets can be further broken down into flow control units
called flits. As physical channel width is limited, multiple physical channels may
be used to transfer a single flit. A flit is then split into phits which are units
of information that can be sent over a physical channel in one cycle. Flit is
a logical representation of a piece of information whereas phit determines the
physical quantity the number of bits transferred in parallel in a single cycle.
Transfer operations between routers are necessarily constructed in terms of phits.
However, switching techniques may operate with flits (which could be defined to
be the complete message packet, e.g., in packet switching).
2.3 Routers 35
In the SpW standard phit size is 1-bit since SpW links use serial data transfer
as described in 2.1.2. And flit size is 9-bits since upper layers of the SpW standard
operate with characters (flits).
Circuit switching
Circuit switching works in a way that physical path is reserved from source to
destination before the transmission begins. To reserve a path a routing header
is injected into the network. A Routing probe, which is formed from destination
address and some additional control information, is sent to traverse through the
network reserving the path as it passes each router. When it reaches the desti-
nation, the path is completed, and an acknowledge sent back to the source. Now
the message can be sent using the full bandwidth. When transmission completes,
the path needs to be released. A release can be triggered by the destination or
by last few bits of the message.
Circuit switching can be advantageous when messages are infrequent and long.
The message is long when message transmission time is much longer compared
to the path setup time. A disadvantage of circuit switching is that reserved path
may block other messages. For example, a routing probe is blocked at the current
router waiting for a physical link to be free. All links reserved by the probe up
to this point block other circuits and preventing them from setting up a path.
Thus, if the size of the message is not much bigger then the probe, the message
could be sent together with the probe. Moreover, all previously reserved link can
be released since the message is buffered in the current router where the probe is
waiting for a free link. This alternate technique is called packet switching.
Packet switching
The message is split into fixed-length packets. First few bytes of the packet
form a packet header which contains the routing information. Each packet is
individually routed from source to destination. Packets are completely buffered
in each router before they are forwarded through an output port. Thus, packet
switching is also referred to as Store-and-forward switching.
Packet switching is advantageous when short and frequent messages are sent
36 SpaceWire standards and router theory
over a network. Multiple packets of the same message can be present in the
network even though the first packet might have not yet reached the destination.
A disadvantage of packet switching comes in the form of an additional overhead
that is needed to rout each packet successfully.
Virtual Cut-Through (VCT) switching
As described above, when packet switching is used, packets must be received
in entirety before routing decision can be made. However, this is not generally
true. Usually, the channel width is smaller than the packet size; therefore, the
packet needs several clock cycles to cross the physical channel. However, first few
bytes containing routing information will be available at the next node in a few
clock cycles. Rather than waiting for the whole packet, the packet header can be
processed as soon as it is received. The router can start forwarding the packet
when it determines the output port and the buffer at the output port is free.
In fact, the message does not need to be buffered at the output and can be cut
through to the input of the next router before the complete packet was received at
the current router. This technique is referred to as Virtual Cut-Through (VCT)
switching.
In case the header gets blocked at a busy output port a complete packet is
buffered at the input. At high network load, VCT acts like packet switching.
Wormhole switching
The message packet is split into flits. The first flit called a header holds the
routing information, i.e., destination address. As soon as the first flit (header)
arrives at the router, it is checked for the routing information and forwarded
through the specified output port. The rest of the packet flits follow as they are
received, occupying the output port until the whole packet transmitted. Since
flits are being used as the unit of message flow control, input and output buffers
typically store only a few flints.
2.3 Routers 37
2.3.2 Routing Algorithms
Many routing algorithms exist as stated in [3]. However, most of them apply
to multiprocessor networks and NoCs with regular network topologies. But,
SpaceWire network usually has an irregular topology similar to other switch-
based interconnects like Autonet, Myrinet and Server Net.
Switch based networks are constructed with a set of switches, each switch has
a set of ports, a subset of these ports is connected to different network nodes, some
are left open, and others are connected to other switches. An example of such
a network is shown in Figure 2.1. They typically provide irregular connectivity.
The only guarantee of such networks is that the network is connected. Networks
support bidirectional, full-duplex links. They allow multiple links between two
switches.
The benefits of irregular topology are that it provides wiring flexibility that
is required in LANs and allows the design of scalable systems. However, the
irregularity also makes the routing on such systems quite complicated.
There are two routing techniques that can be used in irregular networks. The
first one is source routing where the information on how the destination can be
reached is provided by the source node in terms of a path that needs to be taken
to reach the destination. The second one is distributed routing where each switch
is equipped with a routing table that contains the routing information. Before
any packets can be routed, a network mapping algorithm needs to be executed
to populate the routing tables.
2.3.3 Arbiters
Arbiters are electronic circuit or devices that resolve conflicts when multiple
clients try to access a shared resource at the same time. Depending on an ob-
served system client can be a processor or peripheral accessing a bus, or an input
port in a router. Typical examples where arbiters are needed are systems with
shared interconnect bus, multi-port memories, routers, etc. To resolve competi-
tion for the shared resource, they employ an arbitration scheme. Some simple and
38 SpaceWire standards and router theory
common are Fixed Priority, Round Robin, First Come First Serve, and Dynamic
Priority schemes.
Conceptual design of an arbiter is depicted in Figure 2.17. Arbiter has a set of
input request channels connecting clients, which request resources, to the arbiter.
Moreover, a set of output resource channels connecting resources to the arbiter.
Aim of the arbiter is to optimally address requested resources based on available
resources over a period of time. Arbiter employs a handshaking scheme utilizing
Figure 2.17: Synopsis of an arbiter
request and acknowledge signals for communication with a client. Client issues
a request signal which indicates that a resource is required by the client. Upon
available resource, arbiter issues a grant (acknowledge) signal, which enables the
client to initiate a communication with the requested resource. As can be seen in
Figure 2.17 additional attributes can accompany each request signal. An attribute
can be a priority level, type of resource, and amount of resources to be used. They
are not allowed to change between request and grant signals since they are used
in conjunction with the request signal to determine whether a grant signal can
be issued or not.
Resources tell the arbiter if they can be accessed with an available signal
which can be accompanied by additional attributes such as performance and
2.3 Routers 39
capacity of a resource as they can differ from each other, some can be faster and
larger then others. A release signal is used to inform a resource that is no longer
needed by the previous client and that it can change its state back to available.
Similarly, like with client’s attributes, data must not change between available
and release signals.
Channel as depicted in Figure 2.17 is a request and a grant signal coupled
together enabling bidirectional communication with a client. Similarly goes for
available and release signals in channels between the arbiter and a resource. Ar-
biters use some form of a handshaking communication protocol from many that
exist. However, there are two most common general classes, two-phase protocols,
and four-phase protocols, described by the used signaling method. One type uses
transition signaling the other level signaling.
Two-Phase or Non-Return-to-Zero (NRZ) protocols
With Tow-Phase protocols only transitions or events of a signal matter. Not
even the initial state of the signals is important. Figure 2.18 depicts a timing
diagram of request and grant signals for the Two-Phase protocol. Clients need
to abide by the protocol rules to ensure correct arbitration. The client cannot
issue a new request before the grant signal for the current request. Likewise, the
arbiter cannot issue a new grant signal before the request event is captured by
the arbiter.
Figure 2.18: Two-Phase or Non-Return-to-Zero (NRZ) protocol
40 SpaceWire standards and router theory
Four-Phase or Return-to-Zero(RTZ) protocols
Four-Phase protocols use level-based signaling. Figure 2.19 depict a timing dia-
gram of request and grant signals for the Four-Phase protocol. The initial state
of the signals is important. Whenever a request signal goes from low to high
arbitration process is triggered, and a grant signal is issued. Later, when the
request signal goes from high to low, it resets the grant signal and completes
the four-phase protocol. Again, client and arbiter must abide by the protocol
rules. Request (grant) must remain stable at low or high until grant (request)
has changed its state.
Figure 2.19: Four-Phase or Return-to-Zero (RTZ) protocol
In above-described protocols, it was assumed that the release of the resource
was communicated via a separate channel between the client and the resource,
which is not a part of the arbitration process. Sometimes arbiters are designed
in a way that the release signal goes through the arbiter. In the case of Two-
Phase signaling, this means a need for additional wire between the client and the
arbiter carrying signal called done. Whereas, with Four-Phase signaling, a release
is often associated with a request going to zero.
Fixed priority arbiter
As the name suggests, clients have a fixed priority, e.g., the first one has the
highest priority. Table 2.4 shows which grant signal is issued based on the input
request vector. LSB in the request vector indicates a client with the highest
priority. Fixed priority or sometimes also called simple priority arbiters are most
commonly used when one has only a few clients.
2.3 Routers 41
Table 2.4: Fixed priority arbiter grant signals
requests grant
XXX1 0001
XX10 0010
X100 0100
1000 1000
Round Robin arbiter
Round Robin arbiter solves some of the drawbacks of a fixed priority arbiter. For
example, with fixed priority arbiters, there is no way to determine how long will a
lower priority request have to wait because of the higher priority request. Round
Robin arbiters solve this fairness issue by introducing a token which is passed
among clients. After each arbitration cycle, the token is passed to the next
client. Whichever client holds the token has the highest priority at that moment.
The longest one client must wait only depends on the number of clients.
There are two ways of how the token can be passed among clients. Figure 2.20
depicts the process of both methods. Request signals are identified by letter ’R’
and a number, whereas grant signals are identified by letter ’G’ and a number.
If a signal is asserted its color is green otherwise it is red. The dot corresponds
to the current location of the token. One method is to pass the token always
cyclically. The other is to pass it to the client next to the one who just received
a grant signal. At first glance, these two methods seem to be nothing but the
same. This is true when the system is busy, and all request are constantly present.
However, things change when there are only a few active requests at the time.
First Come First Serve arbiter
First Come First Serve arbiter issues grant signal based on which request signal
arrived first.
Dynamic priority arbiter
Unlike in fixed priority arbiters where client priorities are fixed during execution,
in dynamic priority arbiters, client’s priorities can change during operation and
42 SpaceWire standards and router theory
Figure 2.20: Passing Round Robin token
2.3 Routers 43
are passed to the arbiter as an attribute. Therefore, each client can, at one
point have a higher priority and a lower priority at another. A question presents
itself on how to resolve requests with the same priority level as this is a possibility
since priorities can change between two arbitration cycles. The simplest way is to
always issue a grant signal to the first client in the group. This might not produce
promising results since one is introducing additional unfairness. Therefore, two-
stage arbiters are implemented, meaning that for resolving requests with the same
priority level one uses a fair arbitration algorithm such as Round Robin.
44 SpaceWire standards and router theory
3 Design Implementation
This chapter presents the SpW router architecture. It describes the implementa-
tion of specific components of the SpW router. It explains how RMAP commands
are decoded and RMAP replies encoded. It describes in detail how the control
logic works and presents several arbitration examples.
3.1 Design overview
The target device for the SpW router hardware implementation is the Microsemi’s
RTG4 Radiation-Tolerant FPGA. With the defined target device comes a set of
development tools for firmware development. At the time of writing this thesis,
this was Microsemi’s Libero SoC v12 which ships with Mentor Graphics ModelSim
for Hardware Description Language (HDL) simulations and Synopsys Synplify for
design synthesis [12].
Figure 3.1 depicts a proposed SpW router architecture which employs a worm-
hole switching technique. The architecture was determined based on requirements
presented in section 1.2, a generic router depicted in Figure 2.16, presented solu-
tions in the field of NoCs [13, 14, 15, 16], and architectures of current available
devices [6, 7]. Most of the new research focuses on improving NoC routers as
many high speed SoC and multiprocessor devices move away from traditional in-
terconnect to a NoC architecture to further improve their performance. However,
not all features of NoC routers can be used in a SpW router implementation, e.g.,
routing algorithms in NoC routers exploit the regular topology of a NoC (e.g.,
mash networks). Whereas a SpW network is usually of an irregular topology
where routing algorithm is executed from a routing table as was described in
45
46 Design Implementation
Figure 3.1: Proposed SpW router architecture
section 2.3.2. Nonetheless, a SpW network is very similar to Myrinet which is a
high-speed LAN used in multicomputer processing clusters for parallel computing
[17]. It includes the support for high-performance wormhole switching in arbi-
trary LAN topologies. Myrinet comprises host interfaces and switches connected
by full-duplex physical links with 9-bit-wide channels. The 9-bit character may
be a data byte or one of several control characters [3, p. 434] [17]. Based on
these similarities, Myrinet routers could prove as a good resource how routers are
designed for an irregular topology network.
Looking back at the requirements in section 1.2 and proposed SpW router
architecture in Figure 3.1 it is obvious that several different components are
needed to satisfy all of the requirements. As was stated in section 1.2, one of the
requirements was that the SpW router could be configured during run-time. The
question is where to store configuration data and how to access it. If the amount
3.2 Internal interconnect 47
of information to be stored is small, one can use registers to store data in each
configurable component. This is how peripherals in microcontrollers, IP cores,
and different SoC devices are designed. To access those registers, components
need to be somehow connected. This is done with an internal interconnect bus
which was the first thing to be determined and implemented in the router design.
3.2 Internal interconnect
Internal interconnect bus or usually referred to just as a bus is a structure of
data, address, and control wires connecting different components. To ensure
the integrity of data and correctness of different write and read operations, the
interconnect is designed as a master-slave system. In which master devices drive
the bus and issue bus operation commands, whereas slave devices only listen on
the bus and give appropriate responses. In the case where multiple master devices
are supported, an arbiter is needed to resolve which master device gets access and
for how long. Many different SoC internal interconnect bus architectures exist. It
is safe to say that the three most popular are ARM’s Advanced Microcontroller
Bus Architecture (AMBA) architecture, IBM’s CoreConenct architecture and
Wishbone architecture which is mainly used in open-source IP core designs.
AMBA architecture has three layers of bus architectures, Advanced eXtensible
Interface (AXI), Advanced High-performance Bus (AHB) and Advanced Periph-
eral Bus (APB) each designed for a specific set of applications. AXI is targeted
at high performance, high clock frequency system designs and includes features
that make it suitable for high-speed sub-micrometer interconnect. It has an in-
dependent data bus for read and write operations. It supports burst operations,
pipelined operations, multi-master operations, and out of the order operations
meaning outstanding bus operations might not be handled based on their arrival
time. The data bus can, therefore, be a mix of slow and fast transfers where
reordering of data occurs only among multiple masters and multiple transfers of
the same master but not within a burst[18].
AHB is a bus interface suitable for high-performance synthesizable designs.
It defines the interface between components, such as masters, interconnects, and
48 Design Implementation
slaves. It can be configured to support a substantial bus width. It supports split
transfer operations along the standard pipeline and burst transfer operations of
a high-performance bus [19].
APB is a bus architecture used for connecting slow peripherals with a straight-
forward one master design which in larger systems may be a bus bridge between
AXI or AHB. It supports only simple read and write operations without any
advanced features like burst mode or pipelined operations[4].
A block diagram of a microcontroller featuring ARM Cortex-M4 processor
depicted in [20, p. 20] is an example how AMBA architecture may be used. For
the high-performance bus AHB is used and APB for the peripheral bus.
CoreConnect architecture similar to AMBA provides a high-performance bus
and a slower bus for connecting low-bandwidth peripherals. Processor Local Bus
(PLB) is a high performance, and low latency interconnect with design flexibility
needed in a highly integrated SoC. It has separate read and write buses. It
supports both pipelined and burst transfers. On-chip Peripheral Bus (OPB)
is a secondary bus architected to alleviate system performance bottlenecks by
reducing capacitive loading on the PLB. In contrast to APB it supports up to 4
master devices [21] even though it is a peripheral bus.
Comparison between all three famous bus architecture families is summarized
in Table 3.1.
In the end, APB was picked as the internal interconnect of the SpW router.
There were three reasons for this choice. First, Libero comes with AMBA IP cores
included. In particular APB core enables the user to select up to 16 slave devices,
splitting the address range into 16 equal parts, with configurable address space
based on the address width [22]. Second, since each configurable component is
planned to have only a few registers, there is no need for a high-performance bus.
Moreover, registers are usually designed to match the data width of the bus, so
only one read or write operation is needed for accessing a register. Third, it is
simple to implement, as it has a small number of control signals. All the signals
of the APB bus are listed in Table 3.2.
3.2 Internal interconnect 49
Table 3.1: Bus architecture comparison
C
or
eC
on
n
ec
t
A
M
B
A
W
is
h
b
on
e
O
P
B
P
L
B
A
P
B
A
H
B
A
X
I
A
d
d
re
ss
32
/6
4
b
it
32
/6
4
b
it
32
b
it
32
b
it
32
b
it
1-
64
b
it
B
u
s
w
id
th
32
/6
4
b
it
32
-2
56
b
it
8-
32
b
it
8-
10
24
b
it
8-
10
24
b
it
8/
16
/3
2/
64
b
it
#
M
as
te
rs
4
16
1
16
n
/a
n
/a
B
u
rs
ts
ye
s
va
r.
le
n
gt
h
n
o
va
r.
le
n
gt
h
va
r.
le
n
gt
h
ye
s
P
ip
el
in
in
g
n
o
ye
s
n
o
ye
s
ye
s
n
o
S
ep
ar
at
e
rd
/w
r
d
at
a
b
u
se
s
n
o
ye
s
n
o
ye
s
ye
s
n
/a
S
p
li
t
tr
an
s-
fe
r
n
o
ye
s
n
o
ye
s
ye
s
n
o
O
u
t-
of
-
or
d
er
n
o
n
o
n
o
n
o
ye
s
n
o
50 Design Implementation
Table 3.2: APB signals description [4]
Signal Source Description
PADDR APB bridge/master Address. This is the APB address bus. It
can be up to 32 bits wide and is driven by
the peripheral bus bridge/master unit.
PSELx APB bridge/master Select. The APB bridge/master unit gener-
ates this signal to each peripheral bus slave.
It indicates that the slave device is selected
and that a data transfer is required. There
is a PSELx signal for each slave.
PENABLE APB bridge/master Enable. This signal indicates the second and
subsequent cycles of an APB transfer.
PWRITE APB bridge/master Direction. This signal indicates an APB
write access when HIGH and an APB read
access when LOW.
PWRITE APB bridge/master Write data. This bus is driven by the pe-
ripheral bus bridge/master unit during write
cycles when PWRITE is HIGH. This bus
can be up to 32 bits wide.
PREADY Slave interface Ready. The slave uses this signal to extend
an APB transfer.
PRDATA Slave interface Read Data. The selected slave drives this bus
during read cycles when PWRITE is LOW.
This bus can be up to 32-bits wide.
PSLVERR Slave interface This signal indicates a transfer failure. APB
peripherals are not required to support the
PSLVERR pin. This is true for both exist-
ing and new APB peripheral designs. Where
a peripheral does not include this pin then
the appropriate input to the APB bridge/-
master is tied LOW.
3.2 Internal interconnect 51
Driving the APB bus due to its simplistic design is rather simple. The bus
is driven by a three-state machine with states: IDLE, SETUP, and ACCESS, as
depicted in Figure 3.2. Each operation on the APB bus takes two clock cycles.
The bus stays in the default IDLE state until a transfer operation is triggered
by the master (transition 1). When the transfer is triggered bus moves to the
SETUP state (transition 2) at which point the slave select signal PSELx is set.
The bus remains in the SETUP state only for one clock cycle and moves to the
ACCESS state at the next rising edge (transition 3). When in the ACCESS state,
the PENABLE is asserted. The address, write, select, and write data signals must
remain stable during the transition between SETUP and ACCESS. Exit from the
ACCESS state is controlled by the PREADY signal from the slave. If it is held
LOW, the bus stays in the ACCESS state (transition 4). If it is set to HIGH,
the bus exits the ACCESS state and moves to the IDLE state (transition 5).
Alternatively, in case there is another pending transfer request, it moves directly
to the SETUP state (transition 6).
Figure 3.2: APB state machine diagram [4]
52 Design Implementation
3.3 Registers
As was mentioned before, registers are used to store configuration, control, and
status data of certain components of the SpW router. They are memory-mapped
as this is common practice when designing peripherals for SoC devices, e.g.
memory-mapped register in a microcontroller [20, p. 74-78][23, p. 1018]. Memory
mapping registers allows master devices to access control data of a component
as it was written in memory. More importantly, memory-mapped registers are
necessary to access them with the RMAP protocol, as stated in section 2.2. Reg-
ister width was set to 32-bits in order to have enough space for multiple fields of
similar data. Consequently, the APB data bus width was set to match the width
of registers; thus, only one read or write operation is needed to access a register.
Traditionally registers are implemented as a set of D-FlipFlop (Dff) each
flipflop storing one bit of information, e.g., for 32-bit registers one needs 32 Dff.
However, the SpW router is meant to be used in space applications meaning
ordinary flipflop register are susceptible to radiation events. In particular to
Single Event Upsets (SEUs) which are common in memories and registers. A
SEU causes a bit-flip leading to change in stored information. Therefore, in-
stead of using traditional register description, which does not have any available
data recovery mechanism, SRAM cells with an Error Correction Codes (ECC)
support were used as registers. The ECC logic circuitry provides a mechanism
for single error correction and double error detection and raises flags to indicate
single-bit-correct and double-bit-detect when enabled [24]. When one wants to
infer SRAM cells by writing VHDL component description, one must follow the
recommendations provided in [25].
Inferring SRAM cells might be an overkill for register implementation on an
RTG4 which has SEU-hardened registers [26]. However, one could argue that
control and configuration data is very delicate as it is used to control and configure
the SpW router operation. Besides, this way, the implementation is compatible
with other FPGAs as long as they feature SRAM cells with some ECC logic
circuit.
Even though inferring SRAM cells is not hard, writing data to SRAM cells,
on the other hand, is not as trivial as writing data to a registered signal. A
3.3 Registers 53
dedicated Direct Memory Access (DMA) controller had to be implemented so
that components can write data into registers, in particular, status registers.
Based on the purpose of registers, different transfer operations are allowed for a
specific type of a register. Control and configuration registers are of read-only
type looking from the component side. However, both read and write operations
are possible from the bus side. Status registers are write-only type looking from
the component side; however, only read operation is allowed from the bus side.
Inferred SRAM memory has two ports. A write port for writing data and
a read port for reading data. Write port has three input signals: write enable
(WEN), write address (WADDR), and write data (WD). Whereas read port has
two input signals read enable (REN) and read address (RADDR), and one out-
put signal read data (RD). In order to access this memory block via the APB
bus, a wrapper had to be written to convert bus signals to DMA signals. The
implementation of the wrapper was derived from the example provided in [27].
The implemented state machine shown in Figure 3.3 has three additional states
compared. Two additional states are for enabling support for read and write
transfers with wait states [4], and the third state is needed for read operations
when output pipeline register is enabled as one needs to wait for one more clock
cycle before data is available.
The APB slave wrapper state machine stays in the default IDLE state until the
slave is selected by the master who has to assert a correct PSEL signal (transition
1). The state machine moves to the next state only when the component is not
accessing registers by itself. From here, there are three possibilities. First, if
a read transfer is requested (PWRITE is LOW), and a read transfer with wait
states is enabled, then the FSM moves to the WAIT RD state (transition 2) where
it waits (transition 3) for a specified number of clock cycles before moving to the
next state S0 (transition 4). Second, if a write transfer is requested (PWRITE
is HIGH), and a write transfer with wait states is enabled, then it moves to
the WAIT WR state (transition 5) where it waits (transition 6) for a specified
number of clock cycles before it moves to the next state S0 (transition 7). Third,
if the wait state is disabled for the requested transfer operation, the state machine
moves directly to the S0 state (transition 8). The state machine remains in the
S0 state for one clock cycle and moves to the next state at the next rising edge.
When PWRITE is HIGH, it moves back to the IDLE state (transition 9) and thus
54 Design Implementation
Figure 3.3: APB slave wrapper state machine for SRAM
completing the write operation. When PWRITE is LOW, it moves to the RD0
state (transition 10) and then at the next rising edge to the RD1 state (transition
11) if a pipeline register is enabled or to the RD1 state directly (transition 12)
if there is no pipeline register. When the state machine is in the RD1 state, it
moves back to the IDLE state at the next rising edge (transition 13) and thus
completing the read operation.
With APB slave wrapper written all necessary components are completed to
create a simple test setup for testing the behavior of the APB interconnect, APB
master, SRAM memory and APB slave wrapper. A block diagram of instanti-
3.3 Registers 55
ated components in Libero SmartDesign canvas is depicted in Figure 3.4. The
Sysreset component generates the system reset signal. The RCOSC with the
RTG4CCC Clock Conditioning Circuit (CCC) is used to generate a 50MHz sys-
tem clock frequency. The CoreAPB3 component represents the APB interconnect
with address decoding included. It is configured for only one slave device. The
RTG4 SRAM user-defined SmartDesign component is a top-level entity for the
APB slave wrapper and inferred SRAM memory. The APB master drives the
APB bus and performs predefined write and read operations.
Figure 3.4: Libero SmartDesign block diagram of SRAM as APB slave
In Figure G.1 in Appendix G are shown simulation results of the above de-
scribed circuit. Write and read transfer operations without additional wait states
were simulated. The master waits in the IDLE state (i apb state signal, mark 1)
for the transfer initiation (i transfer signal, mark2). When triggered, it performs
four consecutive write operations (mark 3), and then four consecutive read op-
erations to verify whether data was successfully written into the SRAM (mark
5). At the end done flag is asserted to indicate the end of the test (mark 7). At
which point, the bus goes back to the IDLE state (mark 8). Mark 4 indicates the
point at which transition from write to read operations occurs. Mark 6 indicates
a valid data signal asserted which tells the master that data on the PRDATA bus
is valid. The APB slave wrapper state is indicated by the i slave state.
In Figure G.2 in Appendix G are shown simulation results for the same transfer
operations. However, in this simulation run, the slave device waits for four clock
56 Design Implementation
cycles before executing the transfer operation. The wait states are marked with
blue color. For convenience, all other key events were marked the same way as
they were in the previous example.
Since SRAM is a volatile type of memory, its content is lost whenever a
device loses power. Therefore, to have a default state after power-up or after a
reset an initializer circuit, later referred to as initializer, was designed to perform
the initialization of SRAM registers. Default or initialization data is stored in
the RTG4 Read Only Memory (ROM) as it retains data even on power loss. A
binary file is used to configure the content of the ROM. ROM file can be updated
after synthesis and place&route but before the generate FPGA Array Data step
in the Libero software, which can save a lot of time. The initializer reads data
from the ROM and writes it to an appropriate register in the next stage of the
initialization cycle. The initializer is composed of three state machines. The
first one is responsible for the APB bus control since initializer is considered
as a master device and determining whether read or write operations shall be
performed. The second one is responsible for incrementing the current address.
The third one sets the current initialization phase, it starts with routing table
initialization, then moves to configuration port initialization and last but not
least SpW router ports initialization.
Figure 3.5 depicts the state machine that controls the APB bus during the
initialization process. Compared to the one depicted in Figure 3.2 it has an addi-
tional possible transition between ACCESS and IDLE state which is taken when
the top address of the current component being initialized is reached (transition
7). This is to ensure that APB bus is in the IDLE state between two initialization
phases and at the end of the whole initialization process. Also, on a transition
between ACCESS and SETUP states, it toggles between read and write transfer
operations (written in red color). Other than that, the functionality is the same
as described before.
Figure 3.6 depicts the state machine that increments the address after each
initialization cycle. Where an initialization cycle is considered as a ROM read
operation followed by a register write operation. Write operations are done with
the same address as read operations since data in ROM is stored as if it were
in registers. One has to apply a correct address offset to the ROM address in
3.3 Registers 57
Figure 3.5: Initializer APB state machine diagram
each initialization phase. Mentioned FSM has three states: IDLE, RD(read) and
WR(write). After a reset, the state machine is in the default IDLE state. When
a read operation is scheduled, and the same transfer signal as for the APB state
machine is asserted, FSM moves to the next state RD (transition 2). It stays in
this state (transition 3) until a write operation is scheduled. Then it moves to the
WR state (transition 5) where it waits for another read operation (transition 6).
When a new read operation is scheduled the state machine, moves back to the
RD state (transition 7) and the address on the bus is incremented. The FSM is
switching between RD and WR states until the address reaches the top address
in the address range of the current component being initialized. Then it moves
back to the IDLE state (transition 8) and during the transition asserts a signal
to indicate the end of the current initialization phase.
Figure 3.7 depicts the state machine which determines the current initializa-
tion phase. The initialization process starts with initializing the routing table in
the router (transition 1). This is indicated by the RT init state where the address
is configured to point to the routing table and the correct section in the ROM.
When routing table initialization is done the state machine, moves to the first wait
state WAIT 0 (transition 2) and waits for a few clock cycles (transition 3). This
was implemented mainly for simulation purposes so that the transition between
58 Design Implementation
Figure 3.6: Initializer transfer operation state machine diagram
two initialization phases can be easily observed. The next initialization phase is
determined by the Config Port init state. During this initialization phase as the
name might suggest the configuration port registers are initialized. Again the
address is configured to point to the configuration port registers and the correct
section in the ROM. When configuration port initialization is completed the state
machine, moves to the second wait state WAIT 1 (transition 6) where it waits
for a few clock cycles (transition 7) before moving to the last initialization phase
(transition 8). In the last state, the SpW router port registers are initialized.
When the last initialization phase completes, a signal is asserted to indicate the
end of the initialization process. A more detailed view of described state machine
diagrams with all the signal names that were used in the VHDL implementation
can be found in Appendix A.
3.4 Configuration port 59
Figure 3.7: Initializer initialization phases state machine diagram
3.4 Configuration port
A solution for implementing registers and how to access them was presented
in previous paragraphs. At this point, registers or SRAM memory blocks are
accessible only internally from a master device connected to the internal inter-
connect. However, in order for the SpW router to be remotely configurable, an
interface between SpW and the internal interconnect is needed. This is achieved
by implementing a component called a configuration port [2, clause 5.6.9, p. 105]
which is capable of decoding and encoding RMAP packets and translating de-
coded commands into bus transfer operations as was stated in the requirements
in section 1.2.
60 Design Implementation
The core of the configuration port implementation was take from the open-
source SpW router IP core from Shimafuji Electric Inc. [28]. The design was
modified to be compatible with STAR-Dundee’s SpW CODEC IP. The decoding
error handling was changed to send the first error that occurs in the decoding
process. Lastly the CRC is calculated on the fly as is suggested in the standard
[11, Annex A] when writing a VHDL implementation opposed to the lookup table
approach used in the open-source IP core.
Figure 3.8 represent a block diagram of the configuration port which is com-
posed of several components. Configuration register holds the value of the config-
uration’s port logical address and an 8-bit key which is used to authenticate the
RMAP commands as described in section 2.2.2. APB slave wrapper is used as an
interface bridge between the register and the APB bus. The key component of
the configuration port is the RMAP decoder which is responsible for command
decoding and reply encoding. It is composed of six state machines. One for
RMAP packet decoding, one for receiving SpW data characters and calculating
the CRC, one for RMAP packet encoding, one for transmitting SpW characters
and calculating the CRC, one for driving the APB bus, and one for determining
the bus operation to be executed. Control logic interface represents a group of
signals connecting the configuration port to the control logic, e.g., request and
grant signals. SpaceWire interface represents a group of signals for interfacing
with SpW router ports, e.g., data in, data out signals.
Figure 3.8: Configuration port block diagram
3.4 Configuration port 61
Before one can start decoding an RMAP packet, one needs to obtain data
characters from the SpW router port that is receiving the RMAP packet. This
is done by the receive data/calculate CRC state machine depicted in Figure 3.9.
The FSM stays in the default IDLE state (transition 1) if there is no active link
between a SpW router port and the configuration port. After the initialization
process has completed and if the link is active the state machine moves to the
next state (transition 2), a data character is popped from the receiver FIFO in the
receiving SpW router port, and a data ready flag is asserted. The data ready flag
is used to control the RMAP command decoder state machine. When the receive
data/calculate CRC state machine is in READ 0 state it moves back (transition
3) to the IDLE state on the next rising edge. During the transition, the CRC is
calculated, and the data ready flag is de-asserted. The process continues until all
characters of the packet are processed.
Figure 3.9: RMAP decoder - receive data/calculate CRC state machine
Meanwhile the RMAP command decoder state machine decodes each received
data character. It is enabled only when the data ready flag is asserted. The state
machine was derived from all possible command fields present in the RMAP
command. Where each field shown in Figure 2.10, 2.12 and 2.14 represents its
own state in the FSM. Excluding the Target SpW Address since it is deleted
before the RMAP packet arrives to the RMAP decoder. The RMAP command
decoder state machine determines the RMAP command to be executed, issues a
request for a bus operation and produces an error code when an error occurs.
62 Design Implementation
The internal bus access state machine depicted in Figure 3.10 handles request
from the RMAP command decoder state machine. Based on a decoded command,
it executes a proper bus operation. The state machine waits in the default IDLE
state (transition 1) until bus access is requested. When bus access is requested
and the RMAP command to be executed is read or read-modify-write command,
then the state machine moves to the READ 0 state (transition 2). Here it waits
(transition 3) for a valid data signal and in case of a read command then moves
back to the IDLE state (transition 4) thus completing the read command. In
case of a read-modify-write command, the state machine moves from the READ 0
state to the READ 1 state (transition 5) storing read data in a buffer. At the next
rising edge, it moves to the WRITE 0 state (transition 6) modifying data and
writing it back to the same location. The state machine waits in the WRITE 0
state (transition 8) until an acknowledge is received and then moves back to
the IDLE state (transition 9) thus completing the read-modify-write command.
When the state machine is in the default IDLE state and bus access is requested
and the command to be executed is a write command the bus moves to the
WRITE 0 state (transition 7). It waits in the WRITE 0 state (transition 8) until
an acknowledge is received and then moves back to the IDLE state (transition 9)
thus completing a write command. This state machine controls the APB state
machine which was described in section 3.3.
When an RMAP command completes a reply packet needs to be constructed
if the acknowledge bit was set in the instruction field of the command (always
true for read and read-modify-write commands) — this is a responsibility of
the RMAP reply encoder state machine. The state machine was derived from
all possible reply fields present in the RMAP reply. Where each field shown in
Figure 2.11, 2.13 and 2.15 represents its own state in the FSM. The state machine
waits in the current state until the transmit CRC is calculated.
A transmission of a reply packet is executed by the transmit data/calculate
CRC state machine depicted in Figure 3.11. The FSM stays in the default IDLE
state (transition 1) if there is no available reply data and the SpW (path) address
of the reply was not set, meaning only logical address is used for routing the
packet back to its source. In case SpW address is set for the reply packet, it
is the first thing to be transmitted. The state machine moves to the WRITE 1
state (transition 2), de-asserting the CRC byte calculated flag, where it waits
3.4 Configuration port 63
Figure 3.10: RMAP decoder - bus access state machine
until the SpW router port transmitting the packet is ready to accept a new data
character (transition 3). Once the transmitting SpW router port can receive a
new data character, the state machine moves back to the IDLE state (transition 4)
asserting the CRC byte calculated flag, which is used to control the progression
of the RMAP reply encoder FSM. This process repeats until the whole SpW
address has been transmitted. From the beginning or after the SpW address
was transmitted, the state machine waits in the IDLE state until reply data is
ready (transition 1). When the reply data is ready, the state machine moves to
the WRITE 0 state (transition 5), de-asserting the CRC byte calculated, where
it waits until the SpW router port transmitting the packet is ready to accept a
new data character (transition 6). Once the transmitting SpW router port can
receive a new data character the state machine moves back to the IDLE state
(transition 7) calculating the CRC and asserting the CRC byte calculated flag.
This process repeats until the whole reply packet has been transmitted. There
is no way that reply data is transmitted before the SpW address since character
queuing is done by the RMAP reply encoder state machine. Where the SpW
64 Design Implementation
address field is always encoded before other reply fields. A more detailed view
of described state machine diagrams with all the signal names that were used in
the VHDL implementation can be found in Appendix A.
Figure 3.11: RMAP decoder - transmit data/calculate CRC
With the configuration port completed, internal memory blocks can be ac-
cessed via the SpW remotely. Figure 3.12 depicts a block diagram in Libero
SmartDesign canvas of an RMAP target device. The spw rtg4 port1 dev top 0
component represent the SpW CODEC IP which acts as an interface between
RMAP decoder component and a SpW link. The APB interconnect, status reg,
Routing Table, control reg, and RTG4 uPROM 0 represent the internal intercon-
nect with a few slave devices. The APB maste sw component is a multiplexer
which selects one of the master devices to be connected to the APB bus since
only one master device can be present on the APB bus. During the initialization
process, the Initializer component is the active master device, whereas during
the normal operation the active master device is the RMAP decoder. Which of
the two is the active master device is controlled by the Init done signal from the
Initializer. The RMAP target device is capable of receiving RMAP packets via
SpW. It can decode packets and send replies as defined in [11, clause 5.7.1.3]. It
was used to test the RMAP decoder component.
3.4 Configuration port 65
Figure 3.12: Libero SmartDesign block diagram of RMAP target device
66 Design Implementation
3.5 Routing Table
A routing table is needed to store the routing information for each logical address
in order to successfully route SpW packets with a logical address. Each routing
table entry is represented by a 32-bit value where each bit location corresponds
to a physical port of the SpW router with the same number as a bit position.
A 32-bit value was chosen because the SpW standard allows up to 31 physical
ports, and it matches the bus data width. Port numbers are not binary encoded
because multiple output ports can be assigned to one logical address in case of
group adaptive routing. Since there are 256 possible values for a SpW address, the
routing table depth is set to match this number, thus creating a 256x32-bit large
memory space. However, the first 32 entries are not used since they represent the
path address values. At the time the idea was to directly use the logical address
of an incoming packet as an address to the routing table without performing any
address decoding.
One of the requirements stated in section 1.2 was to include the support
for multicast packets, meaning packet from one source is sent through multiple
output ports. To achieve this, another field was added to the routing table entry.
This field represents a so-called multicast set. Where again each bit position
corresponds to a physical port of the SpW router except bit 0. Since it does not
make sense to send a multicast packet to port 0 which is a configuration port,
instead, this bit is used to enable or disable multicast for a particular logical
address.
Another requirement was that the SpW router is capable of routing packets
based on their priority. Therefore, a third (control) field was added to the routing
table entry which contains an 8-bit binary encoded priority value and a few control
flags.
With multicast set and control fields, the routing table size increased to
256x96-bits. Even though inferred memory blocks can be of arbitrary size (with
some upper limit), the design synthesis did not produce optimal results. Synthesis
tool just did not like the proposed aspect ratio of the memory block. Therefore,
instead of creating one big memory block the routing table was split into smaller
memory banks of size 256x32-bits, one bank for each field in the routing table
3.6 SpW router ports 67
entry. In case an additional field is needed in the routing table entry one can just
add another memory bank.
Since the routing table needs to be accessed from the APB bus and SpW
router ports, a multiplexer was added to the DMA signal lines. The default
connection is to the SpW router ports.
Like with other SRAM memory blocks an APB slave wrapper was added as
an interface between DMA and APB bus signals. The read operation regarding
the routing table is a little bit different than the one performed for registers.
The reason is that the entire routing table entry is read (96-bits - width of the
RD signal); therefore, some additional address decoding is necessary. The whole
routing table entry is read because all fields are used at the same time by the
SpW router port.
Another thing to consider was that all SpW router ports need to access the
routing table. Therefore, a routing table arbiter was implemented to resolve con-
tention for a shared resource. A round-robin arbitration scheme is used because,
at this point, the priority of the packet is not known yet since this information
is stored in the routing table. Therefore, the best option is to implement a fair
arbitration scheme. The implementation of the round-robin arbiter is presented
in section 3.8 since it is also used in the control logic.
3.6 SpW router ports
Now that all the control and configuration aspects of the SpW router have been
sorted out. The focus shifted to designing components that will enable the pro-
cessing of incoming packets. In order to receive and transmit packets from the
router, a SpW interface is needed. Therefore, each physical SpW router port
includes a SpW CODEC IP from STAR-Dundee and a packet processor on a
receiver side to determine the packet destination, to communicate with control
logic and to control the flow of packets.
Figure 3.13 represents a block diagram of a SpW router port. As mentioned
each SpW router port includes a SpW CODEC IP and a packet processor. Not
68 Design Implementation
Figure 3.13: SpW router port block diagram
much had to be done with the SpW CODEC IP since it came with an example for
RTG4 implementation from where all configuration parameters and constraints
were copied. Routing table interface represents a group of signals that are used
to communicate with the routing table arbiter. Control logic interface represents
a group of signals that are used to communicate with the control logic, e.g.,
request and grant signals. Internal connections signals are signals which connect
one input port to one or multiple (in case of multicast packet) output ports when
the connection is approved by the control logic. Dout and Sout signals are output
signals from the output LVDS buffers within the SpW CODEC IP and can be
therefore directly connected to LVDS supported FPGA IOs. Rx Data Port and
Rx Clk signals are received data and recovered clock signals driven by a clock
recovery circuit which is external to the SpW router port as is suggested in [29,
p. 44]. A detailed view of a SpW router port block diagram can be found in
Appendix B.
Clock recovery circuit is an essential part of the SpW router port as it recovers
data and clock signals from the data-strobe encoding. It can be implemented in
various FPGA fabric architectures, but its design poses serious challenges specific
to the FPGA device architecture. The major challenge is related to the tight
timing constraints between data and strobe signals to generate a glitch-free, and
low jitter recovered clock as is stated in the application note from Microsemi [30].
Fortunately, RTG4 CCC already comes included with the clock recovery circuit,
which makes the designer’s life a little easier. However, this limits the number
of physical ports in a router to 16. There are eight CCCs available in an RTG4
3.6 SpW router ports 69
each having two clock recovery circuits thus resulting number of 16 ports. For
the SpW router implementation in this thesis, only a few ports are needed since
the focus is on the functionality of the router. Therefore, the CCC was used as
a clock recovery circuit. A detailed view of SpW router ports and all external
components can be found in Appendix B.
The SpW CODEC encodes and decodes SpW character where only data and
EOP characters (N-Chars) are further routed through the router as was explained
in section 2.1.2. Before N-Chars can be routed, their destination needs to be
determined. The next crucial component of a SpW router port is, therefore, a
packet processor. The packet processor was implemented in the form of a state
machine. In its IDLE state, it continuously checks if there is any data in the
receive FIFO buffer by reading the empty flag from the CODEC IP. In this state
when a character appears in the receive FIFO, it assumes that this is the first
character of the packet. Therefore, it checks if the character contains a path or
a logical address value. In case of a path address (value ≤ 31), it checks if the
value is in the correct range based on the number of implemented physical SpW
router ports, e.g., the router has four physical ports. Therefore, the maximum
allowed value for a path address is 4. If the value is out of range, the packet
is dropped. On a valid path address, the packet processor then checks whether
the destination (output) port is active or not. If it is not the packet is dropped.
If the output port is active the packet processor issues a request signal to the
control logic informing it that it wants to connect to the output port specified
by the path address. The packet processor now waits for the grant signal from
the control logic before it starts sending characters to the output port. Once a
grant signal is received the packet processor reads a character from the receive
FIFO and forwards it to the output port if the transmit FIFO in the output
port can store the new character otherwise it waits until there is a free space in
the transmit FIFO. The packet processor continuously reads characters from the
receive FIFO and forwards them to the transmit FIFO in the output port until an
EOP character is read, which indicates the end of the current packet. The request
signal is de-asserted, soon after the control logic de-asserts the grant signal, the
output port is released, and is free to connect to another input port. The packet
processor moves back to the IDLE state and waits for a new packet, or it starts
processing it if there are already characters waiting in the receive FIFO.
70 Design Implementation
In case when the first character read by the packet processor contains the value
of a logical address (value > 31), it sets the routing table address to the value of
the character and issues a request signal to the routing table arbiter. The packet
processor now waits for the grant signal from the routing table arbiter before
moving forward. Once the grant signal is received, it asserts the read enable
signal of the routing table. At the next rising edge, it de-asserts the read enable
signal and waits for one clock cycle for data to appear since a pipeline register is
used on the read data signal. After the data from the routing table is received, it
checks whether this is a multicast packet or not. If it is, it checks if all the output
ports in the multicast set are active. If all targeted output ports are active, it
sends a request signal to the control logic with the list of output ports and priority
level of the multicast packet. If one of the output ports in the multicast set is
not active, the packet is dropped. If the packet is regular, the packet processor
sends a request for only one output port and the priority level of the packet to
the control logic. From here on the processing flow is the same as in case of a
path address. A detailed view of the described state machine diagram with all
the signal names that were used in the VHDL implementation can be found in
Appendix A.
Request and grant signals for both control logic and routing table arbiter
are issued based on four-phase handshake protocol which was described in sec-
tion 2.3.3. The release of a resource (in this case an output port) is determined
by de-asserting the grant signal.
In the current implementation of the SpW router each SpW router port has
one control register and one status register. The control register determines
the transmission rate divider value, can flush transmitter FIFO and can control
the state of the port (auto-start, disable, start). The status register contains
information about link errors, link-state, and link status. A detailed description
of registers and their fields can be found in Appendix D.
For performance testing purposes trigger circuits were implemented. Triggers
trigger on received Start Of Packet (SOP), received EOP, transmitted SOP, and
transmitted EOP.
3.7 Crossbar 71
3.7 Crossbar
At this point, the output port of an incoming packet is known. However, there
is no physical connection between an input and an output port just yet. This is
where the crossbar circuitry comes in play. It is used to fully connect the SpW
router meaning all input ports can be connected to all output ports. To achieve
this, the crossbar is composed of multiplexers.
Two types of generic multiplexers (in code referred to as selectors) were de-
signed, one for single-bit signals and one for vector signals. The select signal in
the entity description is one-hot encoded, meaning the input is selected based on
which bit is set in the select vector. Before the input is connected to the output
the select vector is checked if it is a power of two number (only one bit set in
the vector), this is to ensure that the output port is connected to only one input
port and is disconnected from all input ports when none of them requested a
connection (selector vector equals zero) or a wrong select vector was applied to
the multiplexer.
Internal connection signals in the router are:
 Data - represent a SpW character (9-bits).
 Strobe - indicates that the SpW character is valid (used as a write enable
signal for the transmit FIFO in the output port, single bit).
 Request - used as a control signal to enable connection between transmit
FIFO write enable and Strobe signals in the output port (signle bit).
 Source port - only used to pass the source port number to the configuration
port, so it knows for which port should it request access when sending a
reply packet (8-bits).
 Ready - indicates if the transmit FIFO in the output port has space to
receive a new character (single bit).
Connections from input to output port are listed in Table 3.3.
Data, Strobe, and Request signals are all multiplexed from all input ports to
all output ports. Source port signals from all input ports are multiplexed only
to the configuration port. However, the Ready signal is multiplexed from output
72 Design Implementation
Table 3.3: Input port to output port signal connections
Input port Output port width
Data out –> Data in 9-bits
Strobe out –> Strobe in 1-bit
Request out –> Request in 1-bit
Source port out –>
Source port in
(only configuration port)
8-bits
Ready in <– Ready out 1-bit
ports to input ports since the output port is the source of this signal. This is
not all, in case of a multicast packet all output ports targeted by the multicast
packet must be able to receive a character. Therefore, the resulting Ready signal
is an and function of Ready signals from all targeted output ports. The code
snippet for the Ready signal and multiplexers implementation can be found in
Appendix C and the simulation results demonstrating how the Ready signal is
determined in Appendix G.
3.8 Control logic
The last component of the SpW router, where everything comes together is the
control logic. It employs an arbitration scheme for issuing the grant signals to
the input ports and is responsible for generating select signals for the crossbar
switch. These two parts of the control logic are referred to as an arbiter and a
switch allocator, respectively. One of the requirements stated in section 1.2 was
that the router should able to route packets based on their priority. Before the
final decision was made how the control logic is going to be implemented, two
options were considered regarding the priority requirement.
The first one was to have a rather simple control logic executing a fair arbi-
tration algorithm (e.g., round-robin) for the switch allocation. And then have a
priority sorted FIFO buffer or a priority queue at the output port. The imple-
mentation of the priority queue would be based on the work presented in [31].
However, due to the complexity and limited time for the implementation, this the
3.8 Control logic 73
Figure 3.14: Control logic block diagram
idea was dropped. A much simpler approach would be to have a FIFO buffer for
each priority level as is described in [32, section 2.2 FIFO Priority]. However, this
limits the number of priority levels and introduces additional buffers, which use
many resources, at the output ports. The second option was to have a control
logic which would employ a priority-based arbitration scheme. The latter was
chosen to avoid additional buffers and to have a flexible number of priority levels.
Figure 3.14 depicts a block diagram of the control logic for a SpW router
with 5 ports (4 physical and 1 configuration port). It is important to point
out that the control logic was implemented as a generic component. Therefore,
it can be configured for any number of ports in the SpW router. The control
logic accepts request, priority, and destination signals form ports. A fanout of
the request signals is created to connect all request signals to all output port
arbiters. However, the value of the request signals is passed to the arbiter only
when that particular output port is requested by the input port (determined by
the destination input). PA block represents a priority arbiter for one output port.
The output grant vector signals have the length equal to the number of ports,
and they each represent one row in the grant signal array. Due to the fact how
vector arrays are represented in the VHDL (columns are not of std logic vector
type) the grant signal array is transposed so that reduction operators and other
vector operations can be used on the grant signal array columns (now represented
by rows and std logic vector type in the transposed grant signal array).
The best way to explain how grant signal and crossbar select signal generators
work is through a simple example. If input port 2 issues a request to access output
74 Design Implementation
Figure 3.15: Grant signal array example
port 1, then with no other traffic in the router, the resulting grant signal array
is depicted in Figure 3.15. The rows in the array correspond to the output grant
signal vectors from the output port priority arbiters. Marked with a red circle is
a grant vector from the priority arbiter of the output port 1. It can be directly
used as a select vector signal for output port multiplexers as it already indicates
that the input port 2 should be connected to the output port 1 by a set bit in
position 2. However, to correctly connect the Ready signal from the output port
to the input port the bit in select vector should be set in position 1 and the
select vector should be connected to the multiplexer of input port 2. One can see
that the select vector can be obtained by taking the second column of the grant
signal array marked by a green circle. The grant signal for the input port 2 is
generated by applying an or reduce operation to the same column. This can be
done because in case of normal packets, only one bit will be set in the column
and an or operation is an easy way to determine if a vector is of none zero value.
In case of a multicast packet, the grant signal for the input port is determined
3.8 Control logic 75
with the following logical equation:
Gran To Port(i) = ∧(grantT row(i)⊕ (¬destination of port(i))).
An example of how the grant signal is determined when one of the input ports
requests to send a multicast packet to output ports 1 and 2 is depicted in Table
3.4. As can be seen from this example, grant signals for all targeted output ports
must be asserted before a grant signal is issued to the input port.
Table 3.4: Grant signal in case of a multicast packet
(a) One grant signal missing
destination 00110 not
¬destination 11001
grantT row vector 00100 xor
11101
grant = and(11101) = 0
(b) All grant signals asserted
destination 00110 not
¬destination 11001
grantT row vector 00110 xor
11111
grant = and(11111) = 1
The key component of the control logic is the Priority Arbiter. It was designed
as a two-stage arbiter with three pipeline stages. A block diagram of the Priority
Arbiter is depicted in Figure 3.16. A key function for determining which input
request signal has the highest priority is finding a maximum/minimum of n k-
bit numbers. However, it was a major hurdle to find and design an optimal
maximum/minimum finder for an FPGA implementation. Fortunately, a good
article was found [33], which describes old and new approaches for tackling this
problem, focusing on the hardware implementations. The design of a maximum
finder for the Priority Arbiter was based on the fastest topology presented in [33]
called Array-Based Topology (AbT). It should be pointed out that their targeted
technology was an ASIC design with UMC Faraday 180 nm standard cell library.
It can be assumed that the relationship between results should be similar for an
FPGA. However, actual values might increase. The AbT design was modified to
find a minimum number among n k-bit numbers and to produce a vector where
a set bit indicates an input with the minimum number. Multiple bits can be set
if more then one input number has a minimum value. This is necessary since
packets might have the same priority.
76 Design Implementation
Figure 3.16: Priority Arbiter block diagram
The first pipeline stage (Stage0) of the Priority Arbiter captures the Request
and Priority signals. In the second pipeline stage (Stage1), the priority signals
are masked with the Request signals to filter out priority values of active requests
only. Filtered Priority signals are then applied to the Max/Min finder. The re-
sults of the Max/Min finder are then stored in a register. In the third pipeline
stage (Stage2) before anything else, the Maximum position vector is checked if
multiple bits are set, which indicates multiple inputs have the same minimum
value. If only one bit is set an appropriate grant signal is asserted, and the Mux
Sel circuit produces a select signal which will configure the output multiplexer to
connect OH Grant signal to the output Grant signal. This completes the arbitra-
tion process, where only one arbitration stage was needed - finding a minimum
number. However, when the Maximum position vector has multiple bits set, a
second arbitration stage is needed to determine which of the inputs with the
same priority value shall get granted access. This is determined with the Round
Robin arbiter. The request signals are masked so that arbitration takes place
only among input requests with the same minimum value. The result from the
Round Robin arbiter is used to set an appropriate grant signal, and the Mux
Sel circuit produces a select signal which will configure the output multiplexer to
connect RR Grant signal to the output Grant signal. This completes the arbi-
tration process with two arbitration stages - finding a set of minimum numbers
and round-robin arbitration.
3.8 Control logic 77
Figure 3.17: Round Robin arbiter topology
The Round Robin arbiter was designed based on topology depicted in Fig-
ure 3.17 which was derived from coding style proposals for round-robin arbiters
in [34]. The Token Registers indicates which request signal has the highest pri-
ority currently. The width of the register is equal to the number of input request
signals. Each bit of the register then drives the enable input pin of the Simple
Priority circuit which is an arbiter that was described in section 2.3.3. Each Sim-
ple Priority circuit has connected to it the input request signals which are shifted
by one position based on the previous Simple Priority circuit. Grant(i) signal is
driven by an OR gate which takes the outputs of Simple Priority circuits that
correspond to the Request(i) input. This is done, so it does not matter which of
the Simple Priority circuits is active when asserting a Grant signal.
In the following paragraphs, a few examples are provided which showcase
different arbitration scenarios that might occur in the control logic during router
operation. In all provided examples a router with four input and output ports
plus a configuration port is considered. Figure 3.18 illustrates an arbitration
process for packets with matching priority.
At stage one the input port 1 and 3 have packets to be routed to output port 2.
The previous input port that used the output port 2 was input port 4 therefore,
the next input port selected by the router will be input port 1 (round-robin
arbitration).
At stage two the input port 1 is selected by the router, and the packet is routed
78 Design Implementation
Figure 3.18: Arbitration of two packets with matching priority
through output port 2.
At stage three, the transmission of the packet from input port 1 has completed.
Now input port 3 can transmit its packet to output port 2.
Figure 3.19 illustrates another arbitration example for packets with matching
priority. At stage one, input port 1 and 3 have packets to be routed to output
port 2. The previous input port that used the output port 2 was input port 4
therefore, the next input port selected by the router will be input port 1.
At stage two, input port 1 is selected by the router, and the packet is routed
through output port 2. Meanwhile, another packet with the same priority as
packets on input port 1 and 3 arrives at input port 2 with output port 2 as its
destination.
At stage three, the transmission of the packet from input port 1 has completed.
Now input port 2 is selected by the router, and the packet is routed through
output port 2 even though the packet arrived later than the one on input port
3. That is because round-robin arbitration is active, and the previous input port
that had access to output port 2 was input port 1. Therefore, input with the
highest priority is now input port 2.
3.8 Control logic 79
At stage four, the transmission of the packet from input port 2 has completed.
Now input 3 can transmit its packet to output port 2.
Figure 3.19: Arbitration of packets with matching priority
Figure 3.20 illustrates an arbitration process for packets with different priority
levels. Logical address value 70 corresponds to a packet with lower priority value
than a packet with logical address value 35. At stage one, input port 1 and 3
have packets to be routed through port 2. The previous input port that used the
output port 2 was input port 2. In case both packets had the same priority, the
next input port selected by the router would have been input port 3. However,
since the packet on input port 1 has higher priority than packet on input port 3,
the next selected input port by the router will be input port 1.
At stage two, input port 1 is selected by the router, and the packet is routed
through output port 2.
At stage three, the transmission of the packet from input port 1 has completed.
Now input port 3 can transmit its packet to output port 2.
Figure 3.21 illustrates another example for packets with different priority lev-
els. Like in previous example logical address value 70 corresponds to a packet
80 Design Implementation
Figure 3.20: Arbitration of two packets with different priorities
with lower priority value than a packet with logical address value 35. At stage
one, input port 1 and 3 have packets to be routed through port 2, and both have
the same priority. The previous input port that used output port 2 was input
port 4 therefore, the next input selected by the router will be input port 1.
At stage two, input port 1 is selected by the router, and the packet is routed
through output port 2. Meanwhile, another packet with logical address value 35
arrives on input port 4 with output port 2 as its destination.
At stage three, the transmission of the packet from input port 1 has completed.
Now input port 4 can transmit its packet to output port since it has the highest
priority. Input port 3 must wait. Meanwhile, another packet with logical address
value 70 arrives on input port 1.
At stage four, the transmission of the packet from input 4 has completed. Now
router must decide between two pending packets on input port 1 and 3. Because
the last input port that used output port 2 was input port 1 with the same pri-
ority as pending packets, the next input port selected by the router would have
been input port 2. However, since input port 2 is inactive (no pending packets)
3.8 Control logic 81
Figure 3.21: Arbitration of multiple packets with different priorities
input port 3 is selected by the router, and the packet is routed through output
port 2.
At stage five, the transmission of the packet from input port 2 has completed.
Now input port 1 can transmit its packet to output port 2. Control logic sim-
ulation results for a similar case can be found in Appendix G where four ports
continuously transmit low priority packets and one port high priority packets.
Figure 3.22 illustrates router behavior in case of a multicast packet. Logical
address value 70 has an assigned multicast set 10101. Meaning a packet with
logical address value 70 should be routed to output port 2 and 4. At stage one
82 Design Implementation
a multicast packet arrives on input port 1. Since there is no other traffic in the
router, all output ports are ready to receive a packet. Therefore, input port 1 is
immediately selected by the router.
At stage two, input port 1 is selected by the router, and the packet is routed
through output port 2 and 4.
Figure 3.22: Arbitration of three packets with matching priority
Figure 3.23 illustrates another example for a multicast packet. As in the
previous example, logical address value 70 has an assigned multicast set 10101.
Meaning a packet with logical address value 70 should be routed to output port
2 and 4. At stage one, a multicast packet arrives on input port 1 during a
transmission of a packet from input port 2 to output port 2.
At stage two, input port 1 must wait since one of the output ports in the multicast
set is busy transmitting a packet.
At stage three, the transmission of the packet from port 2 has completed. Now
both output ports in the multicast set are ready to receive a packet. Therefore,
input port 1 is selected by the router, and the packet is routed through output
port 2 and 4.
3.8 Control logic 83
Figure 3.23: Arbitration of three packets with matching priority
84 Design Implementation
4 Test setup
This chapter describes simulation tests that were used to confirm router function-
ality before synthesis. It describes the hardware test setup in a lab environment
that was used for testing the hardware implementation of the SpW router. It
describes test procedures that were used to confirm the functionality of the hard-
ware implementation of the router and to evaluate its performance.
For all test described in the following sections, a SpW router implementation
with four physical ports and internal clock frequency set to 50MHz was used.
4.1 Test by simulation
Before moving to hardware tests, the router was first tested in a simulation en-
vironment. A VHDL testbench was written to test all basic functionalities of
the router. It was used to check whether the initialization procedure completes
successfully or not.
Then it was used to check if the configuration port works as expected since the
RMAP decoder component had to be slightly modified compared to the one used
in standalone tests. The RMAP decoder variant for the configuration port has
to have the support for four-phase handshake protocol for the request and grant
signals as it needs a way to connect to the control logic and to mimic the behavior
of an actual SpW router port. In order to test this, a few RMAP packets were
sent from a source port connected to one of the input ports to the configuration
port. After a while, the receive FIFO in the source port was checked if a correct
response/reply packet was received.
85
86 Test setup
After that, a few different packets were sent through the router to determine
the routing delay tr which is the time that router needs to determine the output
port of a packet and to make a connection between an input and output port. It
was measured for both path and logically addressed packets. The routing delay
was determined based on how long does it take for the SOP to appear at the
output port. This was easily measured since triggers for the receive SOP in the
input port and for the transmit SOP in the output port were implemented. The
time difference between those two trigger pulses is the routing delay.
Also, a few packets with different lengths were sent through the router to
determine the packet router latency.
Besides, a couple of more tests were performed. A single multicast packet
was sent through the router to see if the multicast function is working. A single
packet with a logical address which had been later deleted by the router was sent
through the router to see if logical address deletion works as expected. Lastly,
multiple packets were sent from multiple source ports at the same time to ensure
that control logic works as intended.
4.2 Hardware test setup
Figure 4.1 depicts the test setup that was used for testing the hardware
implementation of the SpW router. RTG4 Development Kit with FMC
SpaceWire/SpaceFiber Board connected on HPC1 connector is used as a hard-
ware representation of the SpW router. With four SpW connectors on the FMC
board the router can have up to four physical ports. The router is connected with
four SpW cables to four SpW ports on the PXI system. The PXI system is used as
a packet source and sink. For precise timing measurements an additional FPGA is
used. Because only LVDS IO adapter was available for the FPGA a single-ended
to LVDS converter board had be designed to convert single-ended trigger signals
to LVDS standard. An oscilloscope is used to confirm timing measurements and
do additional analysis on captured signals. A test setup in a lab that correspond
to the block diagram shown in Figure 4.1 is depicted in Figure 4.2.
4.2 Hardware test setup 87
Figure 4.1: Hardware test setup block diagram
Figure 4.2: Hardware test setup in a lab environment
88 Test setup
The PXI System has the following configuration:
 PXI-Chassis PXIe-1071 with one controller and three hybrid slots for
connecting PXI components.
 PXI-Controller PXIe-8115 with 2.5GHz Dual-Core i5 processor running
Windows OS as host for the LabVIEW applications (populates the con-
troller slot).
 PXI FPGA Module for FlexRIO with Virtex-5 SX50T FPGA and NI
6585 LVDS IO adapter module connected to it used as a precise timer (it
populates slot 2 as indicated in the block diagram).
 SpW PXI RMAP card from STAR-Dundee with four SpW ports and
four bi-directional trigger IOs (it populates slot 3 and covers slot 4 as indi-
cated in the block diagram).
For executing different tests, several different LabVIEW applications called
Virtual Instruments (VIs) were written in LabVIEW 2017 using STAR-Dundee’s
LabVIEW Application Programming Interface (API). One of the applications
is called a receiver VI. Its sole purpose is to receive SpW packets, however,
depending on the test to be performed it can also log packets or just a part of them
to keep the log files readable since the packets can be very long. Alongside it logs
the source of a packet, a timestamp to indicate when the packet arrived, the time
between two packets, and the packet latency based on its source. The receiver
VI has three while loops. One for handling the user interface, one for receive
operations, and one for logging. At first, the receiver loop included a so-called
Event Structure since functions in the API can produce user events, e.g., transfer
completion event. The idea behind this was that this way no receive operation
could be missed. However, it was later observed that the Event Structure limits
the maximum receiver data rate to around 70 kbps, which is way lower than the
theoretical limit of 160 Mbps. Therefore, the receiver while loop was changed to
polling while loop which in its idle state executes every 100ms and during receive
operations as fast as the system allows it. The data rate of the receiver drastically
improved.
A transmitter VI was written to act as a source of SpW packets. Several
different variations were written to enable different test. In some cases, packets
4.2 Hardware test setup 89
are sent continuously, in some only specified number of packets are sent, in some
packets are sent continuously with a defined period.
For the FPGA within the PXI system, a target VI is needed to program
the FPGA behavior. In this case, the FPGA acts as a precise timing device
implemented as a counter with internal clock frequency set to 100MHz which
gives a time measuring unit of 10 ns. Alongside that, multiple edge detection
circuits were implemented to detect trigger pulses from the SpW router and the
PXI system. These trigger signals are used for capturing time. For example, if
one wants to measure packet latency, one needs to save a timer value to a register
when a receive SOP trigger is asserted and then subtract it from a captured timer
value when a transmit EOP is asserted.
In order to get data from the FPGA to the Windows environment for logging
another VI called host VI is needed. The target VI has to write data into a data
structure called DMA channel which consist of two FIFO buffers. One on the host
computer and one on the FPGA target. The FPGA writes data to target-to-host
DMA FIFO. Later the host VI can read this data by invoking the read method
on the same DMA FIFO. Figure 4.3 represent data flow between a target and a
host VI.
Figure 4.3: Target to host VI data flow [5]
90 Test setup
One final VI was written as an RMAP packet generator. It is used for manip-
ulating register values and routing table entries. However, it is not suitable for
configuring the whole routing table. Therefore, the routing table content should
be configured with a ROM file. This VI is used only for making small changes in
the routing table and checking the status registers.
First, each SpW router component was synthesized on its own to determine
how many resources it uses. Each component synthesis report was compared to
the synthesis report of the whole SpW router to determine what percentage of
the router it represents. This is important as one might want to include the SpW
router in a bigger project. Synthesis results and results from all described test
procedures in this chapter are described in the next chapter 5
4.2.1 Hardware capabilities of the PXI system test
Determines the capabilities of the SpW PXI RMAP card, e.g., achievable data
rate and time between packets in relation to packet length, similarly to what
START-Dundee did for one of their other product [35]. Based on results, one
can determine how fast can packets be transmitted. This test was done to ensure
that one does not try to simulate faster traffic than the hardware can produce.
4.2.2 Packet Router Latency test
This test determines the packet latency caused by the router. The test was
executed by sending a continuous stream of packets, measuring packet latency of
each packet for a specified number of packets before increasing the packet length.
The test was done with no other traffic on the router except one input port
sending packets to one output port. In the next stage of the test, the number
of active input ports was increased to two. Priority and packet length for both
input ports were the same. The idea was to show that packet latency doubles
when packets arrive at the same time compared to the results with only one input
port.
4.2 Hardware test setup 91
4.2.3 Control Logic Functionality test
This test demonstrates that the control logic works as intended. The first stage
of the test was executed by sending several packets on multiple input ports at the
same time with the same priority. This testing stage was done to demonstrate
that round-robin arbitration works. The test was executed for both path (same
priority by default) and logically addressed packets. The second stage of the
test was executed by sending several packets on multiple input ports at the same
time; however, this time, one of the input ports had higher priority packets. This
testing stage was done to demonstrate that priority arbitration works.
4.2.4 Multicast Packet Transmission test
This test demonstrates multicast packet transmission. The test was executed by
setting a multicast set to one logical address then send a packet with this logical
address through one of the input ports of the router and then observe all output
ports.
4.2.5 Logical Address Deletion test
This test demonstrates the logical address character deletion. The test was exe-
cuted by setting the logical address deletion flag in the routing entry for one of
the logical addresses. One packet with this logical address was sent through the
router. An output port was observed to see if the logical address is not present
in the received packet.
92 Test setup
5 Evaluation and discussion of results
This chapter presents results from simulations and tests described in the previous
chapter. It shows simulations and hardware measurements for the initialization
time, routing delay, and packet router latency. It derives router components
resource usage based on synthesis results. It evaluates router functionality based
on control logic, multicast, and logical address deletion tests.
5.1 Initialization time
The implemented SpW router has to initialize all its registers and the routing
table with values from the ROM, after a power-up or a reset which takes some
time, referred to as initialization time. The initialization time was first deter-
mined with a simulation. It is measured as time interval between a point when
the reset signal is de-asserted (RESET n goes from LOW to HIGH – active low
signal) and a point when the initialization done signal is asserted. Figure 5.1
depicts a simulation result and measured initialization time of 259.45 µs. This
value was taken as an expected value for the hardware tests. Figure 5.2 depicts
a scope measurement of the initialization time. The first cursor is set at LOW
to HIGH transition of the RESET n signal. Whereas the second cursor is set at
Figure 5.1: Initialization time - simulation result
93
94 Evaluation and discussion of results
Figure 5.2: Initialization time - scope measurement
LOW to HIGH transition of the Init Done signal. Difference between these two
cursors is the initialization time with a value of 259.46µs indicated by delta x in
bottom right corner. Compared to the expected value its almost the same.
5.2 Routing delay
One of two interesting parameters of the SpW router is the routing delay as was
stated in section 2.3. The routing delay differs between a path and logically ad-
dressed packets since the output port is determined differently for these two types
of packets as was described in section 3.6. In case of path addressed packets, the
packet processor needs 7 clock cycles to connect an input port to an output port.
4 clock cycles correspond to 4 state transitions in the packet processor (transi-
tions 2, 4, 6, 8 in the packet processor state machine diagram which can be found
in Appendix A). Plus 3 additional clock cycles for the grant signal to be issued
by the control logic (three pipeline stages). With the internal clock frequency
set to 50MHz, the resulting routing delay for path addressed packets is 140 ns.
Figure 5.3a depicts a simulation result which confirms the derived routing delay.
The routing delay is measured as a time interval between receive SOP trigger and
5.2 Routing delay 95
transmit SOP trigger as was described in section 4.1. Figure 5.3b depicts a scope
measurement of the routing delay in case of path addressed packets. Measured
routing delay is 140 ns indicated by delta x in bottom right corner. This result
applies to a scenario when only one input port is sending packets to one of the
output ports. For this test, input port 1 was sending packets to output port 1.
Same results are achieved if traffic in the router has a one-to-one mapping,
meaning at any point in time each input port is sending packets to only one of the
output ports. There is no situation where two or more input ports would compete
for the same output port. In any other case, the routing delay also depends
on the packet router latency of currently transmitting packet from a particular
output port, since input ports need to wait for the output port to be free before
transmitting a new packet. For this test, input port 1 was sending packets to
output port 1, input port 2 to output port 2, input port 3 to output port 3,
and input port 4 to output port 4. Figure 5.4a depicts a simulation result where
each input port experiences a routing delay of 140 ns. For some reason receive
SOP trigger in input port 3 is delayed for one clock cycle and consequently, the
transmit SOP trigger as well. Since the anomaly did not affect the functionality
of the router, it was not further investigated at this point. However, there is
speculation that there is a clock skew which delays the processing of a packet by
one clock cycle. But then again this is a pre-synthesis simulation. Figure 5.4b
depicts a scope measurement where the same anomaly is present. However, this
anomaly does not always appear on the same input port.
In the case of logically addressed packets, the packet processor also needs to
request access to the routing table as was described in section 3.6. Therefore, the
routing delay increases compared to the routing delay in case of path addressed
packets. The packet processor needs 12 clock cycles to connect input port to
an output port. 7 clock cycles correspond to 7 state transitions in the packet
processor (transitions 2, 15, 17, 18, 19, 20, 23 in the packet processor state
machine diagram which can be found in Appendix A). Plus 2 additional clock
cycles to receive a grant signal from the routing table arbiter. Plus 3 additional
clock cycles for the grant signal to be issued by the control logic (three pipeline
stages). With the internal clock frequency set to 50MHz, the resulting routing
delay for logically addressed packets is 240 ns. Figure 5.5a depicts a simulation
result which confirms the derived routing delay. Figure 5.5b depicts a scope
96 Evaluation and discussion of results
(a) Simualtion
(b) Scope measurement
Figure 5.3: Routing delay - path addressed packets - one active input port
5.2 Routing delay 97
(a) Simulation
(b) Scope measurement
Figure 5.4: Routing delay - path addressed packets - multiple active input ports
98 Evaluation and discussion of results
measurement of the routing delay in case of logically addressed packets. Measured
routing delay is 240 ns indicated by delta x in bottom right corner. This result
applies to a scenario when only one input port is sending packets to one of the
output ports. For this test, input port 1 was sending packets to output port 1.
In the case of logically addressed packets, the routing delay increases when the
router is busy (all inputs are active but keeping the one-to-one mapping). The
routing delay increases because all input ports need to access the routing table and
must wait on each other. Based on simulation results depicted in Figure 5.6a, an
equation for worst-case routing delay for logically addressed packets was derived.
tr = (1 +
1
2
(n− 1))240 ns, n → number of active input ports
The 240 ns correspond to the routing delay when only one input port is active.
If the equation is rewritten into a different form
tr = 240 ns + (120(n− 1))ns
One can see that for each additional active input port, the routing delay increases
by 120 ns. This value corresponds to 6 clock cycles that are needed by the in-
put port to release the shared resource (routing table). Figure 5.6b depicts a
scope measurement of the routing delay in case of logically addressed packets
and multiple active input ports.
5.2 Routing delay 99
(a) Simulation
(b) Scope measurement
Figure 5.5: Routing delay - logically addressed packets - one active input port
100 Evaluation and discussion of results
(a) Simulation
(b) Scope measurement
Figure 5.6: Routing delay - logically addressed packets - multiple active input
ports
5.3 Synthesis 101
5.3 Synthesis
Resource usage results for the implemented SpW router, which were produced
with Libero synthesis report are listed in Table 5.1. In the current implementa-
tion, the main concern is the number of used CCC components.
Table 5.1: SpW synthesis results - resource usage
Type Used Total Percentage
4LUT 13642 151824 8.99
DFF 5173 151824 3.41
I/O Register 12 2160 0.56
User I/O 70 720 9.72
Single-ended I/O 38 720 5.28
Differential I/O Pairs 16 360 4.44
RAM64x18 15 210 7.14
RAM1K18 3 209 1.44
MACC 0 462 0
H-Chip Globals 10 48 20.83
CCC 6 8 75
RCOSC 50MHZ 0 1 0
SYSRESET 1 1 100
SERDESIF Blocks 0 6 0
FDDR 0 2 0
UPROM 1 1 100
GRESET 1 1 100
RGRESET 2 206 0.97
Figure 5.7 depicts a pie chart which shows a percentage of LUTs used by each
component in the router compared to the whole router design. As can be seen
from the chart, the control logic uses 51% of LUTs used by the router. A little
more than one-third of LUTs are used by the SpW router ports. The remaining
LUTs are divided among other components. Figure 5.8 depicts a pie chart which
shows a percentage of Dffs used by each component in the router compared to the
whole router design. With 61%, the majority of Dffs are used by the SpW router
102 Evaluation and discussion of results
ports. The control logic uses 13% of Dffs, the configuration port 11%, the routing
table arbiter 7%, and the remaining Dffs are divided among other components.
Based on this results one can derive that one SpW router port uses around
1227 LUTs and 788 Dffs. Whereas control logics uses around 1739 LUTs and
168 Dffs for each port added to the router. The interesting thing is to look at
how many resources would a router with a maximum number of ports (31) use.
Since control logic and SpW router ports together use more resources than other
components combined, it is enough to multiply resource usage for one port by the
number of ports to get a rough estimation. Moreover, other components do not
change much if additional ports are added. Results for 31 ports would be 38037
LUTs and 24428 Dffs for SpW router ports, and 53909 LUTs and 5208 Dffs for
the control logic. Therefore, a 31 port router would use more than 91946 LUTs
which is more than 60% of available LUTs in the RTG4 and 29636 Dffs which
is more than 19% of available Dffs in the RTG4. This result is to show that the
router resource usage must be considered when including the router in a bigger
design project.
Figure 5.7: SpW router components LUT usage
5.4 Hardware capabilities of the PXI system 103
Figure 5.8: SpW router components Dff usage
5.4 Hardware capabilities of the PXI system
Figure 5.9 depicts results from data rate and average time between packets tests
described in section 4.2.1 for the SpW PXI RMAP card. Figure 5.9a shows how
data rate increases with increasing packet length. The maximum achievable data
rate with this setup is around 147 Mbps, which is close to the theoretical limit
of 160 Mbps of the SpW standard. The maximum transmission rate of the SpW
is 200 Mpbs. However, the data rate of 200 Mbps cannot be achieved since data
characters are encoded with 10 bits, where only 8 out of 10 bits (80%) represent
actual data. As a result, the maximum data rate drops by 20%, which is equal
to 160 Mbps.
Figure 5.9b shows how average time between packets changes when packet
length is increased. The minimum average time between packets is 60 µs for
packets with a packet length of 10. For packets with packet length below 1000,
104 Evaluation and discussion of results
the average time between packets is defined by the fastest possible transmission
speed of the hardware which is between 60 µs and 90 µs. However, for packets with
packet length above 1000, the time between packets is defined by the processing
speed of the packets since the hardware cannot process the packets as fast as they
could be transmitted due to their length. Therefore, the average time between
packets has a linear correlation with packet length for packet lengths above 1000.
5.5 Packet router latency
The second interesting parameter of a router is the router propagation delay or
router packet latency. Based on simulation results, an equation was derived for
calculating the packet router latency in relation to the packet length. For path
addressed packets:
Prl =
⎧⎨⎩tr + (7 ∗ 40 ns) +
⌈︁
(Plen − 7)12
⌉︁
40 ns +
⌊︁
(Plen − 7)12
⌋︁
60 ns, if Plen > 7
tr + Plen ∗ 40 ns, if Plen ≤ 7
and logically addressed packets
Prl =
⎧⎨⎩tr + (17 ∗ 40 ns) +
⌈︁
(Plen − 17)12
⌉︁
40 ns +
⌊︁
(Plen − 17)12
⌋︁
60 ns, if Plen > 17
tr + Plen ∗ 40 ns, if Plen ≤ 17
. Where Prl is packet router latency, tr routing delay, and Plen packet length.
Figure 5.10 depicts results for packet router latency test described in sec-
tion 4.2.2. Data series which have “single source” written in parentheses by their
name correspond to a test where only one input port was sending packets to one
output port. Data series which have “multiple sources” written in parentheses
by their name correspond to a test where two inputs were sending packets to the
same output port. For both tests, packets with path address were used. The
dotted line “Expected Latency (single source)” plots the expected latency in case
of one active input port. At packet length set to 9000, measured packet router
latency (“Packet Latency src1 (single source)”) starts to deviate from the ex-
pected latency line because the receiver data rate (green line - “Data Rate (single
5.5 Packet router latency 105
(a) Data Rate
(b) Average Time Between Packets
Figure 5.9: SpW PXI RMAP card performance
106 Evaluation and discussion of results
source)”) reached its maximum of 147 Mbps. This value corresponds to the result
found with the hardware capabilities test in section 5.4. From here on, packets
experience additional latency due to the receiver not being able to process packets
fast enough.
Figure 5.10: Router packet latency
For the second part of the test, two input ports were sending packets to the
same output port. In this case, the expected latency doubles and is plotted by the
yellow dotted line “Expected latency (multiple sources)”. The expected latency
doubles because when sending a continuous stream of packets from two input
ports to one output port, one input port must always wait for the other one
to finish transmitting its packet. Therefore, the packets experience a combined
latency of packets from both input ports. In case when packets have the same
5.5 Packet router latency 107
length, this equals two times the latency compared to if only one input port were
sending packets. Gray (“Packet Latency src3 (multiple sources)”) and orange
(“Packet Latency src1 (multiple sources)”) curves plot measured packet routing
latency for two different sources which were sending packets to the same output
port. At the start, both curves follow the expected latency line for two sources,
but at packet length 2000 they start to follow the expected latency line for one
source. This is because tests are executed on a Windows machine where is no
guarantee that tasks will be executed before a deadline. At packet lengths 2000,
3000, and 4000, tasks responsible for sending packet are scheduled in a way
that packets are sent one after the other and not at the same time, e.g., source
one sends a packet then the second source sends a packet right after the packet
from source one has finished transmission. Therefore, packets from both sources
experience only their packet router latency. Similar to the first test, at packet
length 5000, receiver data rate reaches its maximum, and router packet latency
starts to deviate from the expected latency.
During extensive testing, some of the packets were not received in entirety.
A few characters were missing. Since the pattern seemed to be random, the
current test setup was not sufficient to provide the answer where is the problem.
The testing environment had to be switched back to a VHDL testbench and
signals that might cause problems had to be checked. First thought was that
this was caused by the control logic, that for some reason it issues a grant signal
before the previous packet has been completely transmitted. This assumption
was based on the fact that packet headers were received intact. However, a
sperate test of the control logic before it was included in the final design produced
flawless results. Therefore, this did not make any sense. The problem had to be
somewhere else. There was one more place which could have been the root of this
problem. The SpW router port, FIFO buffers in the SpW CODEC to be exact.
As it turned out, this was the case. For some reason, a pipeline register was
added to the Ready Out signal which caused timing difference between signals
and consequently caused SpW characters to be written in the transmit FIFO even
in the case when it was full.
108 Evaluation and discussion of results
5.6 Control logic
Figure 5.11 depicts output port traffic with packets from four different sources as
a result of the test described in section 4.2.3. All packets have the same priority
level; therefore, the round-robin arbitration takes place as can be seen in the
Figure. Markings indicate in what order were the packets received. Figure 5.12
depicts output port traffic with packets form four different sources. All packets
have the same priority level except packets from source port 4 which have a higher
priority level than others; therefore, they are routed as soon as the current packet
finishes transmission. The marking indicates in what order were the packets
received. Black vertical lines indicate when a high priority packet arrived. A high
priority packet arrived while packed marked with number 3 was being received. As
soon as this packet finishes transmission, the packet with high priority is received.
After that, packets with lower priority continue to follow a cyclic pattern. A
similar situation occurred when the packet with number 8 was being received.
Figure 5.11: Control logic - Round-Robin arbitration
5.7 Multicast 109
Figure 5.12: Control logic - Priority arbitration
5.7 Multicast
Figure 5.13 shows a simulation result of a multicast packet transmission from
a test described in section 4.2.4. One packet with logical address x”29” and a
multicast set “01101” was sent from source port 1. Based on the multicast set
value packet is routed to output port 2 and 3. First, packet characters are stored
in a transmit FIFO in the source port 1 (signal tx data source port1). After
the packet is processed by the router, the packet appears at the same time in
the receive FIFO of source ports 2 (signal rx data source port2) and 3 (signal
rx data source port3) which are connected to ports 2 and 3 of the router.
5.8 Logical Address Deletion
Result of a test described in section 4.2.5 is depicted in Figure 5.14. A packet with
a logical address is routed through the router; however, after leaving the router,
its logical address is deleted. This feature is necessary when regional addressing
is used in the SpW network in order to expose the second logical address, which
is applicable within a region.
110 Evaluation and discussion of results
Figure 5.13: Multicast packet transmission - simulation
Figure 5.14: Logical address deletion: On the left is a transmitted packet with
logical address x”20” (marked with red circle, on the left is received packet with-
out logical address)
6 Conclusion
This chapter concludes the thesis. It provides a summary which explains what
was done and achieved in the scope of this thesis and points out some prob-
lems. Besides, it gives an outlook on what features and improvements could be
implemented in future SpW router design iterations.
6.1 Summary
The goal of this thesis was to design a SpW router based on a set of requirements.
The design is written as a VHDL implementation to have a certain degree of
flexibility to implement new features and to provide a simple way to iterate
the current design. The target device for the SpW router design was an RTG4
FPGA. The SpW router design task was split into smaller tasks focusing on
separate router components. Each component was separately tested before it
was included in the final design to minimize the number of possible errors.
The SpW router is based on a wormhole switching architecture in order to
meet the speed requirements of the SpW standard. It supports transmission
speeds up to 200 Mbps. It is based on a fully connected topology, meaning
any input port can be connected to any output port. In order to achieve full
connectivity, a crossbar switch was implemented. It supports both path and
logical addressing.
The implemented SpW router has as configuration port which enables con-
figuring the router from anywhere within the network. It is used for accessing
internal registers of the router and the routing table via the SpW network. It
111
112 Conclusion
supports all three types of RMAP commands (read, write, read-modify-write).
Since SpW networks are usually of irregular topology, the router does not
have an implemented circuit for executing a routing algorithm but instead uses
a routing table to determine the destination of packets. The routing table is
considered as a shared resource for all input ports. Therefore, a routing table
arbiter was implemented to resolve simultaneous access to the routing table by
multiple input ports. A round-robin arbitration scheme is used to ensure fair
arbitration since the priority of packets is not known at this point.
In order for the SpW router to connect to other network nodes, each SpW
router port needs a circuit for encoding/decoding SpW characters. For this pur-
pose, STAR-Dundee’s SpW CODEC IP core was used. Not only does it en-
code/decode SpW characters but it also serializes them into a bitstream to create
a SpW standard-compliant link. Each SpW router port has a dedicated packet
processor which is used to control the flow of incoming packets and to determine
the output port of an incoming packet. The packet processor also has an option
to delete the logical address character if an appropriate flag is set in the control
field of the routing table entry. This function is needed when regional addressing
is used since packets, in this case, have two logical addresses, one for the region
and one for addressing the correct node within the region. In order to expose the
second one, the first one is deleted upon packet arrival into the network region.
During SpW router design, the main focus was on the control logic design.
Compared to existing SpW router implementations, the control logic in the SpW
router presented in this thesis employs a dynamic priority arbitration scheme
which means that the priority level of packets can change during run time. More-
over, it supports multiple priority levels in contrast to the existing implemen-
tation, where usually only two priority levels exist. Also, the control logic has
additional support for multicast packet transmission, which is not available on
any device currently available on the market.
Key components of the SpW router, e.g., control logic, crossbar switch, routing
table arbiter, were designed with a generic approach in mind. Unfortunately,
current SpW router implementation is not fully generic because SpW router ports
use a CCC component as a clock recovery circuit which cannot be used as a generic
6.2 Outlook 113
component.
The implemented SpW router was tested on an RTG4 development kit board.
The router functionalities work as intended. However, during routing delay tests,
it was found that currently for some unknown reason packet processing gets
delayed for one clock cycle in some of the input ports. The speculation is that
this might be due to clock skew.
One more thing to be careful about when using this SpW router implemen-
tation is to make sure to design a deadlock-free SpW network since the current
implementation does not have an implemented recovery mechanism in case of a
deadlock.
6.2 Outlook
The first thing to be done in the near future is to create a generic SpW router
soft IP core with a generic parameter which sets the number of SpW router ports.
In order to do that, the usage of CCC component for clock recovery should be
omitted. An alternative generic solution for the clock recovery is needed. Also, a
top-level HDL file should be used to connect all router components instead of a
Smart Design canvas, as is the case in the current implementation. The soft IP
core would allow the router to be easily included in a larger project.
A significant concern when including the router in a larger design is its resource
usage. In the future router design iteration, it should be looked at if the control
logic can be optimized in order to reduce its resource usage since it represents
50% of the resources used by the router.
Current router implementation does not support group adaptive routing.
Foundation for this feature is already prepared, e.g., control field in the rout-
ing table entry has a bit dedicated for enabling group adaptive routing. However,
additional modifications are needed in the packet processor and the control logic.
An important component to add to the router is a watchdog timer. It would
be used as a recovery mechanism in case of a deadlock or a network stall.
114 Conclusion
Another feature missing in the router is a transmission of broadcast codes
which include timecodes and distributed interrupts. Therefore, in the future
design iterations, support for broadcast codes should be implemented.
At the moment SpW router ports only have a few registers. Additional reg-
isters could be added, for example, a status register for indicating a number of
packets dropped by the input port.
The SpW standard defines two types of router ports. One is the SpW router
port, and the other one is the FIFO port which can be used to directly connect
devices local to the router without using a SpW link but rather writing data
directly into a FIFO buffer in the router port. In later design iterations, a generic
parameter could be used to define the number of regular SpW router ports and
the number of FIFO ports.
The routing table size could be reduced since the destination field, and the
multicast set field both represent almost the same thing. They both indicate to
which output port or ports a packet should be routed. If the multicast enable
flag would be moved from bit 0 in the multicast set to the control field, both
mentioned fields could be combined into one. Also, a routing table ROM file
generator program should be written (e.g., a python script) to eliminate binary
ROM file editing.
Lastly, the RMAP decoder component can be used to implement an RMAP
target device, e.g., a network-connected memory. It can be easily modified to
support larger read and write operations.
Bibliography
[1] S. Parkes, SpaceWire User Guide. STAR-Dundee Limited, 2012.
[2] ECSS-E-ST-50-12C-Rev1 SpaceWire - Links, Nodes, Routers and Networks,
ESA Requirements and Standards Division Std., Rev. 1.0, May 2019.
[3] J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection Networks: An
Engineering Approach. San Francisco, CA: Morgan Kaufmann, 2003.
[4] AMBA APB Protocol Specification, C ed., ARM, April 2010.
[5] How DMA Transfers Work (FPGA Module) - LabVIEW 2016
FPGA Module Help - National Instruments. Accessed 11.9.2019. [On-
line]. Available: https://zone.ni.com/reference/en-XX/help/371599M-01/
lvfpgaconcepts/fpga dma how it works/
[6] SpW-10X SpaceWire Router DATASHEET, G ed., Atmel, February 2013.
[Online]. Available: http://ww1.microchip.com/downloads/en/DeviceDoc/
doc7796.pdf
[7] GR718B Radiation-Tolerant 18x SpaceWire Router DATASHEET, 3rd ed.,
Cobham, July 2018, accessed 5.8.2019. [Online]. Available: https:
//www.gaisler.com/doc/gr718/gr718b-ds-um.pdf
[8] SpaceWire - Commercial Products. ESA. Accessed 5.8.2019. [Online].
Available: http://spacewire.esa.int/content/Devices/Commercial.php
[9] Shimafuji SpaceWire open IP cores. Shimafuji Electric Inc. Ac-
cessed 28.8.2019. [Online]. Available: https://github.com/shimafujigit?
tab=repositories
115
116 BIBLIOGRAPHY
[10] SpaceWire. ESA. Accessed 30.6.2019. [Online]. Avail-
able: http://www.esa.int/Our Activities/Space Engineering Technology/
Onboard Computer and Data Handling/SpaceWire
[11] ECSS-E-ST-50-52C SpaceWire - Remote Memory Access Protocol, ESA Re-
quirements and Standards Division Std., Februray 2010.
[12] Libero SoC v12.0 and later — Microsemi. Microsemi. Ac-
cessed 20.8.2019. [Online]. Available: https://www.microsemi.com/
product-directory/design-resources/1750-libero-soc
[13] B. Chemli and A. Zitouni, “A Turn Model Based Router Design for 3D
Network on Chip,” p. 8, 2014.
[14] M. Dridi, S. Rubini, M. Lallali, M. J. S. Florez, F. Singhoff, and J.-P. Diguet,
“DAS: An Efficient NoC Router for Mixed-Criticality Real-Time Systems,”
in 2017 IEEE International Conference on Computer Design (ICCD).
Boston, MA: IEEE, November 2017, pp. 229–232, accessed 3.7.2019.
[Online]. Available: http://ieeexplore.ieee.org/document/8119215/
[15] L. Rooban and S. Dhananjeyan, “Design of Router Architecture Based On
Wormhole Switching Mode For NoC,” ISS N, vol. 3, no. 3, p. 5, 2012.
[16] S. Swapna, “Efficient Router Design for Network On Chip,” Master’s thesis,
National Institute Of Technology Rourkela, Rourkela, 2013.
[17] N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N.
Seizovic, and W.-K. Su, “Myrinet: A Gigabit-per-Second Local Area Net-
work,” IEEE Micro, vol. 15, pp. 29–36, 1995.
[18] AMBA AXI and ACE Protocol Specification AXI3, AXI4, and AXI4-Lite
ACE and ACE-Lite, D ed., ARM, October 2011.
[19] ARM AMBA 5 AHB Protocol Specification AHB5, AHB-Lite, B.b ed., ARM,
October 2015.
[20] ARM Cortex-M4 32b MCU+FPU, 210DMIPS, up to 1MB Flash/192+4KB
RAM, Crypto, USB OTG HS/FS, Ethernet, 17 TIMs, 3 ADCs, 15 Comm.
Interfaces & Camera, 8th ed., STMicroelectronics, September 2016.
BIBLIOGRAPHY 117
[21] The CoreConnect Bus Architecture, IBM, 1999, accessed 24.8.2019.
[Online]. Available: http://www.scarpaz.com/2100-papers/SystemOnChip/
ibm core connect whitepaper.pdf
[22] CoreAPB3 v4.1 Handbook, 3rd ed., Microsemi, December 2014.
[23] RM0090 Reference Manual, 18th ed., STMicroelectron-
ics, February 2019, accessed 27.8.2019. [Online]. Avail-
able: https://www.st.com/content/ccc/resource/technical/document/
reference manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/
DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf
[24] RTG4 FPGA Fabric User Guide, 4th ed., Microsemi, 2018.
[25] Inferring Microsemi RTG4 RAM Blocks, Synopsys, June 2018.
[26] RTG4 Radiation-Tolerant FPGAs — Microsemi. Accessed 28.8.2019.
[Online]. Available: https://www.microsemi.com/product-directory/
rad-tolerant-fpgas/3576-rtg4
[27] Building an APB3 Core for SmartFusion cSoC FPGAs, 3rd ed., Microsemi,
February 2012.
[28] (2019, May) SpaceWireRouterIP 6PortVersion. shimafujigit. Ac-
cessed 28.8.2019. [Online]. Available: https://github.com/shimafujigit/
SpaceWireRouterIP 6PortVersion
[29] C. McClements, SpaceWire Interface User Manual.Pdf, 2nd ed., STAR-
Dundee, January 2017.
[30] Implementing SpaceWire Clock and Data Recovery in RTG4 FPGAs Appli-
cation Note, 3rd ed., Microsemi, March 2018.
[31] I. Benacer, F.-R. Boyer, and Y. Savaria, “A Fast, Single-
Instruction–Multiple-Data, Scalable Priority Queue,” IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 10,
pp. 1939–1952, October 2018, accessed 7.9.2019. [Online]. Available:
https://ieeexplore.ieee.org/document/8374984/
118 BIBLIOGRAPHY
[32] K. Shin, J. Rexford, and Sung-Whan Moon, “Scalable hardware priority
queue architectures for high-speed packet switches,” IEEE Transactions
on Computers, vol. 49, no. 11, pp. 1215–1227, November 2000, accessed
27.6.2019. [Online]. Available: http://ieeexplore.ieee.org/document/895938/
[33] B. Yuce, H. F. Ugurdag, S. Gören, and G. Dündar, “Fast and Efficient
Circuit Topologies forFinding the Maximum of n k-Bit Numbers,” IEEE
Transactions on Computers, vol. 63, no. 8, pp. 1868–1881, August 2014.
[34] M. Weber, Arbiters: Design Ideas and Coding Styles. SNUG Boston, 2001.
[35] S. Mills, “Performance of STAR-System,” STAR-Dundee, Tech. Rep.
Appendix
119
120 Appendix
A Detailed state machine diagrams
In this Appendix detail state machine diagrams are provided. All state names
and transitions have the same names as the signal names later used in the VHDL
implementation.
121
122 Detailed state machine diagrams
Figure A.1: Initializer APB state machine
Figure A.2: Initializer APB transfer operation state machine
123
Figure A.3: Initializer APB initialization phase state machine
124 Detailed state machine diagrams
Figure A.4: Receive data/calculate CRC state machine
Figure A.5: Transmit data/calculate CRC state machine
125
Figure A.6: Internal bus access state machine
126 Detailed state machine diagrams
127
128 Detailed state machine diagrams
129
Figure A.7: RMAP command decoder state machine
130 Detailed state machine diagrams
131
132 Detailed state machine diagrams
133
Figure A.8: RMAP reply encoder state machine
134 Detailed state machine diagrams
Figure A.9: Packet Processor state machine
B Detailed block diagrams
This appendix provides a detailed view of block diagrams for configuration port
and SpW router ports.
Figure B.1 depicts a detailed block diagram of a configuration port block
diagram from Figure 3.8. It includes all signals that are used to connect the
configuration port to the router.
Figure B.2 depicts a detailed block diagram of a SpW router port block di-
agram from Figure 3.13. It includes all signals that are used to connect the
configuration port to the router.
Figure B.3 depicts a block diagram which represent a top entity of imple-
mented SpW router ports with all external components. As mentioned, CCC
component is used as a clock recovery circuit. Port Registers block represents all
control and status registers of SpW router ports. Link Up signal generator gener-
ates signals which indicate whether SpW router port has an active link connected
to it or not. Tx rate update block asserts an update signal whenever a transmis-
sion rate divider changes in a control register. LED logic is used for generating
appropriate LED color signals which correspond to the link status and are used
for driving the LEDs on the FMC board. LEDs are used as debugging feature.
135
136 Detailed block diagrams
Figure B.1: A detailed block diagram of a configuration port
137
Figure B.2: A detailed block diagram of a SpW router port
138 Detailed block diagrams
Figure B.3: A detailed block diagram of SpW router ports with external compo-
nents
C Code snippets
C.1 Bit selector (multiplexer) component
1 entity bit_selector is
2 generic(
3 N : positive := 5
4 );
5 port (
6 -- inputs
7 A : in std_logic_vector(N-1 downto 0);
8 SEL : in std_logic_vector(N-1 downto 0);
9 -- outputs
10 Y : out std_logic
11 );
12 end bit_selector;
13 architecture RTL_0 of bit_selector is
14 constant log2_N : integer := integer(ceil(log2(real(N))));
15 signal i_sel : std_logic_vector(log2_N -1 downto 0);
16
17 begin
18 B_ENC: entity work.binary_encoder
19 generic map(N => N)
20 port map(A => SEL , BCODE => i_sel);
21
22 Y <= A(to_integer(unsigned(i_sel ))) when is_power_of_2(SEL) else ’0’;
23 end architecture;
C.2 Vector selector (multiplexer) component
1 entity vector_selector is
2 generic(
139
140 Code snippets
3 N : positive := 5;
4 K : positive := 8
5 );
6 port (
7 -- inputs
8 A : in slv_array (0 to N-1)(K-1 downto 0);
9 SEL : in std_logic_vector(N-1 downto 0);
10 -- outputs
11 Y : out std_logic_vector(K-1 downto 0)
12 );
13 end vector_selector;
14 architecture RTL_0 of vector_selector is
15 constant log2_N : integer := integer(ceil(log2(real(N))));
16 signal i_sel : std_logic_vector(log2_N -1 downto 0);
17
18 begin
19 B_ENC: entity work.binary_encoder
20 generic map(N => N)
21 port map(A => SEL , BCODE => i_sel);
22
23 Y <= A(to_integer(unsigned(i_sel ))) when is_power_of_2(SEL) else
24 (others => ’0’);
25 end architecture;
C.3 Is power of 2 function
1 function is_power_of_2 (x: std_logic_vector) return boolean is
2 begin
3 return (x /= 0) and ((x and (x-1)) = 0);
4 end function;
C.4 Binary Encoder component
1 entity binary_encoder is
2 generic(
3 N : natural := 8
4 );
5 port (
6 -- inputs
7 A : in std_logic_vector(N-1 downto 0);
8 -- outputs
C.5 Ready signal - crossbar 141
9 BCODE : out std_logic_vector (( integer(ceil(log2(real(N))))) -1 downto 0)
10 );
11 end binary_encoder;
12 architecture arch_0 of binary_encoder is
13 constant log2_N : integer := integer(ceil(log2(real(N))));
14
15 type mask_2d_type is array(log2_N -1 downto 0) of
16 std_logic_vector(N-1 downto 0);
17 signal mask : mask_2d_type;
18
19 function gen_or_mask return mask_2d_type is
20 variable or_mask : mask_2d_type;
21 begin
22 for i in (log2_N -1) downto 0 loop
23 for k in (N-1) downto 0 loop
24 if (k/(2**i) mod 2) = 1 then
25 or_mask(i)(k) := ’1’;
26 else
27 or_mask(i)(k) := ’0’;
28 end if;
29 end loop;
30 end loop;
31 return or_mask;
32 end function;
33 begin
34 mask <= gen_or_mask;
35 process(mask , A)
36 variable tmp_row: std_logic_vector(N-1 downto 0);
37 begin
38 for i in (log2_N -1) downto 0 loop
39 tmp_row := a and mask(i);
40 bcode(i) <= or tmp_row;
41 end loop;
42 end process;
43 end architecture;
C.5 Ready signal - crossbar
1 -- conenct correct destination and source ready pins
2 READY_sw: for i in 0 to N-1 generate
3 READY_sel: entity work.bit_selector
142 Code snippets
4 generic map(N => N)
5 port map(Ready_Out_port , sel_for_source_port(i), i_ready_in(i));
6
7 process(all)
8 variable tmp : std_logic;
9 begin
10 for j in 0 to sel_for_source_port(i)’length -1 loop
11 if j = 0 then
12 if sel_for_source_port(i)(j) = ’1’ then
13 tmp := Ready_Out_port(j);
14 else
15 tmp := ’1’;
16 end if;
17 else
18 if sel_for_source_port(i)(j) = ’1’ then
19 tmp := tmp and Ready_Out_port(j);
20 else
21 tmp := tmp and ’1’;
22 end if;
23 end if;
24 end loop;
25 if unsigned(sel_for_source_port(i)) = 0 then
26 i_ready_in_mc(i) <= ’0’;
27 else
28 i_ready_in_mc(i) <= tmp;
29 end if;
30 end process;
31
32 Ready_In_port(i) <= i_ready_in(i) when is_power_of_2(sel_for_source_port(i))
33 else i_ready_in_mc(i);
34 end generate;
D Register descriptions
In this Appendix one can fined a detail description of SpW router registers.
D.1 Configuration port registers
CONFIG0 - Configuration register 0
Base address: 0x20000000
Offset address: 0x00
Reset value: 0x00000000
 Bits 31..16: Reserved Must be kept at reset value.
 Bits 15..8 - KEY7..0: RMAP key These bits define the RMAP key to
be used for authenticating RMAP commands received by the Configuration
port.
 Bits 7..0 - LADDR7..0: Logical Address These bits define the logical
address of the configuration port.
143
144 Register descriptions
D.2 SpaceWire port registers
Pn CR0 - Port n control register 0
Base address: 0x30000000
Offset address: 0x00+n
Reset value : 0x00000000
 Bits 31..9: Reserved Must be kept at reset value.
 Bits 8..4 - TXR4..0: Transmission rate divider value These bits
define the the clock divider value and control the output transmission speed.
 Bit 3 - FLU: Flush transmitter FIFO buffer This bit is used to flush
the transmitter FIFO buffer in the SpW router port. This bit should be set
only when the SpW link is not running.
 Bits 2..0 - LCTRL2..0: Link control These bits are used to control
the state of the SpW link.
– LCTRL2 - Auto start: Auto start the link. When this bit is set
the link state machine will wait in state Ready until the first NULL
character is received.
– LCTRL1 - Link Disable: This bit is used to disable or stop the link
when set. It overrides the LCTRL0 bit.
– LCTRL0 - Link Start: When set this bit enables the link to
establish a connection.
D.2 SpaceWire port registers 145
Pn SR0 - Port n status register 0
Base address: 0x30000000
Offset address: 0x40+n
Reset value : 0x00000000
 Bits 31..21: Reserved Must be kept at reset value.
 Bits 20..10 - LERRS10..0: Link Error Status
– LERRS10 - Disc Run Err: When this bit is set it indicates a
disconnect error in run state.
– LERRS9 - Parity Run Err: When this bit is set it indicates a
parity error in run state.
– LERRS8 - Escape Run Err: When this bit is set it indicates an
escape error in run state.
– LERRS7 - Credit Run Err: When this bit is set it indicates a
transmit or receive credit error in run state.
– LERRS6 - Disc Err: When this bit is set it indicates a disconnect
error detected by receiver.
– LERRS5 - Parity Err: When this bit is set it indicates a parity
error detected by receiver.
– LERRS4 - Escape Err: When this bit is set it indicates an escape
error detected by receiver.
– LERRS3 - Rx Credit Err: When this bit is set it indicates a
detection of receive credit error.
– LERRS2 - Tx Credit Err: When this bit is set it indicates a
detection of transmit credit error.
146 Register descriptions
– LERRS1 - Nchar Seq Err: When this bit is set it indicates a
detection of start-up sequence error (time-code/data before first FCT).
– LERRS0 - Tcode Seq Err: When this bit is set it indicates a
detection of start-up sequence error (time-code/data before first FCT).
 Bits 9..7 - LSTATE2..0: Link State These bits represent the interface
state encoded into three bits.
LSTATE2 LSTATE1 LSTATE0 State
0 0 0 ErrorReset
0 0 1 ErrorWait
0 1 0 Ready
0 1 1 Started
1 0 0 Connecting
1 0 1 Run
1 1 0 Undefined
1 1 1 Undefined
 Bits 6..0 - LS6..0: Link Status
– LS6 - Link Running: When this bit is set it indicates that the
interface state machine is in Run state.
– LS5 - Got NULL: When this bit is set it indicates that the receiver
got NULL. Remains asserted after the first NULL.
– LS4 - Got FCT: When this bit is set it indicates that the receiver
got FCT. Remains asserted after first FCT.
– LS3 - Got NChar: When this bit is set it indicates that the receiver
got Nchars. Remains asserted after the first Nchar.
– LS2 - Got Timecode: When this bit is set it indicates that the
receiver got Timecodes. Remains asserted after the first Timecode.
– LS1 - Tx Has Credit: When this bit is set it indicates that the
transmitter has credit to send one more data character.
– LS0 - Rx Expecting Data: When this bit is set it indicates that
the receive buffer has sent FCTs and therefore is expecting data from
the other end.
E Routing Table entry detailed
description
Routing table holds the necessary information for routing packets. This infor-
mation includes a destination port or ports (multicast packet) of an incoming
packet, priority of the packet, and some additional control flags. The routing
table is implemented as a SRAM memory block with separate read and write
ports. Write port has an aspect ratio of 768x32-bits, whereas the read port has
an aspect ratio of 256x96-bits. A 10-bit address is used to address the routing
table. In case of a bus read operation, 8 LSB bits of the address are used as
SRAM address and the reaming MSBs are used to decode which field from the
routing table entry should be put on the APB data bus. Input ports, on the
other hand, always read the whole routing table entry (96-bits) since all fields
are needed to route a packet correctly. Destination fields are addressed with
base address 0x000, Multicast set fields are addressed with base address 0x100,
and Control fields are addressed with base address 0x200.
Routing Table entry
147
148 Routing Table entry detailed description
Destination field
 Bits 31..0 - Destination port31..0: Destination port These bits define
the output port of an incoming packet when the logical address is used. Each
bit position corresponds to an output port with the same number. E.g., if
packets need to be routed through output port 3, bit 3 should be set.
Multicast set field
 Bits 31..1 - Multicast set31..1: Multicast set These bits define the
output port of an incoming packet when the logical address is used, and
multicast is enabled. Each bit position corresponds to an output port with
the same number. E.g., if packets need to be routed through output port 2
and 3, bit 2 and 3 should be set.
 Bit 0 - MCEN: Multicast enable When set it enables multicast trans-
mission of a packet.
149
Control field
 Bits 31..10: Reserved Must be kept at reset value.
 Bit 9 - GAE: Group Adaptive routing enable When set it enables
group adaptive routing.
 Bits 8 - LAD: Logical Address Deletion When set it enables logical
address deletion.
 Bits 7..0 - Priority7..0: Priority level These bits define the priority
level of a packet.
150 Routing Table entry detailed description
F RMAP conformance statement
This product conforms to the SpaceWire RMAP Target only specification of the
ECSS SpaceWire Protocols Standard (ECSS-E-ST-50-52)
F.1 RMAP write command
This product conforms to the SpaceWire RMAP Write specification of the ECSS
SpaceWire Protocols Standard (ECSS-E-ST-50-52)
F.2 RMAP read command
This product conforms to the SpaceWire RMAP Read specification of the ECSS
SpaceWire Protocols Standard (ECSS-E-ST-50-52)
F.3 RMAP read-modify-write command
This product conforms to the SpaceWire RMAP Read-Modify-Write specification
of the ECSS SpaceWire Protocols Standard (ECSS-E-ST-50-52)
151
152 RMAP conformance statement
Table F.1: RMAP write command characteristics
Action Supported
Maximum data
length(bytes)
Not-Aligned
access accepted
8-bit write No - -
16-bit write No - -
32-bit write Yes 4 -
64-bit write No - -
Verified write Yes 4 -
Word or byte address Word address
Endian order Little endian i.e. first byte received goes in
least significant byte of memory location
Accepted logical
address
0xFE at power-on, later it can be changed in
the control register
Target logical address in
reply
What was in command
Accepted keys 0x20,later it can be changed in the control
register
Accepted address ranges 0x10000000-0x100002FF,
0x20000000-0x20000003,
0x30000000-0x30000003
Accepted incrementation No increment
Status code returned all
F.3 RMAP read-modify-write command 153
Table F.2: RMAP read command characteristics
Action Supported
Maximum data
length(bytes)
Not-Aligned
access accepted
8-bit write No - -
16-bit write No - -
32-bit write Yes 4 -
64-bit write No - -
Word or byte address Word address
Endian order Little endian i.e. first byte received goes in
least significant byte of memory location
Accepted logical
address
0xFE at power-on, later it can be changed in
the control register
Target logical address in
reply
What was in command
Accepted keys 0x20,later it can be changed in the control
register
Accepted address ranges 0x10000000-0x100002FF,
0x20000000-0x20000003,
0x30000000-0x30000003,
0x30000040-0x30000043
Accepted incrementation No increment
Status code returned all
154 RMAP conformance statement
Table F.3: RMAP read-modify-write command characteristics
Action Supported
Maximum data
length(bytes)
Not-Aligned
access accepted
8-bit write No - -
16-bit write No - -
32-bit write Yes 4 -
64-bit write No - -
Word or byte address Word address
Endian order Little endian i.e. first byte received goes in
least significant byte of memory location
Accepted logical
address
0xFE at power-on, later it can be changed in
the control register
Target logical address in
reply
What was in command
Accepted keys 0x20,later it can be changed in the control
register
Accepted address ranges 0x10000000-0x100002FF,
0x20000000-0x20000003,
0x30000000-0x30000003
Accepted incrementation No increment
Status code returned all
G Simulation Timing Diagrams
In Figure G.1 are shown simulation results of described circuit in section 3.3.
Write and read transfer operations without additional wait states were simulated.
The master waits in the IDLE state (i apb state signal, mark 1) for the transfer
initiation (i transfer signal, mark2). When triggered, it performs four consecutive
write operations (mark 3), and then four consecutive read operations to verify
whether data was successfully written into the SRAM (mark 5). At the end, the
done flag is asserted to indicate the end of the test (mark 7). At which point, the
bus goes back to the IDLE state (mark 8). Mark 4 indicates the point at which
transition from write to read operations occurs. Mark 6 indicates a valid data
signal asserted which tells the master that data on the PRDATA bus is valid.
The APB slave wrapper state is indicated by the i slave state.
In Figure G.2 are shown simulation results for the same transfer operations.
However, in this simulation run, the slave device waits for four clock cycles before
executing the transfer operation. The wait states are marked with blue color. For
convenience, all other key events were marked the same way as they were in the
previous example.
In Figure G.3 are shown simulation results of Ready In signal generated in
crossbar switch in case of an ordinary and multicast packet. In the simulation,
three ports were simulated. Therefore, eight different states of output ports exist,
indicated by Read Out port signals. Also, eight scenarios for selected ports exist,
indicated by sel for source port. Signal sel for source port was looked at since
Ready signal is connected from an output port to an input port. Mark 1 indicates
a scenario when a multicast packet is processed with output ports 2 and 3 as the
destination. Therefore, the Ready in signal is active only when both output
155
156 Simulation Timing Diagrams
ports are ready, indicated by a yellow line. Mark 2 indicates a scenario when a
multicast packet is processed with all output ports as the destination. Therefore,
the Ready in signal is active only when all output ports are ready, indicated by
a yellow line.
In Figure G.4 are shown simulation results of control logic signals when traffic
with multiple sources is simulated. In this simulation five ports are simulated,
four(index:0-3) of which have the same priority while the fifth(index:4) has a
higher priority, indicated by priority port signal (lower the number, higher the
priority). All ports are sending packets to the same destination, indicated by
destination of port signal. At the beginning, all ports issue their request signals
at the same time (mark 1). Based on arbitration rules, port with the highest
priority receives the grant signal (mark 2). After that, the grant signal is issued
to one of the ports with lower priority, based on the current position of the round-
robin token (mark 3). Between two grant signals assertions, all grant signals are
zero (mark 4) which coincides with fort-phase handshake protocol. Whenever a
high priority packet arrives, the port request is handled immediately after the
current packet finishes its transmission (grant signal de-asserted, mark 5).
157
Figure G.1: APB bus simulation - write/read data from SRAM
158 Simulation Timing Diagrams
Figure G.2: APB bus simulation - write/read data from SRAM with wait states
159
Figure G.3: Crossbar switch - Ready in signal simulation
160 Simulation Timing Diagrams
Figure G.4: Control logic - high priority request simulation
