Performance Analysis in IP-Based Industrial Communication Networks by Beran, Jan
VYSOKÉ UČENÍ TECHNICKÉ V BRNĚ
BRNO UNIVERSITY OF TECHNOLOGY
FAKULTA ELEKTROTECHNIKY A KOMUNIKAČNÍCH
TECHNOLOGIÍ
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY
FACULTY OF ELECTRICAL ENGINEERING AND COMMUNICATION
DEPARTMENT OF CONTROL AND INSTRUMENTATION
PERFORMANCE ANALYSIS IN IP-BASED
INDUSTRIAL COMMUNICATION NETWORKS
ANALÝZA VÝKONNOSTI V IP PRŮMYSLOVÝCH KOMUNIKAČNÍCH SÍTÍCH
DIZERTAČNÍ PRÁCE
DOCTORAL THESIS
AUTOR PRÁCE Ing. JAN BERAN
AUTHOR
VEDOUCÍ PRÁCE prof. Ing. FRANTIŠEK ZEZULKA, CSc.
SUPERVISOR
BRNO 2010
Abstrat
With the growing sale of ontrol systems and their distributed nature, ommuniation net-
works have been gaining importane and new researh hallenges have been appearing. The
major problem, ontrary to previously used ontrol systems with dediated ommuniation ir-
uits, is time-varying delay of ontrol and measurement signals introdued by paket-swithed
networks, suh as Ethernet. The real-time issues in these networks have been takled by
proper adaptations.
Nevertheless, market trend analyses foresee also future adoptions of IP-based ommuni-
ation networks in industrial automation for time-ritial run-time data exhange. IP-based
ommuniation has only a limited support from the existing instrumentation in industrial
automation. This hallenge has reently been tehnially takled within the Virtual Automa-
tion Networks (VAN) projet by adopting the quality of servie (QoS) arhiteture delivering
soft-real-time ommuniation behaviour. This dissertation fouses on the real-time perform-
ane aspets from the analytial point of view and provides means for appliability assessment
of IP-based ommuniation for future industrial appliations.
The main objetive of this dissertation is establishment of a relevant modelling framework
based on network alulus whih will assist worst-ase performane analysis of temporal be-
haviour of IP-based ommuniation networks and networking devies intended for future use
in industrial automation.
Empirial analysis was used to identify dominant fators inuening the temporal perform-
ane of networking devies and for model parameter identiation. The empirial analysis
makes use of the TestQoS tool developed for this purpose. Minor extensions to the network
alulus framework were proposed enabling to model the required temporal behaviour of net-
working devies. Several exemplary models were inferred as a result of lassiation of dierent
networking devie arhitetures and empirially identied dominant fators. A novel method
for parameter identiation was used with the modelled devies. Finally, two temporal models
of networking devies (a swith and a router) were validated against empirial observations.
Keywords: Performane Analysis, Industrial Communiation, Network Calulus, Quality
of Servie, IP Router, TestQoS, Virtual Automation Networks
i
Abstrakt
S rostouím po£tem °ídiíh systém· a jejih distribuovanosti získávájí komunika£ní sít¥ na
d·leºitosti a objevují se nové výzkumné trendy. Hlavní problematikou v této oblasti, narozdíl
od d°ív¥j²íh °ídiíh systém· vyuºívajííh dedikovanýh komunika£níh obvod·, je £asov¥
prom¥nné zpoºd¥ní m¥°iíh a °ídiíh signál· zp·sobené paketov¥ orientovanými komunika£ními
prost°edky, jako nap°. Ethernet. Aspekty komunikae v reálném £ase byly v t¥hto sítíh jiº
úsp¥²n¥ vy°e²eny.
Nimén¥, analýzy trend· trhu p°edpovídají budouí vyuºití také IP sítí v pr·myslové
komunikai pro £asov¥ kritikou proesní vym¥nu dat. IP komunikae má ov²em pouze
omezenou podporu v instrumentai pro pr·myslovou automatizae. Tato výzva byla nedávno
tehniky vy°e²ena v rámi projektu Virtual Automation Networks (virtuální automatiza£ní
sít¥ - VAN) zapojením mehanism· kvality sluºeb (QoS), které jsou shopny zajistit m¥kkou
úrove¬ komunikae v reálném £ase. P°edloºená dizerta£ní práe se zam¥°uje na aspekty výkon-
nosti reálného £asu z analytikého hlediska a nabízí prost°edek pro hodnoení vyuºitelnosti IP
komunikae pro budouí pr·myslové aplikae.
Hlavním ílem této dizerta£ní práe je vytvo°ení vhodného modelovaího ráme zaloºeného
na network alulus, který pom·ºe provést worst-ase výkonnostní analýzu £asového hování
IP komunika£níh sítí a jejih prvk· ur£enýh pro budouí pouºití v pr·myslové automatizai.
V prái byla pouºita empiriká analýza pro ur£ení dominantníh faktor· ovliv¬ujííh
£asového hování sí´ovýh za°ízení a identikai parametr· model· t¥hto za°ízení. Empiriká
analýza vyuºívá nástroj TestQoS vyvinutý pro tyto ú£ely. Byla navrºena drobná roz²í°ení
ráme network alulus, která byla nutná pro modelování £asového hování pouºívanýh za-
°ízení. Bylo vytvo°eno n¥kolik typovýh model· za°ízení jako výsledek klasikae r·znýh ar-
hitektur sí´ovýh za°ízení a empiriky zji²t¥nýh dominantníh faktor·. U modelovanýh za-
°ízení byla vyuºita nová metoda identikae parametr·. Práe je zakon£ena validaí £asovýh
model· dvou sí´ovýh za°ízení (p°epína£e a sm¥rova£e) oproti empirikým pozorováním.
Klí£ová slova: výnnostní analýza, pr·myslová komunikae, network alulus, kvalita
sluºeb, IP sm¥rova£, TestQoS, virtuální automatiza£ní sít¥
ii
Bibliographi Referene
BERAN, J. Performane analysis in ip-based industrial ommuniation networks. Brno: Brno
University of Tehnology, Faulty of Eletrial Engineering and Communiation, 2010. 140 p.
Supervisor: prof. Ing. Franti²ek Zezulka, CS.
iii
Prohlá²ení
"Prohla²uji, ºe svou diserta£ní prái na téma Analýza výkonnosti v IP pr·myslovýh komuni-
ka£níh sítíh jsem vypraoval samostatn¥ pod vedením ²kolitele a s pouºitím odborné liter-
atury a dal²íh informa£níh zdroj·, které jsou v²ehny itovány v prái a uvedeny v seznamu
literatury na koni práe.
Jako autor uvedené diserta£ní práe dále prohla²uji, ºe v souvislosti s vytvo°ením této
diserta£ní práe jsem neporu²il autorská práva t°etíh osob, zejména jsem nezasáhl nedo-
voleným zp·sobem do izíh autorskýh práv osobnostníh a jsem si pln¥ v¥dom následk·
poru²ení ustanovení  11 a následujííh autorského zákona £. 121/2000 Sb., v£etn¥ moºnýh
trestn¥ právníh d·sledk· vyplývajííh z ustanovení  152 trestního zákona £. 140/1961 Sb."
V Brn¥, dne: Podpis:
iv
Aknowledgements
Hereby, I would like to thank my supervisor Prof. Ing. Franti²ek Zezulka, CS. for providing
me with interesting opportunities and valuable advie supporting my work. Many thanks
belong to Ing. Tomá² Neuºil, Ph.D. and Ing. Jakub Hrabe, Ph.D., and other olleagues and
friends for their kind support and motivation.
Finally, the greatest deal of appreiation belongs to my wife Dá²a for her unremitting and
unonditional support for the whole duration of this work.
This work has been supported by the Virtual Automation Networks, an Integrated Projet
funded by the European Commission under Information Soiety Tehnology (IST) priority
within the 6th Framework Programme (6FP) - FP6/2004/IST/NMP/2 - 016969 VAN.
v
Contents
1 Introduction 1
1.1 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 State of the Art 3
2.1 Internetworking Technologies . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Open System Interconnection . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Ethernet Technology . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Internet Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Transport Control Protocol and User Datagram Protocol . . . . . 7
2.2 Communication in Industrial Automation . . . . . . . . . . . . . . . . . . 8
2.2.1 Real-Time Systems and Real-Time Communication . . . . . . . . . 8
2.2.2 Real-Time Communication in Ethernet-based Fieldbuses . . . . . . 9
2.3 Networking Device Architectures . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Hubs and Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 IP Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Buffering Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Approaches and Mechanisms . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Traffic Shaping and Policing . . . . . . . . . . . . . . . . . . . . . . 25
2.4.4 Congestion Management . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.5 Congestion Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Analytical Framework for QoS Modelling . . . . . . . . . . . . . . . . . . 29
2.5.1 Introduction to Network Calculus and (min,+) Algebra . . . . . . 30
2.5.2 Arrival Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.3 Service Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.4 Performance Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.5 From Generic Service Nodes to Complex Topologies . . . . . . . . 37
2.5.6 Complementary Issues in Network Calculus . . . . . . . . . . . . . 40
vi
Contents
3 Dissertation Objectives 42
3.1 Motivation for the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Main Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Empirical Analysis of Performance Bounds 46
4.1 TestQoS: Quality of Service Test Bed . . . . . . . . . . . . . . . . . . . . . 46
4.1.1 Test Bed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2 Measurement Principle . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Switch-Related Measurements . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Router-Related Measurements . . . . . . . . . . . . . . . . . . . . 58
5 Network Calculus Extensions 64
5.1 Rate-Variable-Latency Service Curve . . . . . . . . . . . . . . . . . . . . . 64
5.2 Extended Results on Systems with Losses . . . . . . . . . . . . . . . . . . 67
5.2.1 Confrontation of Loss-Rate Analysis with Backlog Bounds . . . . . 71
6 Networking Device Modelling 73
6.1 Definition of Model Structure . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1.1 Networking Device Buffering Strategy . . . . . . . . . . . . . . . . 75
6.1.2 Switch Fabric Model Structure . . . . . . . . . . . . . . . . . . . . 76
6.1.3 Outgoing Interface Model Structure . . . . . . . . . . . . . . . . . 79
6.2 Port-to-Port Service Curve of a Networking Device . . . . . . . . . . . . . 82
6.2.1 PBOO/PMOO Approach . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.2 Extended PBOO Approach . . . . . . . . . . . . . . . . . . . . . . 86
6.2.3 PMOO/PBOO vs. EPBOO Comparison . . . . . . . . . . . . . . . 92
6.3 Identification of the Model Parameters . . . . . . . . . . . . . . . . . . . . 93
6.3.1 Switch-Fabric Parameters . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.2 Outgoing Interface Parameters . . . . . . . . . . . . . . . . . . . . 97
6.3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7 Validation of Models of Networking Devices 99
7.1 HP ProCurve 1800-8G Switch . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.1.1 Switch Parametrisation . . . . . . . . . . . . . . . . . . . . . . . . 100
7.1.2 Port-To-Port Service Curve . . . . . . . . . . . . . . . . . . . . . . 100
7.1.3 Model Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2 Cisco 2811 Router with HWIC-2FE Module . . . . . . . . . . . . . . . . . 104
7.2.1 Switch Fabric Parametrisation . . . . . . . . . . . . . . . . . . . . 104
7.2.2 Outgoing Interface Parametrisation . . . . . . . . . . . . . . . . . . 105
7.2.3 Port-to-Port Service Curves . . . . . . . . . . . . . . . . . . . . . . 106
7.2.4 Model Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.3 Validation Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vii
Contents
8 Conclusion 113
8.1 Unresolved Issues and Further Research . . . . . . . . . . . . . . . . . . . 114
8.2 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Bibliography 116
A Results of the Measurements 120
Abbreviations 127
Symbols 129
viii
List of Figures
2.1 Relation of OSI and Ethernet layers . . . . . . . . . . . . . . . . . . . . . 5
2.2 Ethernet frame content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 VLAN frame tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 IP datagram format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 UDP datagram header format . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 General switch architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 General router architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 Bus-based router architecture (single processor) . . . . . . . . . . . . . . . 15
2.9 Bus-based router architecture (multiple processors . . . . . . . . . . . . . 16
2.10 Router architecture with switch fabric . . . . . . . . . . . . . . . . . . . . 17
2.11 Crossbar switch fabric [15, 184] . . . . . . . . . . . . . . . . . . . . . . . . 18
2.12 Router outgoing interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.13 Switch virtual interface and hardware ports . . . . . . . . . . . . . . . . . 20
2.14 DiffServ architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.15 Service element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.16 Service curve of arbitrary multiplexer . . . . . . . . . . . . . . . . . . . . 34
2.17 Service curve of strict priority multiplexer . . . . . . . . . . . . . . . . . . 35
2.18 Service curve concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1 Real-time communication grid . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 TestQoS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 3D histogram of the packet latency distribution . . . . . . . . . . . . . . . 52
4.3 Progress of the maximum latency in time . . . . . . . . . . . . . . . . . . 53
4.4 Maximum and average packet latencies vs. outgoing port load . . . . . . . 53
4.5 Maximum packet latencies vs. outgoing port load and packet length . . . 55
4.6 Maximum packet latencies vs. outgoing port load with additional switches 57
4.7 Packet latencies passing through 2 and 4 switches . . . . . . . . . . . . . . 57
4.8 Maximum packet latency percentile (99.5 %) vs. packet length and for-
warding mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.9 Maximum packet latency vs. switch fabric load . . . . . . . . . . . . . . . 61
ix
List of Figures
4.10 Maximum packet latency vs. outgoing port load and scheduling algorithm 63
5.1 Service curve parameter T vs. arrival curve parameter r . . . . . . . . . . 64
5.2 Rate-variable-latency service curve . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Concatenation of service curves . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Θ(t) for k < 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Θ(t) for k > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1 General router architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Port-to-port service offered to flows . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Blocking switch fabric (SF-B) . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4 Non-blocking switch fabric (SF-N) . . . . . . . . . . . . . . . . . . . . . . 77
6.5 Process of SF identification . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.6 Model of an OI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.7 Flow aggregation and deaggregation with SF-B . . . . . . . . . . . . . . . 82
6.8 Flow aggregation and deaggregation with SF-N . . . . . . . . . . . . . . . 83
6.9 Flows during SF-B identification . . . . . . . . . . . . . . . . . . . . . . . 94
6.10 SF-B parameter identification . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.11 Flows during OI-FCFS identification . . . . . . . . . . . . . . . . . . . . . 97
7.1 Switched topology for validation . . . . . . . . . . . . . . . . . . . . . . . 103
7.2 Reconstructed latency of packets passing two concatenated switches . . . 103
7.3 Reconstructed SF parameters: PQ (left) and CBWFQ (right) . . . . . . . 105
7.4 Reconstructed SF parameters: WFQ . . . . . . . . . . . . . . . . . . . . . 106
7.5 Reconstructed OI parameters: PQ . . . . . . . . . . . . . . . . . . . . . . 107
7.6 Routed topology for validation . . . . . . . . . . . . . . . . . . . . . . . . 109
7.7 Reconstructed latency of packets passing a router with compound loading
(PBOO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.8 Reconstructed latency of packets passing a router with compound loading
(EPBOO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
x
List of Tables
2.1 Open system interconnection layers . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Real-time communication architectures . . . . . . . . . . . . . . . . . . . 12
2.3 IP precedence and DSCP mapping to DiffServ classes . . . . . . . . . . . 25
2.4 IP precedence and ToS fields . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Measured parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1 Effect of scheduling mechanism to the observed flows (Fh and F1) . . . . 78
6.2 SF service curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 OI service curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.1 Flow parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2 Switch parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.3 Reconstructed SF parameters . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.4 Reconstructed OI parameters . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.5 Flow parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.6 Router parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.1 SW.FL test case statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2 SW.OPC test case statistics . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.3 SW.PL test case statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A.4 SW.CON test case statistics . . . . . . . . . . . . . . . . . . . . . . . . . 122
A.5 RTR.PLCEF test case statistics . . . . . . . . . . . . . . . . . . . . . . . 123
A.6 RTR.FL test case statistics - PQ . . . . . . . . . . . . . . . . . . . . . . . 123
A.7 RTR.FL test case statistics - WFQ . . . . . . . . . . . . . . . . . . . . . 124
A.8 RTR.FL test case statistics - CBWFQ . . . . . . . . . . . . . . . . . . . . 125
A.9 RTR.OPC test case statistics - FIFO . . . . . . . . . . . . . . . . . . . . 125
A.10 RTR.OPC test case statistics - PQ . . . . . . . . . . . . . . . . . . . . . . 126
A.11 RTR.OPC test case statistics - WFQ . . . . . . . . . . . . . . . . . . . . 126
xi
Chapter 1
Introduction
With the growing scale of control systems and their distributed nature, communication
networks have been gaining importance and new research challenges have been appear-
ing. Zampieri categories in [59] the upcoming research challenges in terms of networked
control systems (NCS), i.e., feedback control systems using packet-oriented networks,
such as Ethernet. The major problem, contrary to previously used control systems with
dedicated communication circuits, is the time-varying delay of control and measurement
signals. Hence, the major efforts according to Zampieri are: control of networks (i) and
control over networks (ii). There are many efforts within the latter domain trying to
design the control systems in a way to decrease susceptibility of control to signal latency.
For instance, Chow proposes in [36] an optimal regulator taking the networking aspects
as a design criterium.
On the other hand, the former effort refers to the real-time aspects of the commu-
nication networks and their control by means of quality of service (QoS) mechanisms
providing on-demand resource sharing of the networking devices. In a local scope, NCS
in industrial environment are most often based on Ethernet-based fieldbuses. Producers
of fieldbuses have managed to handle real-time communication aspects and deliver the
expected behaviour in a scalable way. Summary of the existing solutions can be found in
[17]; in larger communication scopes, the reliable industrial solutions emerge only slowly
despite customers’ requirements [5].
Virtual Automation Networks (VAN) project introduced in [6] was an integrated
project within the 6th Framework Programme dealing with future communication tech-
nologies in industrial automation. One of the key topics in the project was specification of
a real-time framework for network topologies with enterprise-wide communication scope.
The approach adopted in the project was employment of IP-based infrastructures with
QoS capabilities used in telecommunication networks.
IP-based communication architectures are not very prone to adoptions which would
provide a perfect match with the industrial requirements on real-time behaviour. Hence,
it is the opinion of the author that a modelling framework assisting performance analysis
of the temporal behaviour would leverage the existing commercial-off-the-shelf (COTS)
networking devices, such as router and switches, and would encourage their use in in-
dustrial automation.
To that account, this work is dedicated performance analysis of temporal behaviour
of internetworking technologies intended for future industrial applications and modelling
of their worst-case performance.
1.1. Dissertation Structure 2
1.1 Dissertation Structure
Chapter 2 presents investigations into the state of the art of the related fields. The
fields of interest are internetworking technologies introducing the TCP/IP communica-
tion architecture, specific communication aspects in industrial automation, architectures
of networking devices, introduction into QoS framework, and finally, introduction into
network calculus as the chosen framework for analytical modelling of temporal perform-
ance.
Chapter 3 introduces the main dissertation objectives based on the identified needs
and existing technological gaps and outlines the research approach.
Chapter 4 is dedicated to the empirical analysis of the QoS metrics based on the
TestQoS test bed. The results provide both qualitative and quantitative findings used
further on for design of model structures and model parametrisation.
Chapter 5 presents extension to the existing framework of network calculus in order to
be able to model the identified phenomena, i.e., bimodal RVL service curve and analysis
of data losses.
Chapter 6 is dedicated to the analytical modelling of the networking devices based
on the internal device architectures. Consequently, port-to-port service curves of the
networking devices are inferred for different types of devices and their parametrisation.
Chapter 7 represents two case studies for the HP 1800-8G ProCurve switch and for
the Cisco 1812 router. Finally, the analytically obtained latency models are compared
to empirical observations and the results are assessed.
Chapter 8 presents the final conclusions and identifies unresolved issues and further
research directions.
1.2 Related Work
The main inspiration has been taken from Georges who made similar attempt to model
switched infrastructure in [29], [30], [31], [32], and [33]. Contrary to the proposed work,
his measurements were less precise and the models were not based on min-plus algebra.
Similarly, Jasperneite has provided classification of different types of industrial flows
and application of network calculus on industrial Ethernet applications in [41] and [42].
Entirely white-box modelling of a network processor using network calculus has been
proposed in [24]. Profiling of the model structure and parameter identification based on
measurement of networking devices is outlined in [16].
Several measurements of QoS parameters of IP-based networking devices have been
traced in [47] and [38]. However, serious consideration of influence and confluence of
dominant factors, such as loading is missing.
Chapter 2
State of the Art
2.1 Internetworking Technologies
Internetwork is a network shielding several individual networks and interconnects them
with intermediate networking devices [28]. The individual networks can be based on
different standardised technologies. Internetworking refers to the effort, products, and
procedures that meet the challenge of creating and administering internetworks [28]. The
main challenges of internetworking are connectivity, reliability, security, and network
management. Internetworks can be coarsely divided into two groups.
Local Area Networks (LANs) enable multiple users in a close geographical area to
access information and share common resources. Nowadays, LANs are predominantly
based on the Ethernet technology.
Wide Area Networks (WANs) interconnect spatially dispersed LANs. WANs are
tailored to higher traffic aggregation and rely on less shared resources. WANs are usually
infrastructures provided by internet service providers (ISP). One can recognise access
networks and core networks.
Run-time industrial communication is, apart from exceptions, related to LANs. In-
deed, there are special needs of industrial automation to employ WAN technologies.
However, real-time expectations are not too high. On the other hand, LANs are subject
to performance analyses and correspond to the scope in which real-time communication
is essential.
2.1.1 Open System Interconnection
Internetworking of networks based on heterogeneous technologies is provided by a unified
layered communication model. Open Systems Interconnection (OSI) reference model is
a cornerstone of internetworking. OSI reference model describes how information from
an application in one end devices passes through a network medium to an application in
another end device. The OSI reference model is a conceptual model composed of seven
layers, each specifying particular network functions. The OSI layers are summarised in
Table 2.1.
Industrial automation has usually made use of L1, L2, and L7 due to reduced temporal
overhead, and intrinsically simpler topology than is usual with internetworks. Ethernet-
based fieldbuses follow this approach in their soft-real-time modes as will be regarded in
Section 2.2.
2.1. Internetworking Technologies 4
Table 2.1: Open system interconnection layers
Layer Abbr. Description Mission
Physical L1 Defines the electrical, mechanical, procedural, and functional spe-
cifications for activating, maintaining, and deactivating the physical
link between communicating network systems
Data
Transport
Data Link L2 Provides reliable transit of data across a physical network link in-
cluding physical addressing, network topology, error notification,
sequencing of frames, and flow control. Defines physical addressing
of devices.
Network L3 Defines the network address. Some implementations, such as the
Internet Protocol (IP), define network addresses in a way that route
selection can be determined systematically. Defines logical network
layout, hence, routers can use this layer to determine how to forward
packets. Because of this, much of the design and configuration work
for internetworks happens at L3
Transport L4 Accepts data from the session layer and segments the data for trans-
port across the network. Generally, the transport layer is respons-
ible for making sure that the data is delivered error-free and in the
proper sequence. Flow control generally occurs at the transport
layer.
Session L5 Establishes, manages, and terminates communication sessions.
Communication sessions consist of service requests and service re-
sponses that occur between applications located in different network
devices.
Applic-
ation
Presentation L6 Provides a variety of coding and conversion functions that are ap-
plied to application layer data. These functions ensure that in-
formation sent from the application layer of one system would be
readable by the application layer of another system.
Application L7 Interacts with software applications that implement a communicat-
ing component. It functions typically include identifying communic-
ation partners, determining resource availability, and synchronising
communication.
2.1.2 Ethernet Technology
The term Ethernet refers to the family of local-area network (LAN) products covered
by the IEEE 802.3 standard that defines what is commonly known as the CSMA/CD
protocol [28, 7-1]. Commonly available device data rates are 10, 100, and 1000 Mb · s−1.
Device interconnection was initially based on bus topology using coaxial cable. The
medium access is known as Carrier Sense Multiple Access / Collision Detection (CSMA/CD).
In principle, a device ready for transmission sensed the traffic on the bus. If the bus was
free of transmission, the device could start transmitting. In case of collision of trans-
mission the device waited for a randomised time and reattempted to transmit. The
randomised back-off time was the source of the Ethernet stochastic nature.
The currently most spread version of Ethernet is based on star topology and copper
twisted pair cable (100Base-TX/1000Base-TX). From deterministic point of view, there
are two different ways to engineer an Ethernet segment, depending on the used network-
ing devices and the extent of the collision domains. Collision domain is the scope in
2.1. Internetworking Technologies 5
which transmitting devices must share the link access.
The former method is based on hubs. Using hubs retains collision domains as with the
bus topology. The latter method is based on switches which use the collision domain to
point-to-point connection between the end device and the switch. In full duplex context,
collisions are eliminated at all. A tradeoff to elimination of collisions is necessity of
buffering and potential congestions of interfaces. This topic is detailed in Section 2.3.
The information necessary to follow the dissertation focus are introduced in the re-
mainder of this subsection.
Ethernet operates at L1 and L2. L1 is common for all devices, while L2 is different
for end devices and networking devices as shown in Figure 2.1.
Application (L7)
Presentation (L6)
Session (L5)
Transport (L4)
Network (L3)
Media Access (L2)
Physical (L1) Physical (PHY) – 802.3
Media Access (MAC) – 802.3
LogicalLink Control (LLC) - 802.2
Physical (PHY) – 802.3
Media Access (MAC) – 802.3
Bridge – 802.1
OSI Reference
Model
Ethernet End-Device Ethernet Bridge
Figure 2.1: Relation of OSI and Ethernet layers
Ethernet layer data unit is called a frame. Frame has a predefined content shown in
Figure 2.2 denoting lengths of particular fields in bytes. The explanation of the frame
fields are the following:
• Preamble (PRE) is a sequence of alternating bits used to announce starting trans-
mission and de-saturation of the receiving device. Receiver synchronises to the
preamble sequence.
• Start of Frame (SOF) is a sequence of alternating bits ending with two consecutive
ones to announce the start of the frame.
• Destination Address (DA) is the physical address of the destination device, broad-
cast address, or multicast address. This address does not change when passing a
router. However it is changed when passing any L3 device. The updated destina-
tion address is the physical address of the adjacent network hop.
• Source Address (SA) is the physical address of the sending device. This address
remains the same throughout the network pass.
• Length/Type if the value is less then 1500, the value represents the length of Data
field. If the value is greater than 1500, it corresponds to a type of the frame being
sent. For instance, IP packet type is 80016
• Data/Padding represents the transmitted information. If the number of transmit-
ted bytes is less than 46, padding is appended to reach a minimal length of 46
bytes.
2.1. Internetworking Technologies 6
• Frame Check Sequence (FCS) is a sequence containing a 32-bit cyclic error check
value. The value is calculated by the sending MAC layer and verified by the
receiving MAC layer. Damaged frames may be discarded.
PRE DASFD SA L/T Data/Padding FCS
7 1 6 6 4 46-1500 4
Figure 2.2: Ethernet frame content
If the frame type is 810016, the frame carries Virtual Local Area Network (VLAN)
information standardised by 802.1q. The VLAN tag is interleaved between the SA and
the L/T field as shown in Figure 2.3.
TCI
VLAN
TPID
SA L/T
6 422
Figure 2.3: VLAN frame tagging
The VLAN tag consists of the tag protocol identifier (TPID) and tag control in-
formation (TCI). TPID has the fixed and aforementioned value 810016. TCI carries 3
bits of priority and 12 bits of VLAN identifier. VLAN technology allows L2 network
segmentation normally provided by routers at L3. Furthermore, thanks to the prior-
ity information, it provides basic L2 QoS architecture, provided that all L2 networking
devices contain 802.1q and 802.1p features.
Ethernet technology is typically implemented as a network interface card (NIC) ac-
companied by appropriate drivers interfacing the operating system.
Further information can be found in [28], [17], and respective IEEE 802 standard
family.
2.1.3 Internet Protocol
The Internet Protocol (IP) is a L3 protocol that contains addressing information and
some control information that enable packets to be routed [28, 30-2]. IP is standardised
by RFC 791. It provides connectionless delivery of datagrams through a network and
provides fragmentation and reassembly of datagrams.
IP is a routed protocol, i.e., routing protocols, such as OSPF, can perform routing
of the datagrams through a network based on information contained in the IP header.
Routing in IP networks is based on IP addresses. IP addresses are logical addresses of
devices carried in the IP header.
IP datagram carries several pieces of information. Figure 2.4 shows the complete
structure, whose precise semantics can be found in [28, 30-3]. For purposes of this work
only several entries are important:
2.1. Internetworking Technologies 7
• DiffServ Code Point is a byte which provides QoS behaviour at L3. This code
point will be regarded in detail in Section 2.4.
• Flags are used to controlling fragmentation. For industrial applications focusing
L3 real-time, fragmentation is banned to increase determinism.
• Time-to-Live defines how many hops the datagram may pass before it is discarded.
The value is decremented on every hop and is discarded when 0. Setting appropriate
value prevents datagrams from congesting the network if the packets circulate due
to a routing flaw.
• Source Address is the logical address of the sending device. IP source address
remains constant throughout the network pass.
• Destination Address is the logical address of the destination device. IP destination
address remains constant throughout the network pass.
0-3 4-7 8-15 16-18 19-31
Version
Header
Length
DiffServCode Point Total Length
Identification Flags Fragment Offset
Time to Live Protocol Header Checksum
Source Address
Destination Address
Options (Optional)
Data
0
32
64
96
128
160
160/
192
Bits
Figure 2.4: IP datagram format
IP layer of the end devices is a part of the TCP/IP stack provided by the operating
system. Hence, all operations are subject to hardly predictable OS scheduling unless a
real-time operating system is used. As for the networking devices, IP layer can be imple-
mented from software component to highly optimised process oﬄoaded to a dedicated
hardware.
2.1.4 Transport Control Protocol and User Datagram Protocol
While IP layer provides interconnection of devices, Transport Control Protocol (TCP)
and User Datagram Protocol (UDP) are L4 protocols providing connection between
applications, i.e., communication clients and servers. There is a significant difference
between the functionality of TCP and UDP.
TCP is a connection-oriented acknowledged transmission protocol. TCP provides
stream data transfer, reliability, efficient flow control, full-duplex operation, and multi-
plexing. Due to its complicated architecture, it can handle lost, delayed, duplicate, or
misread packets.
2.2. Communication in Industrial Automation 8
UDP is a connectionless unacknowledged transmission protocol. Contrary to TCP,
functional richness of UDP is very low. In principle, UDP is an interface between higher
layers and the IP layer. Thus, it provides neither reliability of transmission nor error
protection.
Despite a vast reliability difference, UDP is more suitable transport protocol for
industrial automation. UDP data overhead is only 8 bytes (see Figure 2.5) and temporal
overhead is negligible. Moreover, retransmission mechanisms caused by packet loss which
would significantly decrease determinism are not applied.
0-15 16-31
SourcePort Destination Port
Length Checksum
0
32
Bits
Figure 2.5: UDP datagram header format
2.2 Communication in Industrial Automation
Industrial automation is based mainly on fieldbuses. Fieldbus is the lowest level indus-
trial network in computer communication hierarchy of factory automation and process
control systems [37]. Apart from the communication mission, a fieldbus also contains a
communication object model (COM) which provides the engineer with a certain level of
abstraction. Legacy fieldbuses used are based on proprietary communication technologies
or aging standards, such as RS485.
With the growth of Ethernet the legacy fieldbuses regarded in [48] are being slowly
yet steadily hindered by the upcoming versions - Ethernet-based fieldbuses, which are be-
coming a state of the art. A self-documenting evidence is the existence of standardised
solutions as Introduced in IEC 61158 Type 10 to Type 16 (Profinet IO, EtherCAT, Eth-
ernet Powerlink, EPA, and SERCOS III, respectively). Ethernet has been adopted for its
high availability, low cost, and compatibility with office communications. Ethernet makes
integration possible across all levels of industrial automation, reaching from Enterprise
Resource Planning (ERP) over Manufacturing Execution System (MES) down to oper-
ator level and process instrumentation. Nowadays fieldbuses are very complex systems
providing an application engineer with a vast variety of tools and functionalities.
In the subsequent chapters, only issues of real-time behaviour will be regarded.
Firstly, the notion of real time in industrial automation will be explained. Consecut-
ively, architectural principles providing real-time behaviour in Ethernet-based fieldbuses
will be shown with relation to my work.
2.2.1 Real-Time Systems and Real-Time Communication
The criterion of real-time behaviour has always been crucial industrial automation. How-
ever, there are several definitions of real-time. I recognise two aspects of the real-time as
they appear in various sources. They are explained in the following.
2.2. Communication in Industrial Automation 9
Real-Time Systems
Real-time systems are systems requiring or providing real-time behaviour. Douglas in-
troduces in [22] a definition of a real-time system based on a utility function. Utility
function represents the utility of a system response in time. The value range is < 0, 1 >.
In real-time systems requiring timeliness, the utility function decreases with time
indicating that later response is less appreciated than earlier. This type of real-time
behaviour is expected in event-driven systems.
In real-time systems requiring synchronism, the function maximum is reached at a
time when a system response is expected; both earlier and later responses have lower
utility value. This type of real-time behaviour is expected in time-driven systems with
cyclic nature.
Should the function decrease smoothly, the system is soft real-time. Should the utility
function decrease in step-wise manner, we speak of a hard real-time system. Finally, to
complete the classification, if the function is nearly constant or constant in time, the
system is non-real-time.
The aforementioned approach is quantitative. Utility of a latency is evaluated and
consequently a deadline (for timely systems) is defined within which the system must
respond. From a design point of view, a real-time system should be highly available;
free of deadlocks, fully synchronised, and with controlled preemptive access to shared
resources as discussed in [22].
Real-Time Communication
Real-time communication is communication required by a real-time system and satisfying
predefined performance metrics. The classification introduced for real-time systems is
not used in this case. Essentially, temporal performance requirements are inferred from
the real-time systems’ requirements. These requirements are expressed by various met-
rics. IAONA proposes in [17] a fine classification based on a triplet bandwidth, latency,
and jitter. This classification is coherent with the metrics of the quality of service mech-
anisms shown in Section 2.4. A coarser classification based on application point of view
introduced in [5] recognises non-real-time, soft real-time, hard real-time, and isochronous
real-time communication.
Determinism is a notion often mention with regard to real time. Communication
determinism refers to a predictability of a communication path in terms of availability
and timeliness. Hence, from the determinism point of view, latency, contrary to jitter, is
not an issue.
2.2.2 Real-Time Communication in Ethernet-based Fieldbuses
The original Ethernet was developed by Xerox as an experimental coaxial cable net-
work in 1970s [28]. The access method was carrier sense multiple access collision detect
(CSMA/CD). The stochastic nature of this version of Ethernet arises from the fact that
2.2. Communication in Industrial Automation 10
in case of a transmission collision, the device stops and waits for a random time to
retransmit the frame.
A most popular version is 100Base-TX, which is based on Cat5 cable and is full
duplex. The most significant difference to the coaxial variant is the fact that the physical
topology changed from bus to star. Thus, each networking device can be connected with
a switch or a hub with one’s own cable. Seeing the fact that switch buffers frames and
that each connection to a switch is full-duplex, there are no more collisions in the network.
On the other hand, a new problem arises if more network devices want to transmit a
frame to a single network device. In such a case, provided that the outgoing interface’s
capacity is lower than necessary for the frame transmission on the fly, the frames have
to be buffered. Finally, one can say that congestions replaces collisions.
One can track a certain determinism improvement in switched Ethernet: if the trans-
mission scheduling of all flows is deterministic the congestions are also deterministic.
This makes the use of deterministic analytical methods and thus determining upper-
bounds possible. Moreover, it is typical of industrial automation that the transmission
scheduling is known in advance for run-time process data. However, trend investigations
show that fieldbuses have to provide bandwidth also for non-real-time TCP/IP commu-
nication used similar purposes as at office floor [45]. Finally, Ethernet needs real-time
extension to provide stringent real-time behaviour.
In office floor, Ethernet communication is used only as physical and link layers (L1 and
L2) on top of which TCP/IP stack is positioned. TCP/IP protocol suite is provided by
operating system. From a real-time point of view, this is a potential risk of deterioration
of real-time behaviour. Passing through a stack represents both overhead of both latency
and jitter. Real-time operating systems (RTOS) diminish this affect. For instance RTX
from Ardence provides real-time TCP/IP stack together with the RTOS. Timmerman
introduces in [58] evaluation of RTOS from real-time communication point of view. The
evaluated systems are VxWorks 5.3/6.1, Windows CE 5.0, Windows XP Embedded,
MontaVista Linux2.1/4.0, and Linux. However, real-time stacks are often not available.
Therefore, most of the Ethernet fieldbuses bypass TCP/IP stack to establish a more
deterministic and faster communication infrastructure. It is to say that with the growing
market of dedicated embedded systems improvement of temporal performance can be
expected. However, a step to jitter reduction is replacement of TCP protocol by UDP
protocol.
Three general approaches applied in Ethernet-based fieldbuses with the aim of real-
time improvement follow. A summary of the investigation is introduced in Table 2.2.
Scheduling on top of TCP/IP
Scheduling on top of TCP/IP is an approach used for the softest real-time constraints.
Real-time traffic is usually sent via UDP sockets while non-real-time traffic is sent via
TCP sockets. An arbiter within the device favours real-time traffic over the non-real-
time traffic. There is no special support from the network infrastructure. Therefore,
non-real-time traffic incoming from different segment has to be filtered or shaped on a
2.3. Networking Device Architectures 11
gateway to the real-time segment.
Scheduling on top of MAC
Scheduling on top of MAC represents TCP/IP stack bypassing. Real-time traffic is not
carried by L4 packets or datagrams but by L2 Ethernet frames directly. This approach
diminishes the communication stack overhead. On the other hand, routed communication
is not possible, as the frame does not contain L3 information. For this reason, the
communication is limited to a single segment. There can be a support from the network
infrastructure, as is truth in case of Profinet V2, in form of 802.1p or 802.1d. Both
approaches are based on recognition of the frames based on a priority tag and queuing
different priority classes to different queues. This approach is a germ of a powerful
methods of quality of service implemented at L3 as will be regarded in Section 2.4.
On the other hand, Ethernet Powerlink is based on device synchronisation and strict
Master/Slave data polling [17], [20].
Proprietary MAC
Proprietary MAC is the most proprietary solution with the hardest real-time behaviour.
Both TCP/IP and Ethernet stacks are bypassed. For instance, Profinet V3 is based
on time-division multiplex (TDM) and device synchronisation according to IEEE 1588
standard. All devices are informed about the cycle length and the start of the cycle. The
cycle is divided into IRT slot at the beginning of the cycle, and optionally SRT and NRT
slots. The IRT slot is further subdivided into several subslots. Each subslot is dedicated
to a point-to-point communication of two devices. The proprietary switch forwards the
packets based on the actual subslot; neither IP nor MAC address is used. The scope
of the IRT segment interconnected by this proprietary switch is limited. Details can be
found in [50].
Contrary to this approach, EtherCAT employs a virtual ring infrastructure with
circulating frame (token). It makes use of the full-duplex feature of Ethernet. The frame
is not stored, processed, and transmitted in the Ethernet interface, but is processed on
the fly. Every device has a dedicated data slot in the frame to insert data. It can read
all slots. The communication cycle is given by number of the devices in the virtual ring.
See [17, 80] for details.
2.3 Networking Device Architectures
Internetworking technologies employ a plenty of networking device types providing data
transmission between end devices. Industrial automation has adopted only a subset of the
networking devices, and occasionally, provided adaptations as introduced in Section 2.2.
Network devices which are used in the state-of-the-art industrial automation and network
devices considered for future applications are introduces in this section. The investigation
is focused on the aspects related to the real-time behaviour as these are important for
2.3. Networking Device Architectures 12
Table 2.2: Real-time communication architectures
Real-Time RT Metrics Real-Time Network Network
Type (Latency/Jitter) Implementation Topology Devices
Application
Non-Real-
Time
1 s/1 s None Arbitrary No Con-
straints
Engineering, Dia-
gnostics
Soft
Real-Time
100 ms/100 ms Scheduling on
top of TCP/IP
Isolated
Network
Switches,
Gateways
Alarms, HMI, Op-
erator Control,
Hard
Real-Time
10 ms/1 ms Scheduling on
top of MAC
Isolated
Segment
Hubs,
Switches
Process Control
(Closed Loop)
Isochronous
Real-Time
1 ms/1 µs Synchronised
Devices
Constrained
Segment
Proprietary
Switches
Motion Control
further performance analysis and modelling of temporal behaviour. Basic findings in this
field are summarised in [7]. Moreover, only IP-based Ethernet devices are considered.
2.3.1 Hubs and Switches
Hub is the simplest networking device operating at physical layer (L1). The name origin-
ates from the adopted hub-and-spoke principle, i.e., a frame received by a hub is broad-
casted to all network interfaces simultaneously without neither analysing nor modifying
the frame content. Thanks to the simple functionality, hub is the fastest and the fastest
networking device considered. Its disadvantage is that it duplicates network traffic by
the number of network interfaces which is undesirable and thus extends the collision
domain; increased amount of traffic increases a frequency of link congestions. Neverthe-
less, hubs are used in some Ethernet-based fieldbuses, such as Ethernet Powerlink, where
congestions are prevented by master-slave data-polling communication scheme.
Switch is a network device operating at link layer (L2). The extension to hub is that
a switch accommodates a forwarding table cashing the physical addresses of the ambient
network devices paired with the a number of the network interfaces to which the devices
are connected. Upon arrival of a frame, the forwarding table is consulted and in case of
the destination MAC match, the frame is forwarded solely to the appropriate interface.
In the opposite case, the frame is forwarded in a hub manner. A switch does not modify
frames unless extended features are applied.
The generic processing mode is store-and-forward. In this mode, a frame is com-
pletely accepted, analysed for consistency and advanced to the forwarding module which
determines the outgoing interface. This mode represents extended temporal overhead.
A faster mode is cut-through, i.e., a frame is advanced to the forwarding module upon
reception of the frame preamble and the destination MAC address. This mode is faster,
but the consistency of the frame cannot be guaranteed.
Ethernet switch functionality is standardised by IEEE 802.1 standard family. From
the QoS point of view, two standard extensions are to mention. 802.1q extension defines
Virtual Local Area Network (VLAN). VLAN is a solution to segmentation of a single
network infrastructure to a several virtual segments. With VLANs the segmentation is
delivered without using L3 network devices, such as routers. The frames are tagged with
2.3. Networking Device Architectures 13
additional four bytes containing VLAN ID and VLAN Priority. Therefore, this tag can
be used to QoS purposes. Furthermore, 802.1p standard defines seven classes of traffic
corresponding the VLAN priorities and thus giving them real semantics. It is to say that
802.1q and 802.1p establish a L2 QoS capability which is widely used in Ethernet-based
fieldbuses, such as Profinet IO.
Switch Architecture and Performance
Figure 2.6 depicts a generalised architecture of an four-port Ethernet switch. Incoming
interfaces (II) pass frames to a packet forwarding unit (PFU). PFU resolves the outgoing
port by consulting a forwarding table. Consequently, the frame can be forwarded through
a switch fabric (SF) to a specific outgoing interface (OI) and finally to the interface
hardware.
OutgoingInterfacePacket Forwarding
Unit
Incoming Interface
FE0
FE3
FE2
FE1
Switch Fabric
FE0
FE3
FE2
FE1
Figure 2.6: General switch architecture
Switches have established on the market so well that high throughput switches with
highly efficient switch fabrics are easily affordable. For instance, the eight-port HP
Procurve 1800-8G switch, used further in the empirical part, has a switching capacity of
16 Gb · s−1 [39]. This capacity can provide each port with 1 Gb · s−1 throughput in both
directions. It is introduced in ibidem that the latency of a traversing packet is 3.9 µs for
a 64-byte packet at an 1 Gb · s−1 interface. The forwarding table can count up to 8000
entries.
2.3.2 IP Routers
Router is a L3 networking device providing interconnection of network segments based
network layer (L3) information, in our case IP. Routers can accommodate an immense
amount of features and protocols. However, there are two mandatory tasks a router has
to provide: routing and forwarding [15].
Routing refers to determining the optimal path a packet should take to reach the
destination end device. Routers exchange routing information among each other to
collaboratively reach the optimal path based on the routing metrics. The result is stored
in a routing table of a router. There are several protocols used for routing. Examples of
routing protocols are Interior Gateway Routing Protocol (IGRP), Open Shortest Path
2.3. Networking Device Architectures 14
First (OSPF), Exterior Gateway Routing Protocol (EGRP), and Boundary Gateway
Protocol (BGP). Each protocol is tailored for different purposes, depending if core or
access network is concerned. Details can be found in [28], [21], and [15]. Important is
that routing is not a time-critical task and is left to router CPUs as path change appears
seldom.
Forwarding is a process of transferring a packet from an II to an OI 1 based on the
records in Routing Table (RT). Forwarding is time-critical and is decisive for the total
throughput of traffic through the switch. In the rest of the section, it will be dealt with
the real-time aspects of traffic forwarding.
IncomingInterface
(Line Card)
Switch Plane
Incoming Interface
(Line Card)
Incoming Interface
(Line Card)
Outgoing Interface
(Line Card)
Outgoing Interface
(Line Card)
Outgoing Interface
(Line Card)
Control Plane
Routing Table
CPU Memory
Figure 2.7: General router architecture
A general router architecture is shown in Figure 2.7. However, specific architectures
differ. A router contains physical incoming and outgoing interfaces. The interfaces
range from the simplest similar to those in office PCs to very sophisticated housing
forwarding modules, queue management features, various hardware accelerators, etc.
Packet forwarding from incoming to outgoing interfaces is provided by a SF. SF can be
implemented in various ways (shared-memory, crossbar, etc.). Implementation of switch
plane is decisive for managing flow aggregation and traffic congestions. Finally, control
plane takes care of configuration, routing, management of routing table and providing
the routing information to forwarding modules.
Router Architectures
Router performance is predetermined by its hardware architecture. Router architectures
are scaled to their purposes to provide the required functionality. Routers range from
Small-Office-Home-Office (SOHO) low-end class. Such routers are equipped with typic-
ally four LAN ports and one WAN port and provide Network Address Translation (NAT)
between the WAN and LAN.
1Every line card contains one II and one OI. Distribution of IIs and OIs in the figures is due to
clearness.
2.3. Networking Device Architectures 15
The mid-range routers are dedicated to small- or mid-branch service aggregation are
optimised to providing integrated services.
Finally, the high-end routers are tailored to service providers and network core pro-
viders. Such routers are optimised to switching performance and provide less service
features.
It is to say that there are no commercially available routers tailored to industrial
needs, which are time-optimised packet forwarding and for relatively small network seg-
ments.
Switch Plane Architectures
This section introduces classification of the main architectures available on the market
with regard to forwarding performance.
Bus-Based Routers are basic but very frequently met routers. The determining
factor is that all incoming and outgoing interfaces are connected to the router core via
a bus. Consequently, concurrent traffic must be time-multiplexed on the bus and thus
forms a potential congestion point. The line rate rline is limited to
rbus
2·N , where N is the
number of interfaces. With typical bus rate of 20 Gb · s−1 the bandwidth is excellent.
However, the same cannot be said of latency.
II
RT
CPU Memory
M
A
C
P
H
Y
D
R
V
II
M
A
C
P
H
Y
D
R
V
II
M
A
C
P
H
Y
D
R
V
RT Cache
OI
M
A
C
P
H
Y
D
R
V
OI
OI
B
u
s
M
A
C
P
H
Y
D
R
V
M
A
C
P
H
Y
D
R
V
Prog.
Logic
O
Q
O
Q
O
Q
Figure 2.8: Bus-based router architecture (single processor)
Figure 2.8 shows a bus-based architecture with a single processor. IIs and OIs contain
Ethernet physical and link layer and a driver which can access the memory via direct
memory access (DMA). Upon arrival, a packet is forwarded to memory. The memory
is structured into logical queues for different purposes. In principle the packet is only
written once and read once. Forwarding is based on passing pointers or labels from input
to an output logical queue. This approach reduces read/write overhead.
2.3. Networking Device Architectures 16
There are two approaches how to resolve forwarding. Process forwarding relies on
CPU and the routing table. The interface driver stores the packet and the pointer is
queued. The queue is served in FIFO order. When CPU resolves forwarding the pointer
is stored to an output buffer. Output buffer is dedicated to a particular OI. The OI is
notified and can transmit the packet. Processing forwarding is time consuming but is the
only universal method. For often used destination IP addresses fast forwarding can be
used. Fast forwarding uses a route cache containing a subset of the routing table in form
of a look-up table. The II driver first consults the route cache and if a match is found II
driver queues the packet directly to the OI buffer. If no match exists, process forwarding
is applied. Architecture and management of the route cache is subject to investigations
and optimisations. Cisco offers an advanced method Cisco Express Forwarding (CEF)
which can manage larger scope of entries with less effort and look the entries up faster
[21].
II
RT
CPU Memory
M
A
C
P
H
Y
D
R
V
II
M
A
C
P
H
Y
D
R
V
II
M
A
C
P
H
Y
D
R
V
OI
M
A
C
P
H
Y
D
R
V
B
u
s
Fwd
Engine
&
Cache
Fwd
Engine
&
Cache
Fwd
Engine
&
Cache
OI
M
A
C
P
H
Y
D
R
V
OI
M
A
C
P
H
Y
D
R
V
O
Q
O
Q
O
Q
Figure 2.9: Bus-based router architecture (multiple processors
Figure 2.9 shows bus-based architecture with multiple processors. This architecture
employs more sophisticated line cards. Every line card contains own forwarding engine
including route cache. The packets do not traverse the router memory, yet the packet
is forwarded directly to the specific OI. Hence, CPU is not involved in forwarding but
only in route table management. Critical issue is management of the interface route
cache synchronisation with the common routing table. This can happen on cyclic basis
on demand. Still, II’s route cache contains only a subset of the route table.
Bus-based architecture is used in up to mid-range class of routers. The architecture
is affordable and capable of delivering the required throughput. Moreover, for applic-
ations demanding only limited number of interfaces, the architecture is very suitable.
However, for purposes requiring stringent QoS parameters, suitability is questionable for
its internally blocking nature. Bus-based routers will be the only affordable choice for
industrial applications, though.
2.3. Networking Device Architectures 17
Routers with Switch Fabrics belong to high-performance networking devices hand-
ling high traffic aggregations. The reason is that switch fabrics provide routers with a
certain level of forwarding independency.
II
II
II SwitchFabric
Routing Table
CPU Memory
M
A
C
P
H
Y
D
R
V
Fwd
Engine
&
Cache OI
M
A
C
P
H
Y
D
R
V
M
A
C
P
H
Y
D
R
V
Fwd
Engine
&
Cache
M
A
C
P
H
Y
D
R
V
Fwd
Engine
&
Cache
O
Q
OI
M
A
C
P
H
Y
D
R
V
O
Q
OI
M
A
C
P
H
Y
D
R
V
O
Q
Figure 2.10: Router architecture with switch fabric
Switch-fabric-based architecture is shown in Figure 2.10. A prerequisite is that every
line card contains a forwarding engine with cache. The route cache is updated by the
centrally managed routing table. Routers also exist which have distributed forwarding
engines, i.e., an II forwards the packet to the forwarding engine through the switch fabric.
The forwarding engine resolves outgoing port and forwards the packet to a particular OI.
Such an architecture has to handle one extra transition over switch fabric. Nevertheless,
such an architecture is more scalable and provides better utilisation of the forwarding
engines.
Time-Division Switch Fabrics apply TDM access to a common resource. The
architecture of the switch fabric is not to mix up with the bus-based router architecture,
though. There are two main sub-types of time-division switch fabric:
Shared-medium switch fabric has N inputs and N outputs for N -port router. In
principle a packet arriving to one of the IIs is forwarded to the shared medium at a given
time-slot and the packet physically arrives at every output. Address filters are located at
every output and filter packets based on the routing cache. In case of match, the packet
is forwarded to the OI and dropped otherwise. Shared-medium switch fabric is optimal
for packet broadcasting, yet it has limited throughput.
Shared-memory switch fabric also has N inputs and N outputs for N -port router.
Packets arriving to the switch fabric are multiplexed to a single stream and forwarded to
the switch fabric outputs based after forwarding resolution. The output port resolution
is based on linked-lists, content addressable memory, or space-time-space approach [15].
Although the principle is very straightforward and resembling the bus-based router ar-
chitecture, it allows many modification to optimise throughput and memory utilisation.
2.3. Networking Device Architectures 18
In principle, the switch fabric is used as a basic building block from which more complex
structures are built. Details can be found in [15, 217].
Space-Division Switch Fabrics provide independent transmission of flows passing
through the switch fabric. This category is further subdivided to single-path, such as
crossbar and multi-path switch fabrics, which are out of scope of my investigations.
(1,1) (1,3)(1,2)
(2,1) (2,3)(2,2)
(3,1) (3,3)(3,2)
1
3
2
1 2 3
Cross
Bar
Figure 2.11: Crossbar switch fabric [15, 184]
A crossbar SF is shown in Figure 2.11. In principle, every cell of the crossbar can
be in a bar state or a cross state as depicted ibidem. In crossbars, forwarding of flows is
non-blocking unless the flows contend for the same output port. All three aforementioned
buffering strategies are available with crossbars while their advantages and disadvantages
persist.
Other single-path space-division switch fabrics exist, such as fully interconnected,
banyan-based, etc. [15, 183] can be consulted for details.
Outgoing Interface Architecture
The previous subsection was dedicated to performance issues of the packet forwarding
through a router. Issues of traffic management at the outgoing interface are investigated
in this subsection. It could be noticed that excess traffic aggregation in the router core
could be solved by parallel structures, such as crossbar to avoid congestions. However,
this was true only for traffic being forwarded to different OIs. Traffic being forwarded to
the same interface has to contend for the outgoing bandwidth regardless of implement-
ation. For this reason, most of the priority-treatment mechanisms are located at the
OIs.
When a packet arrives at the OI, it is first decided, if the OI is free for transition or
if the packet has to be queued and wait. In the latter case, either FIFO buffer can be
employed, or any advanced congestion management mechanism. Figure 2.12 depicts the
latter case. The classifier stores packet to different queues based on the agreed metrics.
Packet scheduler poll the packets based on the scheduling policy and transmits them to
the OI. This topic is detailed in Section 2.4. It is obvious that the data path in case of
2.3. Networking Device Architectures 19
M
A
C
P
H
Y
Conge-
stion?
S
c
h
e
d
u
le
r
S
o
rt
e
r
+
-
Figure 2.12: Router outgoing interface
congestion is computationally more demanding and will cause larger latency and decrease
of forwarding performance.
Further Issues Influencing Performance
Apart from the key architectural features, there are further issues to be considered on
one hand, and difficult to respect in a router formal description on the other hand. The
following summary provides a closer look into such issues.
Forwarding Cache Initialisation. The first incoming packet of a new flow is subject
to extensive forwarding latency even if any of fast-forwarding implementations based on
route caching is employed. There are two main scenarios, in which extensive latency is
experienced.
Firstly, if the packet is to be forwarded over a router, the particular destination IP
address is not stored in the routing cache, hence, the common routing table must be
consulted. Such an operation is based on CPU. Only then the destination MAC address
can be overwritten with a MAC address of the incoming interface of the adjacent hop.
Secondly, if the packet is to cross the last hop and arrive to the destination device,
the destination MAC address must be overwritten with the MAC address of the destin-
ation device. Such a piece of information is not stored in a routing table. Therefore,
Address Resolution Protocol (ARP) request must be broadcasted to retrieve the MAC
address corresponding the destination IP address, unless not cached. Routers are usually
not anyhow optimised to such operations. Hence extensive latency overhead would be
experienced again.
After route cache and ARP cache has been initialised, packet latencies become ergodic
in time. Still, caches outdate with a preconfigured intervals and have to be updated. A
typical ARP cache remanence is 5 minutes at routers.
Network Control Traffic. As will be shown in Section 2.4, the control traffic ex-
changed among network devices is marked with the highest priority. Hence, any control
traffic can overcome even high-priority run-time traffic exchanged among end devices,
if queuing mechanisms providing priority treatment are employed. Fortunately, control
traffic is not voluminous and would probably not cause queue starvation. However, in
2.3. Networking Device Architectures 20
case of collapsing network, or denial of service attacks, extensive volumes of control traffic
have to be counted on.
Hardware Ports and Switch Virtual Interfaces. Cisco routers of mid-range con-
tain WAN ports which are connected to the Control Plane via a bus. Further, they may
contain an in-built switch with several ports (4, 8, 16, or 32). WAN ports are usually
hardware-implemented either on the same board as the rest of the router or as a separate
plug-in module. They contain full L3-functionality, i.e., can be assigned an IP address.
Hence, they can form separate network segment.
ROUTER
SwitchPorts (In-built Switch)
FE6
FE2
FE7
FE3
FE8
FE4
FE9
FE5
WAN
Port
FE0
WAN
Port
FE1
Control Plane
PHY 0 PHY 1
SVI
P
H
Y
0
Figure 2.13: Switch virtual interface and hardware ports
On the other hand, the ports of the in-built switch cannot. Cisco IOS offers a way to
form subdomains on these ports using Switch Virtual Interface (SVI) feature. In such a
case, administrator must form a VLAN and assign IP address and IP range by a mask.
Further, any port of the in-built switch can be assigned to the VLAN subdomain. From
the commissioning point of view, the problem is solved. However, from the performance
point of view, any performance guarantee cannot be expected. Such a port cannot make
use of the route cache implemented in hardware and thus, fast-forwarding performance
benefits are lost. Accordingly, throughput is limited and congestions and consequent
packet losses appear with higher loads.
Figure 2.13 shows the port assignment of the Cisco 1812 router where 2 ports contain
full L3-functionality and the rest of 8 ports are only switching interfaces. For instance,
FE0 could form a subdomain 192.168.1.0, FE1 subdomain 192.168.2.0, and the ports
FE3, FE4, FE7, and FE8 are switch ports of a VLAN 192.168.3.0.
2.3.3 Buffering Strategies
Decisive factor of the networking device performance is the buffer location. Buffering is
used in case of traffic congestion to temporarily store the traversing frames. Chao and
Liu recognize in [15] four generic buffer locations.
Output Queuing (OQ) allows the packet forwarding unit (PFU) to forward a frame
to an outgoing interface (OI) directly. The frames are queued only at the OI. This
2.4. Quality of Service 21
implementation is advantageous in case of employment of QoS mechanisms, such
as priority queuing. Disadvantageous of this method is a poor memory utilisation,
and thus, costly design. Moreover, efficiency decreases with higher demands on
throughput and port number.
Shared-Memory Queuing (SMQ) accommodates a common shared memory. The
queues are dedicated to output ports, thus, it falls into a category of OQ im-
plementations. This method utilises the queue memory optimally. However, the
switch throughput is limited as the memory read/write access has a limited rate.
Consequently, SMQ is suitable for small-scale switches.
Input Queuing (IQ) implementation queues frames coming from an incoming inter-
face (II) to a common queue. PFU polls frames from the queue and forwards the
frames onwards only if the connection to a particular OI is available, i.e., if there
is no congestion. This implementation suffers from head-of-line (HOL) blocking,
as all packets queued after the blocking frame have to wait for transmission of
the blocking frame regardless of the fact if their connection to the OI is available.
This architecture becomes appreciated in high-throughput PFUs with either high
rates or high number of ports[15]. As there are no buffers considered at the OIs,
special algorithms have to employed to handle the problem. The most algorithms
are matching-based Maximum Weight Matching (MWM), Maximum Size Match-
ing (MSM), randomised matching algorithms, frame-based matching algorithms,
etc.). Details can be found in Chapter 7 in [15].
Virtual Output Queueing (VOQ) is a special case of IQ, avoiding HOL blocking.
Hence, VOQ is the most practical implementation of IQ. In principle, there are
N input queues, each dedicated to one of the N OIs. Consequently, the switch
accommodates N ×N queues. The used algorithms are the same as with IQ.
2.4 Quality of Service
Quality of Service (QoS) can be defined in numerous way. However, I will most often
refer to the terms defined by Cisco, as an originator of the QoS mechanisms and owner
of numerous QoS mechanisms’ implementations’ patents.
QoS refers to the capability of a network to provide better service to selected network
traffic over various technologies [28]. The primary goal of QoS is to provide priority
including dedicated bandwidth, controlled jitter and latency (required by some real-time
and interactive traffic), and improved loss characteristics. Also important is making sure
that providing priority for one or more flows does not make other flows fail [28].
As Firoiu et al. introduce in [27], there are two general drivers of QoS to be involved
in the communication networks:
• Applications with stringent QoS level require performance bounds in term of, e.g.
latency and bandwidth. QoS mechanism are able to provide such bounds under
2.4. Quality of Service 22
the conditions introduced further in this section. Examples of these applications
are VoIP and IPTV.
• Competition of the provided services and the lack of bandwidth form a market
challenge to provide premium services offering corresponding services level based
on service level agreements (SLA). For example, virtual leased lines represent a
special class of virtual private network providing bandwidth, delay, jitter, and loss-
rate guarantees.
From the industrial automation perspective, the categorisation of motivations is the
same. There are domain requiring strict performance guarantees and thus corresponding
the former case. On the other hand, in case of bottlenecks caused especially by immense
flow aggregation the competing traffic has to be differentiated, which corresponds the
latter case in telecommunications.
The main difference between the QoS in telecommunications and industrial automa-
tion is the scale of the QoS parameters. While in telecommunication, it is most often
spoken of dozens to hundreds of millisecond guarantees, in industrial automation, sub-
millisecond guarantees are casual in a single-segment real-time communication.
A common metrics is defined to compare the QoS level delivered by the QoS mech-
anisms. The mostly used quadruple of metrics for the QoS evaluations is:
Bandwidth Service capacity of the link. While bandwidth represents the physical ca-
pacity, throughput represents the rate at which traffic is successfully retransmitted,
thus accounting for losses and retransmissions.
Packet Delay is delay of a packet from the source node to the destination node. The
delay is composed of a transport delay and the service latencies of the service
nodes on the path. The delay is defined in [15] as an α-fractile of the packet delay
distribution, where α is the chosen level of significance.
Jitter is a difference between the maximum and the minimal delay. Contrary to stand-
ard deviation of delay, jitter does not reflect the probability of occurrence of the
maximum or minimal delay. From this point of view, the jitter parameter is very
strict.
Packet Loss is the ratio of the number of lost packets to the number of the sent packets
in percents
Chao and Liu introduce in [15] further metrics such as Throughput, Residual Error
Rate (RER), Spurious Packet Rate, and Availability.
2.4.1 Approaches and Mechanisms
There are two general mechanisms providing a required QoS level.
2.4. Quality of Service 23
D S  D o m a i n  E d g e
( C l a ,  T S ,  C A )
D S  D o m a i n  E d g e
( C l a ,  T S ,  C A )
D S  D o m a i n  E d g e
( C l a ,  T S ,  C A )
D S  D o m a i n  C o r e
( C M ,  C A )
D S  D o m a i n  C o r e
( C M ,  C A )
N e t w o r k  1N e t w o r k  2
N e t w o r k  3
D S  D o m a i n
C l a :  C l a s s i f i c a t i o n
T S :  T r a f f i c  S h a p i n g
C A :  C o n g e s t i o n  A v o i d a n c e
C M :  C o n g e s t i o n  M a n a g e m e n t
Figure 2.14: DiffServ architecture
Integrated Services (IntServ) is an end-to-end flow-based mechanism providing QoS
[15]. IntServ provides reservation of resources for the given flow along the path.
Every hop along the path must confirm the reservation and keep the track of it
throughout its existence. IntServ can provide excellent QoS level but is not feasible
in larger networks. Keeping tracks of all reservations for every single flow causes
significant overhead in the growing networks. Therefore, it is less often employed
in nowadays networks. Resource Reservation Protocol (RSVP) is used to enforce
this mechanism.
Differentiated Services (DiffServ) is a class-based mechanism. DiffServ priority
treatment is coarser as it does not recognize flows, but their aggregates. The
information of the class is located in the Differentiated Service Code Point (DSCP)
in the IP packet header; neither signalling, nor reservations are used. All flows fall-
ing to the same class are treated according to the same per-hop behaviour (PHB).
DiffServ architecture was established by [12] and updated by [35].
From [27] and [15] it is obvious that DiffServ reflects much better trends in Internet-
working QoS. Therefore, a decision was made to employ DiffServ for the application in
industrial automation and the investigations hereafter are focused on DiffServ.
A scope within which the DiffServ class are offered the same PHB is denoted as
DiffServ domain. Traffic traversing a DiffServ domain enters the domain at a DiffServ
ingress node and leaves the domain at the DiffServ egress node, both referred to as Diff-
Serv boundary nodes. The network DiffServ uses four types of mechanisms to provide
the expected traffic treatment: Classification, Traffic Shaping, Congestion Management,
and Congestion Avoidance. Each mechanism is positioned in the DiffServ domain as de-
picted in Figure 2.14. The functionality of the mechanisms are explained in the following
subsections.
2.4. Quality of Service 24
2.4.2 Classification
Classification is a process of labelling the incoming packet with a proper DSCP code.
Classification consists in two steps: identification and marking. Classification is per-
formed on a DiffServ ingress, typically at a L3-switch or an edge router. An emerging
trend is to classify the traffic directly at the traffic source, i.e., at a connected device
itself.
Identification represents assignment of the packet to one of the service classes defined
within a DiffServ domain. Network Based Application Recognition (NBAR) is an identi-
fication protocol which can identify traffic based on the packet content including protocol
types. Policy Based Routing (PBR) also analyses a packet content and can classify the
traffic to a particular class. The most straightforward method of the flow identification
are access control lists (ACL). ACLs are the set of rules defined on a networking device
by administrator. If the incoming packet meets the rule, it is assigned to an appropriate
class. Otherwise, the packet is assigned to a default class, not surprisingly, with a default
priority.
Marking refers to insertion of a particular DiffServ code to the DSCP field in the
packet IP header. The DSCP code consists of 6 bits and their mapping was defined
by [46]. A comprehensive summary can also be found in [21, 1167] and [15, 127]. The
mapping is shown in Table 2.3. DSCP recognises two types of PHB: assured forwarding
and expedited forwarding.
Assured forwarding (AF) is used to provide the traffic with assured services. AF
defines four classes. The meaning of the AF classes is dependent on the DiffServ domain
forwarding strategy. Each AF class is further split up to 3 subclasses where each subclass
represents a different drop precedence. The drop precedence provides additional inform-
ation on which packet should be discarded first in case of node congestions. Hence, AF
provides finer granularity than the predecessors Type of Service (ToS) and IP Precedence
discussed further.
Expedited forwarding (EF), sometimes also called a fifth AF class, is dedicated to
premium services with low-delay and low-jitter requirements. EF class is not further
divided into subclasses. Seeing the fact that the EF class is offered hard QoS guarantees,
there is a chance of starvation of other classes, which cannot be allowed due to the QoS
mission. Therefore, the traffic exceeding the agreed bandwidth and shape is discarded
in case of congestions.
It can be seen in Table 2.3 that the highest DSCP priorities are not bound to any
PHB class and that they belong to the control traffic. This ensures that the protocol
signalling and network management is always treated with the highest priority. This has
to be taken into account in case of stringent QoS requirements.
The DSCP mapping is designed with regard to backward compatibility with its pre-
decessor ToS and IP precedence fields defined in [1]. IP precedence fields define 7 priority
classes directly corresponding to AF1 through AF4, EF and reserved classes for traffic
management. ToS defines the specific requirements of the traffic class, such as maximum
reliability or minimum delay. The bit ordering is shown in Table 2.4. Although the
2.4. Quality of Service 25
Table 2.3: IP precedence and DSCP mapping to DiffServ classes
IP Precedence (3 bits) DSCP (6 bits)
Name Value Bits PHB Class Drop CP DSCP Bits
Selector Precedence Name (Decimal)
Routine 0 000 Default - - Default 000 000(0)
1: Low AF11 001 010(10)
Priority 1 001 AF 1 2: Medium AF12 001 100(12)
3: High AF13 001 110(14)
1: Low AF21 010 010(18)
Immediate 2 010 AF 2 2: Medium AF22 010 100(20)
3: High AF23 010 110(22)
1: Low AF31 011 010(26)
Flash 3 011 AF 3 2: Medium AF32 011 100(28)
3: High AF33 011 110(30)
1: Low AF41 100 010(34)
Flash
4 100 AF 4 2: Medium AF42 100 100(36)
Override
3: High AF43 100 110(38)
Critical 5 101 EF 5 - EF 101 110(46)
Internetwork
Control
6 101 - - - - 48-55
Network
Control
7 111 - - - - 56-63
Table 2.4: IP precedence and ToS fields
0 1 2 3 4 5 6 7
IP Precedence Type of Service (ToS) Unused
concept of the ToS tagging is obsolete, it must be understood as some network devices
still acknowledge its classification.
2.4.3 Traffic Shaping and Policing
Traffic shaping (TS) is used on a DiffServ edge. The traffic transport over a DiffServ
domain is typically subject to a SLA. SLAs define the traffic shape and the transport
policy. Therefore, customer uses traffic shaping, by means of delaying traffic, to conform
to burstiness and bandwidth constraints. If TS fails to provide the conformance, traffic
policing (TP) mechanism performs corrective measures on the ingress to the DiffServ do-
main by means of dropping. A typical TS protocol is General Traffic Shaping (GTS) and
a typical TP protocol is Committed Access Rate (CAR). A comprehensive information
can be found in [28], [21], and [15].
2.4.4 Congestion Management
Congestion management (CM) is the core of the DiffServ architecture. Let us realise
that CM does not apply its measures unless a congestion occurs. In other words, if the
packets arrive to a network node with such a rate that they can be processed in wire
2.4. Quality of Service 26
speed, no priority treatment is necessary. On the contrary, if incoming traffic rate exceed
the processing capacity of the node packet queuing starts.
Hence, queuing is the first task of CM. In case of other than FIFO queuing, there
are more than one queues. Packet buffering to appropriate queues is based on the DSCP
code in the packet.
The latter part of the CM is the packet scheduling, i.e., polling of packets from
the queue and passing them either directly to the outgoing interface or output buffer.
Packet scheduling is a process where various algorithms are applied. Choosing a proper
packet-scheduling algorithm is a trade-off between a efficiency, fairness, determinism, and
starving of low-priority traffic. The main principles of packet scheduling are introduced
in the remainder of this subsection ranging from the simplest to the most complicated
and tailored to specific applications.
First-In-First-Out Queuing
First-in-first-out (FIFO) queuing/scheduling is a generic queuing algorithm where neither
differentiation nor prioritisation is employed. FIFO contains a single queue. FIFO is
an intrinsic queuing algorithm is no QoS capabilities are available with the regarded
device. Moreover, it is employed as a default queuing algorithm in powerful devices
unless configured otherwise.
Priority Queuing
Priority Queuing (PQ), also referred to as to Strict Priority (SP) queuing, is an extreme
scheduling algorithm. The scheduler servers the queues in a strictly priority manner,
i.e., the higher-priority queue is served until it is empty. Only then a lower-priority
queue is served. This algorithm is able to provide perfect service to premium QoS class,
such as EF, but disposes all other classes to starving. For this reason, this algorithm is
recommended only for special applications.
Round Robin Scheduling
Round Robin (RR) scheduling delivers a basic approach to long-term fairness. RR serves
the queues in a cyclic manner and in its generic form treats all flows equally. There are
several modifications of the RR scheduler.
Round Robin (RR) as a generic algorithm has significant lag in case of immense num-
ber of queues. Therefore, modifications are included which maintain the indices of
non-empty queues in a departure queue. Then those non-empty queues are served.
The algorithm is suitable for fixed-length cells.
Weighted Round Robin (WRR) extends the RR algorithm by weights assigned to
each queue. Given every flow i is assigned a weight wi then the algorithm assigns
2.4. Quality of Service 27
the flow i rate
gi =
wi∑
j
wj
.
The assigned rate represents how many packets will be polled from a queue in
one round. Departure queues are used in the same way as with RR. Again, this
algorithm is suitable for fixed-length cells.
Deficit Round Robin (DRR) is an extension for variable-length packets. The fair-
ness is delivered by a deficit counter which accounts for a deficit of each queue. A
deficit is caused by the scheduler which denies to serve a queued packet because
the dedicated rate is too low to serve the packet of the given length. The packet is
polled once the rate together with the deficit counter is sufficient.
The RR schedulers are said to be packet-wise fair as from a long-term perspective,
the fairness is met. However, fairness is not guaranteed in IP networks due to different
packet lengths. On that account, RR is not suitable in case of stringent QoS parameter
requirements. On the other hand, the algorithm performs well in Asynchronous Transfer
Mode (ATM) networks, where the cells2 are of equal lengths [15, 138]. Cisco refers to
these bandwidth provisioning algorithms as to Custom Queuing (CP) in [28].
Generalised Processor Sharing Algorithms
Generalised Processor Sharing (GPS) is an algorithm which provides ideal fairness among
the competing flows. It allocates the whole service capacity to the backlogged flows using
min-max allocation scheme [15, 141]. The algorithm is based on a fluid model, i.e., the
data units can infinitesimally short. The principle of the scheduler follows.
Let us have a set of flows such that F = {Fi, i = 1...N}. Each flow is assigned a
minimum rate ri with the bounding condition
N∑
i=1
ri ≤ r, where r is the service rate of
the scheduler. Let us have a set of backlogged flows B(t) at time t. According to GPS
definition in [15, 144], the backlogged session i will be allocated a service rate
gi(t) =
ri∑
j∈B(t)
rj
× r.
GPS cannot be realised in its generic form due to finite-length packets which need to
be served at once. However, many practical implementations exist. They are based on a
global function virtual clock which tracks the progress of GPS. The packets are scheduled
based on a virtual finish time in increasing order. Virtual finish time determines when
the packet scheduling will be finished [15, 146].
Weighted Fair Queuing (WFQ) is the most powerful implementation of GPS. Cisco
routers recommend WFQ as a default scheduling protocol if QoS is required. WFQ
2In IP networks, we refer to packets. In networks of ATM-type and synchronous networks where the
traffic is provided with a slot to insert data we speak of cells.
2.4. Quality of Service 28
performs well even without classification of flows by DSCP. As shown in [21, 470],
WFQ can maintain up to 4096 queues and needs no configuration in casual applic-
ations.
The functional difference to GPS stems from the drawback that the scheduler
cannot switch to polling latter packet once the former packet is being polled even
though the latter packet would have lower virtual finish time. Let tj be an arrival
or departure event and let τ be an arbitrarily long period between the events. Then
the virtual time of the scheduler evolves as
V (0) = 0,
V (tj−1 + τ) = V (tj−1) +
rτ∑
i∈Bj
ri
, for τ ≤ tj − tj−1, j = 2, 3, ... (2.1)
It can be deduced from the definition of the virtual time that the higher the load
the slower the time flow. Moreover, it can be said that the speed of the virtual
time flow is
dV (tj + τ)
dτ
=
r∑
i∈Bj
ri
.
The motivation for this is obvious: the higher load causes slower traffic processing
which has to be reflected in the virtual finish time. Finally, virtual finish time for
the k-th packet of i-th flow is
Fi,k = max{Fi,k−1, V (ai,k)}+
Li,k
ri
,
where ai,k is the arrival time of the packet.
Virtual Clock Queuing (VCQ) is a simplified version of the WFQ. VCQ takes real-
time as a global virtual time, hence V V C(t) = t, for t ≥ 0. Therefore, the virtual
finish time estimation reduces to
Fi,k = max{Fi,k−1, ai,k}+
Li,k
ri
.
Despite simplicity, this algorithm has a drawback in that it ignores the scheduler
load. As a result, a flow may receive a limited service even though it did not edge
out any other traffic. It is shown in [15, 151] how a flow has to pay for scheduling
if an idle flow starts to transmit.
Self-Clocked Fair Queuing (SCFQ) is a version of WFQ in which the global virtual
time is updated as V SCFQ(t) = Fj,l, if the lth packet of session j departs at time
t ≥ 0. Therefore, the virtual finish time is
Fi,k = max{Fi,k−1, V
SCFQ(ai,k)}+
Li,k
ri
.
2.5. Analytical Framework for QoS Modelling 29
SCFQ performs better then VC as the virtual time progress is closer to the one
of WFQ. However, while V V C(t) ≤ V (t), the same does not hold for V SCFQ(t).
Consequently, latency of the SCFQ scheduler can be higher than for the WFQ
scheduler.
There a few more scheduling algorithms, such as Worst Case Weighted Fair Queuing
(WF2Q), and its extended version WF2Q+ which are however behind the scope of my
work. As will be shown in Chapter 4, the scheduling algorithms of the greatest signi-
ficance and vast availability with routers are FIFO, PQ, and WFQ. These scheduling
algorithms will be also regarded in Chapter 6.
2.4.5 Congestion Avoidance
Congestion Avoidance (CA) mechanism are support DiffServ mechanisms which prevent
congestions. The router buffers are of finite length. For instance, Cisco 2610 Series
have 75-packet long input buffer. The idea of CA mechanisms is diminishing the risk of
congestions prior to their appearance by discarding queued packets.
The intrinsic mechanism is Tail Drop (TD) which is enforced by the queue lengths
themselves. The administrator can enforce some drop precedence by engineering different
queue lengths for different classes. This is usually done with PQ scheduling where the
highest priority queue has the shortest queue to diminish starving
Random Early Detection (RED) is a protocol which randomly picks queued packets
and discards them. There are several modification of this algorithm, such as Fair RED
(FRED), Stabilised RED (SRED), Weighted RED (WRED) which can be found in [28],
[21], or [15].
2.5 Analytical Framework for QoS Modelling
Many frameworks exist for performance analysis of communication networks and net-
working devices. In the following, reasoning for the choice of network calculus for the
purpose of the performance modelling of industrial communication is provided.
For instance, Manita et al. in [44] provide a Markov model of Ethernet Switch. They
show the statistical behaviour of the device depending on the incoming flow characterist-
ics and inner switch architecture. Markov models are probabilistic, and thus, they offer
rather statistical results than hard-real-time guarantees.
Strikant introduced in [57] a framework for analysis of Internet congestion, methods
to analyse the network stability, and traffic shaping algorithms which diminish the con-
gestions in feedback-control manner. Strikant’s approach is close to worst-case analysis
in some aspects, e.g. see [57, 17]. Nevertheless, the idea of the feedback traffic control
cannot be used in industrial communication; the communication patterns are usually
given in advance. Moreover, the investigations are focused on TCP transport layer (L4),
while real-time industrial communication is based predominantly on UDP.
2.5. Analytical Framework for QoS Modelling 30
Temporal logic is a method used for real-time system modelling providing the worst-
case-execution-time (WCET) analysis. It is an extension to the propositional logic.
Temporal logic adds new operators (see [4]) and thus considers temporal aspects of the
modelled system. Temporal logic could outperform other approaches provided that a
detailed knowledge of the modelled system is available. Therefore, the method is not
suitable for modelling of network devices, whose complexity is enormous, and the inner
architecture is not known precisely. The presented approach uses grey-box modelling,
i.e., the system architecture and its parameters is partially known, but investigation of
input-output observations are still important for successful modelling.
Network calculus is a theory of deterministic queuing systems found in computer
networks [13]. It is a system theory that provides us with an analytical insight into
the performance analysis of communication networks. Network calculus is a worst-case
analysis, thus, it offers upper bounds of the flow characteristics and the network perform-
ance. The basics of network calculus have been defined by Cruz in [18] and [19], referred
to as to Calculus for network delay. Cruz’ approach uses convex optimisations to infer
characteristics of the corresponding service elements, i.e., shapers, buffers, etc. Le Bou-
dec and Thiran brought network calculus to the current state by basing it on min-plus
algebra. Chang provides further extensions in [14]. A comprehensive summary on evolu-
tion of network calculus can be found in [27]. Network calculus has been further extended
by several research groups in different respects. For instance, Giacomazzi and Saddemi
propose in [34] a novel framework: bounded-variance network calculus, which utilises
min-plus algebra for statistical upper-bound evaluation. Echagu¨e and Cholvi define in
[23] service curves for complementary scheduling policies, such as Earliest Deadline First
(EDF). Ayyorgun and Cruz propose in [2] and [3] component service algorithms for sys-
tems with losses. Schmitt and Zdarsky propose new concatenation approaches in order
to obtain tighter worst-case bounds in [52] and in [53] and define sufficiently strict service
curves allowing analysis of non-FIFO systems in [54]. Furthermore, they provide DISCO
Network Calculator in and [55]. Fidler proposes extended pay-burst-only-once principle
in [25] and extends network calculus to be able to model complex infrastructures in [26].
2.5.1 Introduction to Network Calculus and (min,+) Algebra
Network calculus is based on min-plus algebra. Min-plus algebra is a commutative dioid
formed by a structure (R∪{+∞},∧,+), while in traditional algebra we work in a struc-
ture (R,+,×), which is a commutative field. Min-plus algebra has various interesting
properties which can be found in Chapter 3 in [13].
In control theory and electrical engineering, one is used to working with signals and
systems under the concept of system theory. The theory of linear systems is the basic
subset of the system theory. One of the properties of a linear system is the principle
of superposition: response to the sum of two input signals is equal to the sum of the
responses of the system to two input signals transmitted separately. This is the main
prerequisite to the proposition of signal convolution defined as
2.5. Analytical Framework for QoS Modelling 31
y(t) = (h⊗ x)(t) =
∞∫
−∞
h(t− τ)x(τ)dτ, (2.2)
where y(t) is the response to the input signal x(t) and h(t) is the impulse characteristic
of the system.
In network calculus, thanks to min-plus algebra, a linear system possesses the prop-
erty of superposition: system response to the minimum of two input flows is the minimum
of the responses of the system to each input taken separately [13]. An illustration of this
can be shown on a buffer. Let us have two flows described by rate functions x1(t) and
x2(t), respectively. The rate function represents the number of bits per second passing
through a point of observation at the time t. If one considers the responses of the system
to these two signals transmitted separately and take the minimal response, it will be
equal to the response of the minimum of the two input signals.
Having satisfied the superposition property, one can also introduce signal convolution
analogically to the classical system theory; operation of summing is replaced by the
operation of minimum and the operation of multiplication is replaced by summing:
R∗(t) = (β ⊗R)(t) ≤ inf
0≤s≤t
{β(t− s) +R(s)}. (2.3)
You may note that there are also differences apart from analogies. Sharp equality of
the output signal holds only for some types of service curve, such as greedy shapers, see
[13, 38].
In the further, a way of flow and system description will be shown. Furthermore,
it will be explained, how the elementary service elements, such as delays, buffers, and
multiplexors, can be concatenated to form complex structures. Finally, the way to end-
to-end performance guarantee can be evaluated will be shown.
2.5.2 Arrival Curves
A flow can be characterised by a rate function r(t) representing a number of bits per
second passing through a point of observation at a moment t. In network calculus, it
is more convenient to characterise a flow by a cumulative rate function R(t) which is
defined as
R(t) =
t∫
0
r(τ)dτ.
The reason for employing the cumulative rate function is that one has information of
the system history. System history is important as most of the service elements have
memory in form of unfinished work. As it always holds that r(t) ≥ 0, R(t) is a wide-sense
increasing function.
As network calculus deals with worst-case performance analysis, one works with upper
bounds of the flow and system characteristics. With flow characteristics, one defines an
2.5. Analytical Framework for QoS Modelling 32
arrival curve which upper-bounds the cumulative rate-function by a definition introduced
in [13, 8]:
Definition 2.1 (Arrival Curve). Given a wide-sense increasing function α defined for
t ≥ 0 (namely, α ∈ F), we say that the flow R is constrained by α if and only if for all
s ≤ t:
R(t)−R(s) ≤ α(t− s).
We say that R has α as an arrival curve, or also that R is α-smooth.
The most often used arrival curve is the affine function γr,b defined as
γr,b(t) =
{
rt+ b if t > 0
0 otherwise,
introduced in [13, 128]. The r parameter represents the average rate of the flow
throughout the period of observation. The b parameter represents backlog (virtually
pending traffic) of the flow. An explanation can be shown on a leaky bucket principle:
There is a bucket serving a fluid flow. A γr,b bucket has the volume of b and a hole in
the bottom that lets the fluid flow out with the rate r. The fluid flows into the bucket
from the top. If the flow passing through bucket survives with the bucket volume b, it is
γr,b-conforming. In the latter case, the flow is γr,b-non-conforming.
The α-conformance can be given by the flow substance. This is the case for pre-
scheduled cyclic traffic. Pre-scheduling is a casual method in cyclic industrial commu-
nication. On the other hand, if the α-conformance is not an intrinsic characteristic of the
regarded flow, it can be shaped by a traffic shaper. Please note, that α-conformance is
a qualitative (not quantitative) feature; it is not required to shape the flow to infeasible
characteristics. The point is to determine loose enough, but still reasonable upper-bound
to the flow.
2.5.3 Service Curves
SR(t)
R∗(t)
Figure 2.15: Service element
Flow characteristics may change after having passed a service element. Service ele-
ment is an atomic input-output block depicted in Figure 2.15 described by a service
curve. One can describe more complex system using these service elements. Defini-
tion 2.2 introduced in [13, 23] defines a service curve of a service element.
Definition 2.2 (Service Curve). Consider a system S and a flow through S with input
and output function R and R∗. We say that S offers to the flow a service curve β (if
2.5. Analytical Framework for QoS Modelling 33
and only if β ∈ F) and
R∗(t) ≥ (R⊗ β)(t).
There are several min-plus functions representing different types of the service ele-
ments.
Constant Rate Node
Traffic shaper is a service element which imposes no additional delay on the flow passing
through the element.
β(t) = λR(t) =
{
Rt if t > 0
0 otherwise
Service elements having this function for service curve can be General Processor
Sharing (GPS) nodes, Guaranteed Rate (GR) schedulers, traffic shapers.
Guaranteed Delay Node
Guaranteed delay node imposes no rate limitations but has a bounded delay which can
be imposed on the flow passing through the node.
β(t) = δR(t) =
{
+∞ if t > T
0 otherwise
Rate-Latency Node
Rate-latency server is the most frequent service element which combines both features
of the aforementioned service elements.
β(t) = βR,T = R[t− T ]
+ =
{
R(t− T ) if t > T
0 otherwise
Hence, it holds that βR,T (t) = (λR⊗ δT )(t). Rate-latency service curve is the corner-
stone of the service curves and is suitable for modelling of most of the single-input-
single-output (SISO) service nodes. It is also a generic service curve for modelling of
multiplexors.
Multiplexors are multi-input-single-output (MISO) service nodes. A multiplexer has
a total service curve β(t) which can be offered to the aggregate of the passing flows.
A service curve offered to a particular flow is a portion of the total service β(t). By
convention, it will be considered in the following a service offered to the flow F1 referred
to as to β(t)F1 . The flow F2 will be considered as a flow competing for the service of the
multiplexer. Furthermore, it will be assumed that the flows are α-smooth.
2.5. Analytical Framework for QoS Modelling 34
Arbitrary Multiplexer
Arbitrary multiplexer, also referred to as to first-come-first-serve (FCSF) in [18] or FIFO
multiplexer, does not favour any flow over another. That means that both flows have
equal chances. Therefore, in worst case, it must be considered that the service curve
βF1 is the leftover after serving the arbitrating flow F2. The following theorem is a
modification of the Corollary 2.4.1. in [13, 106]:
Theorem 2.1 (FCFS Multiplexer Service Curve). Let us have a FCFS multiplexer with
a total service curve βR,T (t) passed through by the flows F1 and F2. The service curve
offered to the flow F1 is
βF1(t) = [β(t)− α2(t)]
+.
Provided that the flow F2 has γr2,b2 as an arrival curve, the service curve offered to the
flow F1 is
βF1(t) = (R− r2)
[
t−
RT + b2
R− r2
]+
= β
R−r2,
RT+b2
R−r2
(t).
T TF1 t
R(t)
0
β(t)
βF1(t)
R = tan(ω)
RF1 = tan(ω1)
ω ω1
Figure 2.16: Service curve of arbitrary multiplexer
It is obvious from Figure 2.16 that the multiplexer introduces additional latency
caused by multiplexing. The rate offered to the flow is decreased by the rate of the
competing flow.
Strict Priority Multiplexer
Strict priority (SP) multiplexer favours one flow absolutely over the other. However,
there is an additional latency caused by the fact that a packet of the low-priority flow is
being processed at the moment of arrival of a packet of the high priority flow. Therefore,
the packet must wait until the low-priority packet is served. Corollary 6.2.1 introduced in
[13, 208] infers service curve for a SP multiplexer with two inputs. The service curve was
extended to multiple inputs in [30]. The following theorem is based on these resources
Theorem 2.2 (SP Multiplexer Service Curve). Let us have a SP multiplexer with a total
service curve βR,T (t) passed through by a flows Fi(t), i = 1...n bounded by arrival curve
αi(t), where the priority decreases with increasing i. The service curve offered to the flow
2.5. Analytical Framework for QoS Modelling 35
Fi(t) is
βFi =

β(t)− lLmax − ∑
0<j<i
αj(t)


+
, i = 1, · · · , n− 1,
βFn =

β(t)− ∑
0<j<n−1
αj(t)


+
.
where lLmax is the maximum packet length among all flows. Provided that the flows Fi are
constrained by αi(t) = γri,bi(t) and the service curve offered to the aggregate of the flows
is β(t) = βR,T (t), service curve offered to the flow Fi is
βFi =

R− ∑
0<j<i
ri



t−
RT +
∑
0<j<i
bi
R−
∑
0<j<i
ri


+
, i = 1, · · · , n − 1,
βFn =

R− ∑
0<j<n−1
ri



t−
RT +
∑
0<j<n−1
bi
R−
∑
0<j<n−1
ri


+
.
Figure 2.17 shows the service curve of the SP multiplexer offered to the flow F1. The
rate remains the same, while the latency is extended. The service curve offered to any
flow Fi, i > 1 has a similar effect as in Figure 2.16.
T TF1 t
R(t)
0
β(t) βF1(t)
R = tan(ω)
ωω
Figure 2.17: Service curve of strict priority multiplexer
Weighted Fair Queuing (WFQ) Multiplexer
WFQ multiplexer dedicates a certain portion of bandwidth to every flow based on the
predefined weights φ. If the total service node capacity is C, then the ith flow has
possesses processing capacity
ci =
φi∑
j
φj
C.
Georges defines in [30] a simplified service curve for a WFQ multiplexer which is
introduced in the following theorem.
2.5. Analytical Framework for QoS Modelling 36
Theorem 2.3 (WFQ Multiplexer Service Curve). Let us have a WFQ multiplexer with
a total service curve βR,T (t) passed through by the flows F1 and F2 with dedicated weights
φ1 and φ2, respectively. The service curves offered to the flows are
βF1(t) = R
φ1 − l
L
max
φ1 + φ2 − lLmax
[
t−
φ2
R
]+
= β
R
φ1−l
L
max
φ1+φ2−l
L
max
,
φ2
R
(t),
βF2(t) = R
φ2 − l
L
max
φ1 + φ2 − lLmax
[
t−
φ1
R
]+
= β
R
φ2−l
L
max
φ1+φ2−l
L
max
,
φ1
R
(t).
Again, the service curves βF1(t) and βF2(t) are analogous to the one in Figure 2.16.
The WRR scheduling can be represented in the same way, despite that WFQ schedul-
ing provides more effective utilisation of resources. Nevertheless from the worst-case
perspective the performance seems equal.
2.5.4 Performance Bounds
Let us consider once more the Figure 2.15. The most important benefit of the network
calculus, analogously to the classical signal and system theory, is that the characteristics
of the flow leaving a system, can be derived from the characteristics of the incoming
signal and the system. In network calculus, the characteristics are given by the arrival
curve and the service curves. Therefore, Le-Boudec introduces in [13, 28] the following
three theorems representing the basic performance bounds: backlog bound, delay bound
and the output bound.
Theorem 2.4 (Backlog Bound). Assume a flow, constrained by arrival curve α, tra-
verses a system that offers a service curve β. The backlog R(t)−R∗(t) for all t satisfies
R(t)−R∗(t) ≤ sup
s≥0
{α(s)− β(s)}.
Backlog bound retrieves the size of the buffer needed in the service node to be able to
process the incoming flow. Practically, the backlog bound is the largest vertical distance
between the service curve and arrival curve provided that the arrival curve is concave
and the service curve is convex. In case of use of affine and rate-latency functions, the
prerequisite is fulfilled.
This bound is studied by designers of networking devices. Based on the analysis the
maximum buffer length can be inferred provided the flow characteristics for which the
device is designed. The analysis optimises the ratio between resources and packet-loss
rate.
Theorem 2.5 (Delay Bound). Assume a flow, constrained by arrival curve α traverses
a system that offers a service curve β. The maximum delay dˆ(t) for all t satisfies:
dˆ(t) ≤ h(α, β), where
h(α, β) = sup
s≥0
{ inf
τ≥0
{α(s) ≤ β(s+ τ)}}.
2.5. Analytical Framework for QoS Modelling 37
Practically, the maximum delay is the longest horizontal distance between the service
curve and arrival curve provided that the arrival curve is concave and the service curve
is convex, and that the flows are FIFO. For non-FIFO flows the delay is delimited by the
maximum duration of the congested period, as proposed in [54]. However, in our case
the theorem’s prerequisites are satisfied.
Theorem 2.6 (Output Bound). Assume a flow, constrained by arrival curve α, traverses
a system that offers a service curve of β . The output flow is constrained by the arrival
curve
α∗ = α⊘ β.
The operation of ⊘ represents min-max deconvolution defined as:
(f ⊘ g)(t) = sup
u≥0
{f(t+ u)− g(u)}. (2.4)
The proofs for the performance bounds can be found in [13].
2.5.5 From Generic Service Nodes to Complex Topologies
Having described the generic service elements, modelling of more complex structures
is possible if there is an approach which can employ the performance bounds of the
generic service nodes to provide end-to-end performance bounds. The observed end-to-
end performance bounds are:
End-To-End Delay Bound provides the maximum end-to-end latency of a bit of the
observed flow traversing the modelled topology.
End-To-End Service Curve is an equivalent service curve providing the same service
as the topology formed by the generic service curves. End-to-end service curve only
conforms to one of the generic service curves if the parameters of the competing
flows are declared constant and their parameters are known.
End-To-End Output Bound provides the performance bound of the observed flow
leaving the complex topology. End-to-end service curve is a prerequisite for deriving
this bound.
Complex topologies can either represent a network of general routers where every
router is described typically by a single rate-latency service curve. In this case the
complex topology refers to a single network device and the generic service nodes to
particular logical parts of the networking device. This approach is substantiated by
the fact that the granularity of the time-scale is finer for industrial applications than
for typical QoS applications of the telecommunication networks as was introduced in
Section 2.2.
There are several approaches to concatenation of the generic service node to form a
complex system traversed by a flow along a given path and thus inferring the end-to-
end performance bounds. Schmitt et al have provided a classification of the available
approach in [52]:
2.5. Analytical Framework for QoS Modelling 38
Total Flow Analysis (TFA)
TFA is the most straightforward method of the end-to-end bounds’ determination. The
idea is to determine delay bounds for the sum of the flows traversing each service node.
The end-to-end delay is then equal to the sum of the nodes’ delays along the flow’s path.
This approach is shown in [19] and [29].
Disadvantageous of this method is that only end-to-end delay bound can be obtained;
there is no way to infer the end-to-end service curve. Moreover, the tightness of the
bounds are questionable, as neither pay burst only once nor pay multiplexing only once is
considered. On the other hand the method makes no presumption on the FIFO behaviour
of the flows.
Separated Flow Analysis(SFA)
SFA method separates the services offered to each microflow at each node and thus infer
the left-over service available to the regarded flow. In the adjacent step the left-over
service curves are concatenated using concatenation theorem. Using concatenation is
advantageous as the bounds determined by the concatenation theorem are tighter than
those inferred by the TFA analysis. The phenomenon is referred to as pay burst only once
(PBOO). It is based on the observation that the total additional burst caused by single
concatenated service nodes is not equal to the sum of the bursts [13], [52]. However, SFA
makes an assumption of FIFO behaviour of the flows and Rizzo and Le Boudec show in
[51] that PBOO does not hold for non-FIFO flows. Consequently, Schmitt et al. show in
[54] that the PBOO retains even under non-FIFO condition under certain presumptions.
The following theorem is based on Theorem 1.4.6 in [13]. The proof can be found
ibidem.
Theorem 2.7 (Service Curve Concatenation). Let us have two systems S1 and S2 with
service curves β1 and β2, respectively. The system S formed by concatenation of the
systems S1 and S2 offers the flow a service curve
β(t) = β1(t)⊗ β2(t).
β(t)
β1(t) β2(t)R(t) R
∗(t)
Figure 2.18: Service curve concatenation
Furthermore, it can be shown that under the presumption of concatenation of rate-
latency service curves the following theorem holds.
2.5. Analytical Framework for QoS Modelling 39
Theorem 2.8 (Rate-Latency Service Curve Concatenation). Let us have two systems
S1 and S2 with service curves β1(t) = βR1,T1(t) and β2(t) = βR2,T2(t), respectively. The
system S formed by concatenation of the systems S1 and S2 offers the flow a service
curve
β(t) = βmin{R1,R2},T1+T2(t).
SFA is the principal method of the further modelling and the Theorem 2.7 will be
used for inferring the models of the network devices.
Pay Multiplexing Only Once (PMOO-SFA)
PMOO-SFA is an extension of the SFA. The core idea is that under the assumption of
cascaded FCFS aggregation the burst caused the aggregations can be paid only once.
In other words, it is possible to infer the e2e service curve first and subsequently apply
employ the multiplexing.
The main idea can be explained on an example of two concatenated FCFS nodes
given by β1 and β2 into which two flows are multiplexed (α1 and α2), while we observe
α1 and consider α2 as a competing flow. The PMOO approach considers the following
two expressions of βR1(t) as approximately equally valid:
[(β1 ⊗ β2)− α2]
+ ≈ [β1 − α2]
+ ⊗ [β2 − (α2 ⊘ β1)]
+.
Details on PMOO are introduced in [52] and [55]. This approach will be used for
modelling of networking devices. The reason is that, as it will be obvious in Chapter 6,
modelling of a networking device under certain circumstance becomes extremely fiddling
without providing rewarding precision, contrary to PMOO approach.
Optimisation-Based Bounding (OBB)
In some cases the SFA and PMOO-SFA fail to provide satisfying results. SFA, and
even more PMOO-SFA, ignore a certain amount of the topological knowledge, i.e., there
could be more topologies recovered from the equational notation of a former topology.
This may result in overlooking certain special aspects which may play significant role
for the bound tightness. For such cases a different approach can be taken. OBB is a
method, introduced in [52], which no longer uses the concatenation theorem and returns
to the roots of network calculus: min-max optimisation. The idea is that the arrival
curves, service curves and the topology of them is modelled by a set of constraints. The
optimisation function favours the tightest bounds. As a result of this approach, the burst
is paid only once, yet, at the node where it is the most effective. The method operates
with strict service curves (see [13, 27]) which cannot be always guaranteed. Therefore,
this approach will not be regarded in further.
Extended PBOO
Another approach to obtaining a tight end-to-end service curve is introduced in [25]. The
main idea of the approach is that all the flows are γr,b-constrained and all the service
2.5. Analytical Framework for QoS Modelling 40
elements of the path are of βR,T -type then the end-to-end service curve is also the βR,T -
type such that: (i) the rate is the minimum of the remaining rates offered to the observed
flow along the path, and (ii) the latency is given by the sum of the latencies along the
path plus the burst contribution from the interfering flows. The burst contribution is
given by the ratio of the burst parameter of the interfering flow and the smallest rate
experienced on the common part of the path. The resulting service curve is proposed as
βi(t) = min
j∈Ji

Rj − ∑
k∈Kj
rk

 ·

t−∑
j∈Ji
T j −
∑
k∈Ki
b
jmin
k
minj∈ Ji,k
[Rj ]


+
, (2.5)
where i is the flow index, j is the index of the service element βjR,T along the path and
k is the index of flows interfering the flow i at service element j. Ji is the set of service
elements of flow i, Kj is the set of flows interfering with the flow i at service element j,
Ki =
⋃
j∈Ji
Kj is the set of all interfering flows along the path of flow i, and Ji,k = Ji∩Jk
is the set of service elements used both by flow i and flow j.
The use of this formula is straightforward and can be easily applied to most of the
problems introduced in this work. A prerequisite is that all the service elements are
FCFS. Extended PBOO, contrary to PMOO, addresses flow aggregation and deaggreg-
ation along the path which makes this approach attractive.
2.5.6 Complementary Issues in Network Calculus
There are several more issues complementary to modelling of real-world systems which
will be shortly introduced.
Flow Packetisation
So far, bit stream of the flows was considered and no presumption on the packet-based
processing was incurred despite the fact that IP-based networks for which the calculus
is intended are packet-based.
Boudec et al introduce in [13] the term of packetizer, i.e., a block receiving a stream of
bits and emitting packets. This means that the output of a packetizer R∗(t) experiences
step-wise changes. In other words, the cumulative functions increases by the number of
bits in a packet at the moment of obtaining the last bit of the packet at its input in an
infinitely short time. Hence, this behaviour essentially contributes to the output flow
burstiness.
The effect of packetizer is well documented by Theorem 1.7.1 in [13, 52]. It considers a
system with a service curve β(t) with appended packetizer P (L). Such a systems is said to
have the same maximum latency dˆpac as if the systems had no packetizer, i.e., dˆpac = dˆ.
The backlog bound bˆ increases by the maximum packet length, i.e., bˆpac ≤ bˆ + lmax.
Burstiness of the output bound α∗(t) increases by lmax, i.e., α
∗
pac(t) = α
∗(t)+ lmax1{t>0}.
The overall service curve incurring the packetizer yields βpac(t) = [β(t)− lmax]
+.
The last statement is probably the most important for this work. It says that the
service curve parameter T is increased by lmax
R
, i.e., the effect is purely quantitative
2.5. Analytical Framework for QoS Modelling 41
and has not qualitative influence. Seeing the fact that the parameters of the modelled
devices are obtained by empirical identification, further considerations of the packetizer
are neglected.
Systems with Losses
Obviously, at congested service elements with insufficiently long buffer lengths must
experience traffic loss. Systems with losses in network calculus are studied in [13, 251].
However, it is admitted by the authors that the conclusions are rather preliminary and
should serve as baseline for further investigations. A result of importance for further
work is definition of the deterministic bound on loss rate in a lossy system.
Definition 2.3. Loss Rate Function Loss rate function l(t) represents number of lost
bits in a lossy service node within an interval (0, t〉. Loss rate function is defined as
l(t) =
L(t)
R(t)
,
where L(t) is the cumulative rate function representing the lost traffic and R(t) is the
cumulative rate function representing the arriving traffic.
Theorem 2.9 (Bound on Loss Rate). Consider a system with a storage capacity X,
offering to a flow with an arrival curve α(t) a service curve of β(t). Then the bound on
loss rate lˆ(t) is
lˆ(t) = 1− inf
0<s≤t
β(s) +X
α(s)
. (2.6)
Extended results stemming from Definition 2.3 and Theorem 2.9 are introduced in
Section 5.2.
Chapter 3
Dissertation Objectives
Having summarised the state of the art and state of the practice in the related areas,
motivation for the research is defined and the main objectives to be achieved within the
dissertation are introduced. Finally, the planned research steps are outlined.
3.1 Motivation for the Research
In 2006, a consortium of the VAN project’s work package dedicated to the real-time
aspects was established. The consortium was to formulate the main contribution to the
final project solution with regard to the real-time aspects of the future virtual automation
networks. An extensive survey into roadmaps of the industrial partners contributing to
the work package (Phoenix-Contact, Siemens AG, and Schneider-Electric) was performed
resulting in definition of several dozens of use cases reflecting future customer’s needs
and overall technical research challenges. The survey is introduced in [5]. The summary
was presented in a form of a grid showing correlation between the type of real-time
communication and the communication scope. The grid is shown in Figure 3.1.
Non-RT RT IRT
Local
-Required: Yes
- Exists: Yes
- Example: all Ethernet fieldbuses
- Utilization: engineering, parametrisation,
web & OPC servers
- Required: Yes
- Exists: Yes
- Example: Profinet IO, Powerlink
- Utilization: cyclic run-time data exchange,
alarm handling
- Required: Yes
- Exists: Yes
- Example: Profinet IO (IRT), Sercos III
- Utilization: motion control, high-speed
applications
Enterprise
- Required: Yes
- Exists: Yes
- Example: Profinet CBA, etc.
- Utilization: parametrisation, SW update,
MES, Recipies
- Required: Yes
- Exists: No
- Example: Profinet (RT over UDP)
- Utilization: resource reservation, cell
synchronisation, alarm handling
- Required: No
- Exists: No
WAN
- Required: Yes
- Exists: Yes
- Example: Profinet CBA, etc.
- Utilization: remote monitoring, remote
diagnosis
- Required: Possibly
- Exists: No
- Example: Profinet (RT over UDP), MPLS,
SDH
- Utilization: telecontrol, telepresence
- Required: No
- Exists: No
C
o
m
m
u
n
ic
a
ti
o
n
S
c
o
p
e
Type of Industrial Communication
Figure 3.1: Real-time communication grid
The type of communication is coarsely divided to non-real-time, real-time, and isochronous-
real-time communication as introduced in Section 2.2.
The scope is divided into local, enterprise, and public. Local scope represents an
Ethernet-based switched topology with up to several dozens of devices. This communic-
ation scope is nowadays dominated by Ethernet-based and legacy fieldbuses. Enterprise
scope represents a segmented IP-based infrastructure accommodating switches, routers,
3.1. Motivation for the Research 43
firewalls, gateways, etc. An important prerequisite is that all devices can be managed by
local administration and no third parties are involved. WAN scope represents a network
topology including public infrastructures provided by Internet service providers (ISP)
and telecommunication service providers (TSP), i.e., with no administration means.
Classification of each cell of the communication grid was performed by cross-checking
if the solutions for the case exists and if customers require the solution. The resulting
categories are the following: green represent existing solutions, yellow represent non-
existent, yet required solutions, and orange represent those neither existent nor required.
The market obviously expects future existence of at least SRT communication level in
enterprise networks. Real-time communication over public networks would be welcome
according to many use cases. However, despite the fact that it is technically feasible from
providers based on SLA agreements, it is hardly affordable by industry. In other words,
the existing gap being a fertile research field at the same time is the SRT communication
in enterprise networks.
As enterprise networks require IP-based communication and hence must adopt L3
networking devices, the existing real-time mechanisms provided by Ethernet-based field-
buses have to be extended by IP-based QoS mechanisms.
Many experts may argue that for the existing applications, local real-time networks
based on L2 networking devices are sufficient and using L3 networking devices is question-
able. However, the following list of arguments shows that research in field of enterprise
networks accommodating L3 networking devices faces the future challenges in industrial
automation in a more flexible way:
Network segmentation support of routers provides a possibility to segment net-
works to domains and subdomains. One could argue that with VLAN technology
segmentation is possible also with switches. However, in larger scales this proves in-
efficient. Segmentation of domains prevents broadcast traffic and provides a better
framework for limiting flow aggregation.
Abundance of interfaces gives routers a better possibility of interlinking more dis-
tant domains. For instance, using wireless connections, or optic fibres. Mid-range
routers have modular interfaces which can accommodate any WAN technology
available. This proves effective for spatially distributed real-time domains, where a
mere Ethernet connection is not possible. Distributed real-time domains are usu-
ally applications with fast spreading media. For instance, alerting for wind parks,
pressure control in pipelines, etc.
Granular QoS architecture. DiffServ architecture used in this dissertation was defined
primarily at L3. A complete summary of DiffServ architecture is given in Sec-
tion 2.4. L2 accommodates usually only 802.1q and 802.1p, thus providing only
classification by means of VLAN tagging and only priority queuing for congestion
management.
Merging Automation and Office Networks is a key to virtualisation of automation
3.2. Main Objectives 44
networks explained in Chapter 1. Hence, one has to automatically anticipate ne-
cessity of IP-based communication. Though this argument is simple, it is very
strong.
Despite these challenging features, there are also several practical reasons why the
key industrial automation vendors refrain from making use of the IP-based networks and
L3 networking devices for time-critical run-time data exchange:
1. Industrial automation market has been too small to attract telecommunication
device vendors and force them to provide L3 networking devices tailored to the
industrial requirements. 1.
2. IP-based QoS architectures, such as DiffServ, have not been used in industrial
communication. However, the RToUDP technology uses DSCP packet marking to
enable L3 QoS.
3. Real-time of the L3 networking devices is not trusted for the devices being too
complicated and unpredictable in their behaviour.
The first argument is out of any research scope. The second argument has been
diminished by providing technical solution within the VAN project. The solution is
based on a combination of the end-device deterministic stack delivered by the Real-Time
over User Datagram Protocol (RToUDP) technology provided by Siemens and adoption
of the DiffServ QoS approach. The classification of packets is handled in the end devices
and is controlled by mapping of the Profinet classes to DiffServ classes. The mapping is
a result of an investigation of the DiffServ classification definitions with regard to specific
treatment of different classes and investigation of Profinet IO frame types with specific
timeliness requirements and loss criticality. Introduction to VAN solutions is introduced
in [60]. Detailed description of the overall approach can be found in [8]. Finally, it is
the opinion of author that the third argument deserves dedicated research supporting
serious discussion on the level of feasible real-time behaviour and thus promote use of
the IP-based networks for future industrial real-time communication.
3.2 Main Objectives
Hence the main objective of the dissertation is establishment of a relevant modelling
framework based on network calculus which will assist worst-case performance analysis
of temporal behaviour of IP-based industrial communication networks.
The high-level objectives was broken down to partial milestones necessary to be met
in order that the high-level objective is fulfilled:
Empirical Analysis is used for several reasons. Firstly, it is necessary to identify
dominant factors influencing the temporal performance of networking devices and
for model parameter identification. Finally, it is used for final model validation.
1Initial effort can be traced by joint effort of the Rockwell Automation and Cisco companies in 2006
3.2. Main Objectives 45
Predefined Model Structures are to be used for rapid model structure design of
networking devices. Based on the investigations of networking device architectures
introduced in Section 2.3, basic model structures are to be developed and port-to-
port service quantifications inferred.
Methods for Parameter Identification have to be defined and applied as a solu-
tion to this topic has not been tackled by researchers so far. This objectives is
well aligned with identifications of the upcoming research directions recognised by
Schmitt in [56].
Device Concatenation strategy for concatenation of networking devices in order to
perform complex networks’ analysis.
Chapters 4 through 6 summarise fulfillment of the partial milestones. Chapter 7
summarises validation of the inferred models.
Chapter 4
Empirical Analysis of Performance Bounds
Empirical analysis of performance bounds is not only important for finding QoS para-
meters of interest. Yet, empirical analysis is important for retrieval of dependencies of
QoS parameters on different combination of flows, networking device parametrisation,
and networking devices themselves. The purpose of empirical analysis can be divided
into two categories:
• Qualitative analysis represents empirical analysis in which dependencies of QoS
behaviour on combination of certain phenomena, such as device architecture, flow
aggregation, and device parametrisation, are studied without particular interest
in specific values of the QoS parameters. Hence, qualitative analysis provides
conclusions on dominant factors influencing the QoS behaviour of either a single
network device or the whole communication network, and subsequently their effects
on the temporal performance bounds.
• Quantitative analysis represents empirical analysis in which QoS parameters of
interest are studied in order to parameterise a model of a single network device or to
verify performance bounds of a single network device or the whole communication
network.
Obviously, qualitative analysis is a predecessor of further research into networking
device modelling. Quantitative analysis is used in the further steps of the work for device
model parametrisation.
In the remainder of the chapter, firstly, a test bed is introduced which was designed
and implemented within the proposed work. Secondly, a set of measurements focusing
on the investigation of the networking devices is introduced and commented.
4.1 TestQoS: Quality of Service Test Bed
The QoS parameters of interest are given by QoS metrics introduced in Section 2.4. The
most important metrics are packet latency which can be measured directly, and latency
jitter, which can be inferred from repetitive measurements. Furthermore, packet loss has
to be regarded for modelling of loss behaviour.
From Section 2.2 one can understand that sub-millisecond resolution of packet latency
is required in order to obtain sensible measurement. Such a temporal resolution can be
obtained either by specialised hardware test beds, or by real-time extension of an oper-
ating system providing precise clock signals. Generic operating systems with Ethernet
NICs are usually not capable of delivering acceptable precision.
4.1. TestQoS: Quality of Service Test Bed 47
4.1.1 Test Bed Architecture
The test bed is based on the Siemens Evaluation Board 200 (EB200). The EB200 boards
were designed by Siemens in order to provide customers with a test tool of Siemens
Enhanced Real-Time Ethernet Controller 200 (ERTEC 200). ERTEC 200 can be used
either as a 2-port Profinet IO switch or as an Profinet IO controller. EB200 is a PCI card
which is delivered with MS Windows XP drivers and Siemens Ethernet Device Driver
(EDD). EDD forms an interface to user applications. It is a service-oriented interface
providing Profinet communication services.
A decisive factor for making use of EB200 is the access to a 100 MHz free-running
counter which can be captured upon packet arrival and departure and the value is passed
to the user application via an EDD service callback.
QoS
TestBedTestQoS Application
EDD
Message
Generator
Receiver
Graphic User
Interface
Evaluation
Unit
EB200
Board A
EB200
Board B
Network
Infrastructure
Port 1
Port 2
Port 1
Port 2
Figure 4.1: TestQoS architecture
Figure 4.1 shows the TestQoS architecture. It is based on two EB200 boards which
is suitable for further extensions providing packet latency measurements over larger dis-
tances. Graphical User Interface is used for test flow parametrisation and test progress
control. Message Generator generates the test traffic which is transmitted via Port 1
and Port 2 of Board A. At the same time, the counter is captured and the timestamps
are passed to evaluation unit. The traffic is received by Receiver. Receiver captures the
packets at Port 1 and Port 2 of Board B and the corresponding counter values. Latency
for every packet pair is calculated in Evaluation Unit. The raw data are processed and
visualised online. Furthermore, they can be exported to Matlab and post-processed
oﬄine.
4.1.2 Measurement Principle
TestQoS measures directly two metrics: packet loss and packet latency. Measurement
of packet loss is straightforward; packets are tagged by sequence numbers and missing
matches reported. Packet latency measurement employs the following approach:
4.2. Measurement Results 48
Ports 1 are connected over the measured network infrastructure and Ports 2 are
connected by a patch cable forming a shortcut. Capturing counter values proceeds in
the following manner:
• i-th packet is transmitted via Port 1 of Board A, timestamp tTx1i is captured
• duplicated i-th packet is transmitted via Port 2 of Board A, timestamp tTx2i is
captured
• i-th packet is received by Port 1 of Board B, timestamp tRx1i is captured
• duplicated i-th packet is received by Port 2 of Board B, timestamp tRx2i is captured
The latency of the i-th packet is
di = t
Rx1
i − t
Tx1
i + t
off (tTx1i ), (4.1)
where toff (tTx1i ) is the offset between the values of the free-running counters at the
moment of packet transmission. The offset is not guaranteed to be constant. Therefore,
it has to be considered as a variable. As the packet latency between Ports 2 is negligible
(for 0.5 m long cable, the latency is 1 ns), it holds that
tRx2i − t
Tx2
i + t
off (tTx2i ) = 0. (4.2)
Installing (4.2) into (4.1) results in
di = lim
toff (tTx2i )→t
off (tTx1i )
{tRx1i − t
Tx1
i − t
Rx2
i + t
Tx2
i }. (4.3)
According to the EDD documentation the limit condition is satisfied if tTx1i − t
Tx2
i ≤
5ms, which is always true in the test bed. Hence, (4.3) can be simplified to
di = t
Rx1
i − t
Tx1
i − t
Rx2
i + t
Tx2
i . (4.4)
Table 4.1 lists the parameters provided online by the TestQoS and the way they
are calculated. The parameters are calculated over all sent packets N . Details on the
TestQoS architecture are introduced in [9]. The use of this tool is documented in [43]
for wireless networks and [40] for industrial switches.
4.2 Measurement Results
This chapter introduces a set of test cases in which the networking devices intended for
multi-segment IP-based real-time communication are investigated. The measurements
were performed using the TestQoS tool. Additional traffic used to load or congest the
devices was generated using UDPFlooder used for generating additional loading UDP
traffic.
4.2. Measurement Results 49
Table 4.1: Measured parameters
Parameter Name Calculation Note
Packet Latency di -
Average Latency dø =
1
N
N∑
i=0
di N . . . number of packets
Std. Dev. of Latency sd =
1
N
N∑
i=0
(dø − di)
2 N . . . number of packets
Jitter j = max
N
{di} −min
N
{di} N . . . number of packets
Packet Loss l = Nreceived
Nsent
· 100% N . . . number of packets
Both switches and routers are regarded as the main representatives of the future
automation networks. The switch under test is the HP ProCurve 1800-8G switch. The
router under test is the Cisco 28111. These devices represent well the affordable level of
devices available on the market satisfying the basic QoS requirements.
General remarks to the measurements follow:
• TestQoS application can generate test traffic of at most 150 kb · s−1. This traffic is
not meant to congest the devices’ bottlenecks but to test observed QoS parameters,
particularly, packet latency.
• Occasional packet losses of less than 0.1 % can be caused by dropouts at the
TestQoS receiver due to implementation problems. This may occur especially in
case of extremely bursty traffic.
• Measurement of the maximum latency, which is particularly important for later
worst-case analysis using network calculus, can be questionable. Therefore, espe-
cially in scenarios where the dispersion of latency is high, standard error is evaluated
and the number of test packets was adapted in order that the result confidentiality
is reasonable.
4.2.1 Switch-Related Measurements
The main architectural components of switches influencing temporal behaviour are the
switch fabric (SF) and the outgoing interface (OI) implementations. Therefore, test cases
are targeted to these parts of the switch. More specifically, it is important for the later
modelling if the SF is blocking or non-blocking and what is the OI architecture in terms
of scheduling policy and if any bimodal behaviour appears in case of the OI congestion.
Additionally, aspects of concatenation are investigated.
1Originally, Cisco 871 and 1812 routers were also investigated. However initial measurements showed
their insufficient QoS support in temporal domain. Hence, these devices are not introduced.
4.2. Measurement Results 50
Switch-Fabric Loading
Test Name: SW.FL - Switch Fabric Loading
Purpose: This test measures the observed flow’s packet latency while the SF is traversed by
additional flows. The purpose is to investigate the temporal influence of the passing additional
traffic to maximum packet latency of the observed flow.
Topology:
U D P  F l o o d e r  # 1
U D P  F l o o d e r  # 2
T e s t Q o S
U D P  D u m m y  # 1
U D P  D u m m y  # 2
S w i t c h
U n d e r  T e s t
Network Devices Switch HP ProCurve 1800-8G
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.3.1, Port:1000, VLAN:0, DSCP:0
TestQoS. Receiver IP:192.168.3.2, Port:1000,
UDPFlooder #1 IP:192.168.3.3, Port:1000, VLAN:0, DSCP:0
UDPFlooder. #2 IP:192.168.3.4, Port:1000, VLAN:0, DSCP:0
UDPDummy #1, #2 IP:192.168.3.5, Port:2000, IP:192.168.3.6, Port:2000
Switch Under Test Flow Control: None, Line Speed: 100 Mbps (TestQoS), 1Gbps otherwise,
Full Duplex
Variable Parameters Switch Fabric Load (UDP Flooders’ traffic)
Observed Parameters Packet Latency, Packet Drop (TestQoS traffic)
Test Duration 10000 Packets, ca 5 minutes
Test Results in Table A.1
It can be observed in Table A.1 that the packet latency is not dependent on the SF
load, i.e. the SF has non-blocking nature despite the bus-based SF architecture. The
reasoning will be given in Section 6.1 and in Section 7.1. The statement is valid in the
loading range up to 500 Mb · s−1 and is expected to be valid even with higher loads.
4.2. Measurement Results 51
Outgoing Port Loading
Test Name: SW.OPC - Switch Outgoing Port Congestion
Purpose: This test observes the effect of congested outgoing interface to packet latency and drop.
The purpose is to investigate the temporal influence of the additional traffic congesting the outgoing
interface to maximum packet latency of the observed flow.
Topology:
UDP Flooder #1
UDP Flooder #2
TestQoS
UDP Dummy
Switch
Under Test
Separation
Switch
Network Devices Switch HP ProCurve 1800-8G under test
Switch HP ProCurve 1800-8G for flow separation
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.3.1, Port:1000, VLAN:0, DSCP:0
TestQoS. Receiver IP:192.168.3.2, Port:1000,
UDPFlooder #1 IP:192.168.3.3, Port:1000, VLAN:0, DSCP:0
UDPFlooder. #2 IP:192.168.3.4, Port:1000, VLAN:0, DSCP:0
UDPDummy IP:192.168.3.5, Port:2000,
Switch Under Test Flow Control: None, Line Speed: 100 Mbps Full Duplex
Variable Parameters Outgoing Port Load (UDP Flooder’s traffic)
Observed Parameters Packet Latency, Packet Drop(TestQoS traffic)
Test Duration 10000 Packets, ca 5 minutes
Test Results in Table A.2, Figure 4.2, Figure 4.3, Figure 4.4
Table A.2 summaries the measured results. It is to say that the parameters can only
serve for orientation as not under all outgoing port loads the distribution of the latencies
is unimodal, as can be seen in Figure 4.2. The figure represents a 3D representation of
16 histograms of packet latencies for different outgoing port loads. The graphs can be
divided into three zones:
Load (ca 0-80% load) in which the packet latency has unimodal distribution with low
dispersion. With higher load the dispersion increases. This is caused by the fact
that a arbitrating packet belonging to a different flow can be met with a higher
probability with increasing load. The skewness of the distribution increases with
load as well. This is caused by the fact that while some of the packets are likely
to be processed longer, there are none which would be processed faster than with
a low load. This phenomenon is valid generally in latency distributions.
Occasional congestion (ca 80 - 100%) is typical by bi-modal distribution of packet
4.2. Measurement Results 52
latencies or very flat unimodal distribution. This zone represents a state in which
the passing packet is either buffered or passed directly. Buffering represents signi-
ficantly higher latency overhead than direct transmission.
Congestion (100% and higher) is typical by unimodal distribution with higher mean
value and higher dispersion, though relative dispersion may be equal to the loaded
interface. This zone represents a state in which almost all packets pass through a
buffer.
It can be seen in Figure 4.3 that the amount of sent packets representing the real-time
flow was sufficient to reach the required confidentiality. It can be seen in Figure 4.4 that
the maximum latency of a transmitted packet depends on the OI load. This parameter
is the most important for the quantitative analysis and will serve as a parameter in the
subsequent worst-case modelling.
Finally, it was proven by compensation measurement that the most of the latency is
caused by the first switch (switch under test). The reason is that any exceeding traffic
is processed/discarded at this stage. The separation switch can consequently be loaded
at maximum with the load of the OI rate, i.e., 100 Mb · s−1 at the SF. Moreover, the
load of the OI of the test traffic is low. As temporal behaviour of a switch under such a
condition is well know, it can be easily separated as will be shown in Section 6.3.
0
50
100
150
200
0
0.5
1
1.5
2
0
2000
4000
6000
8000
Packet Latency [ms]Outgoing Port Load [Mb · s−1]
F
re
q
u
en
cy
o
f
O
cc
u
re
n
ce
[−
]
Figure 4.2: 3D histogram of the packet latency distribution
4.2. Measurement Results 53
0 2000 4000 6000 8000 10000
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
Packet [−]
M
a
x
im
a
l
L
a
te
n
cy
[m
s
]
 
 
110 Mbps Load
120 Mbps Load
130 Mbps Load
140 Mbps Load
150 Mbps Load
Figure 4.3: Progress of the maximum latency in time
0 50 100 150 200
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Outgoing Port Load [Mb · s−1]
L
a
te
n
cy
[m
s
]
 
 
Average Latency
Maximal Latency
Figure 4.4: Maximum and average packet latencies vs. outgoing port load
4.2. Measurement Results 54
Effect of Packet Length
Test Name: SW.PL - Effect of Packet Length on Maximum Latency
Purpose: This test measures the observed flow’s packet latencies with different packet lengths
depending with different loads the outgoing interface. The purpose is to investigate the temporal
confluence of two factors: (i) outgoing port congestion and (ii) packet length to maximum packet
latency of the observed flow.
Topology:
UDP Flooder #1
UDP Flooder #2
TestQoS
UDP Dummy
Switch
Under Test
Separation
Switch
Network Devices Switch HP ProCurve 1800-8G
100Mbit Hub for flow separation
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.3.1, Port:1000, VLAN:0, DSCP:0
TestQoS. Receiver IP:192.168.3.2, Port:1000,
UDPFlooder #1 IP:192.168.3.3, Port:1000, VLAN:0, DSCP:0
UDPFlooder. #2 IP:192.168.3.4, Port:1000, VLAN:0, DSCP:0
UDPDummy IP:192.168.3.5, Port:2000,
Switch Under Test Flow Control: None, Line Speed: 100 Mbps Full Duplex
Variable Parameters Outgoing Port Load (UDP Flooder’s traffic), Packet Length (TestQoS
traffic)
Observed Parameters Packet Latency, (TestQoS traffic)
Test Duration 10000 Packets, ca 5 minutes
Test Results in Table A.3, Figure 4.5
It can be observed that with oﬄoaded and loaded outgoing port, the packet latency
is proportional to packet length. Seeing the fact that the switch is store-and-forward
architecture, such a behaviour was expected. In this case, the confluence of the factors is
additive. With congested outgoing port the latency is no longer proportional to packet
length. This is probably caused by that more dominant phenomena apply when the
port is congested. Hence, in case of congestion the confluence of the factors is no longer
additive.
4.2. Measurement Results 55
0 20 40 60 80 100 120 140 160
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Outgoing Port Load [Mb · s−1]
M
a
x
im
a
l
P
a
ck
et
L
a
te
n
cy
[m
s
]
 
 
64 Bytes
128 Bytes
256 Bytes
512 Bytes
1024 Bytes
Figure 4.5: Maximum packet latencies vs. outgoing port load and packet length
4.2. Measurement Results 56
Device Concatenation
Test Name: SW.CON - Switch Concatenation
Purpose: This test measures the observed flow’s packet latencies with different number of concat-
enated switches in path and with different loads of the infrastructure. The purpose is to investigate
the temporal confluence of two factors: (i) infrastructure load and (ii) number of concatenated
switches to maximum packet latency of the observed flow.
Topology:
U D P  F l o o d e r  # 1
U D P  F l o o d e r  # 2
T e s t Q o S
U D P  D u m m y
S w i t c h
U n d e r  T e s t
# 1
S w i t c h
U n d e r  T e s t
# 2
S w i t c h
U n d e r  T e s t
# n
Network Devices 6x Switch HP ProCurve 1800-8G
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.3.1, Port:1000, VLAN:0, DSCP:0
TestQoS. Receiver IP:192.168.3.2, Port:1000,
UDPFlooder #1 IP:192.168.3.3, Port:1000, VLAN:0, DSCP:0
UDPFlooder. #2 IP:192.168.3.4, Port:1000, VLAN:0, DSCP:0
UDPDummy IP:192.168.3.5, Port:2000,
Switch Under Test Flow Control: None, Line Speed: 100 Mbps Full Duplex
Variable Parameters Outgoing Port Load (UDP Flooder’s traffic), Number of Switches
Observed Parameters Packet Latency, Packet Drop(TestQoS traffic)
Test Duration 10000 Packets, ca 5 minutes
Test Results in Table A.4, Figure 4.6
Measurements of packet latency of the concatenated switches present interesting res-
ults begging for explanation. One can see in Table A.4 and Figure 4.6 that the latencies
of the oﬄoaded and slightly loaded topology behave as expected; by adding one switch
per measurement, the packet latency increases accordingly, and thus, one can infer a
latency contribution of every single switch to the total latency.
However, the same cannot be stated of the switches once the first switch is con-
gested by traffic crossing the outgoing interface capacity. Once this happens the first
switch employs additional mechanisms, such as buffering and packet scheduling which
consequently contributes to higher latencies. The traffic transmitted by the switch is
limited to 100 Mb · s−1. Hence, all of the subsequent switches experience only maximum
load of 100Mb·s−1. Adding a third switch to the topology results in ca double maximum
latency. On the other hand, adding fourth, fifth, and sixth switch does not have any
additional delay effect.
Due to the nature of the measurement, it is difficult to state which portion of latency
4.2. Measurement Results 57
stems from which device. Hence, all the reasoning would be speculative. As a result, the
confluence of the two factors is not even additive.
0 20 40 60 80 100 120 140 160
0
0.5
1
1.5
2
2.5
3
3.5
4
Outgoing Port Load [Mb·−1]
M
a
x
im
a
l
L
a
te
n
cy
[m
s
]
 
 
2 Switches
3 Switches
4 Switches
5 Switches
6 Switches
Figure 4.6: Maximum packet latencies vs. outgoing port load with additional switches
0 0.5 1 1.5 2
x 104
0
0.5
1
1.5
2
2.5
3
3.5
Packet [−]
P
a
ck
et
L
a
te
n
cy
[m
s
]
 
 
2 Switches
4 Switches
Figure 4.7: Packet latencies passing through 2 and 4 switches
4.2. Measurement Results 58
4.2.2 Router-Related Measurements
Routers are more complex networking devices both from architectural and configuration
point of view. Nevertheless, there are again two main components: SF and OI. Most of
the QoS mechanisms at the OI can be configured explicitly, architecture of switching is
less transparent. Important feature to be investigated is the approach to resolution of
the forwarding implementation, i.e., if it is process-based or cache-based.
Effect of Packet Length and Forwarding Mechanism
Test Name: RTR.PLCEF - Effect of Packet Length and Oﬄoaded Forwarding on Maximum
Latency
Purpose: This test measures maximum packet latencies of the observed flow with different packet
lengths and forwarding mechanism in oﬄoaded state. The purpose is to investigate temporal influ-
ence of packet length and forwarding mechanism to packet latency.
Topology:
T e s t Q o S
R o u t e r
U n d e r  T e s t
Network Devices Router Cisco 2811 with 1 HWIC-2FE
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.1.1, Port:1000, VLAN:0, DSCP:0
TestQoS. Receiver IP:192.168.2.1, Port:1000,
Router Under Test Scheduling at outgoing port: FIFO (implicit), Line Speed: 100 Mbps Full
Duplex,
Variable Parameters Forwarding Mechanism: Process Forwarding vs. Cisco Express Forward-
ing (Router), Packet Length (TestQoS traffic)
Observed Parameters Packet Latency, (TestQoS traffic)
Test Duration 5000 Packets, ca 3 minutes
Test Results in Table A.5, Figure 4.8
Figure 4.8 shows that the maximum packet latency increases linearly with packet
length. This fact is identical with the same measurements on a switch, again proving the
store-and-forward nature of the device.
The effect of the forwarding mechanism is self-explanatory. Process forwarding causes
higher packet latency and jitter which can be also verified in Table A.5. With additional
load not documented in this test, it was found that the forwarding capacity dropped
to ca 50 Mb · s−1. CEF had far better characteristics and was used throughout further
measurements as the only possible option.
4.2. Measurement Results 59
Figure 4.8 shows 99.5% percentile of the both distributions in order that the meas-
urements are visually comparable as progress of the process forwarding latencies had
numerous peaks of ca 30 ms.
0 200 400 600 800 1000 1200 1400
100
200
300
400
500
600
700
800
900
Packet Length [Bytes]
L
a
te
n
cy
P
er
ce
n
ti
le
(p
=
9
9
.5
%
)
[µ
s
]
 
 
Cisco Express Forwarding
Process Forwarding
Figure 4.8: Maximum packet latency percentile (99.5 %) vs. packet length and forward-
ing mechanism
4.2. Measurement Results 60
Switch-Fabric Loading
Test Name: RTR.FL - Effect of Switch Fabric Loading on the Maximum Latency
Purpose: This test measures maximum packet latencies of the observed flow with additional
traffic traversing the SF. The purpose is to investigate temporal influence of different SF scheduling
mechanisms (PQ, WFQ, and CBWFQ) with additional SF load to packet latency.
Topology:
U D P  F l o o d e r
T e s t Q o S
U D P  D u m m y
R o u t e r
U n d e r  T e s t
Network Devices Router Cisco 2811 with 1 HWIC-2FE
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.1.1, Port:1000, VLAN:0, DSCP:0
TestQoS. Receiver IP:192.168.2.1, Port:1000,
UDPFlooder IP:192.168.3.1, Port:1000, VLAN:0, DSCP:0
UDPDummy IP:192.168.4.1, Port:2000,
Router Under Test Line Speed: 100 Mbps Full Duplex, Forwarding: CEF
Variable Parameters Scheduling Mechanism of OI (PQ, WFQ, CBWFQ), Switch Fabric Load
(UDP Flooder)
Observed Parameters Packet Latency, (TestQoS traffic)
Test Duration 10000 Packets, ca 5 minutes
Test Results in Table A.6, Table A.7, Table A.8, Figure 4.9
From Figure 4.9, it is obvious that the configuration of scheduling mechanism influ-
ences also the forwarding capacity of the SF. Firstly, the forwarding capacity of each
mechanism is different reaching from 180 Mb · s−1 to 370 Mb · s−1. 2 Secondly the
latency caused by scheduling mechanism is different; WFQ clearly outperforms the rest
of the mechanisms. Ford claims that Cisco implementation of WFQ has been successful
enough to become their routers’ implicit scheduling mechanism in [28]. The reason why
the latency increases according to expectations and drops at 240Mb·s−1 load is probably
given by the triggering additional mechanism favouring low-volume flows, such as the
one generated by the TestQoS.
2The calculation of the capacity is shown in Section 7.2.
4.2. Measurement Results 61
0 50 100 150 200 250 300
0
1
2
3
4
5
6
7
8
9
Switch Fabric Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
PQ
WFQ
CBWFQ
Figure 4.9: Maximum packet latency vs. switch fabric load
4.2. Measurement Results 62
Outgoing Port Loading
Test Name: RTR.OPC - Effect of Outgoing Port Congestion on Maximum Latency
Purpose: This test measures maximum packet latencies of the observed flow with additional traffic
loading the outgoing interface. The purpose is to investigate temporal influence of the outgoing port
load to packet latency under different scheduling mechanisms (FIFO, PQ, WFQ).
Topology:
T e s t Q o S
S w i t c h
R o u t e r
U n d e r  T e s t
U D P  F l o o d e r  # 1  
U D P  F l o o d e r  # 2  
U D P  D u m m y  
Network Devices Router Cisco 2811 with 1 HWIC-2FE
Switch HP ProCurve 1800-8G
Parametrisation
Device Name Parameters
TestQoS Sender IP:192.168.1.1, Port:1000, VLAN:0, DSCP:0xB8
TestQoS. Receiver IP:192.168.2.1, Port:1000,
UDPFlooder #1 IP:192.168.3.1, Port:1000, VLAN:0, DSCP:0
UDPFlooder #2 IP:192.168.4.1, Port:1000, VLAN:0, DSCP:0
UDPDummy IP:192.168.2.2, Port:2000, 2001
Router Under Test Line Speed: 100 Mbps Full Duplex, Forwarding: CEF
Variable Parameters Scheduling Algorithm FIFO, PQ, WFQ (Router), Outgoing Port Load
(UDP Flooder)
Observed Parameters Packet Latency (TestQoS traffic)
Test Duration 10000 Packets, ca 5 minutes
Test Results in Table A.9, Table A.10, Table A.11, Figure 4.4
Figure 4.10 shows three curves belonging to different scheduling mechanisms. Ana-
lysis of the behaviour is the following.
FIFO scheduler is capable of processing of the traffic until the OI load is less than
100 Mb · s−1. After reaching this threshold, packets start to get dropped. Those which
the scheduler manages to process are transmitted with the latency of ca 35 ms. The
losses are shown in Table A.9.
PQ scheduler processes the observed packets with absolute priority as the DSCP
is 0xB8 (Expedited Forwarding). Hence, even after the additional traffic crosses the
100 Mb · s−1 threshold, the latency increases due to scheduling overhead but persists
afterwards. The latency growth starting at 150 Mb · s−1 is most probably caused by
limited capacity of the traffic classifier. The saturation at 170 Mb · s−1 is caused by
4.2. Measurement Results 63
dropping certain amount of packets which serves as a negative feedback and thus stabilises
the packet latency.
WFQ scheduler performs in a similar way and due to the fact that the TestQoS traffic
has low rate, the behaviour is identical with PQ, which can be proven; even without
the EF priority, the flow would yield a great amount of the bandwidth. However, no
saturation was measured when the additional flow’s rate reached 200 Mb · s−1. One
can account this to efficient implementation of the scheduling algorithm which is able to
process higher data rates.
Nevertheless, seeing the fact that the general approach of this work is grey box
modelling, the reasoning of the behaviour is far not as important as representability of
behaviour in a device analytical model introduced in Chapter 7.
0 50 100 150 200
0
5
10
15
20
25
30
35
40
Outgoing Port Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
FIFO
PQ
WFQ
Figure 4.10: Maximum packet latency vs. outgoing port load and scheduling algorithm
Chapter 5
Network Calculus Extensions
This chapter presents the network calculus extensions necessary to model the observed
behaviour.
5.1 Rate-Variable-Latency Service Curve
In cases shown in the SW.OPC and RTR.OPC, it is difficult to model the observed
latency dependency with the means available in the network calculus framework. Test
cases show behaviour in which the packet latency increases step-wise once the load of
the service element reaches its limit despite the fact that the observed flow should re-
ceive absolute priority. In such a case one would appreciate using a service curve well
representing such a behaviour.
Such a behaviour corresponds to a service curve with bimodal latency parameter
shown in Figure 5.1. It shows that the T parameters changes from the value T1 to
T2 once the r parameter of the arrival curve α(t) is greater than rT . Thus, the rate-
latency service curve becomes rate-variable-latency (RVL) service curve and has a form
βR,T1,T2,rT (α(t), t). Obviously, RVL service curve is no longer a function with constant
parameters, but is dependent on the parameters of the incoming flow R(t), similarly to
multiplexors.
Definition 5.1 (RVL Service Curve Function). Let us have a rate-latency service curve
βR,T (t) = R[t − T ]
+ with constant parameters of rate R and latency T , and an arrival
curve α(t) = γb,r(t) = rt+ b. If the latency T of the service curve is dependent on the
parameter r of the arrival curve α(t) such that T = {T1, r < rT ; T2, r ≥ rT }, where the
parameter rT is a trigger rate, then the service becomes a Rate-Variable-Latency (RVL)
T0 r r
T1
T2
T (r)
Figure 5.1: Service curve parameter T vs. arrival curve parameter r
5.1. Rate-Variable-Latency Service Curve 65
T1 T20 t
β(t)
ωω
r < rT r ≥ rT
R = tan(ω)
Figure 5.2: Rate-variable-latency service curve
service curve and is defined as
βR,T1,T2,rT (α(t), t) = R[t− T1 − (T2 − T1)1{α(1)−α(0)≥rT }]
+. (5.1)
The step function is used vT (t) = 1{t≥T} defined in [13, 129] as
vT (t) = 1{t≥T} =
{
1 : t ≥ T
0 : t < T.
Furthermore, if the arrival curve is affine curve, it holds that
r = α(1) − α(0). (5.2)
In some cases the following form of the RVL function is more suitable:
βR,T,Tcon,rT (α(t), t) = R[t− T1 − Tcon1{α(1)−α(0)≥rT }]
+, (5.3)
where Tcon = T2 − T1.
Several advantages of the rate-latency service curve properties are taken for the mod-
elling. Therefore, it is necessary to investigate if the required properties of service curve
persist with the RVL service curve. The most important properties are the bounds in-
troduced in Theorem 2.5 and Theorem 2.6 and the concatenation properties introduced
in Theorem 2.7.
Theorem 5.1 (RVL Service Curve Bounds). If a RVL service curve
βR,T1,T2,rT (α(t), t) is passed through by a flow R(t) upper-bounded by a service curve
α(t) = γr,b than the resulting flow R
∗(t) is upper-bounded by a service curve
α∗(t) = rt+ b+ r(T1 + (T2 − T1)1{α(1)−α(0)≥rT }), (5.4)
and the virtual delay is
d(α(t), t) = T1 + (T2 − T1)1{α(1)−α(0)≥rT } +
b
R
. (5.5)
Proof. βR,T1,T2,rT (α(t), t) has piecewise constant parameters {R,T1} for r < rT and
{R,T2} for r ≥ rT . Therefore, from [13], it holds that
r < rT : α
∗(t) = rt+ b+ rT1,
r ≥ rT : α
∗(t) = rt+ b+ rT2. (5.6)
5.1. Rate-Variable-Latency Service Curve 66
α(t) α∗(t) α∗∗(t)β1(t) β2(α
∗(t), t)
β(α(t), t)
Figure 5.3: Concatenation of service curves
Putting the previous expressions together results in expression:
α∗(t) = (rt+ b+ rT1)1{r<rT } + (rt+ b+ rT2)1{r≥rT },
α∗(t) = rt+ b+ r(T11{r<rT } + T21{r≥rT }). (5.7)
As T11{r<rT } = T1(1− 1{r≥rT )}), it holds that
α∗(t) = rt+ b+ r(T1 − T11{r≥rT } + T21{r≥rT }). (5.8)
By adding (5.2) to (5.8), one obtains the (5.4).
Theorem 5.2 (RVL Service Curve Concatenation). Let us have a concatenation of
two service elements bounded by service curves β1(t) = βR1,T1(t), and β2(α(t), t) =
βR2,T2,1,T2,2,rT (t). If the flow R(t) passing through the system is upper-bounded by an
arrival curve α(t) = γr,b then the service curve of the concatenated system is
β(t) = (R1 ∧R2)[t− T1 − T2,1 − (T2,2 + T2,1)1{r≥rT }]
+. (5.9)
Proof. As the RVL service curve has piecewise constant parameters, one can apply con-
volution piece-wise. However, as the RVL service curve at the second place of the con-
catenated system, the threshold is given by the r parameter of the flow R∗(t) as can be
seen in Figure 5.3.
r < rT : β((α
∗(t), t) = (R1 ∧R2)[t− T1 − T2,1]
+,
r ≥ rT : β((α
∗(t), t) = (R1 ∧R2)[t− T1 − T2,2]
+, (5.10)
where α∗(t) = α(t)⊘ β1(t). From [13, 30],
α∗(t) = γr∗,b∗ = γr,b+rT1.
Hence, r = r∗ and it holds that
r < rT : β((α(t), t) = (R1 ∧R2)[t− T1 − T2,1]
+,
r ≥ rT : β((α(t), t) = (R1 ∧R2)[t− T1 − T2,2]
+, (5.11)
which is equivalent to (5.9).
Hence, the necessary properties of the RVL service curve that are needed for the
modelling have been proven. Consequently, one can use the operations of concatenation
and evaluate the performance bounds.
5.2. Extended Results on Systems with Losses 67
5.2 Extended Results on Systems with Losses
As introduced in Theorem 2.9, (2.6) expresses the data loss rate based on traffic and
service element’s characteristics. The initial proposition for loss-rate analysis was exten-
ded. Firstly, an extended theorem derived from Theorem 2.9 with a minor extension is
proposed. Consequently, a theorem showing the loss behaviour based on the flow and
loss-node parameters to infer practicable results is introduced.
Theorem 5.3. Let us consider a system with a storage capacity X, offering to a flow
with an arrival curve α(t) a service curve of β(t). Then the bound on loss rate lˆ(t) is
lˆ(t) =
[
1− inf
0<s≤t
β(s) +X
α(s)
]+
. (5.12)
Proof. It can be shown that without limiting lˆ(t) to positive values, such parameters of
α(t) and β(t) exist, which drive lˆ(t) to negative values. Negative values of lˆ(t) do not
make sense.
Theorem 5.4. Consider a system with a storage capacity X, offering to a flow with an
arrival curve α(t) = γr,b a service curve of β(t) = λR. The bound on loss rate lˆ(t) is
rX −Rb < 0 : lˆ =
[
1−
X
b
]+
, (5.13)
rX −Rb = 0 : lˆ =
[
1−
R
r
]+
, (5.14)
rX −Rb > 0 : lˆ(t) =
[
1−
Rt+X
rt+ b
]+
, lim
t→∞
[
lˆ(t) = 1−
R
r
]+
.
Proof. By inserting β(t) = Rt and α(t) = rt+ b into (5.12), we obtain
lˆ(t) =
[
1− inf
0≤s≤t
Rs+X
rs+ b
]+
=
[
1− inf
0≤s≤t
Θ(s)
]+
.
Essentially, the loss is dependent on the linear rational expression Θ(s). However, proper
manipulation with Θ(s) is required in order that the function can be better analysed.
Θ(s) =
Rs+X
rs+ b
= m+
k
s+ l
,
where k = Xr−Rb
r2
, l = b
r
, and m = R
r
. The parameter k represents curvature of
the rational function, −l represents the x−coordinate of the vertical asymptote, and m
represents the y−coordinate of the horizontal asymptote. Generally, the shape of Θ(s)
falls into one of three categories depending on if k is positive, negative, or zero. Each
case has to be handled separately:
• k < 0⇔ Xr−Rb < 0, and Θ(s) is increasing in interval 0 ≤ s ≤ t, and inf
0≤s≤t
Θ(s) =
Θ(0) = X
b
, which can be seen in Figure 5.4. Hence, lˆ(t) = lˆ = [1− X
b
]+.
5.2. Extended Results on Systems with Losses 68
• k = 0 ⇔ Xr − Rb = 0, and Θ(s) is reduced to a constant function Θ(s) = R
r
.
Consequently, inf
0≤s≤t
Θ(s) = R
r
. Hence, lˆ(t) = lˆ = [1− R
r
]+.
• k > 0⇔ Xr−Rb > 0, and Θ(s) is decreasing in interval 0 ≤ s ≤ t, and inf
0≤s≤t
Θ(s) =
Θ(t) = Rt+X
rt+b , which can be seen in Figure 5.5. Hence, lˆ(t) = [1 −
Rt+X
rt+b ]
+, and
lim
t→∞
l(t) =
[
1ˆ− R
r
]+
.
m
−l ts0
Θ(t)
Figure 5.4: Θ(t) for k < 0
Discussion
Every packet loss can be caused by undersized buffer and/or insufficient rate of the
service element. It will be explained in the following, how this fits with the inferred
mathematical results.
The control expression in Theorem 5.4 can be rewritten as X < R
r
b. This expression
evaluates buffer’s robustness towards the flow burstiness taking into account the ratio
of the service rate and the flow rate. Moreover, loss can be caused also by insufficient
service rate. It applies that if:
• X < R
r
b, buffer may be insufficient. Moreover, if R ≤ r, loss is caused by insufficient
rate and if b < X, loss is caused by insufficient buffer. Combination of both is
possible. It is notable that all three cases are regarded by the resulting equation
(5.13);
5.2. Extended Results on Systems with Losses 69
m
−l ts0
Θ(t)
Figure 5.5: Θ(t) for k > 0
• X = R
r
b, buffer is marginally sufficient. Consequently burstiness is fully covered
by the buffer and the loss can only be caused by insufficient rate if R < r as can
be seen in (5.14).
• X > R
r
b, buffer is sufficient. Again, the loss can only be caused by insufficient rate.
If R ≤ r than m ≥ 1, and lˆ(t) = 0 for all t. If R < r, the loss bound starts from
0 until it reaches 1 − R
r
. The reason is that the oversized buffer can compensate
for the insufficient service rate only for a certain period of time and starts failing
when the buffer becomes full.
The situation becomes less transparent, yet more usable for further application, when
the rate function λR is substituted by rate-latency function βR,T .
Theorem 5.5. Consider a system with a storage capacity X, offering to a flow with an
arrival curve α(t) = γr,b a service curve of β(t) = βR,T . Then the bound on loss rate lˆ(t)
is
r(X −RT )−Rb < 0 : lˆ(t) =
[
1−
X
rt+ b
]+
1{0<t≤T} +
[
1−
X
rT + b
]+
1{t>T},
r(X −RT )−Rb = 0 : lˆ(t) =
[
1−
X
rt+ b
]+
1{0<t≤T} +
[
1−
R
r
]+
1{t>T},
r(X −RT )−Rb > 0 : lˆ(t) =
[
1−
X
rt+ b
]+
1{0<t≤T} +
[
1−
R(t− T ) +X
rt+ b
]+
1{t>T}.
5.2. Extended Results on Systems with Losses 70
Proof. As in the proof of Theorem 5.4, Θ(s) must be expressed. However, the loss with
βR,T must be investigated in two disjunctive intervals:
As βR,T (s) = 0 for 0 < s ≤ T ,
Θ(s) =
X
rs+ b
.
One can assume that X is positive without loss of generality. Then Θ(s) is decreasing
on the given interval and it applies that for inf
0<s≤t
Θ(s) = Θ(t). Hence for 0 < t ≤ T it
holds
lˆ(t) = [1−Θ(t)]+ =
[
1−
X
rt+ b
]+
.
As βR,T (s) = R(s− T ) for s > T ,
Θ(s) =
R(s− T ) +X
rs+ b
.
Θ(s) becomes a rational linear function whose shape depends on constellation of para-
meters. It is beneficial to manipulate Θ(s) into form
Θ(s) = m+
k
s+ l
yielding coefficients
k =
r(X −RT )−Rb
r2
, l =
b
r
, m =
R
r
.
The shape of Θ(s) falls into one of the three categories depending on if k is positive,
negative, or zero. Each case has to be handled separately:
• k < 0 ⇔ r(X − RT ) − Rb < 0, and Θ(s) is increasing in interval T < s ≤ t, and
inf
T<s≤t
Θ(s) = Θ(T ) = X
rT+b . Hence, lˆ(t) = lˆ = [1−
X
rT+b ]
+.
• k = 0⇔ r(X−RT )−Rb = 0, and Θ(s) is reduced to a constant function Θ(s) = R
r
.
Consequently, inf
0≤s≤t
Θ(s) = R
r
. Hence, lˆ(t) = lˆ = [1− R
r
]+.
• k > 0 ⇔ r(X − RT ) − Rb > 0, and Θ(s) is decreasing in interval T < s ≤
t, and inf
T<s≤t
Θ(s) = Θ(t) = R(t−T )+X
rt+b . Hence, lˆ(t) = [1 −
R(t−T )+X
rt+b ]
+, and
lim
t→∞
[
lˆ(t) = 1− R
r
]+
.
Discussion
One can observe increasing loss-rate bound on interval (0 < t < T ). The reason is that
despite the incoming traffic the node is in the delay period and serves no traffic.
If t > T the form of Θ(s) is similar to the one in Theorem 5.4. The control ex-
pression changes from X ⋚ R
r
b to X ⋚ R
r
b + RT . The difference is RT . This buffer
extension requirement obviously compensates the traffic accumulated during the delay
period 0 < t ≤ T . Otherwise, the interpretation of the three cases is similar to the one
in Theorem 5.4.
5.2. Extended Results on Systems with Losses 71
5.2.1 Confrontation of Loss-Rate Analysis with Backlog Bounds
Loss-rate bounds reveal the maximum loss-rate caused either by insufficient rate or in-
sufficient buffer. Based on the results, buffer can be sized to conform to the traffic
requirements by inverse application of the loss-rate analysis. However, there is a simpler
way to size buffer. Theorem 2.4 reveals the backlog bound, i.e., the bounded amount of
the unprocessed bits in the service element from which the required buffer size is obvious.
A question arises why loss-rate analysis should be used at all when the backlog bound
provides the required results. The answer is that while the backlog bound can only be
used for buffer sizing, loss-rate analysis provides calculation of losses in case of insufficient
buffer and/or processing rate. This is not possible with the backlog bound analysis, as
Theorem 2.4 assumes that r < R and infinite buffer size. Otherwise, it provides no
results.
However, the loss-rate results tempt the author to compare the buffer requirements
imposed by both methods under the same conditions.
Rate Function and Affine Function
The former case compares the situation with α(t) = γr,b as an arrival curve and β(t) = λR.
It is automatically assumed that r < R. Application of Theorem 2.4 yields in the buffer
requirement X = b. Loss-rate analysis provides a result only after giving the r and R
parameters. It is obvious that the worst-case situation for buffer sizing appears when
r = R. If at the same time X = b, rX −Rb = 0 and the loss bound from Theorem 5.4 is
lˆ =
[
1−
R
r
]+
= [1− 1]+ = 0.
If we now keep the buffer X = b and decrease r so that r < R, the condition changes to
rX −Rb < 0 and the loss bound is
lˆ =
[
1−
X
b
]+
= [1− 1]+ = 0.
Hence, the results of the both methods correlate.
Rate-Latency Function and Affine Function
The latter case compares the situation with α(t) = γr,b as an arrival curve and β(t) =
βR,T . It is automatically assumed that r < R. Application of Theorem 2.4 yields in the
buffer requirement X = b + rT as the biggest amount of unprocessed bits is at t = T
in the rate-latency node. Given that 0 < t < T and assuming X = b+ rT the loss-rate
bound is
lˆ(t) =
[
1−
X
rt+ b
]+
=
[
rt+ b−X
rt+ b
]+
=
[
rt+ b− rT − b
rt+ b
]+
=
[
r(t− T )
rt+ b
]+
= 0.
Now, let us shorten the buffer by a positive number of bits c+ so that X = b+rT−c+.
5.2. Extended Results on Systems with Losses 72
Then the loss-rate bound is
lˆ(t) =
[
r(t− T ) + c+
rt+ b
]+
.
The greatest usage of buffer is experienced at t = T as before that moment no traffic
leaves the node. So, if the shortened buffer would cover for the traffic at t < T , it would
not at t = T , where the loss would be
lˆ(T ) =
[
c+
rT + b
]+
> 0.
In other words, there is no c+ > 0 which would provide that lˆ(T ) = 0.
If for the same case t > T the control expression (X −RT )r−Rb has to be analysed.
However, it gives no exemplary interpretation like with the previous case. Hence it
is easier to analyse all three cases separately. Again, conditions r < R, t > T , and
X = b+ rT are assumed.
For (X −RT )r −Rb < 0 the loss-rate bound from Theorem 5.5 is
lˆ =
[
1−
X
rT + b
]+
=
[
rT + b−X
rT + b
]+
=
[
rT + b− rT − b
rT + b
]+
= 0.
For (X −RT )r −Rb = 0 the loss-rate bound is
lˆ =
[
1−
R
r
]+
= 0.
For (X −RT )r −Rb > 0 the loss-rate bound is
lˆ(t) =
[
1−
R(t− T ) +X
rt+ b
]+
=
[
1−
R(t− T ) + rT + b
rt+ b
]+
=
=
[
rt+ b−R(t− T )− rT − b
rt+ b
]+
=
[
r(t− T )−R(t− T )
rt+ b
]+
=
=
[
(r −R)(t− T )
rt+ b
]+
= 0.
The effect of shortening the buffer to X = rT + b − c+ has the same effect as with
the former case. The proof is left to the kind reader.
Summary
Confrontation of backlog bound results and loss-rate analysis has proven that for the
given two cases the both analyses impose the same buffer requirements. It was shown
that the buffer requirement explicitly given by the backlog bound analysis fit the loss-rate
analysis if additional parameters are regarded, such as relation of r to R. As a result,
it is recommended to size buffer using backlog bound theorem as the loss rate analysis
requires tedious effort to come to the same conclusions without additional value. On
the other hand, if the main objective is investigation of lossy systems, loss-rate analysis
should be the choice.
Chapter 6
Networking Device Modelling
A prerequisite to network performance analysis is designing networking device models.
As the dissertation is focused on industrial automation employing IP-based real-time
communication, the modelled devices are switches and routers, representing L2 and L3
networking devices, respectively. The objective of this work is to provide such a modelling
framework which is able to describe networking devices without loss of general applicab-
ility, i.e., taking into account dominant implementation peculiarities contributing to the
port-to-port temporal performance.
A resulting networking device model must provide a set of service curves for all
combinations of incoming and outgoing interfaces so that a service offered to each flow
traversing the device is available. Initial contribution to this topic is presented in [10].
6.1 Definition of Model Structure
Figure 6.1 is to remind the general hardware architecture of a networking device. Let us
now consider contribution of the single hardware components to latency
• Incoming interface (II) performs packet reception and its transmission to switch
fabric. Depending on the interface type, it can contain buffers, and forwarding
units. In some implementations, they can even recognize packet priorities and
provide the faster forwarding to the according traffic1. Hence, potentially they
represent a service element. On the other hand line cards can easily be dimensioned
to the interface capacity and thus are usually able to operate in line rate and thus
represent no contribution to latency.
• Switch fabric (SF) performs switching of packets from the incoming interface to
the outgoing interface. It can either take care only of the physical switching or
it can be the switching decision-maker. The influence to the QoS depends on the
type of the SF and the schedulers involved. SF is a potential bottleneck due to
congestions, unless the speedup of a SF is equal to the number of interfaces.
• Outgoing interface (OI) receives packets from the SF and transmits them to the
physical interface. Congestion risk is the most likely here. The interface cannot be
over-provisioned as all interfaces are supposed to have equal rates. For that reason,
OIs are the first instance to implement QoS congestion management.
1This applies usually with under-dimensioned switch fabrics.
6.1. Definition of Model Structure 74
IncomingInterface
(Line Card)
Switch Plane
Incoming Interface
(Line Card)
Incoming Interface
(Line Card)
Outgoing Interface
(Line Card)
Outgoing Interface
(Line Card)
Outgoing Interface
(Line Card)
Control Plane
Routing Table
CPU Memory
Figure 6.1: General router architecture
In spite of the fact that the hardware architecture is composed of three stages, the
networking device model was decided to be two-stage, i.e., is represented by a concaten-
ation of two service elements. The former service curve represents incoming interfaces
together with the SF. The latter service curve represents the OI.
The whole system could be represented by a single service curve, as is done, e.g., in
Chapter 2 in [13]. This is possible only with networking devices in which the influence
of SF can be neglected and more complex networks are considered. However, empirical
analysis of the considered devices breaks this assumption. As a result, segregation to
several service elements is inevitable.
Separation of OIs is evident; the interface has its outgoing queues and schedulers
which are dedicated to congestion management. Contrary to this, incoming interfaces
cannot be isolated in such a straightforward way for the following reasons:
• It is sometimes difficult to recognize and pointless at the same time to argue, if a
buffer is located at the incoming interface or a SF. The same applies for forwarding.
The forwarding unit can either be centralised or distributed. However, the physical
location does not account for latency.
• There is no technical way to empirically identify parameters of an incoming in-
terface and the SF separately. The 2-stage model provides the according number
of points where aggregation and de-aggregation of traffic can take place. See Fig-
ure 6.2.
The applied network calculus framework is per-flow based. With respect to network-
ing device modelling, port-to-port service curve represents a service offered to a flow
when traversing the networking device. The port-to-port service curve is inferred in
such a way that the service curves for the SF and the OI are obtained at first and then
concatenated.
6.1. Definition of Model Structure 75
replacemen
Switch Fabric (SF)
βSF (t)
Outgoing Interfaces (OI)
βOIi(t)
F 01,1, F
0
1,2 F
1
1,1, F
1
2,1 F
2
1,1, F
2
2,1
F 02,2, F
0
2,1 F
1
2,2, F
1
1,2 F
2
2,2, F
2
1,2
Figure 6.2: Port-to-port service offered to flows
Naming Conventions
F li,j,k denotes a flow entering the device via interface i to be forwarded via interface j and
belonging to the priority class k, if applicable. The upper index l denotes the internal
hop. Entering flow is upper-indexed as 0. β
Fi,j,k
se denotes a service curve of the service
element se offered to the flow Fi,j,k. The same indexing applies also to the service curve
parameters. So, if β
Fi,j,k
se = βR,T , then R = R
Fi,j,k
se , and T = T
Fi,j,k
se .
After considering the influence of the buffering strategies and putting them to context
of particular implementations, subsequent subsection deal with modelling of SF and OI.
Finally, the adopted strategy of the port-to-port service curve is presented.
6.1.1 Networking Device Buffering Strategy
Buffering strategies were introduced in Section 2.3. As every device can adopt a different
buffering strategy it is advisable to consider briefly its effect on the QoS behaviour.
Classification introduced in the consulted resources do not fit optimally the objectives
of this work. Consequently, a suitable classification is introduced:
Inport Queuing (IQ) must be used in case that the SF does not provide a capacity
equal to the sum of capacities of the incoming interfaces. This applies to devices with
high rates and/or high number of interfaces. Moreover, IQ are used with bus-based SFs,
as there is no storing capacity in the SF, contrary to shared-memory SFs. HOL blocking
problem limits the use of IQ. Hence, VOQ is adopted to diminish the problem. IQ may be
beneficial if congestion management mechanisms are used to schedule forwarding based
on priorities.
Switch-Fabric Queuing (SFQ) represents either shared-memory queuing or cross-
point queuing depending on the SF implementation. Contrary to IQ and VOQ, the
number of congestions may be limited and under special traffic patterns, eliminated at
all.
Outport Queuing (OQ) is a congestion management measure in case of OI congestion
phenomenon. OQ is obligatory except for special VOQ forwarding and mechanisms based
on matching.
Most of the networking devices adopt a combination of the queuing strategies based
6.1. Definition of Model Structure 76
on their internals. However, enumeration of the possible buffering constellations would
be exhausting and in some cases very speculative.
Consequently, for QoS-enabled networking devices, OQ is obligatory. IQ and SFQ
are a part of the SF and the forwarding process. Knowledge of the buffer placement is
advantageous as it allows for choosing a proper SF model (Figure 6.3 or Figure 6.3).
6.1.2 Switch Fabric Model Structure
There are two aspects important for inferring a representative SF model: blocking prop-
erty and scheduling algorithm.
Switch Fabric Blocking
Blocking has a close connection to the SF type and buffer location. Non-blocking means
that any inport/outport pair of ports of the switch pairs can be connected if neither
of the ports is occupied [15, 179]. In context of the model, this means that the QoS
parameters of a communicating input/output pair remain unchanged when another pair
of ports is active.
Non-blocking is an intrinsic property of crossbar-based devices with distributed for-
warding units. Yet, such a behaviour can be experienced at devices with overprovisioned
SF and forwarding.
Blocking is a decisive factor for the device behaviour. Hence, let us establish a
coarse classification of SF models based on this feature. If the SF is blocking the SF-B
constellation in Figure 6.3 is recommended to be used. In case of a non-blocking SF,
constellation SF-N in Figure 6.4 can be employed.
II 1
II 2
F 01,1
F 01,2
F 02,1
F 02,2
F 11,1, F
1
2,1
F 11,2, F
1
2,2
Buffer 1
Buffer 2
Buffer 3Mux
Demux
OI 1
OI 2
X1
X2
X3
β
F1,1,F1,2
agg
β
F2,1,F2,2
agg
β
∑
i,j Fi,j
fwd
Figure 6.3: Blocking switch fabric (SF-B)
Figure 6.3 shows a general SF-B architecture counting all possible blocks. Buffer 1
and 2 represent IQ. Mux represents the scheduler polling the packets from the buffers
and passing them to the forwarding unit. Every input interface is given a portion of the
service curve of the Mux depending on the scheduling mechanism studied in further. The
rate of the scheduler is given by the shared-medium capacity. Buffer 3 represents SFQ
and the forwarding functionality with a rate-latency service type. The stored packets are
served in FCFS order and forwarded to the OI. The rate of the forwarding unit is given
by its forwarding capacity and the latency is the implicit forwarding overhead. Even in
6.1. Definition of Model Structure 77
this case, the buffers are optional. In crossbar SFs, crosspoint-queuing can also be used,
which is not regarded in the model, as VOQ is used more often [15, 191].
II 1
II 2
F 01,1
F 01,2
F 02,1
F 02,2
F 11,1
F 12,1
F 11,2
F 12,2
Buffer 1
Buffer 2
Buffer 3
Buffer 4
Mux 1
Mux 2
Demux 1
Demux 2
OI 1
OI 2
X1
X2
X3
X4
β
F1,1,F2,1
agg
β
F1,2,F2,2
agg
β
F1,1,F1,2
fwd
β
F2,1,F2,2
fwd
Figure 6.4: Non-blocking switch fabric (SF-N)
Figure 6.4 shows a general SF-N architecture counting all possible blocks. Buffer 1
and 2 represent IQ and the rate-latency nature of the output is given by the forwarding
unit as in the SF-B case. Buffer 3 an 4 represent the OQ buffers needed in case of
congestion of the outgoing link. The rate of the buffer output is given by the bus rate
connected to the OI of the networking device.
There are two major differences between SF-B and SF-N in the form of the service
curves: (i) the aggregation level of flows is different, and (ii) the order of service elements
is inverse.
Switch-Fabric Scheduling Mechanism
If the SF architecture is not known, discovering the SF architecture at this level of
granularity is straightforward. Using the test cases SW.FL or RTR.FL, it can be decided
if the implementation is SF-based with multi-path architecture, or over-provisioned bus-
based architecture on one hand, or bus-based imposing forwarding limitations. This
coarse classification is sufficient.
However, more heuristics is necessary to retrieve the scheduling mechanism of the
SF. The discovery is based on the test scenario introduced in the SW.FL and RTR.FL.
FH and F1 represent flows generated by the TestQoS. The former is high-priority tagged
and the latter is best-effort untagged. FL and F2 represent additional flows generated
by the UDP Flooder, both untagged. All flows use the same packet lengths. RSF
represents the processing rate of the SF. Table 6.1 summarises qualitative effect of the
flows’ rates 2 on the latency and the loss rate of the observed flows (r1 and rh) under
different configurations. Hence, by performing different tests and observing the latency
and loss-rate behaviour, the scheduling mechanism can be discovered.
Reasoning of this speculative approach is the following. FCFS mechanism serves the
two flows as a single flow. Seeing the fact that under all test configurations the sum of
the flows’ rates exceed the processing rate, loss must occur. Also the latency increases
with higher additional load.
2ri corresponds to Fi.
6.1. Definition of Model Structure 78
Table 6.1: Effect of scheduling mechanism to the observed flows (Fh and F1)
No. Test FCFS PQ RR WFQ
Configuration Latency Loss Latency Loss Latency Loss Latency Loss
1 rH ≪ rL
rH + rL > R
SF Yes Yes No No No No No No
rH < RSF
2 rH ∼ rL
rH + rL > R
SF Yes Yes No No Yes Yes Limited No
RSF/2 < rH < R
SF
3 rH ∼ rL
rH + rL > R
SF Yes Yes No No No No No No
rH < R
SF/2
4 r1 ≪ r2
r1 + r2 > R
SF Yes Yes Yes Yes No No No No
r1 < RSF
5 r1 ∼ r2
r1 + r2 > R
SF Yes Yes Yes Yes Yes Yes Yes Yes
RSF/2 < r1 < R
SF
6 r1 ∼ r2
r1 + r2 > R
SF Yes Yes Yes Yes No No No No
r1 < R
SF/2
PQ mechanism gives absolute priority to the high-priority flow. Hence, neither loss,
nor increased latency can occur. If both flows are untagged, the PQ cannot tell a differ-
ence between flows and delivers the same performance as the FCFS.
RR is intrinsically fair and practises no priority treatment. Hence, increased latency
an loss occur once the flow rates are comparable and congest the SF. Losses occur at those
flows, which exceed their portion of the bandwidth (second and fifth row). Moreover, RR
favours low-voluminous traffic, thus, neither increased latency, nor packet loss happens,
if the flow’s rate is significantly lower than the other’s (first and fourth row).
WFQ is an extended mechanism of RR and delivers priority treatments. This mani-
fests itself by preventing loss in case of congestion even though the high-priority flow
demands higher portion of bandwidth (second and fifth row).
Finally, the proposed method for the SF architecture discovery is given by the decision
tree introduced in Figure 6.5.
Summary
Finally the service curve offered to the flow F li,j,k(t) by the SF under the PBOO assump-
tion can be expressed as
β
Fi,j,k
SF (t) = (β
Fi,j,k
agg ⊗ β
Fi,j,k
fwd )(t), (6.1)
where β
Fi,j,k
agg represents the aggregation of flows being forwarded to the common out-
put link and the β
Fi,j,k
fwd represents the forwarding resolution for the given flows. Table 6.2
shows the forms of the service curves and the flows which are aggregated in the given ser-
6.1. Definition of Model Structure 79
S t a r t
S w i t c h  F a b r i c
L o a d i n g  w / o  P r i o
L o a d i n g
E f f e c t ?
N o  S h a r e d  R e s o u r c e s :
a )  C r o s s b a r  w i t h  O Q ,  
b )  O v e r p r o v i s i o n i n g  w i t h  O Q
S h a r e d  R e s o u r c e s :
a )  S h a r e d  M e m . / M e d .  
b )  P r o c e s s  S w i t c h i n g
S w i t c h  F a b r i c
L o a d i n g  w .  P r i o
( e q u a l  r a t e s )
L o a d i n g
E f f e c t ?
F I F O ,
R R
P Q ,  
W F Q
L o a d i n g
E f f e c t ?
S w i t c h  F a b r i c
L o a d i n g  w / o .  P r i o
( e q u a l  r a t e s )
F I F OR R L o a d i n gE f f e c t ?
S w i t c h  F a b r i c
L o a d i n g  w / o .  P r i o
( e q u a l  r a t e s )
P QW F Q
+
+
++-
-
-
-
Figure 6.5: Process of SF identification
vice elements. One can observe that SF-B constellation is subject to aggregation of more
flows. The table introduces only the types of the curves. Specific parameters are based
on the scheduling mechanisms employed. The most common scheduling mechanisms
are FCFS, and RR. Advanced devices adopt PQ or WFQ/WRR based on, e.g., L2/L3
packet tags. The forms of the service curves under different scheduling mechanisms were
introduced in Section 2.5.
Table 6.2: SF service curves
SF Type β
Fi,j
agg β
Fi,j
fwd
SF-B λR, R = R
∑
j Fi,j
agg βR,T , R = R
∑
i,j Fi,j
fwd , T = T
∑
i,j Fi,j
fwd
SF-N λR, R = R
∑
i Fi,j
agg βR,T , R = R
∑
j Fi,j
fwd , T = T
∑
j Fi,j
fwd
6.1.3 Outgoing Interface Model Structure
Architecture of an OI is more straightforward. The mechanisms used at OIs are better
documented and are more straightforward to identify. As a result, less heuristics than
with SFs, is applicable. On the other hand, especially with routers, OIs allow for immense
amount of parametrisation potentially influencing the QoS behaviour. Finally, congested
interfaces are considered throughout this work. This fact opens a topic of modelling
service provided that the element is subject to loss.
6.1. Definition of Model Structure 80
Scheduling Mechanism Consideration
Identification of the OI architecture usually does not require any heuristics. Low-end
switches apply no priority mechanisms. Switches recognizing the IEEE 802.1p and IEEE
802.1q standards usually implement PQ, RR or WRR mechanism at OIs. PQ is used
with several types of HP ProCurve switches. Per-packet scheduling is often used with RR
and WRR for lower complexity as is true with Cisco Catalyst switches [30] rather than
advanced adaptive scheduling mechanisms, such as WFQ. On the other hand, routers
implement very sophisticated algorithms and allow user to configure a plenty of para-
meters.
OI can be modelled with a rate-latency service curve βOI = βR,T (t). Its actual form
depends on the scheduler type, number of queues, bandwidth allocations (if applicable),
and queue lengths. A comprehensive summary of service curves respecting different types
of schedulers is given by Theorem 2.1, Theorem 2.2, and Theorem 2.3.
The results of the test cases SF.OPC and RTR.OPC show that additional latency
occurs when the OI becomes congested. Hence, dependability of latency to OI load
is smooth, yet it experiences a step-wise change at the load equal to the OI physical
capacity. The state switching is obviously controlled by the total data rate and the
trigger is equal to the link capacity. The theoretical background of this behaviour is
given in Section 5.1 and presented in [11].
Data-Loss Evaluation
While SF can be prevented from data loss by over-provisioning, there is no such possible
treatment at OIs where service rate is limited by the interface capacity. It can be also
experienced that when applying more complicated QoS rules at the OI without having a
hardware support, the service rate can drop bellow the physical capacity of the interface.
Consequently, data loss is an intrinsic part of the OI and should be regarded in
the networking device model. [13, 251] gives a basic theorem evaluating the data loss.
However, within this work, use of the closed-form results introduced in Section 5.2 is
made. Using these results, it is possible to evaluate losses with PQ, FIFO and WFQ
schedulers of the OI.
However, the original idea of retrieving tighter delay bounds of the competing flows
subject to finite buffer lengths remained unresolved. Let us follow a simple consideration.
Let us have a scheduler of a rate type λR(t) serving a high-priority flow FH(t) bounded
by αH(t) and a low-priority flow FL(t) bounded by αL(t). The service offered to FL(t)
in case of PQ is βFL(t) = [λR(t)−αH (t)]
+ provided that the buffer XH of the flow FH(t)
is infinite. Now, if XH is finite, a potential loss caused by insufficient buffer could yield
in more left-over portion for FL(t) and thus better service β
FL(t).
Nevertheless, there is apparently no suitable explicit service curve definition estab-
lishing a service curve for a ”loss node”. The problem is that this application does not
require loss upper bound, yet it requires minimal loss bound. Ayyorgun and Cruz es-
tablish a service curve with loss in [3] and extend the results in [2]. The results are not
6.1. Definition of Model Structure 81
F 11 , α
1
1(t)
F 12 , α
1
2(t)
F 1n , α
1
n(t)
r1
r2
rn
L1(t)
L2(t)
Ln(t)
Cla./Demux
βcla(t)
Clipper
Clipper
Clipper
∑
Sch./Delay
Buffer 1
Buffer 2
Buffer n
X1
X2
Xn
βF1sch(t)
βF2sch(t)
βFnsch(t)
βsch(t)
Figure 6.6: Model of an OI
directly applicable to OI modelling. However, an interesting result is that in simple form
the packet latency passing through a lossy delay node is bounded by the same value as
if the delay node was lossless (Lemma 1 in [3]). This result justifies using original arrival
curves regardless of the potential losses when a flow passes a lossy node.
Summary
Finally, the block diagram of an exemplary OI is shown in Figure 6.6. The interface
has three queues representing either flows or classes of flows depending on the chosen
congestion management mechanism.
The packets arriving from the SF are demultiplexed if congestion management is
employed, i.e., other than FCFS queuing is used. Demultiplexing is based on a decision
of a classifier which analyses the packet content and forwards the packet to the respective
queue. The processing capacity of the classifier, represented by the service curve βcla(t),
can be limited depending on the complexity of the service policy scheme.
Clippers represent data loss. The concept is based on the idea introduced in [13, 252]
that the flow Li(t) is a part of the flow Fi(t) which is discarded in order that the residual
flow can traverse the rest of the system with finite buffer of Xi in a lossless manner.
The packets advance to queues and are served by the scheduler based on the chosen
mechanism. Hence, the total service capacity of λR(t) is split to λRFi (t) accordingly.
Finally, the forwarding service element represents the overhead of the OI caused by
the processing delays. The latency is bimodal depending on the OI congestion condition
given by the trigger rT as defined in Section 5.1. T1 represents the contention-free latency
and T2 represents the congested latency. It is to note that the trigger is based on the
rates of the original flows. As a result a service offered to a flow Fi(t) by the OI is
6.2. Port-to-Port Service Curve of a Networking Device 82
βFiOI
(
t,
∑
ri
)
=
(
βFicla ⊗ β
Fi
sch
)(
t,
∑
ri
)
, (6.2)
where the service curves have parameters expressed by Table 6.3.
Table 6.3: OI service curves
βcla λR, R = Rcla
βsch βR,T , R = R
Fi
agg, T = Tsch + Tsch,con · 1
{
n∑
i=1
ri≥rT
}
The loss rate bound of the flow Fi(t) bounded by αi = γri,bi experienced at the OIs
is bounded by
lˆFi(t) =

1− inf
0<s≤t
(
βFicla ⊗ β
Fi
sch
)
(s,
∑
ri) +Xi
αi(s)


+
. (6.3)
6.2 Port-to-Port Service Curve of a Networking Device
So far, modelling of SF and OI has been accomplished in Section 6.1. Due to traffic
aggregation and deaggregation, it is worth considering which approach to take to obtain
the port-to-port service curve of a modelled device. Figure 6.7 and Figure 6.8 depict
all possible flow aggregates which can pass through networking devices with SF-B and
SF-N, respectively. The flow of interest is denoted as FI,J,K, i.e., the source interface is
I and destination interface is J , and the priority is K. The priority is considered only
at the OI. For the sake of simplicity, the figures adopt a simplified notation, in which
Fi,j,k ≡
∑
i,i6=I
∑
j,j 6=J
∑
k,k 6=K
Fi,j,k, i.e., the aggregate of flows.
FI,J,K
FI,J,k
FI,j,k
FI,J,K
FI,J,k
FI,j,kFi,J,K Fi,J,k Fi,j,k
Fi,J,K
Fi,J,k
Fi,j,k
P IB SMB/FWD OBD1 D2M1 M2
Figure 6.7: Flow aggregation and deaggregation with SF-B
It can be observed that with the SF-B architecture, more flow aggregates have to
be considered than with the SF-N architecture. The aggregation is more questionable
at the OI. The flows forwarded to the same OI aggregate at the SF output. However,
the flows deaggregate to different outgoing queues provided that the flows belong to
different service classes and aggregate again at the scheduler. A question arises how
to treat such flows. The most straightforward approach is to separate such interfering
flows into separate ones, i.e, when the flow deaggregates at D2, consider it as a leaving
6.2. Port-to-Port Service Curve of a Networking Device 83
S F B
FI,J,K
FI,J,k
FI,j,k
FI,J,K
FI,J,k
FI,j,k Fi,J,K Fi,J,k
Fi,J,K
Fi,J,k
P IB/FWD OBD1 D2M1 M2
Figure 6.8: Flow aggregation and deaggregation with SF-N
traffic and treat the flow aggregating at M2 as a fresh arriving traffic. This approach is
recommended also at [52].
The objective is to find the most suitable method to obtain port-to-port service curve.
The considered approaches are introduced in Section 2.5. TFA analysis provides too
loose bounds and insufficient resulting information. For instance it does not provide the
required port-to-port service curve. PBOO-SFA (further only PBOO) does much better.
However, due to multiple aggregations, the approach explodes in complexity without
providing added value in some cases. PMOO cannot be used as a universal approach
as under some aggregation/deaggregation scenarios the method cannot be used at all.
Therefore, two main approaches will be applied. Firstly, PBOO with ad hoc application
of PMOO where possible. Secondly, Extended PBOO will be applied.
In the rest of the section, it is supposed that there is no loss at the SF. The only
data loss is allowed at the OIs. Moreover, the scheduling algorithm at the SF is arbitrary
(FCFS). Hence the QoS measures are applied at the OI of the device.
6.2.1 PBOO/PMOO Approach
The PBOO mechanism is based on the assumption that the port-to-port service curve
offered to a flow is a convolution of the service curves offered to the traversing flow.
The service curves withal represent the service elements which the flow traverses. This
principle is introduced in Section 2.5 and is given by (6.4).
β
FI,J,K
p2p =
⊗
h∈H
β
FI,J,K
h , (6.4)
where FI,J,K is the observed flow, h represents the order of the service element (hop)
traversed by the flow, and H represents the set of hops on the flow’s path. If applicable,
the principle can be extended by the PMOO principle also introduced in Section 2.5. The
main idea is that if a flow traverses a number of adjacent service curves together with
another aggregated flow, the aggregation of the aggregating flow can be considered after
obtaining the end-to-end service curve rather than at each service curve. The principle
is expressed by (6.5).
β
FI,J,K
h,h+1 = [(βh ⊗ βh+1)− αi,j,k]
+ ≈ [βh − αh+1]
+ ⊗ [βh+1 − (αi,j,k ⊘ βi,j,k)]
+. (6.5)
6.2. Port-to-Port Service Curve of a Networking Device 84
In the following, port-to-port service curves offered to the observed flow will be in-
ferred for four most convenient combinations of SF and OI types using PBOO, and
PMOO where applicable, i.e., SF-B or SF-N and FCFS or PQ at OI.
SF-B and OI-FCFS Architecture
It is possible to use the PMOO with the SF-B architecture as all flows pass the same part
of the subpath, as can be seen in Figure 6.7. However, it no longer applies for the OI
as the flows forwarded to different OI deaggregate. With FCFS interface, there is only
one output buffer at the observed OI and no traffic is queued elsewhere. The resulting
service curve is given by (6.6).
β
FI,J,K
p2p =

(βagg ⊗ βfwd)− ∑
{i,j,k}∈S1
α0i,j,k


+
⊗

βsch − ∑
{i,j,k}∈S2
α1i,j,k


+
(6.6)
The equation represents concatenation of two FCFS multiplexors based on The-
orem 2.1. The sets of flows aggregated at the SF and OI are given by the following:
S1 = {[i, j, k];∪ − [I, J,K]},
S2 = {[i, j, k]; j = J} − [I, J,K].
Finally, it is necessary to infer the bounds of the aggregated flow passing through the
observed OI to complete the knowledge of all elements in (6.6). α1i,j,k can be obtained
by the following equation:
[i, j, k] ∈ S2 : α
1
i,j,k = α
0
i,j,k ⊘
(
βi,j,kagg ⊗ β
i,j,k
fwd
)
. (6.7)
In other words, for each additional flow Fi,j,k accompanying FI,J,K to the same output
interface, service offered to such a flow by the SF must be inferred. This point makes
derivation of the β
FI,J,K
p2p rather awkward and the bounds less tight.
SF-B and OI-PQ Architecture
Here, the consideration differs for the OI, which is PQ. From Theorem 2.2, it holds that
the observed flow competes only with flows of higher priority, and potentially has to wait
for completion of transmission of any lower-priority flow. These facts are regarded in
(6.8)
6.2. Port-to-Port Service Curve of a Networking Device 85
β
FI,J,K
p2p = β
FI,J,K
SF ⊗ β
FI,J,K
OI ,
β
FI,J,K
SF =

(βagg ⊗ βfwd)− ∑
{i,j,k}∈S1
α0i,j,k


+
,
β
FI,J,K
OI =

βcla − ∑
{i,j,k}∈S2
α1i,j,k


+
⊗

βsch − ∑
{i,j,k}∈S3
α2i,j,k −
{
lLmax
}
K<Kmax


+
.(6.8)
The service curve is based on FCFS and PQ multiplexors with service curves in
Theorem 2.1 and Theorem 2.2. The sets are changed accordingly as follows:
S1 = {[i, j, k];∪ − [I, J,K]},
S2 = {[i, j, k]; j = J} − [I, J,K],
S3 = {[i, j, k]; j = J, k ≤ K} − [I, J,K].
Finally, the same operation must be provided for the additional flows’ bounds α1i,j,k ∈
S2 and α
2
i,j,k ∈ S3.
SF-N and OI-FCFS Architecture
Due to different aggregation scheme (see Figure 6.8) at SF-N architecture, the PMOO
concept cannot be used in a general way as previously. However, due to simple archi-
tecture of OI in case of FCFS, and thus no deaggregation, the PMOO can be used as is
shown in (6.9)
β
FI,J,K
p2p =

βfwd − ∑
{i,j,k}∈S1
α0i,j,k


+
⊗

(βagg ⊗ βsch)− ∑
{i,j,k}∈S2
α1i,j,k


+
. (6.9)
The sets of flows aggregated at the SF and OI are given by the following sets. Due
to distributed nature of forwarding, the amount of maximum aggregated flows is lower
than with SF-B architecture. Thus,
S1 = {[i, j, k]; i = I} − [I, J,K],
S2 = {[i, j, k]; j = J} − [I, J,K].
Finally, it is necessary to infer the bounds of the aggregated flow passing through the
observed OI to complete the knowledge of all elements in (6.9). α1i,j,k can be obtained
by the following equation:
[i, j, k] ∈ S2 : α
1
i,j,k = α
0
i,j,k ⊘ β
i,j,k
fwd . (6.10)
6.2. Port-to-Port Service Curve of a Networking Device 86
It is to say that βi,j,kfwd in (6.10) belongs to different II than to the one of FI,J,K unless
i = I. Hence, the aggregated flows at such II are different than those observed at the II
of FI,J,K.
SF-N and OI-PQ Architecture
At this point, PMOO cannot be used at all due to the aggregation scheme. Hence, β
FI,J,K
p2p
must be expressed as a concatenation of four service curves according to (6.11).
β
FI,J,K
p2p = β
FI,J,K
SF ⊗ β
FI,J,K
OI ,
β
FI,J,K
SF =

βfwd − ∑
{i,j,k}∈S1
α0i,j,k


+
⊗

βagg − ∑
{i,j,k}∈S2
α1i,j,k


+
,
β
FI,J,K
OI =

βcla − ∑
{i,j,k}∈S3
α2i,j,k


+
⊗

βsch − ∑
{i,j,k}∈S4
α3i,j,k −
{
lLmax
}
K<Kmax


+
.(6.11)
The sets of flow aggregated at the SF and OI are given by the following conditions.
S1 = {[i, j, k]; i = I} − [I, J,K],
S2 = {[i, j, k]; j = J} − [I, J,K],
S3 = {[i, j, k]; j = J} − [I, J,K],
S4 = {[i, j, k]; j = J, k ≤ K} − [I, J,K].
Finally, it is necessary to infer the bounds of the aggregated flow passing through the
observed OI to complete the knowledge of all elements in (6.11). αhi,j,k can be obtained
by the following equation:
[i, j, k] ∈ S4 : α
3
i,j,k = α
2
i,j,k ⊘ β
i,j,k
cla , (6.12)
[i, j, k] ∈ S3 : α
2
i,j,k = α
1
i,j,k ⊘ β
i,j,k
agg , (6.13)
[i, j, k] ∈ S2 : α
1
i,j,k = α
0
i,j,k ⊘ β
i,j,k
fwd . (6.14)
PBOO/PMOO Approach Conclusion
Acquiring the final parameters R
FI,J,K
p2p and T
FI,J,K
p2p needs some additional manipulations
according to Theorem 2.1. However, an example will be shown in Section 7.2.
6.2.2 Extended PBOO Approach
Extended PBOO (EPBOO) approach is not based on the concatenation theorem. Yet,
it is based on stronger assumptions delivering tighter bounds. One of the assumptions is
that all the points of aggregation, i.e., multiplexors are based on arbitrary scheduling. As
6.2. Port-to-Port Service Curve of a Networking Device 87
the general assumption of this work is that the SF apply arbitrary scheduling, EPBOO
can directly be applied provided that the OI schedule the outgoing traffic in an arbitrary
manner. Nevertheless, it will be shown that an OI with PQ can also make use of this
approach after a small consideration.
The naming convention used in [52] and introduced in (2.5) had to be changed to
conform to the rest of the work according to the following:
• FI,J,K refers to the observed flow. The index triplet refers to the II, OI, and priority,
respectively
• Fi,j,k refers to any additional flow, such that {i, j, k} 6= {I, J,K}. The flow is
bounded by parameters ri,j,k, and bi,j,k.
• J{I,J,K} is the set of service elements along the flow FI,J,K . The service elements
are indexed by h in ascending order.
• Kh is the set of additional flows Fi,j,k at the service element h.
• K{I,J,K} is the set of additional flows Fi,j,k which use at least one service element
along the path of the flow FI,J,K .
• JFI,J,K ,Fi,j,k is a set of service elements which are used both by the flow FI,J,K and
the flow Fi,j,k.
• hmin represents the service element at which the regarding flow enters the system.
Finally, the rate-latency parameters of the port-to-port service curve β
FI,J,K
p2p offered
to the flow FI,J,K are
R
FI,J,K
p2p = min
h∈J{I,J,K}

Rh − ∑
{i,j,k}∈Kh
ri,j,k

 , (6.15)
T
FI,J,K
p2p =
∑
h∈J{I,J,K}
Th +
∑
{i,j,k}∈K{I,J,K}
bhmini,j,k
minh∈J{I,J,K},{i,j,k} [Rh]
. (6.16)
SF-B and OI-FCFS Architecture
From Figure 6.3, Figure 6.7, and Table 6.2, it is obvious that there are four service
elements. Hence, combining the architectural knowledge with (6.15) and (6.16) results
6.2. Port-to-Port Service Curve of a Networking Device 88
in the following parameters:
R
FI,J,K
p2p = min

min[Ragg, Rfwd]−∑
SR1
ri,j,k, Rcla −
∑
SR2
ri,j,k, (6.17)
Rsch −
∑
SR3
ri,j,k −
∑
SR′
3
r′i,j,k

 ,
T
FI,J,K
p2p = Tfwd + Tsch +
∑
SB1
bi,j,k
min[Ragg, Rfwd]
+ (6.18)
+
∑
SB2
bi,j,k
min[Ragg, Rfwd, Rcla, Rsch]
+
∑
SB3
bi,j,k
min[Ragg, Rfwd, Rcla]
+
∑
SB3
b′i,j,k
Rsch
.
(6.17) represents the port-to-port rate offered to FI,J,K . Note, that the rate offered
by the OI is decreased by the flows passing through the same buffer as FI,J,K . Yet, it
is also decreased by the flows competing for the bandwidth with different traffic classes
(priorities) denoted by r′i,j,k.
(6.18) represents the port-to-port latency offered to FI,J,K . It is composed from the
T parameters and the burst components. The burst components are divided according
to the common part of the path they take with the FI,J,K . Again, the burst caused by
aggregation of competing traffic classes are given by b′ flows at the scheduler.
The conditions which hold for different sets introduced in (6.17) and (6.18) are evident
from Figure 6.7. Thus,
SR1 = {[i, j, k];∪ − [I, J,K]},
SR2 = {[i, j, k]; j = J} − [I, J,K],
SR3 = {[i, j, k]; j = J, k = K} − [I, J,K],
SR′
3
= {[i, j, k]; j = J, k 6= K},
SB1 = {[i, j, k]; j 6= J} ∪ {[i, j, k]; j = J, k 6= K},
SB2 = {[i, j, k]; j = J, k = K} − [I, J,K],
SB3 = {[i, j, k]; j = J, k 6= K}.
It is to say that the resulting formulas are based on the idea that the traffic forwarded
to the OI J with priority k 6= K compete for the bandwidth for the second time, i.e.,
they aggregate for the second time. This presumption is not precise. However, it is
compliant with other priority-capable architectures and explanatory. Dropping the class-
based output queuing would result in the sets as follows and with the (6.17) and (6.18)
unchanged:
6.2. Port-to-Port Service Curve of a Networking Device 89
SR1 = {[i, j, k];∪ − [I, J,K]},
SR2 = {[i, j, k]; j = J} − [I, J,K],
SR3 = {[i, j, k]; j = J} − [I, J,K],
SR′
3
= ∅,
SB1 = {[i, j, k]; j 6= J},
SB2 = {[i, j, k]; j = J} − [I, J,K],
SB3 = ∅.
SF-B and OI-PQ Architecture
Extended PBOO is dedicated to FIFO service elements only. However, with a simple
extension it can be used also for PQ-based OI architecture. The proposition is based on
Theorem 2.1 and Theorem 2.2. The idea is that the observed flow competes with all flows
with k ≤ K in an arbitrary manner. Consequently, all flows with k > K can be neglected.
Furthermore, the observed flow has to wait for finalisation of transition of a flow which
is transmitting at the moment, as the transmission cannot be preempted. Hence, the
observed flow has to wait for l
L
max
ROI
if K < Kmax. Finally, the resulting parameters of
β
FI,J,K
p2p are
R
FI,J,K
p2p = min

min[Ragg, Rfwd]−∑
SR1
ri,j,k, Rcla −
∑
SR2
ri,j,k,
Rsch −
∑
SR3
ri,j,k −
∑
SR′
3
r′i,j,k

 , (6.19)
T
FI,J,K
p2p = Tfwd + Tsch +
{
lLmax
ROI
}
K<Kmax
+
+
∑
SB1
bi,j,k
min[Ragg, Rfwd]
+
∑
SB2
bi,j,k
min[Ragg, Rfwd, Rcla, Rsch]
+
+
∑
SB3
bi,j,k
min[Ragg, Rfwd, Rcla]
+
∑
SB3
b′i,j,k
Rsch
. (6.20)
The conditions which hold for different sets introduced in (6.19) and (6.20) are evident
from Figure 6.7. Thus,
6.2. Port-to-Port Service Curve of a Networking Device 90
SR1 = {[i, j, k];∪ − [I, J,K]},
SR2 = {[i, j, k]; j = J} − [I, J,K],
SR3 = {[i, j, k]; j = J, k = K} − [I, J,K],
SR′
3
= {[i, j, k]; j = J, k < K},
SB1 = {[i, j, k]; j 6= J} ∪ {[i, j, k]; j = J, k 6= K},
SB2 = {[i, j, k]; j = J, k = K} − [I, J,K],
SB3 = {[i, j, k]; j = J, k < K}.
SF-N and OI-FCFS Architecture
From Figure 6.4, Figure 6.8, and Table 6.2, it is obvious that there are four service
elements. Hence, combining the architectural knowledge with (6.15) and (6.16) results
in the following parameters of β
FI,J,K
p2p :
R
FI,J,K
p2p = min

Rfwd −∑
SR1
ri,j,k, Ragg −
∑
SR2
ri,j,k,
Rcla −
∑
SR3
ri,j,k, Rsch −
∑
SR4
ri,j,k −
∑
SR′
4
r′i,j,k

 , (6.21)
T
FI,J,K
p2p = TSF + TOI +
∑
SB1
bi,j,k
Rfwd
+
∑
SB2
bi,j,k
min[Rfwd, Ragg, Rcla]
+
+
∑
SB3
bi,j,k
Ragg, Rcla
+
∑
SB4
bi,j,k
min[Ragg, Rcla, Rsch]
+
+
∑
SB5
bi,j,k
min[Ragg, Rcla]
+
∑
SB5
b′i,j,k
Rsch
. (6.22)
It can be observed that the SF-N architecture is less aggregated and possess less
bottlenecks. Yet, the expression is more complex. The conditions which hold for different
sets introduced in (6.21) and (6.22) are evident from Figure 6.8. Thus,
6.2. Port-to-Port Service Curve of a Networking Device 91
SR1 = {[i, j, k]; i = I} − [I, J,K],
SR2 = {[i, j, k]; j = J} − [I, J,K],
SR3 = {[i, j, k]; j = J} − [I, J,K],
SR4 = {[i, j, k]; j = J, k = K} − [I, J,K],
SR′
4
= {[i, j, k]; j = J, k 6= K},
SB1 = {[i, j, k]; i = I, j 6= J},
SB2 = {[i, j, k]; i = I, j = J, k 6= K},
SB3 = {[i, j, k]; i 6= I, j = J, k 6= K},
SB4 = {[i, j, k]; i 6= I, j = J, k = K},
SB5 = {[i, j, k]; j = J, k 6= K}.
As with the SF-B and OI-FCFS architecture, the resulting formulas are based on
the idea that the traffic forwarded to the OI J with priority k 6= K compete for the
bandwidth for the second time, i.e., they aggregate for the second time. Dropping the
class-based output queuing would result in the sets as follows and with the (6.21) and
(6.22) unchanged:
SR1 = {[i, j, k]; i = I} − [I, J,K],
SR2 = {[i, j, k]; j = J} − [I, J,K],
SR3 = {[i, j, k]; j = J} − [I, J,K],
SR4 = {[i, j, k]; j = J} − [I, J,K],
SR′
4
= ∅,
SB1 = {[i, j, k]; i = I, j 6= J},
SB2 = ∅,
SB3 = ∅,
SB4 = {[i, j, k]; i 6= I, j = J},
SB5 = ∅.
SF-N and OI-PQ Architecture
By applying the same consideration as with the SF-B and OI-PQ architecture, the res-
ulting parameters of the service curve β
FI,J,K
p2p are
6.2. Port-to-Port Service Curve of a Networking Device 92
R
FI,J,K
p2p = min

Rfwd −∑
SR1
ri,j,k, Ragg −
∑
SR2
ri,j,k,
Rcla −
∑
SR3
ri,j,k, Rsch −
∑
SR4
ri,j,k −
∑
SR′
4
r′i,j,k

 , (6.23)
T
FI,J,K
p2p = TSF + TOI +
{
lLmax
ROI
}
K<Kmax
+
∑
SB1
bi,j,k
Rfwd
+
+
∑
SB2
bi,j,k
min[Rfwd, Ragg, Rcla]
+
∑
SB3
bi,j,k
Ragg, Rcla
+
∑
SB4
bi,j,k
min[Ragg, Rcla, Rsch]
+
+
∑
SB5
bi,j,k
min[Ragg, Rcla]
+
∑
SB5
b′i,j,k
Rsch
. (6.24)
Finally, the sets respect the arbitration rules of the PQ at the OI:
SR1 = {[i, j, k]; i = I} − [I, J,K],
SR2 = {[i, j, k]; j = J} − [I, J,K],
SR3 = {[i, j, k]; j = J} − [I, J,K],
SR4 = {[i, j, k]; j = J, k = K} − [I, J,K],
SR′
4
= {[i, j, k]; j = J, k < K},
SB1 = {[i, j, k]; i = I, j 6= J},
SB2 = {[i, j, k]; i = I, j = J, k 6= K},
SB3 = {[i, j, k]; i 6= I, j = J, k 6= K},
SB4 = {[i, j, k]; i 6= I, j = J, k = K},
SB5 = {[i, j, k]; j = J, k < K}.
Note that the burst caused by aggregation of the flows entering the device via the
same II is never paid within the model. The reason is that the aggregation takes place
at the preceding device. If there is no networking device preceding the modelled device,
it is necessary to add a dummy multiplexer λR, where R corresponds to the capacity of
the II. This is not the case with PBOO.
6.2.3 PMOO/PBOO vs. EPBOO Comparison
It was shown in this section, how to infer the port-to-port service curve of the modelled
networking device and its parameters. The basic assumption is that the service curve is
of the rate-latency type. The methods can be compared from several points of view:
6.3. Identification of the Model Parameters 93
Resulting Form. PBOO results in a form which needs further manipulations to obtain
a closed-form solution. Contrary to this, EPBOO provides the resulting rate-
latency parameters directly. Hence, EPBOO requires less computation effort.
Theoretical Bound Tightness. EPBOO accounts for burst caused by aggregation
once and only once for each aggregated flow. PBOO accounts for aggregation
at every common hop. If PMOO can be used, the number of aggregation accoun-
ted for decreases. As a result, the bound provided by the EPBOO are tighter is
tighter as shown in [52].
Algorithmisation Potential. The decision to use the advantage of PMOO needs ad-
ditional knowledge. Moreover, PBOO requires extensive amount of backward cal-
culations of arrival curves in case of cascaded aggregations. EPBOO requires only
calculations related to common path of the aggregating flows and per-hop inform-
ation. My experience with both methods leads me to the conclusion that EPBOO
is easier to implement in algorithm given the networking device architecture.
Modularity. The concatenation approach used with PBOO/PMOO approach makes
the method more modular. EPBOO requires flow’s end-to-end information. Hence,
the calculations related to device parts or a complete device cannot be reused for
other flows, contrary to the PBOO/PMOO approach.
Generality. Resources provides broad area of stronger results on PBOO/PMOO ap-
proach than on EPBOO approach, and hence making the PBOO/PMOO approach
a more universal tool.
6.3 Identification of the Model Parameters
If it happens that suitable parameters of a networking device is not available from the
device manufacturer, it is necessary to perform parameter identification in order to have
a valid model of the networking device. Nevertheless, identification of the model para-
meters is the most problematic part of the networking device modelling. The principal
problem is that any model based on network calculus operates with bounds, i.e., the
worst case situations. On the other hand, any empirical analysis is subject to random-
ness and thus no guarantee that the worst-case situation occurs during the test life-cycle
exists. For that reason, one has to treat the worst-case model to be valid at a given level
of trustworthiness. Moreover, the following complications persist:
Environmental Stability is an issue with higher loads of the networking devices and
computers. In case of increased temperature, the devices lower the networking
processor unit (NPU) power and are able to process less data which causes packet
drops.
Protocol Stability. It may happen that the Address Resolution Protocol (ARP) re-
cords are renewed within the test run or that the router cache is renewed after
6.3. Identification of the Model Parameters 94
some time. These actions cause outlier latencies which should be compensated. A
certain remedy is to take into account a latency percentile at the level 99.99% to
eliminate them. However, it is never sure that the experienced outlying latency is
not exactly the relevant one.
Knowledge of Traffic Shapes. The traffic generated as additional load can have ran-
dom burst level unless the traffic is shaped by a professional tool. This is a problem
in traditional OS-based platforms unless a real-time kernel is used.
The most limiting factor is that one can rely solely on a model structure and in-
put/output temporal characteristics of the traffic and traffic delay. Consequently, not all
parameters of the previously inferred service elements may be identified. On the other
hand, the precise distribution of contributions of services to the overall port-to-port delay
is not important; for instance, the implicit latency caused by the SF and the OI are TSF
and TOI , respectively. Hence, the sum of these parameters will appear in the port-to-
port service curve. However, it is not important, if TSF = 50µs and TOI = 100µs, or
vice-versa. It is important to know that TSF + TOI = 150µs, as each flow passes both
service elements. However, it will be shown further that in case of multiplexors, this
assumption does not apply generally.
6.3.1 Switch-Fabric Parameters
It makes sense to identify parameters of a SF if it is blocking. Hence, SF-B will be
considered in this section. For SF-N, a different method would have to be proposed.
Figure 6.9 shows two flows used for the identification. FT is the flow generated and
captured in the TestQoS application. FL is additional loading flow generated by the
UDPFlooder or any other application with controllable traffic shape.
FT FT
FL FL
P IB SMB/FWD OBD1 D2M1 M2
Switch Fabric Outgoing Interface
Figure 6.9: Flows during SF-B identification
The only common part of the path is the SF. The end-to-end service curve offered to
the flow FT under the PBOO assumption is
βFTp2p = β
FT
SF ⊗ β
FT
OI . (6.25)
It holds from (6.2) that RSF = min[Ragg, Rfwd]. However, these parameters cannot
be recognised by identification. Hence, RSF will be used. As the SF is shared by both
6.3. Identification of the Model Parameters 95
flows, the service curve offered to FT is equal to.
β
FT
SF = [βSF − αL]
+ = (RSF − rL)
[
t−
RSFTSF + bL
RSF − rL
]+
. (6.26)
Similarly, OI is used only by FT . Hence,
β
FT
OI = βOI = ROI [t− TOI ]
+ . (6.27)
Inserting (6.26) and (6.27) into (6.25) yields in
βFTp2p = min [(RSF − rL), ROI ] ·
[
t−
RSFTSF + bL
RSF − rL
− TOI
]+
. (6.28)
The bound of virtual latency dˆFT from Theorem 2.5 is
dˆFT = T
FT
p2p +
bT
RFTp2p
. (6.29)
Finally, by inserting (6.28) into (6.29) yields
dˆFT =
RSFTSF + bL
RSF − rL
+ TOI +
bT
min [(RSF − rL), ROI ]
. (6.30)
(6.29) is of importance as the port-to-port latency is the only measurable latency. As
the most important question is what the influence of flow aggregation is, the dependence
dˆFT = f(FL) is observed and used for identification.
rL
dˆ
r
T
(r
L
)
0 RSF
TSF + TOI +
bT
ROI
Figure 6.10: SF-B parameter identification
Figure 6.10 shows the dependence dˆFT = f(rL). It implies from (6.29) that setting
rL to zero results in
dˆFT (rL = 0) = TSF + TOI +
bT
ROI
.
6.3. Identification of the Model Parameters 96
The bT parameter is known for being generated in TestQoS. dˆFT (rL = 0) is the
measured delay on an oﬄoaded device. Provided that ROI is known the TSF + TOI can
be inferred directly.
Furthermore, from the shape of the dependence dˆFT = f(rL), the RSF is the asymp-
tote of the dependence. In a SF with an infinite buffer, lim
rL→RSF
dˆFT =∞. However, in a
system with a finite buffer, the delay bound saturates and remains at a level correspond-
ing to the buffer length according to the considerations in Section 5.2. The saturation is
accompanied by the corresponding loss.
Hence, provided that the SF rate limit can be reached experimentally, the parameter
can be revealed directly. However, in case of lack of hardware equipment, flows reaching
the device’s limits are difficult to generate. In such a case extrapolation method can be
employed. The following procedure is proposed to find RSF . Moreover, as a result, the
TSF + TOI can be verified or refined.
1. In a topology with a flow setup introduced in Figure 6.10, dˆFT (rL) must be meas-
ured for a number of values of rL. The more measurements the finer will be the
extrapolation.
2. Provided that TOI ≪ TSF and bT is sufficiently small, (6.30) can be expressed in
the following simpler form
dˆFT ≈
RSF (TSF + TOI) + bL + bT
RSF − rL
=
A
B + rL
. (6.31)
The first assumption is reachable as forwarding is supposed to impose higher fixed
latency than the OI in oﬄoaded state (rT ≪ rL). The imposed error ξTOI (rL) given
by the difference between (6.30) and (6.31) is
ξTOI (rL) =
RSFTOI − TOI(RSF − rL)
RSF − rL
= −TOI
rL
RSF − rL
.
Consequently lim
rL→RSF
ξTOI =∞. Nevertheless, this method is intended for cases in
which rL ≪ RSF . In other cases, different regression function has to be employed.
The second assumption causes error ξbT (rL) to the dependence if rL < RSF −ROI :
ξbT (rL) =
{
bT
RSF −ROI − rL
}
rL<RSF−ROI
.
As bT can be controlled by the TestQoS application, it can be minimised, e.g., by
introducing short packets.
3. Hyperbolic regression is involved which provides the required parameters A and B
in (6.31). Application of hyperbolic regression is described in [49].
4. Finally,
RSF = −B, TSF + TOI =
A+ bT + bL
B
. (6.32)
6.3. Identification of the Model Parameters 97
6.3.2 Outgoing Interface Parameters
Figure 6.11 shows the two flows used for the identification. FT is the flow generated
and captured in the TestQoS application. FL is additional loading flow generated by the
UDPFlooder or any other application with controllable traffic shape.
FT FT
FL
FL
P IB SMB/FWD OBD1 D2M1 M2
Switch Fabric Outgoing Interface
Figure 6.11: Flows during OI-FCFS identification
TOI parameter has been made a part of the common latency parameter within the
SF identification and needs no additional treatment. Rsch parameter can be identified
directly or is known explicitly by the physical capacity of the OI.
According to Section 5.1 where the RVL function was introduced, the scheduler at
OI has bimodal latency parameter; a fixed part Tsch and the additional part Tsch,con
caused by congestion. Seeing the fact that the change to different mode is triggered at
the OI, it is not possible to incorporate Tsch,con into SF. Hence, let Tsch be the part of
TOI and let the congestion overhead Tsch,con be part of the OI. Finally, the task of the
OI identification is to find Tsch,con.
Subject that rL is incremented during the identification by ∆, then rL = i∆, i =
1, 2, . . . , I. Consequently,
Tsch,con ≈ dˆ(ROI +∆)− dˆ(ROI −∆). (6.33)
Under the PBOO assumption, the error of the approximation ξ is
ξ = (RSFTSF + bL + bT ) ·
(
1
RSF −ROI −∆
−
1
RSF −ROI +∆
)
.
The proof is left to the kind reader.
Finally, Rcla must be identified. This steps applies only with OIs with enabled con-
gestion management using separate queuing; with FIFO, the classifier does not exist.
Hence, FL with lower priority than the one of FT must be used. Consequently, the
only shared service elements are SF and the classifier and the port-to-port service curve
parameters under the PBOO assumption are:
R
FT
p2p = min [RSF − rL, Rcla − rL, Rsch] , (6.34)
T
FT
p2p =
RSFTSF + bL
RSF − rL
+
RclaTcla + bL + rLTSF
Rcla − rL
+
lmax
Rsch
+ Tcon · 1{rT+RL≥Rsch}.
6.3. Identification of the Model Parameters 98
Application of this procedure will be given in Section 7.2. As for the identification it
proved more efficient to employ rates rL > Rsch, and because it proved that RSF > Rcla,
let RFTp2p = Rcla−rL. Moreover, for the particular case RSF and Rcla were not significantly
different by observations of test cases RTR.SF and RTR.OPC. Finally, we still assume
TOI ≪ TSF the maximum latency can be expected in form:
dˆFT ≈
RSF (TSF + TOI) + 2bL + bT + rL(TSF + TOI)
RSF − rL
+
lmax
Rsch
+ Tcon · 1{rT+RL≥Rsch}.
(6.35)
Hyperbolic regression cannot be used directly to this function of dˆFT (rL), decompos-
ition of this linear rational function had to be employed. The resulting expression used
for the identification is
dˆFT (rL) =
A
B + rL
+ C, where
A = −2bL − bT ,
B = −Rcla,
C =
lmax
Rsch
+ Tcon · 1{rT+RL≥Rsch} − (TSF + TOI). (6.36)
The approach can be summarised by the following steps:
1. Measurements of dˆFT (rL) as with the SF parameter identification. Range Rsch ≤
rL ≤ Rcla should be focused.
2. Rsch and Tsch,con must be known from the previous identification step.
3. dˆ′FT (rL) = dˆFT (rL) − C must be calculated for all rL. All parameters of C are
known from the previous points.
4. Hyperbolic regression shall be applied on dˆ′FT (rL); consequently A and B are ob-
tained.
5. Finally, Rcla = −B, and it can be verified that A ≈ −2bL − bT with a reasonable
tolerance.
6.3.3 Conclusion
It may happen that the empirical analysis provides results which do not correspond to
any typical behaviour analysable by network calculus. Tedious implementation small
prints may outperform the intrinsic mechanisms in terms of dominance of influence to
the port-to-port behaviour and thus make the worst-case analysis impossible without a
detailed implementation knowledge. Such efforts are out of scope of this work.
Chapter 7
Validation of Models of Networking Devices
This section provides validation of the derived latency models of the networking devices.
In the first validation case, parameterised model of a maximum latency of a switch is
inferred and the behaviour is validated on a topology consisting of two switches. The
second validation case is dedicated to a router. A parameterised model of maximum
latency of the router is inferred and the behaviour is validated on a topology consisting
of one switch and one router.
7.1 HP ProCurve 1800-8G Switch
The switch is a configurable Hewlett-Packard 8-port switch with full-duplex interfaces
with link capacity of 1 Gb · s−1. The switch is dedicated for small-office-home-office
(SOHO) applications. Despite its compact design the switch is extremely time efficient
as can be seen in the switch-related measurements in Section 4.2. The switch is based
on the VSC7388 SparX-G8 TMsystem on chip (SoC). The chip accommodates functional
blocks providing the run-time operations and 8051 processor for configuration and man-
agement. There are several functional blocks on the chip. 8 ports representing the I/O
interfaces, interconnect bus with arbiter representing the SF, and the CPU representing
administration and management functionality.
Seeing the fact that the capacity of the SF equals to the sum of the throughput of
the interfaces, the SF is over-provisioned and thus can be considered as non-blocking
(modelled as SF-N). This fact is also documented by the SF.FL test case. Moreover, in
the regarded topologies, the switch interfaces are operated in full-duplex mode with link
capacity of 100 Mb · s−1 and so reaching only a fraction of the maximum throughput.
The I/O blocks accommodate advanced QoS architecture both at input and output
which differentiates the QoS architecture from many mid-range routers. Due to the non-
disclosure agreement (NDA) restrictions subject to which this details were obtained, I
must refrain from publishing technical details. Although description of the architecture
by means of network calculus would be possible, the trade-off between the precision
improvement and complexity favours simple description. Hence, let us ignore deep archi-
tectural details and let us limit our a priori information to that necessary to model the
switch using SF-N/OI-PQ model architecture. The PQ assumption with the OI is valid
if rT ≪ rL and rT ≪ ROI , which is the condition under which the measurements were
made 1. The reasoning for adopting PQ is given by the fact that there is round-robin
1This assumption is applicable even in a more general sense in industrial automation where the flows
with the highest criticality have low volumes.
7.1. HP ProCurve 1800-8G Switch 100
mechanism deployed in the switch which automatically favours the low-volume flows.
7.1.1 Switch Parametrisation
Based on the aforementioned characteristics, the switch is a SF-N/OI-PQ device with
service curve parameters according to (6.11). The product documentation claims that
the device throughput is equivalent to the sum of the link capacities of the interfaces so
that the fully loaded switch can handle all traffic in wire speed. Hence, it is to assume
that RSF = Rfwd = Ragg = 8 Gb · s
−1.
The distribution of the fixed latencies given by the T parameters along the device
cannot be inferred. Hence, for the matter of simplicity the packet latency of the oﬄoaded
switch is assigned to SF, i.e., TSF = 90.5 µs, which represents the latency of the oﬄoaded
switch.
The contribution of the OI to the total packet latency in case of the congested OI is
given by Figure 4.6 and Table A.4, i.e.
Tsch,con = dˆ(rL = 110 Mb · s
−1)− dˆ(rL = 80 Mb · s
−1) = 1.858 ms− 300 µs = 1.558 ms.
(7.1)
Due to the fact that the latency in the SW.OPC test case after congestion of the OI
does not experience any further saturation in the observed range of loading, it can be
assumed that βcla =∞.
Finally, the OI link rates configured at the switch are ROI = 100 Mb · s
−1.
7.1.2 Port-To-Port Service Curve
Let us first focus on a single switch behaviour. For the model validation, a scenario
with the switch with the topology shown in Figure 7.1 was chosen. The test flow FT is
disturbed by two additional flows; FL1 used for loading of the OI passes along the same
path as FT to the same OI, and FL2 used for SF loading enters the switch by different
II traversing the SF and then continuing to a different OI.
PBOO/PMOO Port-To-Port Service Curve
According to the PBOO/PMOO principle, the port-to-port service curve of the switch
is (6.11) and the flows belong to the sets as follows:
S1 = ∅, S2 = {α
1
L1
}, S3 = {α
2
L1
}, S4 = ∅. (7.2)
Note that the flow FL2 imposes no influence due to the SF-N architecture. As βcla =∞,
S3 is not applicable. S4 is considered empty due to the condition rT ≪ rL1 and the
round-robin principle guarantees strict priority in this case. Taking this distribution of
additional flows into account and using Theorem 2.1, the port-to-port service curve is
7.1. HP ProCurve 1800-8G Switch 101
calculated according to
β
FT
p2p = β
FT
SF ⊗ β
FT
OI = R
FT
p2p[t− T
FT
p2p]
+, (7.3)
β
FT
SF = βfwd ⊗
[
βagg − α
0
L1
]+
,
βFTOI =
[
βcla − α
1
L1
]+
⊗ [βsch − lmax]
+ .
For the sake of simplicity, βfwd shall be absorbed by βagg as it does not add value,
yet cannot be investigated separately. Hence, SF can be inferred directly, subject that
α0L1 = rL1t+ bL1 :
β
FT
SF = (RSF − rL1)
[
t−
RSFTSF + bL1
RSF − rL1
]+
. (7.4)
As βcla =∞,
[
β
FT
cla − α
1
L1
]+
=∞ as well. As for the convolution operator∞ is a neutral
element, βFTOI = [βsch − lmax]
+. Subsequently, α1L1 does not have to be calculated.
At last, the service curve of the scheduler must be inferred. The service curve is of
the RVL type according to (5.1). Moreover, the FT is considered of the highest priority
and it does not compete with any other traffic. Thus,
β
FT
sch = Rsch
[
t− Tsch,con · 1{rL1+rT≥Rsch} +
lmax
Rsch
]+
. (7.5)
Finally, by inserting (7.4) and (7.5) into (7.3), the parameters of the port-to-port RVL
service curve are
R
FT
p2p = min [RSF − rL1 , Rsch] ,
T
FT
p2p =
RSFTSF + bL1
RSF − rL
+ Tsch,con · 1{rL1+rT≥Rsch} +
lmax
Rsch
. (7.6)
EPBOO Port-To-Port Service Curve
According to the EPBOO principle, the port-to-port service curve’s parameters of the
switch are (6.23) and (6.24), and the flows belong to the sets as follows:
SR1 = ∅, SR2 = {rL1}, SR3 = {rL1}, SR4 = ∅, SR′4 = ∅,
SB1 = ∅, SB2 = ∅, SB3 = ∅, SB4 = {bL1}, SB5 = ∅. (7.7)
Consequently, the parameters of the port-to-port service curve are given by
RFTp2p = min [Ragg − rL, Rsch] ,
TFTp2p = TSF + Tsch,con · 1{rL1+rT≥Rsch} +
lmax
Rsch
+
bL1
Rsch
. (7.8)
7.1. HP ProCurve 1800-8G Switch 102
7.1.3 Model Simulation
Figure 7.1 shows the test topology in which the comparison of theoretical and empir-
ical latencies is shown. Table 7.1 summarises the parameters of the flows traversing
the switch under test and Table 7.2 summarises the parameters of the switch under
test. Figure 7.2 depicts the measured latencies and reconstructed latencies using the
PBOO/PMOO approach.
Table 7.1: Flow parameters
Parameter Value Unit Note
rT 145 kb · s
−1 limited by TestQoS
rL1 0 - 190 Mb · s
−1 variable parameter
rL2 100 Mb · s
−1 fixed parameter
bT 8912 b 1024-byte long packet
bL1 12000/24000 b 2 IIs, 1500-byte long packets
bL2 12000 b 1 IIs, 1500-byte long packets
lLmax 12000 b 1500-byte long packet
Table 7.2: Switch parameters
Parameter Value Unit Note
RSF 8000 Mb · s
−1
TSF 90.5 µs
Rcla ∞ Mb · s
−1
Tcla 0 µs
Rsch 100 Mb · s
−1
Tsch 1.558 ms rT + rL1 ≥ Rsch
As the general approach to inferring the end-to-end bounds from the port-to-port
bounds has not been formalised in this work, the following ad hoc approach must be
adopted: the left switch in Figure 7.1 is the device under test, the right switch is only
used for deaggregation of the flows. The service curves for both switches are the same.
Under the PBOO concatenation the maximum latency is given by
dˆ
FT
e2e(FL1 , FL2) = h
(
αT , β
FT
SW1
(FL1 , FL2)⊗ β
FT
SW1
(FL1)
)
.
Both switches can be modelled by a service curve with parameters from (7.6). The
first switch aggregates all flows in the SF and the aggregate of the FT and FL1 advance
to the second switch through whose SF the flow deaggregate to continue to different OIs
each, i.e., the OI of the SW2 is not congested. On that account, we can use the PMOO
principle for the SF and OI of the first switch (SW1), and SF of the right switch (SW2).
Having done so, we can append also the OI of SW2. Hence,
β
FT
SW1
⊗ βFTSW2 ≈ [βSW1,SF ⊗ βSW1,OI ⊗ βSW2,SF − αL1]
+ ⊗ βFTSW2,OI . (7.9)
7.1. HP ProCurve 1800-8G Switch 103
It can be shown that the PMOO part in (7.9) yields in the same form as (7.6)
only with the SF latency constant of 2TSF . Further, the OI of SW2 does not add any
additional constraint to the final delay bound, as the OI limit is already accounted for
at the SW1 and the OI is never congested.
Hence the end-to-end delay of the packets of the flow FT is given by
dˆ
FT
e2e(FL1 , FL2) =
2RSFTSF + bL1
RSF − rL1
+ Tsch,con · 1{rL1+rT≥Rsch} +
+
lmax
Rsch
+
bT
min [Ragg − rL1 , Rsch]
. (7.10)
UDP Flooder #1
UDP Flooder #2
TestQoS
UDP Dummy #1
Switch
Under Test
Deaggragation
Switch
UDP Dummy #2
Figure 7.1: Switched topology for validation
0 50 100 150 200
0
0.5
1
1.5
2
Outgoing Port Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency
Theoretical Latency
Figure 7.2: Reconstructed latency of packets passing two concatenated switches
The bounds seem pessimistic. On the other hand this approach is valid for concat-
enation of any number of switches and the theoretical latencies are comparable with the
measured latencies introduced in the test case SW.CON.
7.2. Cisco 2811 Router with HWIC-2FE Module 104
7.2 Cisco 2811 Router with HWIC-2FE Module
Cisco 2811 is a Cisco mid-range router with QoS capability. It houses two full-duplex
ports with link capacity of 100 Mb · s−1. Using the HWIC-2FE module, the router
is extended to house 4 fully routable ports. The router is a shared-memory router.
Forwarding can be oﬄoaded to cache using CEF, so that the forwarding overhead is
diminished and the throughput increases, which can be seen in the test case RTR.PLCEF.
However, due to shared resources, the router has to be considered as a router with a
blocking SF, i.e., SF-B. From the qualitative point of view, the decision is given by
RTR.FL test case which clearly proves the blocking nature.
The OIs can be configured independently in terms of scheduling algorithms, buffer
lengths and decision rules assigning the packets to traffic classes based on different met-
rics, e.g., DSCP, II, protocol, etc. The chosen scheduling mechanisms are FCFS, PQ,
WFQ, and CBWFQ. In case of FCFS configuration, the number of queues is reduced
to one (see Figure 6.6). In case of PQ, the maximum number of queues is four. The
queue assignment must be given by configuration. In our case, only high-priority and
low-priority queues are used with 20 and 80 packet buffer, respectively. WFQ creates
as many queues as many flows are active on the interface. CBWFQ represents the best
the complete DiffServ model. It delivers PQ scheduling to EF traffic without sharing
the bandwidth with any other class. Contrary to this, all AFx traffic shares the defined
portion of bandwidth.
One could observe that the throughput of the SF differs based on the scheduling
mechanism of the OI. This should not be possible as the assumption is that the mech-
anisms of the SF and OI are independent. However, for the black-box analysis, this is
not important. Important is that the rate offered to the flows by the SF are dependent
on the scheduling mechanism and to provide appropriate identification.
7.2.1 Switch Fabric Parametrisation
Firstly, the SF’s parameters are identified for various configurations of scheduling mech-
anisms of the OI for demonstration. The values of the maximum delay measured by the
TestQoS under different loads and scheduling algorithms are introduced in Table A.6,
Table A.7, and Table A.8. FCFS scheduler was not evaluated as it had poorly testify-
ing dependence. While for the PQ and CBWFQ scheduler, it was possible to reach the
SF rate limit, it was not possible with the WFQ scheduler. Identification according to
Section 6.3 was carried out for the validation.
From the basic observation, it can be concluded that TOI ≪ TSF . Consequently,
(6.31) can be used. Using hyperbolic regression, the A and B parameters were inferred.
The parameters RSF and TOI + TSF were obtained from (6.32), where bT = 8192 b and
bL = 11200 b, which respects the packet lengths. It is to say, that the PQ and WFQ test
cases contained two more switches (not shown in the figure) in the test topology. It is
possible to compensate for this difference by subtracting 180.9 µs which is the latency
caused by two oﬄoaded switches if the packet length is 1024 B (see Table A.1). The
7.2. Cisco 2811 Router with HWIC-2FE Module 105
Table 7.3: Reconstructed SF parameters
Scheduler A B (TOI + TSF )
′ RSF TOI + TSF dˆFT (0) −
bT
ROI
[ms] [Mb · s−1] [µs] Mb · s−1 [µs] [µs]
PQ -78.098 -233.1 248.42 233.1 67.52 94.48
CBWFQ -34.354 -199.4 75.00 199.4 75.00 94.93
WFQ -129.423 -367.5 299.38 367.5 118.48 89.21
obtained results are summarised in Table 7.3.
The table shows that the limit rates are different for each scheduling mechanism in
the fifth column. While it was possible to reach the limit rate with the PQ and CBWFQ
configurations, it was not possible with the WFQ mechanism for the lack of hardware.
Nevertheless, the expected rate limit provided by the extrapolation can be valid. Let us
assume that there is an additional mechanism implemented in the router which cannot be
modelled. Even after repetitive measurements, the latency did not increase after having
crossed 250 Mb · s−1. Speculatively speaking, this can be given by the WFQ nature
favouring low-voluminous traffic even in the SF, contrary to PQ and CBWFQ configured
at the OI.
The fourth column of Table 7.3 represents the reconstructed values of TOI+TSF , and
the sixth column represent the final values after compensation for the additional switches.
The last column represents the direct value of TOI +TSF inferred from the single latency
value dˆFT (rL = 0). I consider the deviation of the reconstructed and single-point value
sufficiently small.
The reconstructed dependencies are shown in Figure 7.3 for PQ and CBWFQ, and
Figure 7.4 for WFQ.
0 50 100 150 200 250 300
0
5
10
15
20
25
30
Switch Fabric Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency
Reconstructed Latency
Rate Limit
0 50 100 150 200
0
1
2
3
4
5
6
7
8
Switch Fabric Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency
Reconstructed Latency
Rate Limit
Figure 7.3: Reconstructed SF parameters: PQ (left) and CBWFQ (right)
7.2.2 Outgoing Interface Parametrisation
The validation case is based on the PQ scheduling mechanisms at the OI. Hence, let us
focus on the PQ scheduling parametrisation.
7.2. Cisco 2811 Router with HWIC-2FE Module 106
0 100 200 300 400
0
5
10
15
20
Switch Fabric Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency
Reconstructed Latency
Rate Limit
Figure 7.4: Reconstructed SF parameters: WFQ
Table 7.4: Reconstructed OI parameters
Scheduler C A B 2bl + bT Rcla
[µs] [b] [Mb · s−1] [b] [Mb · s−1]
PQ 108 -29418 -175.86 32192 175.86
Rsch = 100 Mb · s
−1 which is a limit imposed by the physical interface itself.
Tsch,con must be inferred. From (6.33) and Table A.9, Table A.10, and Table A.11
the values steps are the following; for FCFS, the value does not make sense, as the traffic
becomes lossy. This behaviour represents reaching the rate limit as with the SF analysis.
On the other hand, the PQ and WFQ configurations, favouring the high-priority flows,
the values are important. Identification of the OI is based on the topology in test case
RTR.OPC. Hence, from Table A.5 it holds that
TRTR,sch,con = dˆ(rL = 100 Mb · s
−1)− dˆ(rL = 90 Mb · s
−1),
TRTR,sch,con = 8071.0 µs− 460.5 µs = 7610.5 µs.
Finally, using the approach introduced in Section 6.3, the rate of the classifier was
identified as Tcla = 175.86 Mb · s
−1. Details are summarised in Table 7.4. The measured
and reconstructed maximum latencies are shown in Figure 7.5. The important point is
the vertical asymptote corresponding to Tcla.
7.2.3 Port-to-Port Service Curves
Figure 7.6 shows the topology chosen for the model validation. The topology consists of
one switch from the previous validation case and the router configured with PQ schedul-
ing mechanism at the OI and enabled CEF mechanism. The high-priority test flow FT
is disturbed by two additional flows; FL1 passing along the same path to the same OI
but with no priority indication and FL2 passing over the SF and then continuing to a
7.2. Cisco 2811 Router with HWIC-2FE Module 107
0 50 100 150 200
0
5
10
15
20
25
Outgoing Port Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency
Reconstructed Latency
Rate Limit
Figure 7.5: Reconstructed OI parameters: PQ
different OI. Hence, it is verified that the superposition of two loads can be handled by
the model.
PBOO/PMOO Port-to-Port Service Curve
According to the PBOO/PMOO principle, the port-to-port service curve is given by
(6.8) inferred for the SF-B/OI-PQ device type. The flows belong to the sets as follows:
S1 = {α
0
L1
, α0L2}, S2 = {α
1
L1
}, S3 = ∅. (7.11)
Taking this distribution of additional flows into account and using Theorem 2.1, the
port-to-port service curve will be calculated according to
βFTp2p = β
FT
SF ⊗ β
FT
OI = R
FT
p2p[t− T
FT
p2p]
+, (7.12)
β
FT
SF =
[
(βagg ⊗ βfwd)− (α
0
L1
+ α0L2)
]+
,
β
FT
OI =
[
βcla − α
1
L1
]+
⊗ [βsch − lmax]
+ .
Aggregation and forwarding can be merged in this case, thus the service curve of the
SF can be inferred directly, subject that α0L1 = rL1t+ bL1 and α
0
L2
= rL2t+ bL2 :
βFTSF = (RSF − rL1 − rL2)
[
t−
RSFTSF + bL1 + bL2
RSF − rL1 − rL2
]+
. (7.13)
Subsequently, α1L1 must be calculated prior to inferring the service curve of the OI.
It follows immediately from [13, 30] that
7.2. Cisco 2811 Router with HWIC-2FE Module 108
α1L1 = β
L1
SF ⊘ α
0
L1
= rL1 + bL1 + rL1T
L1
SF . (7.14)
Hence, it is necessary to compute also β
FL1
SF to obtain T
L1
SF .
β
FL1
SF =
[
βSF − (α
0
T + α
0
L2
)
]+
=
= (RSF − rT − rL2)
[
t−
RSFTSF + bT + bL2
RSF − rT − rL2
]+
⇒
TL1SF =
RSFTSF + bT + bL2
RSF − rT − rL2
. (7.15)
Consequently, α1L1 has the following form:
α1L1 = r
1
L1
t+ b1L1,
r1L1 = rL1 , (7.16)
b1L1 = bL1 + rL1
RSFTSF + bT + bL2
RSF − rT − rL2
. (7.17)
Using Theorem 2.1, and the inferred parameters given by (7.16) and (7.17) the clas-
sifier has the service curve
βFTcla = (Rcla − rL1)

t− RclaTcla + bL1 + rL1
RSFTSF+bT+bL2
RSF−rT−rL2
Rcla − rL1


+
. (7.18)
At last, the service curve of the scheduler must be inferred. The service curve is of
the RVL type according to (5.1). Thus,
β
FT
sch = Rsch
[
t− Tsch,con · 1{rL1+rT≥Rsch} +
lmax
Rsch
]+
. (7.19)
Finally, by inserting (7.13), (7.18), and (7.19) into (7.12), the parameters of the
port-to-port service curve are
R
FT
p2p = min [RSF − rL1 − rL2 , Rcla − rL1 , Rsch] ,
TFTp2p =
RSFTSF + bL1 + bL2
RSF − rL1 − rL2
+
RclaTcla + bL1 + rL1
RSFTSF+bT+bL2
RSF−rT−rL2
Rcla − rL1
+
+ Tsch · 1{rL1+rT≥Rsch} +
lmax
Rsch
. (7.20)
7.2. Cisco 2811 Router with HWIC-2FE Module 109
EPBOO Port-to-Port Service Curve
According to the EPBOO principle, the port-to-port service curve is given by (6.19) and
(6.20) for the SF-B/OI-PQ architecture. The flows belong to the sets as follows:
SR1 = {rL1 , rL2}, SR2 = {rL1}, SR3 = ∅, SR′3 = {r
′
L1
}, (7.21)
SB1 = {bL2}, SB2 = ∅, SB3 = {bL1}. (7.22)
Inserting this distribution into (6.19) and (6.20), and having evaluated the minima
of the rate parameters, the port-to-port service curve’s parameters are
RFTp2p = min [RSF − (rL1 + rL2), Rcla − rL1 , Rsch] ,
TFTp2p = TSF + Tsch,con · 1{rL1+rT≥Rsch} +
lLmax
Rsch
+
bL2
RSF
+
bL1
Rcla
+
b′L1
Rsch
. (7.23)
7.2.4 Model Simulation
The validated topology incorporates OI congestion provided by the flow FL1 and addi-
tional SF loading of the router provided by the flow FL2 . The parameters of the router
are introduced in Table 7.6. The parameters of the test flow are introduced in Table 7.5.
The parameters of interest of the switch are introduced in Table 7.2.
UDP Flooder #1
UDP Flooder #2
TestQoS
UDP Dummy #1
Router
Under Test
Deaggragation
Switch
UDP Dummy #2
Figure 7.6: Routed topology for validation
So far, the port-to-port service curve of a device has been modelled according to the
PBOO/PMOO approach. Nevertheless the end-to-end latency of a packet of a flow FT is
inferred using the TFA approach. There are two reasons for doing so: (i) the computation
simplifies significantly, and (ii) the latency caused by the switch is incomparably lower
to the one of the router.
The resulting dependence shows the progress of the maximum latency of the flow
FT to the rate of the loading flow FL1 and the flow FL2 . The switch serves only for
de-aggregation and thus behaves in a predictable way evident from Section 7.1. The
maximum latency of the switch is dˆFTp2p,sw = 90.5 µs. The end-to-end delay of the packet
7.2. Cisco 2811 Router with HWIC-2FE Module 110
Table 7.5: Flow parameters
Parameter Value Unit Description
rT 145 kb · s
−1 limited by TestQoS
rL1 0 - 190 Mb · s
−1 variable parameter
rL2 {0, 50, 100} Mb · s
−1 one value per test
bT 8912 b 1024-byte long packet
bL1 24000 b 2 IIs, 1500-byte long packets
bL2 12000 b 1 II, 1500-byte long packets
lLmax 12000 b 1500-byte long packet
Table 7.6: Router parameters
Parameter Value Unit Description
RSF 233 Mb · s
−1 capacity of the SF
TSF 94 µs
Rcla 175 Mb · s
−1
Tcla 0 µs
Rsch 100 Mb · s
−1
Tsch 7.61 ms rT + rL1 ≥ Rsch
from the flow FT is according to the TFA given by the sum of the port-to-port latencies
of the concatenated devices. Hence,
dˆFTe2e(FL1 , FL2) = dˆ
FT
RTR(FL1 , FL2) + dˆ
FT
SW = T
FT
RTR +
bFT
RFTRTR
+ dˆFTSW ,
where the parameters are those introduced in (7.20) and the values are introduced
in Table 7.5 and Table 7.6. Figure 7.7 shows the measured and theoretical latencies for
the mixed topology and the given load distribution under the PBOO/PMOO approach.
The inferred parameters based on which the theoretical latencies were inferred rep-
resent well enough the upper bound to the measured latencies. In the case with SF load
of 0 and 50 Mb · s−1, the latency growth to saturation is given by the load approaching
the classifier maximum capacity. In the case with SF load of 100 Mb · s−1, the latency
growth to saturation is given by the load approaching the SF capacity.
The model stops to be valid once the limit is reached. In such a case, some of the
traffic starts to be dropped and the saturation in latency appears. Modelling such a
behaviour requires additional effort to extend the model as described in Section 5.2
Figure 7.8 shows the measured and reconstructed latencies for the mixed topology
and the given load distribution. The maximum latencies are modelled according to the
EPBOO approach. It can be observed that the EPBOO approach does not represent
the real behaviour satisfyingly. Most probably the conditions under which the EPBOO
is valid do not hold in this case.
7.3. Validation Assessment 111
0 50 100 150 200
0
5
10
15
20
25
30
Outgoing Port Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency, SF Load: 0 Mb · s−1]
Measured Latency, SF Load: 50 Mb · s−1]
Measured Latency, SF Load: 100 Mb · s−1]
Theoretical Latency, SF Load: 0 Mb · s−1]
Theoretical Latency, SF Load: 50 Mb · s−1]
Theoretical Latency, SF Load: 100 Mb · s−1]
Figure 7.7: Reconstructed latency of packets passing a router with compound loading
(PBOO)
7.3 Validation Assessment
In the first validation case, behaviour of two overloaded concatenated switches is mod-
elled. The step-wise change of the latency after crossing values can have two reasons.
Yet, even combination of both. The first reason may be execution of the QoS mechanisms
which consume more resources and thus the fixed processing delay increases accordingly.
The second reason may be that as the additional flows’ rates approach the OI’s ca-
pacity, the latency asymptotically increases to infinity. However, as the interface cannot
transmit flow higher than the interface capacity allows and the buffer is finite, the latency
saturates. Nevertheless, both reasons can be with satisfying precision modelled by the
RVL function as is shown by these validations.
It is also shown that in this case where the rate of the observed traffic is significantly
lower than the rate of the additional flow, the OI can be modelled as a PQ scheduler.
For different constellations of the flows the model would have to be improved.
Generally, the behaviour of the HP 1800-8G Procurve switch is well representable by
a simple model as can be seen in Figure 7.1.
In the second validation case, behaviour of a topology accommodating one router
and one switch is modelled. Contrary to the switch, the router has more complicated
behaviour. The most distinguishing factor is variable contribution of the SF to the
port-to-port packet latency. This makes the model more complicated on one hand but
7.3. Validation Assessment 112
0 20 40 60 80 100 120 140 160 180
0
2
4
6
8
10
12
14
16
Outgoing Port Load [Mb · s−1]
M
a
x
im
u
m
L
a
te
n
cy
[m
s
]
 
 
Measured Latency, SF Load: 0 Mb · s−1]
Measured Latency, SF Load: 50 Mb · s−1]
Measured Latency, SF Load: 100 Mb · s−1]
Theoretical Latency, SF Load: 0 Mb · s−1]
Theoretical Latency, SF Load: 50 Mb · s−1]
Theoretical Latency, SF Load: 100 Mb · s−1]
Figure 7.8: Reconstructed latency of packets passing a router with compound loading
(EPBOO)
establishes a nice playground for modelling and assessment of the SF and OI concatena-
tion using network calculus on the other hand. SF parameters were firstly identified by
the procedure described in Section 6.3 and using the measurements obtained in RTR.FL
test. Although the effect of the SF saturation is similar to a saturated behaviour of an
OI, classical rate-latency service curve was used. A drawback of the approach is that
validity of the model is upper-bounded by the SF saturation as the mechanism counting
with the SF losses is not introduced.
Furthermore, parameters of the OI were identified from the RTR.OPC test. The clas-
sifier and scheduler are jointly modelled by the RVL service curve. In this identification
phase, the simulation result is very precise.
It can be seen that the PBOO/PMOO approach represents well the upper-bounds
of latency of the traversing packet. Contrary to this, EPBOO approach does not fit the
real behaviour at all. Most probably, it is not possible to account for multiplexing only
once because the packets arbitrate at more points of the path.
It is to say, that the PBOO/PMOO model represents well the router’s behaviour in
the given constellation of data flows. More complex validation was not possible due to
insufficient testing resources and limitations of the TesQoS application as introduced in
Section 4.1. Moreover, closed-form models of lossy service elements are urgently missing
which makes modelling of the networking devices at their performance edge very difficult.
Chapter 8
Conclusion
The objective of this dissertation was establishment of a comprehensive modelling frame-
work based on network calculus which would assist worst-case performance analysis of
temporal behaviour of IP-based industrial communication networks. Contrary to the
currently employed Ethernet-based fieldbuses, future IP-based industrial communica-
tion will have to employ existing COTS networking devices such as routers; at least until
the telecommunication device vendors will have been attracted by industrial customers.
Consequently, soft-real-time (SRT) run-time communication will have to rely on qual-
ity of service (QoS) capabilities of these networking devices. IP-based networks typically
make use of the DiffServ approach based on packet-oriented prioritisation with defined
per-hop behaviour (PHB). Hence, expedited treatment of the SRT run-time traffic can
be provided.
It was shown by initial experiments that the QoS-enabled router decreases packet
latency and jitter. However, should such mechanisms be employed in industrial safety-
relevant environment, worst-case temporal analysis must be performed in order to provide
reasonable evidence arguing for the IP-based communication applicability in industrial
environment. The problem area is twofold: (i) qualitative analysis is required to reveal
dominant factors influencing the QoS behaviour of the IP-based networking devices, and
(ii) quantitative analysis must be performed to reveal upper bounds of latencies, jitters,
bandwidth, and packet loss.
Qualitative analysis was subject to structured empirical observations and extensive
studies into networking device architectures. Results of the qualitative analysis were used
for reasoning about the model structure of a networking device. Quantitative analysis
was subject to dedicated empirical observations with focus on measurement stability and
result reliability. Results of the quantitative analysis were used for identification of the
parameters of the networking device and final validation of the device models.
Empirical observations introduced in Chapter 4 were performed by TestQoS test bed
which was developed by the author within the Virtual Automation Networks (VAN) pro-
ject dealing with the issues of the IP-based industrial communication. TestQoS measures
latencies of packet injected into the network/device under test. The salient feature of the
TestQoS is the precision of the measured packet latency which reaches the resolution of
10 ns. Drawback of the TestQoS is the limited rate of the packet injection which reach
ca 150 kb · s−1.
Having investigated several approaches to temporal analysis of IP-based communic-
ation networks, application of the network calculus for network/device modelling proved
to be very relevant on one hand, and very challenging due to its practical immaturity on
8.1. Unresolved Issues and Further Research 114
the other hand. According to the available resources, such a massive practical application
of network calculus has never been performed so far.
There were many obstacles to be overcome on the way to consistent networking device
models. For instance, bi-modal behaviour of latency depending on the device load, insuf-
ficient closed-form representations of lossy nodes, lack of parameter identification meth-
ods. Most of these challenges were tackled by developing the needed network calculus
extensions summarised in Chapter 5. Four exemplary models were inferred in Chapter 6
for two different switch-fabric architectures and two different outgoing-interface types
as a result of classification of different networking device architectures and empirically
identified dominant factors. These types form a basis for a future broader taxonomy
of designs. Special attention has been paid to reusability of the model when employing
different networking devices than those considered in this work. Approaches to model
design of SFs and OIs have been proposed. Two concatenation approach were considered
for the design modelling: PBOO/PMOO and EPBOO.
Final validation of the models of HP 1800-8G ProCurve switch and Cisco 2811 router
is introduced in Chapter 7. The proposed method for model parametrisation proved
applicable. PBOO/PMOO approach provided good results and the worst-case upper
bounds of latency conform the measurements. EPBOO approach failed to deliver the
required behaviour.
8.1 Unresolved Issues and Further Research
Despite extensive research effort related to this topic, there are numerous issues which
require further effort. The following points suggest the successive research agenda in this
field.
Extension of the TestQoS capabilities. The most limiting factor is insufficient rate
of the generated test traffic to be injected into the network under test. Upgrade
of the TestQoS is subject to significant change in the architecture towards multi-
threaded version of the traffic generation block. Accomplishment of this step would
allow for generating of self-loading traffic necessary for more advanced analysis.
Validation of the model in a broader scope. Accomplishment of the previous goal
would allow for investigation of new test scenarios. For instance, full investiga-
tion of the WRR, WFQ, and CBWFQ scheduling mechanisms, arbitration of more
high-priority flows, etc. Consequently, modelling approach introduced in this dis-
sertation could be verified and the model validity could be extended accordingly.
Service curves of other scheduling mechanisms. The scheduling mechanisms re-
garded in the previous point could be represented more rigorously and full verifica-
tion should be performed. This is only feasible subject to extension of the TestQoS
application or if commercially available tools are employed. Promising and cost-
effective validation of the models could be performed in a network on chip (NoC)
application.
8.2. Final Remarks 115
Representation of loss nodes. Network calculus framework has a high application
potential due to its modularity, i.e., block-based nature. Despite this fact, there is
no closed-form service curve representing a loss node. Several attempts to model
lossy behaviour are mentioned in Section 5.2. Establishment of a service curve with
losses would extend validity of the models to boundary conditions, which are of
particular interest for safety-critical applications.
Analysis of complex topologies. Despite the fact that applicable models of network-
ing devices have been inferred, the scope of the research is more ambitious. A
formalised approach to concatenation of numerous networking devices should be
established. It has already been proven that EPBOO approach has not provided
satisfying results in this case, hence TFA or SFA concatenation approach should
be considered. A proper trade-off between the complex network model and bound
tightness has to be found.
Promotion of the modelling framework to an engineering tool. Tool development
requires a formalised modular approach with clearly defined interfaces. This issue
has been paid proper attention throughout the dissertation, especially in case of
the typology definition of the models of networking devices. Similar approach has
already been applied in case of ns2 framework, which is however dedicated rather
to theoretical blocks without device oriented semantics. On the other hand, the
intended approach could potentially deliver support for assisted model structure
definition, similar to Opnet modelling environment.
8.2 Final Remarks
The presented dissertation establishes a cornerstone for the upcoming research in worst-
case temporal performance analysis based on network calculus and related SW engineer-
ing challenges in order to successfully accomplish its mission. Motivation to this research
topic was formulated based on industrial innovation requirements presented by research-
ers in the VAN project. The results should be applied ibidem, i.e., industrial automation.
Yet, performance analysis, a topical superset of the temporal performance analysis, has
been gaining importance with the growing market of the embedded systems which are
recently undergoing an evolutionary leap towards systems of systems (SoS) where net-
working and its temporal performance are the main non-functional aspects.
Bibliography
[1] Almquist, P. Type of Service in the Internet Protocol Suite (RFC 1349). 1992 Updated
by RFC 2474.
[2] Ayyorgun, S., Cruz, R. A service-curve model with loss and a multiplexing problem.
In Distributed Computing Systems, 2004. Proceedings. 24th International Conference on,
p. 756–765, 2004.
[3] Ayyorgun, S., Cruz, R. A composable service model with loss and a scheduling al-
gorithm. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer
and Communications Societies, vol. 3, p. 1950–1961 vol.3, March 2004.
[4] Bellini, P., Mattolini, R., Nesi, P. Temporal logics for real-time system specification.
ACM Computing Surveys, vol. 32, no. 1, p. 12–42, 2000.
[5] Beran, J., Elia, A., Hundt, L., Heutger, H., Meo, F., Messerschmitt, R., Schnm-
ller, B., Werner, T., Wolframm, M. D04.2.1 - results of modelling of rt mechanisms
in automation systems and rt extensions of existing industrial solutions. Deliverable, VAN
Project, 2006.
[6] Beran, J. Virtual automation networks - challenge in industrial automation. In Proceed-
ings of the IFAC Workshop on Programmable Devices and Embedded Systems. PDES 2006,
p. 473–478, 2006.
[7] Beran, J., Zezulka, F., Fiedler, P. Findings on qos metrics of l3 network devices inten-
ded for future factory automation. In Proceedings of the IFAC Workshop on Programmable
Devices and Embedded Systems. PDES 2009, p. 208 – 213, 2009.
[8] Beran, J., Fiedler, P., Zezulka, F. Virtual automation networks: An evolutionary
step towards industrial internet. IEEE Industrial Electronics Magazine [Accepted].
[9] Beran, J., Zezulka, F. Evaluation of real-time behaviour in virtual automation networks.
In Proceedings of the 17th IFAC World Congress, Seoul, Korea, 2008.
[10] Beran, J., Fiedler, P., Zezulka, F. Modeling a router temporal performance using
network calculus. In Proceedings of the 8th International PhD Student’ s Workshop on
Control and Information Technology, p. 197–205, 2009.
[11] Beran, J., Fiedler, P., Zezulka, F. Rate-variable-latency service curve as an exten-
sion to network calculus. In Control and Automation, 2009. MED ’09. 17th Mediterranean
Conference on, p. 286–291, June 2009.
BIBLIOGRAPHY 117
[12] Black, S., Black, D., Carlson, M., Davies, E., Wang, Z., Weiss, W. An Architec-
ture for Differentiated Services (RFC 2475). 1997 Updated by RFC 3260.
[13] Boudec, J.-Y. L., Thiran, P. Network Calculus - A Theory of Deterministic Queuing
Systems for the Internet., vol. 2050 Springer Verlag, 2001.
[14] Chang, C.-S. Performance Guarantees in Communication Networks (Telecommunication
Networks and Computer Systems). Springer Verlag, 2000.
[15] Chao, H. J., Liu, B. High Performance Switches and Routers. Willey-IEEE Press, 2007.
[16] Chertov, R.; Fahmy, S. S. N. B. A black-box router profiler. 05 2007.
[17] Collective. IAONA Handbook - Industrial Ethernet. IAONA e.V, 2005.
[18] Cruz, R. L. A calculus for network delay: Part ii: Network analysis. In IEEE Transactions
on Information Theory, vol. 37, p. 132–141, 1991.
[19] Cruz, R. L. A calculus for network delay: Part i: Network elements in isolation., vol. 37,
no. 1, p. 114–131, 1991.
[20] Decotignie, J.-d. The many faces of industrial ethernet [past and present]. Industrial
Electronics Magazine, IEEE, vol. 3, no. 1, p. 8–19, March 2009.
[21] Dooley, K., Brown, I. J. Cisco IOS Cookbook. O’Reilly, 2003.
[22] Douglas, B. P. Doing Hard Time. Addison-Wesley, 2000.
[23] Echagu¨e, J., Cholvi, V. Worst case burstiness increase due to arbitrary aggregate mul-
tiplexing. In valuetools ’06: Proceedings of the 1st international conference on Performance
evaluation methodolgies and tools, p. 10, New York, NY, USA, 2006. ACM.
[24] Faria, F. D., Strum, M., Chau, W. J. A system-level performance evaluation method-
ology for netwrok processors based on network calculus analytical modeling. VLSI, IEEE
Computer Society Annual Symposium on, vol. 0, p. 265–272, 2007.
[25] Fidler, M. Quality of Service in Multiservice IP Networks., vol. 2601, chapter Extending
the Network Calculus Pay Bursts Only Once Principle to Aggregate Scheduling, p. 19–34
Springer Berlin / Heidelberg, 2003.
[26] Fidler, M., Schmitt, J. B. On the way to a distributed systems calculus: an end-to-end
network calculus with data scaling. In SIGMETRICS ’06/Performance ’06: Proceedings
of the joint international conference on Measurement and modeling of computer systems,
p. 287–298, New York, NY, USA, 2006. ACM.
[27] Firoiu., V., Boudec, J.-Y. L., Towsley, D., Zhang, Z.-L. Theories and models for
internet quality of service. In Proceedings of the IEEE, 2002.
[28] Ford, M. Internetworking Technologies Handbook. Cisco Press, 1998.
[29] Georges, J.-P., Divoux, T., Rondeau, E. Validation of the network calculus approach
for the performance evaluation of switched ethernet-based industrial communication. Pro-
ceedings of the 16th IFAC World Congress, 2005.
[30] Georges, J.-P., Divoux, T., Rondeau, E. Strict priority versus weighted fair queuing
in switched ethernet networks for time-critical applications. In Proceedings of the 19th IEEE
International and Distributed Processing Symposium, 2005.
BIBLIOGRAPHY 118
[31] Georges, J.-P., Divoux, T., Rondeau, E. Comparison of switched ethernet architec-
tures models. In Proceedings of IEEE Conference on Emerging Technologies and Factory
Automation 2003, 2003.
[32] Georges, J.-P., Divoux, T., Rondeau, E. Confronting the the performances of a
switched ethernet network with industrial constraints by using network calculus. Inter-
national Journal of Communication Systems, vol. 18, p. 877–903, 2005.
[33] Georges, J.-P., Divoux, T., Rondeau, E. A formal method to guarantee a deterministic
behaviour of switched ethernet networks for time-critical applications. In Proceedings of the
IEEE International Symposium on Computer Aided Control Systems Design, p. 255–260,
2004.
[34] Giacomazzi, P., Saddemi, G. Bounded-variance network calculus: Computation of tight
approximations of end-to-end delay. In Proceedings of the IEEE International Conference
on Communications, 2008. ICC’08, p. 170–175, 2008.
[35] Grossman, D. New Terminology and Clarifications for DiffServ (RFC 3260). 2002.
[36] Gupta, R. A., Chow, M.-Y. Performance assessment and compensation for secure net-
worked control systems. In Proceedings of the 34th Annual Conference of IEEE Industrial
Electronics (IECON 2008), p. 2929–2934, 2008.
[37] Hanzalek, Z., Pacha, T. Use of the fieldbus systems in academic setting. In Proceedings
of Real-Time Systems Education III, p. 93–97, 1998.
[38] Hohn, N., Veitch, D., Papagiannaki, K., Diot, C. Bridging router performance and
queuing theory. In SIGMETRICS ’04/Performance ’04: Proceedings of the joint interna-
tional conference on Measurement and modeling of computer systems, p. 355–366, New York,
NY, USA, 2004. ACM.
[39] Hewlett-Packard Development Company HP ProCurve Switch 1800 Series., January 2009.
[40] Hundt, L., Hoffmann, M., Schwab, C., Beran, J. Quality of service measurement in
virtual automation networks. In Proceedings WAMS 2007 - 1st International Workshop on
Advanced Manufacturing Systems, p. 40–48, 2007.
[41] Jasperneite, J., Neumann, P. Deterministic real-time communication with switched
Ethernet. In Proceedings of 4th IEEE International Workshop on Factory Communication
Systems, p. 11–18, 2002.
[42] Jasperneite, J., Neumann, P. Measurement, analysis and modelling of real-time source
data traffic in factory communication systems. In Proceedings of IEEE International Work-
shop on Factory Communication Systems, p. 327–333, 2000.
[43] Kadlec, J., Beran, J., Vrba, R. Precise measurement of wireless network roaming func-
tionality and network component parameters applied for automation systems. In Proceedings
of the Third International Conference on Systems. ICONS 08, p. 373–376, 2008.
[44] Manita, A., Simonot, F., Song, Y. Multi-dimensional markov model for performance
evaluation of an Ethernet switch. Research Report 4813, Institut National de Recherche en
Informatique et en Automatique, May 2003.
[45] Messerschmitt, R., Beran, J., Werner, T., and, H. H. Real time for embedded auto-
mation systems including status and analysis and closed-loop real-time control. Technical
report, VAN Consortium, 2006 url: http://www.van-eu.eu/deliverables.
BIBLIOGRAPHY 119
[46] Nichols, K., Blake, S., Baker, F., Black, D. Definition of the Differentiated Services
Field (DS Field) in the IPv4 and IPv6 Headers (RFC 2474). 1998 Updated by RFC 3260
and RFC 3168.
[47] Papagiannaki, K. ; Moon, S. . F. C. . T. P. . D. C. Measurement and analysis of single-
hop delay on an ip backbone network. IEEE Journal on Selected Areas in Communications,
vol. 21, no. 6, p. 908–921, 2003.
[48] Park, S. G. Fieldbus in iec61158 standard. In Proceedings on the 15th CISL Winter
Workshop Kushu, 2002.
[49] Ponce, V., M. Engineering Hydrology: Principles and Practices. Prentice Hall, 1994.
[50] Popp, M. Das PROFINET IO-Buch - Grundlagen und Tipps fr Anvender. Hthig, 2005.
[51] Rizzo, G., Boudec, J.-Y. L. ”pay burst only once” does not hold for non-fifo guaranteed
rate nodes. Performance Evaluation, vol. 62, p. 366–381, 2005.
[52] Schmitt, J. B., Zdarsky, F. A., Fidler, M. Delay bounds under arbitrary multiplexing:
When network calculus leaves us in the lurch.... In Proceedings of the 27th IEEE Conference
on Computer Communications, p. 2342–2350, 2008.
[53] Schmitt, J. B., Zdarsky, F. A., Martinovic, I. Performance bounds in feed-forward
networks under blind multiplexing. Technical Report 349/06, Distributed Computer Systems
Lab (DISCO), University Kaiserslautern, Germany, 2006.
[54] Schmitt, J. B., Gollan, N., Martinovic, I. End-to-end worst-case analysis of non-fifo
systems. Technical Report 370/08, Distributed Computer Systems Lab (DISCO), University
Kaiserslautern, Germany, 2008.
[55] Schmitt, J. B., Zdarsky, F. A. The disco network calculator: a toolbox for worst
case analysis. In Proceedings of the 1st international conference on Performance evaluation
methodolgies and tools, 2006.
[56] Schmitt, J. B. Network calculus: what it can do for you and where it needs your help. In
AINTEC ’08: Proceedings of the 4th Asian Conference on Internet Engineering, p. 57–58,
New York, NY, USA, 2008. ACM.
[57] Strikant, R. The Mathematics of Internet Congestion Control. Birkhuser, system &
control: foundation and applications edition, 2003.
[58] Timmerman, M., Perneel, L. State of the art: operating systems for realtime embedded
applications. Industrial Ethernet Book, no. 36, February 2007.
[59] Zampieri, S. Trends in networked control systems. In Proceeding of the 17th IFAC World
Congress, Seoul, Korea, 2008.
[60] Zezulka, F., Beran, J. Virtual automation networks - architectural principles and the
current state of development. In Proceeding of the 34th Annual Conference of IEEE Indus-
trial Electronics (IECON 2008), p. 1545–1550, 2008.
Appendix A
Results of the Measurements
Table A.1: SW.FL test case statistics
Outgoing Average Latency Latency Minimum Maximum Packet
Port Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 88.40 0.05 88.40 88.30 88.50 99.97
500 88.39 0.05 88.40 88.06 88.50 99.97
Table A.2: SW.OPC test case statistics
Outgoing Average Latency Latency Minimum Maximum Packet
Port Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 180.69 0.07 180.69 180.43 180.87 99.99
20 192.87 28.50 180.69 180.40 299.98 99.98
30 198.93 33.49 180.71 180.30 299.94 99.98
40 204.53 36.36 180.74 180.33 299.99 100.00
50 211.37 38.56 185.25 180.24 299.99 99.97
60 216.72 39.44 202.03 180.44 300.04 99.91
70 224.55 39.49 218.88 180.49 300.00 99.89
80 230.52 38.35 228.55 179.89 300.02 99.98
90 249.33 50.45 245.56 180.47 544.04 99.95
100 1466.16 169.53 1507.62 180.74 1762.74 99.97
110 1559.14 102.17 1569.19 1250.58 1857.87 99.98
120 1479.57 173.06 1513.49 1092.37 1859.83 99.98
130 1492.00 183.68 1531.79 180.64 1862.07 99.98
140 1596.69 114.39 1604.29 1234.98 1877.21 99.89
150 1591.53 119.38 1600.94 1242.68 1886.23 99.83
160 1571.46 153.08 1597.03 1093.15 1886.43 99.99
175 1595.53 116.40 1603.04 1240.60 1874.60 99.99
121
Table A.3: SW.PL test case statistics
Outgoing Packet Average Latency Latency Minimum Maximum Packet
Port Load Length Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [B] [µs] [µs] [µs] [µs] [µs] [%]
0 64 17.05 0.07 17.05 16.64 17.25 99.99
128 27.71 0.07 27.71 27.49 27.89 100.00
256 49.01 0.07 49.01 48.74 49.19 100.00
512 91.61 0.06 91.61 91.44 91.79 100.00
1024 176.80 0.07 176.80 176.48 176.98 100.00
20 64 30.43 30.26 17.15 16.84 139.49 99.98
128 41.02 30.16 27.80 27.46 150.01 99.99
256 62.50 30.18 49.10 48.82 170.68 100.00
512 105.03 30.02 91.69 91.37 212.65 99.97
1024 189.66 29.27 176.89 176.49 296.09 99.98
80 64 69.61 38.87 67.86 17.13 139.78 99.99
128 79.60 39.12 78.08 27.47 150.05 99.94
256 102.05 39.09 100.50 48.77 170.94 99.94
512 143.72 38.83 142.27 91.35 212.73 99.93
1024 226.64 38.28 224.83 176.48 296.34 99.97
120 64 1376.99 227.69 1461.45 930.33 1813.53 99.77
128 1542.34 86.09 1541.23 1185.70 1790.04 99.96
256 1433.97 190.94 1490.97 958.36 1805.62 100.00
512 1585.75 77.29 1580.02 1325.92 1821.21 99.97
1024 1570.00 110.42 1577.03 1221.04 1860.43 100.00
160 64 1545.82 111.78 1557.57 1153.60 1790.38 100.00
128 1463.39 226.00 1543.56 27.67 1801.01 100.00
256 1465.48 219.93 1541.01 48.99 1801.49 99.98
512 1504.67 209.48 1573.96 91.64 1825.38 99.98
1024 1619.91 111.57 1627.73 1234.88 1874.48 99.95
122
Table A.4: SW.CON test case statistics
Outgoing Packet Average Latency Latency Minimum Maximum Packet
Port Load Length Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [B] [µs] [µs] [µs] [µs] [µs] [%]
2 Switches 0 176.80 0.07 176.80 176.48 176.98 100.00
20 189.66 29.27 176.89 176.49 296.09 99.98
80 226.64 38.28 224.83 176.48 296.34 99.97
100 1325.00 198.57 1214.69 176.76 1749.46 99.98
120 1434.90 197.44 1477.41 176.73 1868.14 99.99
160 1510.02 197.96 1566.44 176.74 1878.68 100.00
3 Switches 0 265.12 0.08 265.12 264.88 265.37 99.97
20 276.25 31.84 265.14 264.77 420.08 99.96
80 317.60 52.09 305.42 264.89 420.03 99.98
100 2192.51 589.08 2047.37 329.61 3321.55 99.98
120 2450.75 595.08 2656.77 265.02 3433.87 99.99
160 2531.75 613.33 2747.62 265.06 3447.40 100.00
4 Switches 0 353.63 0.09 353.63 353.09 353.91 99.95
20 385.28 54.34 353.70 353.23 543.89 99.94
80 429.91 64.47 430.56 353.33 544.24 99.99
100 2295.31 495.04 2320.29 449.11 3194.53 100.00
120 2410.67 640.57 2302.75 353.51 3536.66 99.99
160 2355.36 576.14 2256.64 353.46 3607.92 99.99
5 Switches 0 442.09 0.10 442.09 440.44 442.42 99.97
20 486.93 68.91 442.15 441.69 667.89 100.00
80 596.70 41.85 596.93 441.89 667.79 99.99
100 2476.65 550.62 2706.98 442.07 3205.59 99.97
120 2740.65 434.50 2926.30 1543.03 3312.81 99.98
160 3043.79 115.10 3050.07 2683.72 3332.58 99.99
6 Switches 0 530.61 0.11 530.61 529.86 531.00 100.00
20 592.14 83.23 530.79 530.09 791.51 100.00
80 720.13 41.16 720.14 530.60 792.04 100.00
100 2309.15 390.21 2282.41 751.79 3210.09 99.99
120 2703.54 524.09 2946.87 530.44 3439.56 99.99
160 2854.10 437.12 3039.81 1707.12 3446.67 99.99
123
Table A.5: RTR.PLCEF test case statistics
Forwarding Packet Average Latency Latency Minimum Latency Packet
Mechanism Length Latency Deviation Median Latency Percentile Pass Rate
[−] [B] [µs] [µs] [µs] [µs] (0.5%) [µs] [%]
CEF 64 133.51 14.08 129.26 115.81 184.93 100.00
128 137.90 14.26 133.58 118.93 189.98 100.00
256 146.86 14.19 142.63 129.53 200.79 100.00
512 163.00 14.11 157.10 148.83 220.26 100.00
1024 207.99 15.21 201.67 191.06 265.84 100.00
1280 239.29 16.65 232.14 217.51 293.04 100.00
Process 64 673.65 247.29 660.77 632.25 757.68 99.88
Switching 128 686.68 385.73 672.75 638.24 758.92 99.98
256 710.04 601.30 681.61 650.97 782.42 100.00
512 725.95 443.00 708.35 674.24 796.75 99.98
1024 772.95 333.38 756.57 718.26 860.77 100.00
1280 804.48 393.43 789.55 745.58 877.59 100.00
Table A.6: RTR.FL test case statistics - PQ
Switch Average Latency Latency Minimum Maximum Packet
Fabric Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 311.75 9.95 307.80 301.10 357.30 100.00
20 308.88 7.52 306.67 300.08 367.17 99.98
30 309.50 8.14 307.02 300.36 377.36 100.00
40 309.96 8.41 307.42 300.10 419.77 100.00
50 310.29 8.74 307.55 299.76 416.84 100.00
60 309.99 9.58 305.87 300.31 421.09 99.94
70 311.10 9.48 308.03 300.27 441.50 100.00
80 310.77 9.41 307.40 299.23 422.05 99.98
90 311.26 10.05 306.95 300.20 486.90 100.00
100 313.41 10.41 310.11 301.14 515.05 99.98
110 471.98 98.43 457.98 307.54 792.24 99.96
120 492.18 106.42 481.70 323.05 888.70 99.98
130 505.58 116.59 488.26 317.07 893.83 100.00
140 539.57 133.19 525.81 323.94 1183.32 100.00
150 577.03 155.49 561.77 326.91 1064.08 100.00
160 592.62 177.15 568.28 321.69 1123.53 99.96
170 607.43 198.66 574.02 318.78 1280.41 99.98
180 643.58 202.66 628.71 320.50 1394.91 100.00
190 926.51 384.51 901.98 303.80 1918.31 100.00
200 1017.45 494.96 973.58 296.25 2345.91 100.00
210 1108.35 500.37 1070.78 310.00 2473.60 100.00
220 910.99 529.57 788.60 294.87 2609.39 100.00
230 2135.56 1110.66 2069.44 320.51 7522.58 99.98
240 3657.62 2013.69 3528.32 331.71 8034.63 100.00
250 3846.10 2135.89 3784.65 297.65 8053.09 100.00
260 2810.82 2085.08 1748.69 293.65 8083.44 98.98
124
Table A.7: RTR.FL test case statistics - WFQ
Switch Average Latency Latency Minimum Maximum Packet
Fabric Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 308.87 8.08 306.03 300.01 352.03 100.00
20 307.87 6.58 306.12 299.67 365.79 100.00
30 308.42 6.78 306.53 300.18 363.94 100.00
40 308.53 7.12 306.45 299.75 371.95 100.00
50 309.09 7.75 306.74 300.46 393.08 100.00
60 309.56 7.96 307.14 300.68 436.27 100.00
70 310.16 8.46 307.48 300.73 461.90 100.00
80 310.23 8.58 307.29 299.88 432.19 100.00
90 314.82 8.93 311.55 303.76 438.69 100.00
100 316.09 9.46 312.80 304.70 447.82 100.00
110 367.07 32.54 360.85 323.41 570.57 100.00
120 369.82 34.44 362.54 323.67 566.05 100.00
130 372.82 37.67 364.63 325.02 594.10 100.00
140 377.43 41.85 367.79 307.01 635.11 100.00
150 382.94 46.21 372.26 318.62 687.07 100.00
160 386.30 48.30 376.38 321.06 709.99 100.00
170 391.79 51.96 380.82 322.68 702.22 100.00
180 394.51 52.35 384.40 323.10 699.69 100.00
190 402.11 55.59 391.99 325.08 663.50 100.00
195 352.93 48.63 334.05 304.52 657.11 100.00
210 432.91 84.63 417.67 307.55 799.52 100.00
220 418.50 105.29 391.51 293.72 865.54 99.98
230 372.55 99.99 324.84 294.27 834.17 100.00
240 505.04 119.21 499.53 308.91 892.14 100.00
250 317.93 16.46 312.54 292.59 515.05 100.00
260 319.44 17.95 313.94 291.99 492.80 99.98
270 319.34 18.77 313.26 291.22 501.24 100.00
125
Table A.8: RTR.FL test case statistics - CBWFQ
Switch Average Latency Latency Minimum Maximum Packet
Fabric Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 133.50 8.45 129.78 123.66 176.85 99.95
20 134.88 7.03 132.79 125.72 213.49 99.98
30 134.29 7.34 132.03 125.49 189.78 100.00
40 135.11 7.73 132.72 125.91 219.62 100.00
50 134.36 7.86 131.49 124.77 193.81 99.98
60 135.23 8.36 132.55 125.03 231.23 99.97
70 135.71 9.13 132.71 125.46 306.73 99.99
80 136.28 9.76 133.31 124.85 331.12 99.95
90 136.39 9.43 133.55 125.11 329.18 99.97
100 137.22 8.97 134.13 124.37 259.31 99.94
110 142.41 9.92 138.84 128.74 301.46 99.94
120 252.82 75.58 237.31 147.36 607.78 99.99
130 270.28 83.11 254.33 141.59 612.99 99.96
140 293.63 92.90 281.96 143.08 781.75 99.97
150 312.50 104.83 300.47 141.01 684.75 99.98
160 387.01 146.68 376.79 141.34 910.39 99.98
170 424.69 166.75 417.61 128.53 894.26 99.99
180 486.13 202.39 478.18 131.98 1269.43 99.98
190 990.90 641.63 871.59 369.84 7551.11 100.00
195 3959.94 2227.66 3917.09 155.55 7967.64 100.00
Table A.9: RTR.OPC test case statistics - FIFO
Outgoing Average Latency Latency Minimum Maximum Packet
Port Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 221.00 8.92 217.96 210.06 271.89 99.91
20 225.22 27.08 213.71 205.54 342.60 99.89
30 229.79 31.69 213.61 206.24 372.94 99.90
40 234.75 32.50 215.79 206.55 339.84 99.88
50 235.67 36.55 216.82 208.21 374.26 99.91
60 247.40 40.56 226.08 208.72 371.51 99.90
70 250.94 36.84 240.64 208.86 358.21 99.94
80 255.19 35.17 251.00 208.67 370.34 99.92
90 260.18 34.82 255.08 210.54 460.64 99.93
100 34787.58 55.83 34796.28 33545.99 34895.99 66.86
110 34823.06 37.44 34822.55 34657.29 34960.61 53.62
120 34827.11 41.51 34825.85 34621.45 35068.17 58.71
130 34824.53 42.39 34823.92 34654.77 34979.02 60.00
140 34832.30 42.21 34830.39 34379.58 34998.01 56.85
150 34842.96 50.13 34837.88 34571.15 35090.95 53.69
160 34856.41 57.11 34849.98 34627.87 35187.97 52.01
170 34873.11 65.44 34865.00 34654.08 35212.46 46.95
180 34895.06 81.55 34884.01 34591.34 35229.89 49.15
190 34988.23 225.68 34993.48 31227.08 35510.70 42.38
126
Table A.10: RTR.OPC test case statistics - PQ
Outgoing Average Latency Latency Minimum Maximum Packet
Port Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 220.46 8.23 217.77 210.17 270.69 99.90
20 223.78 28.77 211.20 204.62 368.12 100.00
30 232.42 36.38 213.74 205.10 377.44 100.00
40 234.70 33.91 215.18 205.99 374.55 100.00
50 241.05 36.29 221.75 202.08 411.64 99.99
60 244.09 36.74 227.27 205.74 356.44 99.98
70 252.51 38.32 243.71 207.45 368.43 99.99
80 258.99 33.47 262.71 207.71 397.08 99.99
90 271.76 44.41 258.65 209.75 460.53 100.00
100 7697.84 59.20 7694.91 7485.67 8070.97 100.00
110 7758.21 103.94 7741.13 7466.73 8229.94 99.98
120 7782.18 121.22 7761.55 7460.39 8315.13 100.00
130 7843.13 162.10 7830.91 303.08 8371.34 99.92
140 7864.32 114.24 7848.05 6830.75 8420.29 99.94
150 7921.29 144.05 7902.13 6832.31 8520.59 99.97
160 8448.4 651.30 8240.06 7548.66 10945.8 99.96
170 9063.72 1456.24 8661.12 7508.56 15298.62 99.95
180 10533.01 2067.85 10123.43 7533.59 15453.18 99.96
190 11928.23 2292.4 12311.57 7557.27 15780.77 99.95
Table A.11: RTR.OPC test case statistics - WFQ
Outgoing Average Latency Latency Minimum Maximum Packet
Port Load Latency Deviation Median Latency Latency Pass Rate
[Mb · s−1] [µs] [µs] [µs] [µs] [µs] [%]
0 222.54 7.92 220.34 211.43 277.38 99.93
20 223.70 28.38 211.01 205.24 345.69 99.89
30 228.81 31.84 212.22 205.70 346.33 99.86
40 236.24 34.01 215.38 206.40 364.58 99.90
50 238.81 37.29 218.06 206.34 371.97 99.85
60 250.43 43.37 232.97 205.07 373.40 99.86
70 252.03 37.03 244.77 207.12 409.24 99.96
80 258.27 35.14 258.61 207.56 367.60 99.94
90 266.19 38.61 261.52 207.83 394.10 99.94
100 7640.29 430.07 7674.92 213.75 7841.27 99.93
110 7696.55 66.23 7686.68 7496.02 7980.89 99.96
120 7727.35 77.37 7713.40 7520.82 8011.12 99.93
130 7734.57 80.06 7719.58 7515.60 7985.36 99.92
140 7740.42 76.42 7733.64 7544.17 8000.16 99.94
150 7771.34 74.80 7777.96 7465.78 8086.76 99.89
160 7791.90 79.16 7798.50 7479.43 8144.00 99.90
170 7783.52 79.83 7789.80 7458.35 8215.81 99.92
180 7790.49 87.00 7794.59 7441.55 8191.26 99.95
190 7813.06 89.98 7815.94 7459.77 8213.98 99.96
Abbreviations
AF Assured Forwarding
CEF Cisco Express Forwarding
CP Custom Queuing
DSCP Differentiated Service Code Point
EF Expedited Forwarding
EPBOO Extended PBOO
FCFS First Come First Served
FIFO First in First out
GPS Generalised Processor Sharing
HOL Head of Line
II Incoming Interface
IP Internet Protocol
IQ Inport Queueing
ISP Internet Service Provider
L1 Physical Layer
L2 Link Layer
L3 Network Layer
L4 Transport Layer
LAN Local Area Network
NCS Network Control System
NIC Network Interface Card
NoC Network on Chip
OBB Optimisation-Based Bounding
OI Outgoing Interface
OQ Outport Queuing
OSI Open System Interconnection
PBOO Pay Burst Only Once
PFU Packet-Forwarding Unit
PHB Per-Hop Behaviour
PMOO Pay Multiplexing Only Once
PQ Priority Queuing
QoS Quality of Service
128
RR Round Robin
RT Routing Table
RTOS Real-Time Operating System
RtoUDP Real-Time over UDP
RVL Rate-Variable-Latency
SF Switch Fabric
SF-B Blocking Switch Fabric
SF-N Nonblocking Switch Fabric
SFA Separated Flow Analysis
SFQ Switch-Fabric Queuing
SLA Service Level Agreement
SMQ Shared-Memory Queuing
SoC System on Chip
SOHO Small-Office-Home-Office
SoS System of Systems
SP Strict Priority
SVI Switch Virtual Interface
TCP Transport Control Protocol
TDM Time-Division Multitplex
TFA Total Flow Analysis
ToS Type of Service
TSP Telecommunication Service Provider
UDP User Datagram Protocol
VAN Virtual Automation Networks
VLAN Virtual Local Area Network
VOQ Virtual Outport Queuing
WAN Wide Area Network
WCET Worst-Case Execution Time
WFQ Weighted Fair Queuing
WRR Weighted Round Robin
Symbols
R(t) Input cumulative rate function
R∗(t) Output cumulative rate function
α(t) Arrival curve
γr,b(t) Affine function with rate r and burst b parameters
β(t) Service curve
βR,T (t) Rate-latency function with rate R and latency T parameters
λR(t) Rate function with rate R parameter
δT (t) Latency function with latency T parameter
βFR,T (t) Service curve offering service to the flow F
βSF (t) Service curve of the switch fabric
βfwd(t) Service curve of the PFU (at SF)
βagg(t) Service curve of aggregation (at SF)
βOI(t) Service curve of the outgoing interface
βcla(t) Service curve of the classifier (at OI)
βsch(t) Service curve of the scheduler (at OI)
βR,T1,T2,rT (t) RVL service curve with rate R, latencies T1 and T2 and trigger rate rT
⊗ Min-plus convolution
⊘ Min-plus deconvolution
dˆ Delay bound
bˆ Backlog bound
lˆ Loss bound
FI,J,K Flow passing through inport I to outport J with priority K
Fi,j,k Flow passing through inport i 6= I to outport j 6= J with priority
k 6= K
FT Observed test flow
FLi Loading flow i
J{I,J,K} Set of service elements along the flow FI,J,K
Kh Set of additional flows Fi,j,k at the service element h
K{I,J,K} Set of additional flows Fi,j,k which use at least one service element along
the path of the flow FI,J,K
JFI,J,K ,Fi,j,k Set of service elements which are used both by the flow FI,J,K and the
flow Fi,j,k
hmin Service element at which the regarding flow enters the system
