An investigation into buffer management mechanisms for the Diffserv assured forwarding traffic class by Mentz, Joshua
The copyright of this thesis vests in the author. No 
quotation from it or information derived from it is to be 
published without full acknowledgement of the source. 
The thesis is to be used for private study or non-
commercial research purposes only. 
Published by the University of Cape Town (UCT) in terms 










An Investigatio11 into BLlffer 
Management Mechanisms for the 





Mr. ~eco Ventura 
Department of Electrical Engineering 
University of Cape Town 
September 2003 
This dissertation is submitted to the University of Cape Town 111 fulfilment of the 











I declare that this the~,is is my own work. Where information from other sources has 
been llsed herein the n~levant material has been refelTed to in the references. 
This work is being submitted for the Master of Science Degree in Electrical 
Engineering at the Ulli verslty of Cape Town, It has not been submitted to any other 
L1ni versity for any oth( r degree or examination. 
Joshua Mentz 












I would like to thank Siemens and the National Research Foundation for providing 
financial support to ITII~ during this project. 
'rhank you to Neeo Ventura for giVIng guidance and for providing a conducive 
learning environment. Thank you to Sven Shepstone for providi invaluable support 
during the formative stages of my project. Thank you to the lolks at The Applied 
Research laboratory, Washington University for creati the MSR research platform 
and for taking the line to address my queries. Thanks also to Adam Burke for an 
initial collaboration (Iut spulTed me on. 
Thank you to my family for your rich support. Thank you also to Rowena Quinan and 












The IETF proposed 1 he Diffserv architecture as a means to provide preferential 
service for delay and 1,)sS sensitive applications. One of the service classes offered by 
Diffserv is the Assured Forwarding (AF) class. Because of scalabi lity concerns. IETF 
specifications recommend that microflow and aggregate-unaware active buffer 
management mechanisms such as RIO (Random early detecLion with ln/Out-of-
profile) he used in the core of Diffserv networks implementing AF. Such mechanisms 
have. however, been shown to provide poor performance with regard to fairness. 
stability and network controL Furthermore, recent advances in router technology now 
allow routers to im[,lement more advanced scheduling and hLlffl~r management 
mechanisms on high-s )eed ports. 
This thesis evaluates the performance improvements that m,ty be realized when 
implementing the Diffserv AF core using a hierarchical microflow and aggregate-
aware buffer management mechanism instead of RIO. The author motivates. proposes 
and specifies such ,[ mechanism. The mechanism. refelTed to as H-MAQ or 
Hierarchical multi drop-precedence queue state Microflow-Aware Quelling, IS 
evaluated on a testbed that compares the performance of a RIO network core with an 
H-MAQ network core The results of these evaluations are presented. 
The results obtained show that H-MAQ provides faimess that is comparable to RIO at 
the microflow level. At the level. H-MAQ networks were shown LO be 
effective in implemenll11g precise quantitative allocation of resources to aggregates on 











Table of (lo11tents 
DECLARA TION ........................................................................................................ II 
ACKNOWLEDGEl\[ENTS ..................................................................................... III 
SyNOPSiS .................................................................................................................. IV 
TABLE OF CONTE~TS ........................................................................................... v 
LIST OF FIGURES .................................................................................................... X 
GLOSSARY ............................................................................................................. XII 
INTRCIDUCTION ....................................................................................................... 1 
1.1 B:-\CKGROU~O l:"FOR\1ATION ............................................................................... I 
1.2 PROBLEM DESCIUPTI01\ ........................................................................................ 3 
1.3 OBJECTIVES .......................................................................................................... -.J. 
1.4 SCOPE .\1\0 LrMITA TIONS ...................................................................................... 5 
1.5 DEVELOPMENT PLAN ............................................................................................ 5 
LITERA TURE RE\ lEW ........................................................................................... 7 
2.1 II'.'TERl\ET PROT( )COLS .......................................................................................... 7 
2.1.1 The Layered Structure of Intemet Protocols ................................................. 7 
2.1.2 Network Layer Protocols .............................................................................. 9 
2.1.3 Transport Ll)'er Protocols ........................................................................... 10 
2.1.3.1 Transmission Confrol Protocol (TCP) ................................................. 10 
2.1.3.2 TCP FI,m-control ........................................... ...................................... 11 
2.1.3.3 Llser Dllf{/gmm Protocol (LlDP) ................................................. ..... /3 
2.1.4 The Effect (If Transport Protocols on Faimess ............................................ 13 
2.2 THE NXn"RE or Il\TER"IET TRAFFIC ................................................................... 14 
2.2.1 Intemet COl C Traffic Volume ...................................................................... 15 











2.2.3 Distribution of Intemet Traffic according to Transport Protocol ................ 17 
2.2.4 Distribution of Intemet Traffic according to Packet Siz(: ........................... 17 
2.3 O\'ER-PROVISIO\,ING FOR QCALITY OF SERVICE ................................................ 17 
2.4 l.:\TSERV FOR Ql ALITY OF SERVICE .................................................................... 18 
1 The Intserv :YIechanism .............................................................................. 18 
2.4.2 Intserv Classes of Service .................................................................. 19 
2.5 DIFFSERV FOR QCALITY OF SERVICE .................................................................. 19 
2.5.1 The Diffser" Mechanism ............................................................................ 19 
.2 DHlsen: of Service ....................................................................... 20 
.2.6 DIFFSERV ARCHITECTURES ....................................................................... . 
2.6.1 The Diffsel": AF Mechanism ..................................................................... .. 
2.6.1.1 Diff.\·en !IF Edge Router Bellm'iollr and ArchiTecture ............. ......... .. 
2.6.1.2 D(ft:';en AF Core Router BelwI'lolir ([lid Architecture ......................... 23 
2.6.2 Performancl~ of the Diffserv AF Mechanism ............................................. 28 
2.6.2.1 Shortcolllings oj'the Diff:w!l1) AF Mec/wl/ism ....................................... 28 
2.6.2.2 Workar illlldsj(JI' tlze Difj:"en AF Shortcolllings ................................. 30 
E~1)-To-E)JD Qt 'ALlTY OF SERVICE .................................................................. .. 
ROUTER ARCHITECTURE AND OPERATION ................................................ 35 
3.1 '1'1 IE ROLE OF HI E ROL:TER ................................................................................. 3.') 
3.2 TI-IE HISTORY 01 ROUTER ARC'HlTECTl"I~ES ..................................................... 3.') 
3.3 PRESE.:\T A1'.:D 'TCRE ROUTER ARCHlTECTU{ES ............................................. . 
3.4 SCALABILITY A;-.D MICROFLOW-AWARE.:\ESS ........... . . ...................... 40 
3.4.1 Limitations in Switch Fabric Technology ................................................... 40 
Limitations in POll Processor Technology ......................................... .41 
3.4.2.1 PeJiorlllillg Micro/low LookllpS ............................................................ 41 
3. .2 Slipponillg Separate Qlleues/i)r (II/ l'V/icrot7Oll's 011 Port Processors .. 41 
3.5 BEST PRACTICES FOR PER-MICRO FLOW QUEUNG ............................................. 42 
:1.6 A HIGH-PERFOR 'vlA]\"CE ROUTER ARCHITECTl'RE ............................. 43 
3.6.l The Hard\\lcl'e Components of the MSR .................................................... 44-
3.6.1.1 COl1frol Processor .................................... ............................................ 45 
3.6. A7M SI 'itch Core .................................................................................. 45 
3.6.1.3 Field P.-ogrmllJll({blc Port E~\1elldcr ..................................................... 47 











3.6.1.5 Line Cllrd .............................................................................................. 50 
3.6.2 Booting and Configuration of the MSR ...................................................... 50 
3.6.3 The OperatIon of the MSR .......................................................................... 51 
DESIGN CONSIDERA TIONS ................................................................................ 53 
4.1 MOTIVATION FOR H-MAQ ................................................................................. 53 
4.2 OVERVIEW OF H-MAQ NETWORKS .................................................................... 55 
4.2.1 Edge Network Element Behaviour.. ............................................................ 55 
4.2.2 H-MAQ Core Network Element Behaviour.. .............................................. 55 
4.2.3 Overall Network Design .............................................................................. 56 
4.3 FEASIBILITY OF PROPOSED MECHANISM ............................................................. 56 
4.3.1 Signalling Feasibility of Proposed Mechanism ........................................... 56 
4.3.2 User Traffic Feasibility of Proposed Mechanism ....................................... 56 
4.4 SCOPE OF IMPLEMENTATION ............................................................................... 57 
TECHNICAL SPECIFICATION FOR IMPLElVIENTATION ............................ 59 
5.1 EDGE ROUTER MECHANISM ................................................................................ 59 
5.2 CORE ROUTER MECHANISIv1. ............................................................................... 60 
5.2.1 H-MAQ Internal Buffer Allocation ............................................................. 60 
5.2.2 H-MAQ Data Flow ...................................................................................... 61 
5.2.3 H-MAQ Drop Policy ................................................................................... 62 
DESIGN AND IMPLEMENTATION OF TESTBED ........................................... 64 
6.1 CHOICE OF PLA"'FORM ........................................................................................ 64 
6.2 HARDWARE CO\1PONENTS .................................................................................. 65 
6.3 -NETWORK TOPOLOGY ......................................................................................... 65 
6.4 TRAFFIC SOURCES .............................................................................................. 66 
6.5 EDGE ROUTER .................................................................................................... 68 
6.6 CORE ROUTER .................................................................................................... 68 
6.7 TRAFFIC SL~K .................................................................................................... 70 
6.8 DELAY EMULA 1 OR ............................................................................................. 70 
6.9 DATA COLLECTION ............................................................................................. 71 
EVALUATIONS, RESULTS AND ANALySIS ..................................................... 72 











7.1.1 Evaluation ................................................................................................... 73 
7.1.2 Results ........................................................................................................ 73 
7.1.3 Analysis of Results ...................................................................................... 74 
7.2 MICROFLOW FAIRNESS - ROUND TRIP TIMES ..................................................... 74 
7.2.1 Evaluation ................................................................................................... 75 
7.2.2 Results ........................................................................................................ 75 
7.2.3 Analysis of Results ...................................................................................... 76 
7.3 AGGREGATE FAIRNESS AND CONTROL - TRANSPORT PROTOCOL. ...................... 77 
7.3.1 Evaluation ................................................................................................... 77 
7.3.2 Results ........................................................................................................ 78 
7.3.3 Analysis of Results ...................................................................................... 80 
7.4 AGGREGATE FA IRNESS AND CONTROL - ROUND TRIP TIMES ............................. 81 
7.4.1 Evaluation., .................................................................................................. 82 
7.4.2 Results ...... ' .................................................................................................. 82 
7.4.3 Analysis of Results ...................................................................................... 84 
7.5 AGGREGATE FA IRNESS AND CONTROL - AGGREGATE SERVICE LEVEL. ............. 85 
7.5.1 Evaluation., .................................................................................................. 86 
7.5.2 Results ...... ' .................................................................................................. 86 
7.5.3 Analysis of Results ...................................................................................... 88 
7.6 AGGREGATE FAIRNESS AND CONTROL - NUMBER OF FLOWS PER AGGREGATE. 89 
7.6.1 Evaluation., .................................................................................................. 89 
7.6.2 Results ...... ' .................................................................................................. 90 
7.6.3 Analysis of Results ...................................................................................... 91 
CONCLUSIONS ........................................................................................................ 92 
8.1 MICROFLOW FAIRNESS ....................................................................................... 92 
8.2 AGGREGATE FARNESS AND CONTROL ............................................................... 93 
8.3 OVERALL PERFORMANCE OF H-MAQ ................................................................ 94 
RECOMMENDA TIONS AND FUTURE WORK ................................................. 95 
APPENDIX A: ASY~CHRONOUS TRANSFER MODE .................................... 97 
A.l INTRODUCTION TO A TM .................................................................................... 97 
A.2 IP OVER ATM ................................................................................................... 98 











B.1 RIO DEVELOP;,'lENT ........................................................................................... 99 
B.2 RIO TESTING ..................................................................................................... 99 











List of Figures 
FIGURE 1: THE OSI MODEL ............................................................................................ 8 
FIGURE 2: THE IPv4 HEADER FORMAT (20 BYTES TOTAL) .............................................. 9 
FIGURE 3: THE TCP HEADER FORMAT (20 BYTES TOTAL) ............................................ 10 
FIGURE 4: AN EXAMPLE TCP CONGESTION WINDOW PROFILE OVER TIME .................... 12 
FIGURE 5: THE UDP HEADER FORMAT (8 BYTES TOTAL) .............................................. l3 
FIGURE 6: ApPLICATION BYTE VOLUMES IN 1997 AND 200112 ..................................... 16 
FIGURE 7: ROUTER ARCHITECTURE WITH FOUR RIO QUEUES ....................................... 23 
FIGURE 8: THE PROBABILITY OF RED DROPPING AN INCOMING PACKET AS A FIJNCTION 
OF AVERAGE BCFFER OCCUPANCY ........................................................................ 25 
FIGURE 9: STAGGERED PARAMETERS -A COMPARISON OF PACKET DROP PROBABILITY 
GRAPHS FOR IN-N.OFILE AND OUT-OF-PROFILE PACKETS ...................................... 27 
FIGCRE 10: AN Ef'<TI-TO-END CONNECTION WITH INTSERV, DIFFSERV, ANTI OVER-
PROVISIONED BEST EFFORT NETWORKS ................................................................. 32 
FIGURE 11: THE SHARED PARALLEL PROCESSORS SPACE SWITCHING ROCTER 
ARCHITECTURE ..................................................................................................... 37 
FIGURE 12: THE DATA PATH OF A ROUTER WITH A SWITCH FABRIC ARCHITECTURE AND 
DISTRIBUTED BUFf'ER MEMORY ANTI FORWARDING ............................................... 38 
FIGURE 13: AN OVERYIEW OF THE MSR HARDWARE SHOWING THE SWITCH FABRIC, 
SWITCH PORTS, FIELD PROGRAMMABLE PORT EXTENDERS (FPXS), SMART PORT 
CARDS (SPCS) A:-ID LINE CARDS ........................................................................... 44 
FIGURE 14: A BENES TOPOLOGY -THE INTERNAL STRUCTURE OF A WUGS ................. 46 
FIGURE 15: AN OVERY lEW OF THE FPX ARCHITECTURE SHOWING THE DATA PATH ..... 47 
FIGURE 16: THE SPC ARCHITECTURE ........................................................................... 48 
FIGURE 17: A SIMPLIFIED SPC DATA PATlI .................................................................. 49 
FIGURE 18: AN EXAMPLE OF AN H-MAQ LOGICAL DATA PATH ................................... 61 
FIGURE 19: CONCEPTIJAL TOPOLOGY OF TESTBED ........................................................ 65 
FIGURE 20: IMPLEME;--':T ATION OF TESTBED .................................................................. 66 











FIGURE 22: THE PROPORTION OF BANDWIDTH CONSUMED BY THE TCP SOURCES WHEN 
THEY MAKE UP 90% OF THE COMPETING SOURCES ............................................... 73 
FIGURE 23: THE PROPORTION OF BANDWIDTH CONSUMED BY THE DELAYED SOURCES 
ACCORDING TO TARGET RATE WHEN ALL PACKETS ARE IN-PROFILE ..................... 75 
FIGURE 24: THE PROPORTION OF BANDWIDTH CONSUMED BY THE DELA YED SOURCES 
ACCORDING TO POLICE RATE WHEN THE TARGET RATE IS 50 KBpS AND THE DELAY 
IS 200MS ............................................................................................................... 76 
FIGURE 25: THE PROPORTION OF BANDWIDTH CONSUMED BY THE TCP AGGREGATE 
ACCORDING TO TARGET RATE WHEN ALL PACKETS ARE IN-PROFILE ..................... 78 
FIGURE 26: THE PROPORTION OF BANDWIDTH CONSUMED BY THE TCP AGGREGATE 
ACCORDING TO POLICE AND TARGET RATES WHEN RIO IS USED IN THE NETWORK 
CORE ..................................................................................................................... 79 
FIGURE 27: THE PROPORTION OF BANDWIDTH CONSUMED BY THE TCP AGGREGATE 
ACCORDING TO POLICE AND TARGET RATES WHEN H-MAQ IS USED IN THE 
NETWORK CORE .................................................................................................... 80 
FIGURE 28: THE PROPORTION OF BANDWIDTH CONSUMED BY THE DEL A YED AGGREGATE 
ACCORDING TO TARGET RATE WHEN ALL PACKETS ARE IN-PROFILE ..................... 82 
FIGURE 29: THE PROPORTION OF BANDWIDTH CONSUMED BY THE DELAYED AGGREGATE 
ACCORDING TO T /\RGET AND POLICE RATES WHEN THE DELAY IS APPROXIMATELY 
50MS AND RIO IS USED ........................................................................................ 83 
FIGURE 30: THE PROPORTION OF BANDWIDTH CONSUMED BY THE DELAYED AGGREGATE 
ACCORDING TO T f\RGET AND POLICE RATES WHEN THE DELAY IS APPROXIMATELY 
50MS AND H-MAQ IS USED .................................................................................. 84 
FIGURE 31: THE PROPORTION OF BANDWIDTH CONSUMED BY THE AGGREGATE WITH A 
QUARTER-SIZED SERVICE LEVEL ACCORDING TO THE TOTAL NUMBER OF TCP 
SOURCES ............................................................................................................... 87 
FIGURE 32: THE PROPORTION OF BANDWIDTH CONSUMED BY THE AGGREGATE WITH 
FEWER MICROFLOWS ACCORDING TO THE PERCENT OF DELIVERED PACKETS THAT 
ARE IN-PROFILE ..................................................................................................... 90 






























Assured Forwarding - a Difrserv service class. 
ACK.nowledgement packet - a small TCP pack~t used to 
acknJwledge receipt of a data packet. 
A collection of microflows that come from or are bound for a gIven 
client and form part of that client's SLA. 
A Tl\ r Port Interface Controller - a network interface chip. 
Asynchronous Transfer Mode - refer Appendix A. 
Data that is transferred or transmitted in short, uneven spurts. 
Defi,:it Round Robin - a fair packet scheduli mechanism. 
Diff~erv Code Point ~- a field in the IP header 01 a packet that 
specifies the packet's service level in a Ditlserv net\\'ork. 
A 1'01 Iter that lies on the boundary of a network domain. 
routers are often responsible for admission contiol. 
Expedited Forwarding - a Difrserv senice clas~. 
A nework flow \vithout hard bandwidth or del,l)' requirements 
First In First Out - a queuing / dequeLllng polic;. 
Fast IP Lookup - an efficient algorithm for rincling a packel's next 
hop. 
A se'ies of packets travelling between a TCP or UDP socket pair. 
Equi valent to microflow. 
Fiele Programmable Gate Array an integrated circuit that can he 
prog 'ammcd in the field after manufacture. 
Fiek Programmable port eXtender an FPGA based port processor 
on tl~e MSR. Designed primarily to perform IP :ookups. 
Fair ~andom Early Detection a RED based butTering mechanism. 
A link or port that can carry traffic in both directions 





























The .lmount of data correctly received by a traffic sink in a specified 
amOllnt of time. Compare with throughput. 
HieLlrchical multi drop-precedence queue state Microfloyv-Aware 
Quelling. The proposed butTer management and scheduling 
mechanism for use in core routers. 
Inter let Engineeri Task - an Internet standards body. 
The lourth version of IP. This IP version currenily predominates in 
the Illiemel. 
A new version of IP that includes an increased ~:ddress space and the 
addition of a flow label. 
Intenet Service Provider a company selling network resources. 
Larg,~ variations in the transit time for packets travelling through a 
netwJl"k. Also called delay variation. 
Pack-:ts that are identical in terms of the 5-tuplc: source address: 
desti il ati on address; source port; destination port and protocol. A 
flow as seen at the packet level. 
Mulll Protocol Label Switching a label-switchlllg technol III 
which packets are appended with locally significant MPLS labels at 
the \IPLS network's i 
MulL-Service Router -a router developed for research at Tile 
Applied Research Laboratory, Washington lTni\ersity. 
Max! mum Transmission Unit the largest size packet that a 
net\\ )rk can transmit without fragmentation. 
A conputer chip that implements the epe interface and memory 
interl'ace. Compare with South Bridge. 
Net\\ork Time Protocol-a mechanism for synchronising computers. 
Net\\ ork Time Protocol Query - a program that queries NTP servers 
abou ~ their ClIlTent state. 
Oper Systems Interconnect a layered model ckscribing protocols 
for Cl lmmunicati systems 
The maximum allowable data rate as enforced by a policing system. 
Permanent Virtual Circuit refer Appendix ;\ - ATI'vI 





























RanJom Early Detection - a microflow unaware active butTer 
management mechanism. 
Resllurcc reSerVation setup Protocol a protocol for reserving 
netvork resources. 
Req Jest For Comments - an IETF standards track document. 
Random early detection with In/Out-of-profi Ie - a microflmv-
Ul1ll'\are active buffer management mechanism with multi drop 
precedence levels. 
Swilciling Element - a component of an AT\1 s\,vitch fabric. 
Sen ICC Level Agreement - an agreement between a user and a 
netv'ork operator that specifics the level of service the user receives. 
A 1ll1vvork connection that expires if nut period'cally refreshed. 
A Ct Il1puter chip that runs onboard devices sLlch as IDE control 
and'Olmd. Compare with North Bridge. 
Smart Port Card a multipurpose MSR port processor. 
The ['ate at which a traffic source attempts to send data. 
Transmission Control Protocol a f1m\'-controlled transport layer 
protocol. 
The amount of data transferred across a link in a specified lime. 
Con pare with goodput. 
Tim.: Sliding Window - a meter of a niicrofio\\ 's data rate, 
USCI Datagram Protocol - a non-tlow-controlled transport layer 
protl H:ol. 
Virtual Channel Identifier - refer Appendix A ATI\! 
Virtual Output Queue - a queue on the input port of a router. There 
is a \'OQ associated \vith each destination purt. 
Was lington University Gigabit Switch - the SWitching comronent of 











('haple' I Il1Il"< ,dUe'lIl"! 
Chapter 1 
Introducti()n 
1.1 Background Information 
Initially. the Intemci was used as a medium for data c()mmunications with 
predominantly non re; I-time applications such as emaiL newsgwups and file transfer. 
Such traffic is termed elastic as it can tolerate a fair delay before reaching the 
receiver. With the current convergence of data and telecommunications networks, as 
well as with the increase in the use of multimedia applications, there is an increasing 
need for the Intemet '0 carry real-time traffic such as voice and video. Such traffic 
types have stringent n'quirements in terms of bandwidth. delay and COll'ect delivery. 
One of the technologi\;s that were designed to enable the Internet to provide adequate 
service levels to non-llastic traffic was Diffscrv. This mechanism provides improved 
service levels to non-l'i<.lstic traffic by pnoritlsi such traffic c:asses, The advent of 
this technology is cOIhidered important in the Internet's development. as It will equip 









The Dillserv model \,as designed to maXllll1Se simplicity and scalahility. Dill'sen 
involves the use of a :;tate-aware network edge that polices incoming microflm\ s' on 
an individual basis. '[ he network is also responsi ble for :narki ng the Di ITsen' 
Code Point (DSCP) in the IP headers of incoming packets. The \alue of a packet's 
DSCP determines the level of that It should receive in the network core. The 
network elements in the core are assumed to be unaware or individual microrlows. 
These elements simpl,1 use DSCPs to determine the le\el of service that each packet 
should receive. Such lI1icroflmv-unaware mechanisms are said to be desirable, as they 
are computationally sillple. 
The Ditlserv model maximises network scalability by pushing the complexity to the 
edge of the network. The rationale for this is that because data r~[tes are highest in the 
network core. the dat,l path in core network elements should he a.s simple and efficient 
as possible. 
One of the service cl~lsses offered by Diffser\, is the Assured F(\rwarding (.~F) class. 
This class has mult!ple drop precedence levels. This is in'ended to maXimise 
utilization by allowin! traffic sources to inject additional traffiL' into the network at 
the risk of a higher proportion of packets being dropped. The mechanism works as 
follows: Should a SOll1'ce exceed its negotiated sendi rate. all ()Ilencling packels are 
remarked with a higher drop precedence level at the edge l)f t!le Difrserv network. 
Should congestion o(:cur in the core of the Diffser\, network. the core network 
elements will drop tle packets with the higher drop precedence Is first. By 
gambling on there no' being any congestion in the network core. traffic sources may 
recei ve increased c1ata rates. 
It has been recommended that active queuing mechanisms su:h as RIO (Random 
early detection with 111/0ut-of-prorile) be lIsed to implement the Diffserv AF class in 
core routers. Such medlanisms arc considered deSirable. as 
• They are computationally simple. 
C The term microllow. as [~ed in this thesis. refers ttl a correlated serie, llf I'acket~ that ('(insist llf the 
five-tuple: source addreso" de"tinalioll address. protocol Ilumber. ~()urce p<lrt ,Ind de.,tinatioil port. Thi~ 










('haptc. I IntH 'duction 
• They limit average queue size, 
• They handle multiple drop precedence levels, and 
• They provide better fairness than si mple drop-tai I queues, 
1.2 Problem Description 
Although the use of ITicrofiow-unaware act]\ e queu] ll1echani~ms, such a~ RIO, has 
been recommended f( r the Diffserv AF implementation, such mechanisms have been 
found to have a numbr~r of drawbacks, These are listed be 10\\ , 
• There is unfairness between competing microflows as well as competing 
aaarcaates' when hev differ in terms of: bb C ." 
• Transpol1 protJcol makeup (TCP or CDP), 
• Average rounc trip time, and 
• Average packLt size, 
Because the Internet's traffic makeup is so very diverse, any unfairness relating to 
traffic type wi II h,\'e very real repercllssions. 
• There is unfairne:-.s between aggregates \\Itll different {lllow{l)le data rates. Tilis is 
explained as follows: At the edge of the Dirfserv net\\ork packets are puliced 
according to theil aggregate's Service Level Agreement (SLi'q. Once 111 the core 
of the Diffserv network, identically marked packets from a~gregates with higher 
service levels are treated the same as those from aggregates with lower service 
levels. This causes unfairness because, for example, exces~ net\\ ork capacity is 
not allocated accl.rding to the SLAs of the competing aggregates. This can put the 
aggregates with h gher SLAs at (l disadvantage. 
• There is unfairne,s between competing aggregatcs that have the same SLAs. but 
differ in terms 01 the number of microflows that they contam. For example, two 
companies may !lHve identical SLAs with an Internet Senice Provider. These 
SLAs would spejfy the company's network connectivity at aggregate level. If 
one of the comp<lnies has more microf/ows than the other, tile comp<.ln) with less 
microfio\\'s will elten be disadvantaged 
An aggregate is a collctlon of rnicrotlows that COllle frull1 or arc buund f(,:- a gi\cn clicnt and form 










('haplcr I Inlr(lLiucl;lln 
• The use of mech:lIlisms such as RIO has been shown to result in instabi lity when 
capacities are high and round trip times ~ll'e long. This causes jitter, packet losses. 
under-utilisation ~llid reduced responsiveness, 
In addition to the int:'insic unfairness, these factors result in the behaviour oj' the 
Diffsen network core not being predictable, but being affected by a number of factors 
relating to the traffic )eing carried, This means that although Diffserv does pmvide 
relative service diffel entiation, it is unable to provide quanti hable service levels 
\\ithout being used il conjunction with network over-pro\'isi()ning, The value of 
providing relative. ratller than quantifiable service levels. is que~tionahle as SLAs are 
specified in absolute, 1I0t relative terms, Furthermore, because the Dinserv core is not 
aware of competing aggregates, it is not possible to explicitly provision for them, This 
makes the precise contl'Ol of Diffserv networks an impossible task. 
1.3 Objectives 
Recent advances in 'outer technology enable core network clements to pmvide 
microflow and aggre~~ate-aware buffer management and scheduling on high-speed 
network ports, This ol'viates the need for Dinserv networks to u~e simple Illicrot'lo\v-
unaware mechanisms )uch as RIO in Diffserv network cores, This study explores the 
degree to which the p(~rformance of Diffsen networks Impieme'lting the AF CielSS of 
service may be impnlved by the use of a microfilm. aggreg~lte and DSCP-aware 
buffer management ml'Chanism in core network elements, 
The need for such a buffer management mechanism is motivated and a specii'ication is 
provided, The specifi:d mechanism, refelTed to as H-MAQ llr Hierarchical multi 
drop-precedence queul~ state Microflow-Aware Queuing. was e\ aluated on a testbed, 
The results of the eva uations are given, Using these results. this study compares the 
performance of a Diflserv network where H-MAQ is implemented ill the core with 
that where RIO is i rnplemented, The benefits of using H-\1AQ rather than a 










("hapler I: Imn ,dUCli<l1l 
1.4 Scope and Limitations 
This study considers t le performance of Dinserv networks aCC(1rding to whether the 
buffer management mechanisms in core network elements are microflow and 
aggregate-aware or n)t. A microflow. aggregate and DSCP-aware mechanism is 
motivated and specired. Thereupon its performance is com]'ared with that of a 
microflow and aggregate-unaware mechanism. This comparisor is made based on a 
series of comparatin: evaluations that were performed on a l~el\\ork testbed, The 
comparisons are made using factors such as fairness and controlbbi lity as indices. 
Employing H-MAQ ir stead of RIO would in general require the addition of a number 
of signalling paths. These \vould be needed to aJ locaLe resour-.::es to aggregates on 
intelllal links of the Il.:twork. These signalling paths are. howe\er, not considered in 
this study. Only the functionality or the network elements with regard to user traffic is 
considered. Further, ol1ly the Diffserv AF class of service is conSidered. 
This study is concell1td with IP network functionality only. Little attention is gi\en to 
lower layer protocols Tn particular, no attention is given to the synergies that exist 
when quality of servi :e-aware protocols such as AT:Yl are used in conjunction with 
IP. 
1.5 Development Plan 
The remainder of this document is organised as follows: 
• Chapter] provicks the reader with the necessary backgrOl:nd information on a 
number of topics. It begins with a disclLssion on Internet protocols. The nature of 
Internet traffic is investigated next. The chapter continues with a discussion of 
three mechanisms for implementing improved quality of sel'\'ice. These arc over-
provisioning, lnts,~rv and Diffserv respecti vely. The Diffsen AF class of service 
is examined in gr(~ater detail next. A section follows this on the interoperation of 
quality of service llechanisms to provide end-to-end quality ()f sen ice. 
• Because of the l',;lative complexity of H-MAQ. and the Ltct that it IS run on 
routers. it is necessary to analyse router architecture in detail. Chapter 5 begins 
v,'ith an overview of the role of the router in IP networks. 1\ description of past. 










('haptcr I inlrc\clLlcli()n 
the capabi lities of current routers. There is a discussion on router buffer 
management and scheduling best practice. Finally, the architecture of the gigabit 
router used in the Cluthor's implementation is described in detlil. 
• Chapter 4 describes the design considerations of H-MAQ. It motivates the use of a 
microflow and aggregate-aware mechanism by noting the performance limitations 
of using mechanisms such as RIO to implement the Ditlsen AF core. ThereafteL 
the overall design of H-MAQ is considered. This is followcd by a discussiun on 
H-MAQ's feasibility. Finally, the scope 01' the author's implementation is given. 
• Chapter 5 provide; a technical specification for the implementation of H-MAQ. 
This is done in t\\l) parts. Firstly, the modifications required on the edge routers 
are discussed, whereupon the implementation of the H-M.-\Q core is specified, 
• Chapter 6 descrihe" the design and implementation of the testbed llsed to evaluate 
H-MAQ and compare its performance \\ith that of RIO. The testbed's topology, 
hardware implementation, traffic sources. router mechanisms, and data collection 
mechanism are all lescribed here. 
• Chapter 7 descrihes the evaluations that were performed 011 the testbed. It also 
describes the results of the evaluations and gives an analysis of the results. 
• Chapter 8 gives the conclusions drawn from the evaluatir)ns as applicable to 
Dillsen' AF netwOi'k performance in general. 
• Chapter 9 makes rc commendations for future work. 
• Finally, two appendices give details on ATM as well as on the development and 










ChapleT 2: L:kra!lirc' ({CI'IC\\ 
Chapter 2 
Literature Review 
2.1 Internet Protocols 
The following sectioll introduces the Open Systems Interconnect (OSI) model for 
describing the hierarcnical protocol stack or computer networks. Individual Internet 
protocols are then con;idered with reference to this model. 
2.1.1 The Layered Structure of Internet Protocols 
The task of allowi ( applications that are hosted on separate machines to 
communicate is a complex one. Because of this mherent comple'\ ity. it is lIsefullo lISC 
a layered model to dlscribe the process. This approach facilitates the abstraction or 
detail at many levels. Although the OSI model has been criticised for being too rigid. 
it will re\iewed h're as it provides a context for ~l descrilYion of the Intcrnct':~ 












Lay.:! App! icatioll I ayn 
- -Layer 6 Presentation l.ayer 6· Prc~('nlat ion La\\' I 
- -:'i: Se,sion Ll\ ('I' :'i: Sl'ssion - -4: Trampurt L~ \ CI I.aye'!' 4 Layer - - + 
Layer 3: :'\e[\\oIK L:.cr - -
Layer ~ Data LI\ c . - -
Physical I.I!IK 
--
La\cr 1 PhysIcal Lr cr Ll\:r I· Physical 
'-----------, 
Figure 1: The OSI model 
Figure 1 demonstrate:-; lhat identical layered architectures arc duplicated on both the 
transmitting and the receiving end-systems. The diagram illustrates that when an 
application on one end system-needs to send data to that on another end-system, the 
data is initially sent dt1wn the stack to the Physical Layer. The data is then sent to the 
receiving end-system hetore being translated hack Lip the stack. The foJlowll1g points 
describe the role of emh layer on the OSI stack [1]. 
• The Application Llyer consists of the L1ser applications that need to be connected. 
This refers to applications sLlch as web browsers and FTP. 
• The Presentation I.ayer translates data from the application' s format to a format 
understood by the !let work and vice versa. 
• The Session Layer is responsible for establishing. managll1g, and terminating 
connections bet\Vel~n appl ications 
• The Transport La:: er provides transparent transfer of data between end systems. 
This includes enol' recovery and flow-control. 
• The Network Layer addresses the routing and switching issues required to create a 
logical circuit bet\\een the end-systems. 
• The Data Layer i~ concerned with the data being encoded into bits. This layer 











Chapter 2: Literature Review 
• The Physical Layer consists of the physical medium, be it air, fibre or wire. 
2.1.2 Network Layer Protocols 
The Internet Protocol (IP) operates at the network layer of the OSI protocol stack. 
When a packet is passed down to the IP Layer, an IP header is appended to the packet. 
This header contains information relating to the packet such as the source and the 







o I I I I I I 81 I I II I I 161 I I II I I I 11241 I I II I I I 
- Protocol Header DSCP Un-
Veraion Length used Total Length 
Packet ID Flags I Fraoment Offset 
I 




Figure 2: The IPv4 header format (20 bytes total) 
A new version of IP, namely IPv6 is currently on the IETF standards track [2]. Two 
notable differences between IPv4 and IPv6 follow. 
• IPv6 has a far larger address space than IPv4. 
• The IPv6 packet header includes a flow label. The flow label, together with the 
source address, may be used by network elements to uniquely identify traffic 
sources [3]. This is beneficial as it means that higher-layer protocol headers need 
not be examined at routers for flows to be uniquely identified. 
IP provides a datagram service to the protocols above it. By a datagram service, it is 
meant that the IP service has the following characteristics: 
• IP is a connectionless packet delivery service. This means that no IP layer 
connection set-up procedure occurs before data transmission and that the 
communicating end-systems don't maintain one another's state. 
• IP is unreliable. This means that IP senders have no verification mechanisms to 











Chapter 2: Literature Review 
IP does not include any flow-control mechanisms. These are implemented in the 
layers above IP. 
2.1.3 Transport Layer Protocols 
Both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are 
found above the Network Layer. Although the position of these protocols corresponds 
to the Transport Layer of the OSI Model, only TCP provides error recovery and flow-
control as specified by the OSI model. However, because UDP is considered a peer to 
TCP, it will be included in this section too. 
2.1.3.1 Transmission Control Protocol (TCP) 
TCP provides an end-to-end connection abstraction. This means that TCP hides 
transport details such as multiplexing and error recovery from the above layers. 
When TCP receives (lata from the layer above it, it splits that data into packets. Each 
packet gets a TCP header appended to it. The TCP header format is illustrated in 
Figure 3. 
Source Port Destination Port 
Sequence Number 
Ackn owledg ment Number 
Data 
I Header Information Offset Window 
Checksum Urgent Pointer 
Figure 3: The TCP header format (20 bytes total) 
Data packets are transmitted using connection orientated data transfers. A connection 
must thus be negotiated between two communicating parties before data may be sent. 
TCP is said to provide guaranteed delivery. This means that should a TCP packet be 
lost in the network, the TCP sender will re-send it until the packet is delivered. This is 
achieved as follows: Before transmission, a TCP sender saves a local copy of all 
packets. It then transmits them using the unreliable IP datagram service described in 











Chapler 2: Lit era ture Review 
it. If the receiver acknowledges that it has got a packet, the transmitter may delete its 
local copy. 
Because TCP provides an in-oruer service, if a packet is lost in a network, no 
subsequent packets may be delivered to the above layers of the receiver until the lost 
packet has been retransmitted by the sender and received by the receiver. 
2.1.3.2 TCP Flow-control 
In a scenario where data is bei ng transfened from a sender to a r~cei ver, there is two-
way communication between the two end poi nts. The sender transmits data packets to 
the receiver. The receiver then signifies receipt of the clata packds by sending 
acknowledgement packets (ACKs) back to the sender. 
TCP senders use a congestion window to determine the maxi mum number of packets 
that may be in the network at any given time. Whenever a TCP sender receives an 
acknowledgement it knows the acknowledged packet has been received and is no 
longer in transit on the network . It is thus safe for the sender to transmit another 
packet. In a scenario where there are few senders and the network is not congested, 
senders may safely be allowed a large congestion window. However i: the network is 
approaching congesti on, the senders should reduce their congeslion windows so th at 
the total number of packets in the network is reduced. TCP senders receive feedback 
about the amount of congestion in the network by monitoring dropped packets . 
TCP's sliding window mechanism will now be considered in detail. When a 
connection is first estahlished, the congestion window for that connection is set to one 
packet. Tep then enters an initial phase of multiplicative congestion window increase. 
First, the one packet is transmitted. Upon the receipt of tha: padet's ACK, the 
congestion window is doubled. Two packets are transmitted. l'pon receipt of these 
two ACKs the congestion window is again doubled . This multiplicative increase 
continues until a packet is lost in the network. When a packe t is los t, th ~ receiver 
indicates this by sending duplicate ACKs to the source upon the rece ipt of each packet 
subsequent to the lost packet. When the source receives these, it deduces which 
packet was lost and re-transmiLs the lost packet. Because the loss of a packet indicates 











Chapter 2: Literature Review 
From this point on, the source goes into congestion avoidance mode. Here the TCP 
window size increases additively until another packet is lost. When this happens, the 
window size is again halved. In the case of multiple packet losses, it is possible that a 
source will not receive any ACKs. If this happens, the source is forced to wait for a 
timeout to occur before it can resend the packets. During this wait, the congestion 
window size is reduced to one packet. This occurrence is very damaging to TCP 
throughput. Figure 4 shows a possible TCP congestion window profile over time. 
TCP Congestion Window Over Time 
Initial Single Packet Congestion 
Avoidance 
Single Packet Congestion 
Increase Loss Loss Avoidance 




Figure 4: An example TCP congestion window profile over time 
This demonstrates that with the exception of the start of a new connection, TCP uses a 
linear increase and an exponential decrease when manipulating its congestion 
window. The exponential decrease is necessary because congestion grows 
exponentially in networks. Figure 4 also demonstrates that TCP congestion control 
results in bursty traffic profiles. 
The increase in TCP sending rates is affected by the connection's round trip time. 
This is so because TCP sources need to wait for the receipt of ACK packets before 
sending further data. If the round trip time for a connection is long, the increased 
delay in receiving the ACKS results in the connection ramping up its sending rates 











Chapter 2: Literature Review 
2.1.3.3 User Datagram Protocol (UDP) 
By re-transmitting packets that are lost or corrupted in the network, TCP provides a 
guaranteed service. There are, however, applications for which this service is not 
desirable. An example of this is real-time voice transfer. When a real-time voice 
stream is being sent across an IP network, the stream can tolerate the loss of a certain 
proportion of the packets without significant degradation of sound quality. If a 
network transmitting a voice stream loses a packet, it is preferable for the receiver to 
compensate for the lost packet rather than wait for it to be re-sent by the source. This 
is because the time required for the packet to be re-sent is often prohibitively high. 
One can use UDP as an alternative to TCP for time-dependent applications. UDP is a 
datagram protocol. As such it provides a connectionless, non-guaranteed service. 
Figure 5 shows the UDP header. 
Source Port Destination Port 
Length Checksum 
Figure 5: The UDP header format (8 bytes total) 
It is clear from its header that UDP is a very simple protocol. Should a source need to 
send data, it splits the data into packets, adds the UDP header and sends the packets 
down to the IP layer. The UDP receiver does not acknowledge the receipt of packets. 
UDP does not include any form of flow-control or congestion avoidance. Users are 
thus able to send as much data into networks as they wish. Sending more packets into 
a network than the network can handle does, of course, result in high packet losses. 
2.1.4 The Effect of Transport Protocols on Fairness 
The previous sections describe how although TCP includes a flow-control 
mechanism, UDP contains none. This section considers link fairness in the light of 
transport layer flow-control mechanisms. 
In general, routers receive packets and send them out on links. In the case where a 
given link's available bandwidth is limited, the router should share the link' s available 











sources into microflmvs and sharing the output link band\\ idth bet\veen them, 
Microflows are packds that are identical in terms of the 5-tuple: source address: 
destination address; ~,ource port: destination pon and protoC(,I, This is the rinest 
granularity at which routers are able to identify similarity between packets, 
Microflows arc equiv, lent to the transport layer flows mentioned in Section 2,1.::\, 
A theoretically fair link-sharing mechanism may be implemented (m a rOLitn by 
providing a separate queue for each microflO\\ present. l\rri\ing Jackets are placed Oil 
their corresponding q leue. A deficit round n scheduler sen the queues. Thc 
real Ii fe fairness of slich a mechanism is, however. strong I y affected hy the transport 
protocols that arc rU1 on the end stations, The resultant unlairness when traffic 
sources with differen' transport layer protocols compete for u'esource is deSCrIbed 
below. 
Should a TCP packet be dropped due to queue overflow, that packet's source will 
subsequently reduce its data rate by at least half. This is due \0 the transport layer 
flow-control of TCP. Should a UDP packet be dropped because of queue overt'low, 
the sending rate of th: source will be unaffected. This is becau,e UDP has no rIow-
controL On the contr lry. many UDP application sources inere: se their sel](jin.:; rate 
when packets are drorped to provide better forward error correcti()n. 
In the above example. even though both the TCP and the l<DP sources were treated 
fairly at the packet le.'el, their resultant data rates were far from fair. Transport layer 
protocols thus make providi ng fine granularity fai mess to minono\\ s a near 
impossible task. Thi is due to large differences in hu\\ TCP md lOP respond to 
packet losses and )' s coarse-grained response to packet losses. The fact that TCP 
flow-control is affected by round tlip times also limits llllCroflo\\ faimcss. 
2.2 The Nature of Internet Traffic 
In order to perform relevant evaluations of buffer managenent and scheduling 
mechanisms, it is ne,'cssary for testbed conditions to model a.; closely as possible 
those e\perienced in networks. It is neccssarv tD c(nsider the nature or 











calTiers release such lata. Using the data that was found, this ~ection describes the 
volume and compositi,m or Intemet traffic. It also discusses how Internet traffic mav 
change in the future. 
When characterising lr rernet traffic, the following attributes should be considered: 
• The volume of traf'ic and the number of conCUITent microflows on core links. 
• The application types that source the traffic. 
• Transport Layer pr )tocols (TCP or COP). 
• Predominant pach t sizes. 
Each of these topics \\:11 be dealt with individually in the sectIon:-. below. 
2.2.1 Internet Core Traffic Volume 
The growth in Internet usage has been descri bed as explosi 'e. Some have even 
claimed that Internet llsage has been doublmg every three monlhs. Andrew Odlyzko 
debunked these clain~s in late 2000 as being nothing more thm a pitch aimed at 
selling network hard·vare. Odlyzko stated that growth rates have levelled off at 
approximately 1000i per year [4]. This suggests complianc'e to Moore's law, 
Overestimations of th2 Internet's growth rate resulted lr1 c01111xlnies increasing their 
network capacities g ·eatly. The realisation that net\vork cap:tcity far outstripped 
demand was one of the primary factors leading to the Telc,:om Crash in :WOO. 
Subsequent to the cr tsh, increases in network capacjty ha\'e slmved dralllati,:ally. 
TeleGeography, Inc. found that total growth of international link bandwidth had 
shrunk to less than 4( durin!! 200[ and 2002 [5]. This reLiuC'ion in the :!.ro\\ th of 
~ ~ 
capacity suggests that service providers are waIting for demand to calch up wilh the 
supply of band\vidth, 
In terms of lntemet C(lre link capacity, Intemet service prmiders currently tend to use 
links of up to Gigtbits per second (OC-48) [6,7]. Such trunk~ can caITY thousands 
to hundreds of thousands of concurrent microflows [8,9]. 
2.2.2 Application SouI'ces of Internet Traffic 
Figure 6 presents the ,tpplication byte composition for selected ~Dplications using two 











1997 data was collected by :\IICI along one or their trunks [W]. r'he 200112 DaLI was 
found at the CAIDA (Cooperative Association for Internet Dau) web site lill and 
provides a breakdown of the traffic composition at the San Di~go I'\etwork ALeess 
Point over a year beginning on 14 March :2001. 
1997 Byte Volume 112 Byte Volume 
NNTP < I I'; 
D?'-JS < 1 ('i 
RealPlayer (real I < 1 Ii 
Figu re 6: A pplicatioll byte volumes in 1997 and 200 1/2 
Figure 6 demonstrate~ a marked change in the applicatlOn byte-..:omp()sition between 
the samples taken in 1997 and in :200112. This si ifies th~ll the distribution of 
application traffic found on the Internet is dynamic and reflects the public's shifting 
favour. A notable obs~rvation is that a large number of ne\\ ulDlications seem to be 
displacing e:\isting ap )Iications such as HTTP. Also noteworthy is that in 2001 the 
amount of real-time tr lffic remained modesr. 
Quantitatlvely predict ng future application traffic compositions is nOl possible. It is. 
however. plausible tlut should the Internet's quality of sen ice improve to the extent 
where real-time traffiC can be adequately carried, there \\ill he a rapid uptake or 
services such as Voic~ Over IP and video-conferencing. This \\ :11 markedly lllcreasc 











2.2.3 Distribution of Internet Traffic according to Transport Protocol 
During 1997. Mcr co lecred traffic data from their trunks 110 " TI1l' data collected 
demonstrated that 95' of the byte composition \\as Tep traffil' ancl that the 
remaining SC1r was UDP, A report of traffic at the NASA :\me,; Internet e,'l.cliangc 
between May 1999 ,md March 2000 had an identical traJ ,sport protocol byte 
composition [12], 
In summary. the majonty of Internet traffic is TCP and the pnlpl·rtion of TCP to UDP 
traffic remained const mt between the years 1997 and 2000, T 1e transport protocol 
distribution is a govened by the mix of applications, Should there be an increase in 
the amount of real-tinle applications in the future, it is likely l1at the proportion of 
UDP traffic will incre~lse. This point is explained in Section 2.1.3.3. 
2.2.4 Distribution of Internet Traffic according to Packet Size 
During March 1998. data collected at FIX West demonstrated tll~1t almost half llf the 
packets were less thar 44 bytes. 18% were between 552 and Y"6 bytes. And almost 
189( were 1500 bytes. Although there were many 44-by\e packel), they onl) formed a 
S(i( byte volume wheleas the [SOO-byte packets made lip more than naif of the byte 
volume [13]. A repc rt of traffic at the NASA Ames IlllclTet exchangc During 
February 2000 had a smilar packet size distribution [12]. 
The 44-byte packets nay be attributed to ACK packets and small packets such as 
Telnet packets. Packets of approximately 576 bytes come from TCP stacks that do not 
implement Maximum Transmission Unit (MTU) discoverv and IS00-byte packets 
come from Ethernets ! 14]. 
2.3 Over-Provisioning for Quality of Service 
One proposed mechalism for providing quality of sen ice in networks is to simply 
over-provision a best effort network. fn this modeL Intel11et Ser,lce Providers (lSPs) 
ensure that the sustai liable throughput of a gi yen lin k or exc han se is greater then the 
maximum expected demand. Such a network would theoretically Ilever have any 











This model for pnni cling quality of service does. htmc\cr. ha\c thc follnwi 
shortcomings. 
• It is not possible lor networks to be transparent at all limes. Equl[Jment faiiures. 
inaccurate forecast', and unexpected traffic surges can all reSL It in ~iluat(()nS \\here 
the available capacity exceeds demand [IS}. Under sLlch cllnditi()ns. there is no 
mechanism for pricritising non-elastic microflows. 
• There is a tendcrcy for ISPs to overhook In an effort to impro\e network 
utilization and increase profits. That ISPs will overhook hy Llctors of .5 to 10 IS to 
be cxpected [15]. The extent of overbooking can be cxpected to increase in future. 
This is because the global growth in Internet traffic is currenlly far higher th~il1 the 
gro\\'th in network capacity. This is quantified in Section' 
• The model of best effort with over-provisioning is an inefficient use of network 
resources. This is so becaLlse all traffic. whether it is ela~tic 01' not is treated 
identically. By raller prioritising the non-elastic traffic. It S po:-.sible to realise 
acceptable net\vork performance for all traffic with far ~reater utiiization of 
network capacity. 
For the reasons stated above. it can be argued that o\er-prO\isioliing does not provide 
a desirable solution to the quality of service problem. 
2.4 Intserv for Quality of Service 
One of the first mecllanisms proposed for provicli fiereel qu,dity of service in lP 
networks was Intserv. The Internet Engineering Task Force (lETF) Intsen WO~'king 
Group specified Intselv [16}. 
2.4.1 The Intserv Ylechanism 
The lntserv model all()ws users to make end-tn-end reservations. This ensures th~tt the 
network can provide the required service to the lIsers. The mecl1::nism works hanc!-in-
hand with ReSource TserVation setup Protocol (RSVP) [17]. ~hOllld an application 
require a given end-t,)-end service, it mLlst first make ,[ rescr\'~tion llsing RSVP. In 
RSVP. a path mess<t!e is sent from the source to the reCIpIent. The path message 
pnmcs each Intserv ["Outcr along the path to expect a rescn:.tlOn :l1cssage. Lpon 











ChapleT Lilcl:JIUIC Review 
source along the same path. As the message moves upstream, each network element 
makes a reservation 1'0' that particular microflow. If the resource.; aren't available, the 
request is denied. Conversely, if the resources arc available on all routers along the 
path, a soft-state connection is created and data may be sent. 
lntserv IS not consick red to be an adequate solution for pro\idin~ Internet scale 
quality of service, as there are scalability concerns. Should Intse:'\' be implemented ill 
the Internet. core roUllTS VI/ould be required 10 perform RS\'P Il'SOurcC all('cati(ln. to 
store per-microflow state and to perform per-l11icroflow queuIng I'or a large number ot 
microflows. Providi this functionality for a large number of microflows IS 
considered excessivel) demanding in terms of computational reS(lurces. 
2.4.2 Intserv Clas~es of Service 
There are two main sl~rvice classes that are provided by Intsen netv,orks. These are 
Guaranteed Service ald Controlled Load Service. The Guaran:eed Service class is 
suitable for applicatitlilS with strict real-time delivery require:nents. This class of 
service provides an as ;ured level of bandwidth and a finn maximum end-to-end delay 
guarantee. The Conrollecl Load Service class pnwides no firm quantitative 
guarantees. The service provided by the net\vork is equi\alcnt to that e\pericnced by a 
best effort microtlo\\ on a lightly loaded network, This seviee is suitable for 
applicatlons that can t()lerate a small amount of loss and delay. 
2.5 Diffserv for Quality of Service 
The IETF Diffserv WJrking Group (concluded in March 20(3), specified Diffserv as 
a mechanism for provding tiered quality of service in IP networks [18]. 
2.5.1 The Diffsen l\lechanism 
Diffserv was proposcJ as a solution to lntserv's scalability prohlem. Diffserv solves 
this problem by introducing traffic aggregation. The mechanism works as follows: If 
the packets of a gi ven microrlow are destined for a Diffserv network, then the packets 
are classified and marked before entering the network. ThIS can happen at the li'affic 
source or at one of the Diffserv network's edge routers. The packets are marked 











in the packet's Diffser' Code Point (DSCP). 'vvhich is stored 1Il the IP header. The six 
bit DSCP. together with two cllITently unused bits. replaces the eight bit IPv.:\. Type or 
Service field [19]. Ref.::r Figure 2 for an illustration of the IP header. Core routers in 
Diffserv networks ani) have to consider the DSCP of a packet to decide \vhat service 
that packet should recl~ive. A router's response to a DSCP is bown as its "per hop 
behaviour." Employin~: the mechanism described above. relative differential services 
may be provided. In order to further improve the performance of Diffserv. it is 
necessary for poliei to be performed at the edges of the Uiffserv network. By 
limiting the traffic ent;ring a Diffserv net\vork and pro\'iding CLlss-,mure scheduling 
in the core, it is possible to provide bounded relati\'e quality of slnice across Dillserv 
networks. 
Should a client wish to use the services of a Diffserv network. il negotiates a Service 
Level Agreement (SL\) with that network. It may then begin ,.;endillg data into the 
network. At the border of the network, the traffic coming from that client is policed 
according to its SLA. This can involve dropping offending packets, or marking them 
so that they are the fiN to be dropped in the event of network congestion. 
The DitTserv model a:,sumes that aggregate and microflow-awarc policing and buffer 
management are possi ble on the edge routers of the network bUl not in the core. This 
is based on the assumption that the traffic \olume in the cores prohibitively high. 
Because aggregate allCl microflow-unaware buller managcment and scheduling is 
L1sed in the core. the c,tent to which the network engineer C,ll1 control and monitor the 
net\vork's performam e is severely limited. The network's co·e behaves, in many 
respects. like a black box. As a result of this, for adequate Ie of service to be 
achieved, it is necess~lry for an appreciable amount of mer-provisioning to take place. 
Refer Section 2.3 for comments relating to thiS practice. 
2.5.2 Diffserv Classes of Service 
The IETF defines the Expedited Forwarding (EF) and the Assmed Forwarding (AF) 
classes of service. EF provides a virtual leased line sen·ice. It \V,s deSigned for traffic 
that requires strict bo mds on delay as well as guaranteed band\\ idth. EF packets that 











contrast AF \vas designed to provide a service similar to that \\ hich a lightly loaded 
network provides. It i~ intended for traffic types that require guaranteed bandwidth 
with less stringent deity requirements. AF packets that exceed their negotiated rate 
are re-marked to a higher drop precedence level at the Dlffsen in~ress. 
Whereas EF is allocated a single DSCP, the AF class comprises IOLlr independent sub-
classes of service. containing three drop precedence Ic\eb l201. AF thus has a 
total of 12 DSCP valu.:s allocated to it. The four sub-classes are intended to provide 
tiered levels of service within AF. The three drop precedence levels are intended ror 
situations where traffil sources WIsh to inject additional trafflc il:to the net\\ ork at the 
risk of a higher number of packets being dropped. This mcchani,m works a:-; follows: 
Should a source exceed its negotiated sending rate, all offendi packets arc re-
marked with a higher drop precedence level when they enter tlC Di!Tserv network. 
Traffic sources may thus trade off throughput with packet Il1sses. 0letworks may 
aggregate the three dr!)p precedence levels into two [20]. It hac- been suggested that 
the use of 12 sub-cla:ises and drop precedence levels in AF has not been properly 
motivated and that a smaller number would have sufficed [2 J]. 
The use of strict pri(lrity queulllg IS recommended for EF packets in the core of 
Diffserv networks, as EF traffic should not be affected by tra:Tic 01' lower service 
classes [22]. Priority queuing ensures that no non-EF paCKet :s sent whi 1st an EF 
packet is \vaiting to bl sent. The LIse of priority queuing for packets means that EF 
is an ex.tremely expensive service. This IS because EF requires a sub:-;t:ll1tlal 
reservation of network resources and EF traffic can excessi \ely disadvantage traffic 
classes with lower priorities [23]. It also results in an increase III the burstiness and 
jitter of EF traffic pro! iles. This is especially problematic on long paths [24]. 
Because this study COlcerns the AF traffic class, AF architectures will be considered 











2.6 Diffserv AF Architectures 
This section describes the commonly accepted \\a)' of implementing AF and then 
some of the problems ~ ulTounding this implementation. Finally. some of the proposed 
\vorkarounds to these problems are considered. 
2.6.1 The Diffserv AF lYlechanism 
This description of the accepted mechanism for implementing AF was compiled using 
the relevant IETF RFCs as well as technical reports on experiments evaluatll1g which 
mechanisms arc prefc ·able. In Section .1. it was explained lhat the behaviour of 
Diffserv edge routers is very different from that of routers in the core or Diffserv 
networks. As a result. edge and core router hehaviour and architecture WIll be dealt 
WIth in separate sectio 1S. 
2.6.1.1 Diffserv AF Edge Router Behaviour alld Architecture 
As described in Section I. it is desirable for edge routers on Diffserv networks to 
implement a policing and packet tagging mechanism. This enS~ll'es that the amount 
and type of traffic enlering the network does not exceed that wlich the network can 
handle. The traffic may also be monitored for billing purposes at Ihe edge of the 
Diffser\, network. 
In the scenario where a Diffserv network is to provide an AF connectivity service to 
an aggregate client SJch as a business, the client is required to set up a data link 
leading to an edge reuter of the Diffserv domain. Further, a service level agreement 
'-~ '- '"" 
must be compiled deuiling the amount of data that the client rnav send on each of the 
AF sub-classes. The llient is required to mark the DSCP of packers \.\.ithin its network 
to indicate which pi.\( kets should receive what AF senice. The router at the edge of 
the AF network is required to police, re-mark and possibly dn·p the packets before 
sending them into tl:e network cnre. This IS necessary as the client network may 
accidentally or malicioLlsly send packets in excess of its service level agreement. 
In the scenario when: a client's service level agreement allows II to send X Kbps of 
AFI traffic into a Dl1lserv network, it marks packets of that a~greg~lte as AF I and 











Sliding Window (TSW) to monitor the incomi traffic. [I' the client uses less than its 
allowable data rate, all packets are marked at the 10\\eSI drop pr . .::cede:lce leveL I r the 
client sends traffic at .1 rate higher than X Kbps, the policing algorithm marks the 
excess packets to a ligher drop precedence leveL A nlll:rlflow-aware policing 
mechanism is considen;d desirable as it helps to ensure that should a client exceed its 
service level agreement, all users from that client receive a similarly reduced service 
[25]. 
2.6.1.2 Diff\'erv All Core Router Behaviour and Architecture 
In this section an ove 'view of the architecture or core routers implementing r\F is 
given, follO\\ed by a dl'scription of some active random drop algtlrithms. 
2.6.1.2.(( Diffserv AF Core Router firc/Zitectllre Overview 
Active random droppus slich as Random Early Detection (R ED) are required rur AF 
to minimise long-tenT congestion [20,22]. Although no recommendations are made 
about which specific landom dropper mechanisms should be ir:lplemented, Random 
Early Detection with h/Out-of-profile (RIO) has been i'ound to \\ork hetter than other 
microflow-unaware active queuing mechanisms SLich as WcighL:d RED [26]. RIO is 
thus a popular choice 101' AF networks. 
Figure 7 shows a rout\~r architecture that may be used to achieve the service specified 


















The AF core router's buffer management and scheduling archite~ture consists of four 
RIO queues that lead Ie) a \veighted round robin scheduler. The:'e are four queues as 
each of the four AF sub-classes is assigned llS OWI1 RIO ljueue. This prO\ides 
isolation between the t\F sub-classes. The output of the scheduL:r leads to the output 
[1011 with the possible nclusion of, for example. a priority schec:uler on the data path 
if EF is implemented 01 the same network. 
Because in-order deli\~ry is specified for microflows or the same AF sub-class [27], 
packets from the same sub-class with differing drop precedence levels share the same 
RIO queue. The probahility of cllTiving packets being enqueued \ ersus being dropped. 
however, depends on the drop priority of the packet. Note that although the IP DSCP 
makes provision for lhree drop precedence levels Within each AF sub-class, AF 
compliant networks need only implement two drop precedence levels (27]. 
RED was first proposed in 1993 by S. Floyd and V. Jacobson [21']. RIO was proposed 
in 1998 [25). RIO is similar to RED except that it mclucks pruvi~ion for multiple drop 
precedence levels. EED will be described next. This \\ ill enable an easv 
understanding of RIO. which will be described in the suhsequent )eetion. 
2.6.1.2.b Random Early Detection (RED) 
The RED mechanisllJ uses a single First In First Out (FIFO) queue. The Duffer 
management algorithm stores the average buffer occupancy (0 'g) 01 this queue, as 
well as minimum anJ maximum thresholds for ovg. These are called Thresll and 
MaxThresh respectively. It also stores a variable called Lintenll, which determines the 
probability of a packet being dropped when it is between Thresh and A1axThresh. 
PreciselY how the ai1lne variables determine whether an tllT:ving packet will be 











Packet Drop Probability versus A vera2e Buffer Occupancy 
\;,) Early Drup, POSSlhility or Early Drop, III All alll\ 111~ arc 
Drop 
Prohahilit> 
p( {/\'l' i 
IILlntcnn 
in this quadra;;1 tillS quadrant 
o~ ________ ~-----------------------------------------
Mo,l"Ihrl'Sh 
Figure 8: The probabilit \' of RED dropping an incoming packet as a fmll'tion of average buffer 
occupancy 
When a packet aITives, if m',g is below Thresh, the packet is enqu2ued. 11' (/l'g is above 
MaxThresh, the packel 111 dropped, If ([Jig is between Thresh and Ji{llxTllresil, the 
packet is dropped with a probability P(avg) which increases as avg moves from the 
Thresh to MaxThresh. By the appropriate tuning of Thresh, /vla,\Thresh and Lin!ernl, 
it is possible to speed') the desired average butler occupancy level. 
The setting of 111resh detelmines the level of butler occupancy dt whlch the gateway 
starts requesting redu.:ed sending rates from senders. Setting this value too high 
results in bursty traffil' sources bci ng penal i sed, Setli nt! it too It 1\\ results in reduced 
link utilization l28]. i-intenn is Llsed to determine the rate at whch the probability or 
dropping a packet illi:reases as (lFg exceeds Thresh. It is re,'ommendeci that the 
gradient of the P((ll'g) not he too great otherwise cx.ceSSI\e oscilLtion results [28) 
The average queue size (m'g) is determined by an ex.ponentially weighted movmg 











As can be seen. q_1Uil,<ht may be used to alter the rate at which {{\'g responds to 
changes in the queuc IC,lgth, By increasing q_"'eighr. (/l'g is made more responsi \e to 
ClllTent buffer occupal1l y levels, 
When (I\'g is between lhresh and MuxThrt's/l. the function that ,etermines wheller a 
packet should be dropped should perform drops according to a uniform rather than a 
geometric random fun :tion. This minimises the probability of many packets being 
dropped consecutively. 
Packets are dequeued'rom RED queues in the same \vay that n,)rmal drop-tail F[FO 
queues are dequeued. 
The benefits of using RED butTer managemenl rather than a simple drop-tail queue 
are as follows [28] . 
• Through the astute s-::Iection of parameters within the router. notabl) the maximum 
and minimum thre~,holds. it is possible to specify the de:--ired average buffer 
occupancy. 
• When packet lossl s do occur. they tend to OCCLlr in i"olatiun rather than 
consecutively. as ol:en happens with normal drop-tail FIFO queuing. As a result. 
Tep flows are better able to recover from packet losses . 
• Faimess is improve J without the need for per-channcl-awareness. This is because 
the probability of .l packet from a given microflo\\ being dropped lS roughly 
proportional to that nicrofJow's buffer occupancy level. 
• Bursty traffic is not penalised, as is the case with drop-tail FIFO queuing. This is 
because using the ~ \wage queue size to determine drop probability amortizcs the 
effect of short burst;, 
• The global synchro lisation problem is avoided. Because the probability of a drop 
OCCUlTing increase~ gradually with m'g. It is unlIkeh th;,! lll~illV !'lows will 
simultaneously suner packet drops with the result that tlev all reduce their 
congestion window thus causing the link to be under-utilized. 
2.6.1.2.c Ralldom Early Detectioll with Ill/Ollt-oj~projile (RIO) 
As stated previously. the RIO mechanism is very similar to RED except that there is 











ChaplCr 2: lilc'raturc Rcvi~\\ 
of-profile at the net\\(lrk edge will be treated differently from an in-profile packet 
once it reaches in the retwork core. Whereas RED uses ({1'g to c:etermine the average 
buffer occupancy. RIO uses both (Il!g_in and m}LtOlul. (/\'g_ill stores the average 
number of in-profile P;:CKcts stored in the buller. (/l'g_fowl slore.~ the avera~e number 
of in-profile as well as out-or-profile packets. 
RIO also has separat( "in" and "out" thresholds for each of the RED thresholds: 
Thresh, MwIhresh and Lill fe rill. Simulation studies have found that t i1ese RIO 
parameters should be staggered to ensure lilat in-profile packets aren't appreciably 
disadvantaged by the presence of out-or-profile packets [261. Staggered parameters 
are illustrated Figure 9. The top graph in Figure 9 describes the probabi lilY ()l' an 
atTiving out-of-profile packet being discarded when it <.IlTives tit a router whilst the 
bottom graph in Figure 9 describes the probability of an ulTi\mg in-prol'lle packet 
being dIscarded. Note that m'g/II and {fl'gTo/u/ are used respecti\ely Oil the X-axes of 
the two graphs. 







An:ragc In-profile Bu''i"cr ()ccupauc:: ) -7 
o __________ ~~ ______________________________________ ___ 
IhrcsltOll1 Mi/\Thrl'Sidhlf 
Figure 9: Staggered purameters -A comparison of packet drop probabili~y graphs for in-profile 











Chaplcr 2: L:k'ralulc Review 
The parameters for P((I\'glll) and P(avgTora!) in Figure 9 are .~taggcred. This is so 
hecause Tlzrt':·;/tlll is 1<lrger than MaxTlzreshOuf. In effect this means that before RIO 
stmts droppi in-prolIle packets, it will be in a state \\ here it (Irops ;tli anwing out-
of-profile packets. This provides greater isolation of in-profile packets from out-of 
profi Ie packets. 
2.6.2 Performance of the Diffserv AF Mechanism 
This section considers the performance of the AF traffic class. It first deals \\ith :;ome 
of the shortcomings of DilTserv AF and then proposed 
are considered. 
utions to tho:-.e shortcomings 
2.6.2.1 Shortcomings of the D~ffserv AF Jl!lechallisl1l 
The Diffserv architectme means that core routers need only to consider the DSCP of 
packets as well as lUITent congestion levels when making buffer management 
decisions. This result ~ in simple core router implementation:;. It does. ho\\ever. 
introduce a number 01 problems. The problems. which fall into three categoric:;. arc 
listed below. 
1. There is per-miC1T\I'low unfairness within each AF sub-class [211 . In particular. 
there is unfairness .vhen competing microflows differ in the f()lIowmg areas: 
• Transport prol()coi: Inlcroflows with unresponsive lI'an..;pon layer prot\)cols 
such as UDP receivc higher throughput than microfi()ws with responsive 
transport layer 1rotocols such as TCP. 
• Round trip tim?s: TCP microflows with longer round trip times are less mbust 
than microflo\ls with shorter round trip times, They thus receive than 
their fair share of the avai lable bandwidth This means that 11l1faimcss would 
result from the following scenario: Two end-users are in cnntention for a 
limited resourc. but whereas the one L1ser is c!own)oadin:! from a server in the 
same city, the other USCI' is downloading from a server Ol~ the other side of the 
world. Refer Sl'ction 1.1 .., for an explanation of this phenomenon. 
In the author's llpinlllil. lle experimellts in this paper \\ere flawed by lhe t let tha! the (liit-uf-profiie 
AF packeb \\ere placed on the best effort queue. ThIS IS C()lltr,ll\ tl' Difbc'n 
N()nethele~s, the author fe, Is lhal nwny of the result:-. (If thi~ paper remalll \,Iid a" ' 










Chapl~r :;: Literature: Rc,i~\\ 
• Packet m:croflmvs with I packet sizes receive higher throughput 01' 
data than micr<lflows with smaller packet sizes. 
J There is per-aggregate unfaimess when the aggregates, or the typical microrlows 
within them. differ In terms of the followi [21 ,2930). 
• Transport pre locol: aggregates containing predominantl) unresponsIve 
transport layel protocols receive higher throughput than with 
predominantly responsIve transport layer protocols. :\ cli,.;nt company that has 
a UDP-heavy raffie profile would thus be ad\'antagcd :wcr a company that 
has predominantly TCP traffic. 
• Round trip tims TCP microflows \vlth longer round trip times are less mbust 
than those \\ Ith shorter round trip times. Aggrcgales containing TCP 
microflows \\ilh predominantly longer round trip times thus receive less than 
their fair shan_ of the available bandwidth. For examp,e. a client company 
who's data tra lic is made up predominantly of TCP fil)\\'s tn a branch at a 
distant 10catiol1 would be disadvantaged should it be In contention with a 
company that communicates predominantly with a branch in the same city. 
• Packet gregates containmg predominantly larger padet sizes receive 
higher throughput than aggregates with predominantly smellier packet sizes. 
• Sen Ice Level: once in the core of the Diffsen neT\\ork, icienllcally 
marked packel, i'rom aggregates with larger allowable data rates arc treated 
the same as th(Jse from aggregates with smaller allcmabk data rates, This can 
be unfair to the aggregates with allowable clata rates. example, a 
company with a large contract would disad,antaged \\hen contendi for 
excess bandWidth against a company with a small contrac. 
• Number of mi .::roflows: when aggregates containing m:my TCP microflows 
compete with aggregates containing fewer TCP microllows. the aggregates 
containing mallY microflows often receive more than their Ldr share of the 
available bandwidth. 
3. Using Random Iy Detection or similar mechanisms such L:S RIO on TCP traffic 
has been shown to result in instability [3\]. ThiS is most pro')lematic in situations 
where link capacit es are high and delays are long. These conditions are exactly 
what one would upect to find across Diffserv networks in the Intemet. The 











• Increased end-l ,)-end jitter 
• Increased packd losses 
• Under-utilized iletwork links 
• Reduced responsiveness for lIlteractivc applications. 
The factors mentioned above result in the behaviour of the Diff:--crv network core not 
being predictable but depending on a number or factors relati to traffic type. As a 
result. the AF class :annot. without significant over-provisi(lning. offer absolute 
guarantees. It can at best offer relative service differentiat.on between classes 
[26,32,33]. The value )f providing a relative service to applications is questionahle as 
real applications have 'equirements that are absolutely quantifiabe. 
A further implication IS that because the Diffserv core is not aggregate-aware. it IS not 
possible to control t 1C allocation of bandwidth to aggregates. This mcans that 
providing an adequate service once again necessitates o\er-provi:--ioning. 
2.6.2.2 Workarolllld,"Jor the D(fls'erv AF Shortcomings 
This section describes some of the mechanisms thut ha\ e l-een proposed as an 
Improvement on the cassie DitTserv model. These mechanisms \\ere touted as solving 
one or more of the problems listed in the previous section. 
Dynamic Packet Stat.:: was proposed in 1999 to improve ft.1J rness by faci Iitating 
communication betwe~n core and edge Diffserv routers [3-1.J. This v\ as achieved by 
placing per-microflo\' state in the header of each packet that was sent into the 
Diffserv core. The mechanism is problematic. as it is not comp<llible with the current 
IPv-1. header. It also ilh:urs a great overhead on the forwarding plane. 
Fair Allocation Deriv,.tive Estimation (FADE) with feedback control \Vas proposed in 
1999 [35]. This mechanism improved fairness by facilitating conmunlcatioll between 
core and edge Diffs,?rv routers. This was achieved by lIsin.'l dynamic feedback 
signalling. The mechanism was expanded upon in :2001 witl: the proposal ur an 
efficient fair allocatioll estimation technique I J. The \\ork done on FADE should be 











('hap:cT Likralllrc I~cvil'\\' 
use of a stateless core its signalli and dynamic bandwidth allocation mechanisms 
may be integrated into the mechanism presented in this thesis. 
In a :2000 proposal [3(lJ, it was suggested that edge routers monitor packet los:-,cs in 
the Diffsen network hy queryi core routers. If the edge routns detect that certain 
microtlows/aggregates aren't responding to packet los:-,c5, they throt! Ie them before 
admitting them into the network. A :2001 paper by .-\. JJabih proposed a similar 
solution [37]. 
A 2000 paper suggested the LIse of Fair RED (FRED) in the core of Diffserv networks 
[38] FRED keeps state for currently buffered microflo\\ s and is thus ahle to Isolate the 
non-responsive ones. ,\lthough this mechanism did improve falmess in simulations, 
non-responsive microllows still received approximately 4CV7( more than their fair 
share of the avai lable bandwidth Llsing FRED. Further. a paper contended that 
there \\ere no appreciable benefits in using a random drop policy if the buffer 
management mcchaJ11srn storcs statc for backlogged mic1'OflO\\s [39]. 
A 2002 paper reportel on simulations aimed at e\'aluating \\hethcr microflow-:lware 
buffer management schemes were able to produce better nctwl'rk pcrformance than 
microflow-unaware mxhanisms such as RED [39]. This paper also evaluated the 
relative performance ur a number of variants uf the schemes. It .vas found that hetter 
fairness and microficm isolation may be achieved when using microflow-a\\are huffer 
management mechanisms. 
A 2002 paper considered the performance consequences or a modification to RED 
that takes packet sizc into account when determining whether to enqueued or drop 
ulTiving packets. It was found that by performing early drops Oil large pad:-':ls \\Ith a 
higher probahility thall on small packets, it \\as possihle to in'prove fairness when 
competing flows difkred in terms of their average packet size HO]. A further 











Chapter 2: Literature Review 
2.7 End-to-End Quality of Service 
Previous sections have introduced a number of quality of service mechanisms. It is 
notable that these mechanisms are not mutually exclusive. On the contrary, an end-to-
end connection with a guaranteed service level may involve a number of quality of 














Figure 10: An end-to-end connection with Intserv, Diffserv, and over-provisioned best effort 
networks 
To illustrate the sequence of events needed to establish a connection involving many 
quality of service mechanisms, we will consider the scenario that End-user A wishes 
to set up a one-way broadcast to End-user B. This sequence assumes that both end-
users are Intserv capable. 
• End-user A sends a RSVP path message towards End-user B. This message 
describes the requirements of the data traffic that it intends to send. 
• The RSVP message goes through the routers in Intserv Network 1. This initiates 
standard RSVP processing. 
• The RSPV path message passes transparently through the Diffserv as well as the 
over-provisioned best effort networks [41]. 
• The RSVP path message goes through the routers in Intserv Network 2. This 
initiates standard RSVP processing. 












Chapler 2: Lit,Tenure I{evicw 
• The RSVP resen~dion message is sent back along the patil)pecilied by the path 
message. As it passes through the routers on Jntserv 1\et\\ ork 2. they perform 
admission control. If they have the avai I able resources they storl~ the necessary 
state for a connecti,)n. If not, then the request is rejected. 
• The RSVP resen~:tion message reaches Intserv 1\et\vork 2's edge router. If that 
router deems the necessary resources to he available on the adjacent best effort 
network then it stu "es the necessary state i'or lhc connection. 
• The RSVP resenuion message is sent tr:lIlsparently through the mer-provisll)lled 
best effort network as well as the Diffsen network. 
• When the RSVP n:servation message reaches the router on J ntsen Net work 
L admission contml processi is triggered. It IS necessary for this rOllter to 
establish whether there are sufficient resources available on the virtual link 
be[\xeen it and IllIserv Network 2. This may invohe a comparison be~ween the 
cllnent service le\ cl specification and current reservations. It may also involve a 
re-negotiation of t 1e service level specification with the Di l"fserv and best cff011 
networks -althougl this would not happen with all new RSV P connections. If that 
router deems the r ecessary resources to be avai lab Ie, it stores the necessary state 
for a connection. -r his coule! include the appropriate DSCP fo:' the packets. 
• The RS V P reservution message goes through the routers of htsen Net\\ork 1 As 
it passes through. the romers perform admission control. If the resources are 
available. they stOI c the necessary state for the connection. 
• The RSVP reserv~ition message reaches End-user A. This message indicates that 
the connection ha; been accepted. It mav also spec-ify the DSCP to be used on 
subsequent packet,. 
Once a connection hJS been established, data may begin to flow. The following 
sequence describes tht events as a data packet travels from End-user A to End-user B. 
• End-user A sends :1 data packet. This mayor may not have the appropriate DSCP. 
• The packet passes through Intserv Network 1 where it follows the path installed 
by the RSVP path message. 
• At Intsen Ketwor, 1 's border rouler, the packet's DSCP is r;larked. The packet is 










• When the packet reaches the Diffserv network edge router, it is policed. If 
necessary, it is remarked to a higher drop precedence le\'e\. The packet then 
passes through the Oi ffserv network - recei ving its cOlTesponding service leve i. 
• The packet passe~ through the over-prmisioned best etTen net work where it 
recei yes the same service as all other packets. 
• The packet passes through Intserv Network :2 in the same \\uy it passed through 
Intserv Network 1. 
• End-user B recei ve" the packet. 
The above steps demonstrate how it is possihle I'or guaranteed eed-to-end connedions 
to be established aen)',s networks with disparate quality of sen:ce mechanisms, The 
key factor necessary 111 such connections is 1'01' the vclriolls networks to be abie to - -
interoperate, For example, when the edge router Oll fntserv :\ct\\ .lrk 1 was required to 
assess the availabilit) of resources on the Diffserv network, it had to determine 
whether the Diffserv network was able to provide then necessary service. Simi iariy, 
the Diffserv network had to be aware of the level of service that could be expected 
from the best effort l1i:t\vork. It is in this way that sen ice-level information may be 












Router Architecture and Operation 
3.1 The Role of the Router 
The role of the router is primarily to forward IP packets towards their destinations. 
Further, it is necessal y for routers to buffer packets so that bursts and temporary 
congestion are absorbl~d. Routers have a number of ports that connect them to other 
routers or end system~. Routers use routing tables and the destination address stored 
in the packet header to determine where a packet should be sent next. The next 
destination of a packel is refeITed to as its next hop. Because Internet routes and end-
systems are al\'\:ays Lilanging, it is necessary that routers frequently update their 
routing tables. This is done automatically USlllg protocols that enable communication 
between rOllters. Common protocols for updating routing tLlbles lllclucle Border 
Gatc\vay Protocol (Bep), Open Shortest Path First (OSPF J. In:crmediate ~ystt'm to 
Intermediate System (IS-IS), and Routing Inlormation Protocol (RIP). 
3.2 The History of Router Architectures 
The evolution of router architectures has been spurred on by ever incn:<1sing data rate 











('hapler ,~ 1\" ller Archl!cc\llle ,1m! 0i,cratioll 
The first generation of routers consisted of general-purpn-;e single processor 
computers with multiple network cards, The nel\\ork cards \\ .?re connected t,) the 
CPU by a shared bu;, Arri VI packets \\ere forwarded to the CPU, The CPU 
determined which oUl['ut port they should be sent to and send them to the appropriate 
network card. The dati path thus required packets to tra\erse the shared bus twice. In 
addition to forwardin~ packets, the CPU was required to pcrform routing table 
updates as well as implement control protocols. 
The second generation of routers addressed both the CPU and the shared bus 
bottlenecks by introdlk'ing the mUltiple processor - shared bus a:·chitecture. With this 
architecture. more of tile forwarding computation was performed on the incoming port 
cards, AITiving packel s were thus Llsually sent directly from the input pnrt to the 
output pon. The meclanism worked as follows: The fIrst ani\ packet of a flow 
was sent to the CPU, rhe CPU determll1ed the output port luI' ue pa.:ket. It sellt the 
packet to this port as well as sencli a cachl' l'ntry to the inpu port, Wh~n further 
packets belonging to [lis flow arrived at the incoming port. the card was able to use 
its cache entry to send the packet directly to the output port. This mechanism reduced 
the load on the shared bus as well as on CPU. Further enhancements to this 
architecture involved topying the entire routing table onto each port card. This meant 
that the initial packet or a connection no longer had to be sent to the CPU. 
The third generation (If routers had multiple processors on the port cards as before, 
but used space switchng instead or the shared bus. This design alleVIated the shared 
bus bottleneck of seclmd-generation routers by introducing a high speed switching 
fabric that connected the port cards. This ~tllo\Ved data rates '0 be incre~lsed by a 
number of orders of l1llgnitude. 
The next bottleneck n throughput was packet proce;-,si at the port cards. This 
problem was alleviated with the shared parallel processors - space switching 
architecture. This optimisation was made possible by the observ~ltion that it IS unusual 
for all port cards to be backlogged at the same time. The sharing of forwarding 
engines was made po~sible by separating them from the port c<rds. This architecture 











Chapter 3: Router Architecture and Operation 
Router 
Controller 
""'" t ..... ~ I ~I ~I \I~ 'I' I ," 
,~ 
I \' \' 
I \J - .... Switch Fabric \11 I - :;;: ... Forwarding Port :: ::: .... 
::: (High Speed - Engines Cards I- - '-- Interconnect) 
Figure II: The shared parallel processors space switching router architecture 
The architecture illustrated in Figure 11 worked as follows: When a packet arrived at 
a port card, its header was sent over the switch fabric to one of many dedicated 
forwarding engines. The forwarding engine determined the outgoing port and 
informed the incoming port of this. The packet was then sent to the appropriate output 
port. Because only packet headers were sent to the forwarding engine, the overhead of 
sending payloads across the switch fabric was minimised. This architecture made it 
possible to increase the port density of routers. 
The previous architecture was motivated by the assumption that port cards are not 
able to perform destination lookups at the rate of packet arrivals during a traffic burst. 
However port processor chips that are able to perform lookups at line speeds of up to 
160 Gigabits per second have subsequently been released [42]. 
3.3 Present and Future Router Architectures 
Most modern high-performance routers as well as many routers that are currently in 
development use a switch fabric architecture with distributed buffer memory and 
forwarding capabilities [43]. This architecture has a centralised route processor that 
duplicates its routing table in each of the port cards. The port cards, which are full 
duplex, each have local processors, input queues, Virtual Output Queues (VOQs), 
internal buffers, as well as output queues. The input queue stores newly arrived 
packets. The VOQs store packets whose IP lookup has been performed and are 
waiting to be sent to the appropriate outgoing port via the switch fabric. The internal 











('IJaph:l I{(\[l"'r Architecture and Operatioll 
mechumsm. The output queue stores packets that are \\ aiting to be sent to their next 
hop. switch fabric connects the port cards to each other as well as to the route 
processor. Figure 11 ~hows the data path or a typical high-perrormance router with 
three ports. 
POl't Card 1 Switch Fabric ,------, 
\OQ2 
Input QueUe (High Speed Interconnect) 
VOQ:; 














Figure 12: The data path of a router with a switch fabric architecture :lIld distributed buffer 
memory and forwarding 
The following sequetlCe of events occurs when a packet from an external source 
ani ves at Port Card 1. 
• The packet is pla:ed on Port Card 1 's input queue. The contents of the packet 
header are read. The port card performs a rUlIting lookup based on the packet 
header's ciestinati(.n address and determines (he packet's next hop. 
• The packet is sent to the VOQ corresponding to its destination port, 
• The packet is sent through the switch fabric to the appropriate pon card's internal 










It is here that temporary congestion is absorbed and ouput link shari IS 
performed. Buffer management and scheduling schemes su:h as RIO and ORR 
are implemented il-::re. 
• The packet is sent 10 the output queue whereupon it is sent to Its next hop. 
The following paragraphs motivate and elaborate on the above data path. 
The input buffer's role is to slore packets from traffic bursts should the lookup engine 
become backlogged. However. it is not desirable that packets be stored in this buffer 
for too long. as the pI ioritisation of delay senSItive traffic is not possible here due to 
the packets' headers having not yet been analysed. A further drawback of having long 
queues here is head of-line blocking. This will be dealt vvnh in the following 
paragraph. Because mJdem port cards are now able to perform packet lookups at line 
speeds ['+4], there is 10 reason why more than one packet should be stored ill this 
buffer at any given tine. 
After the next hop I()(kup the packets are selll to one ot the por: card's VOQs where 
they walt for the swi ch fabric to send them to the recei\ing ])0\1 card. There is a 
separate VOQ cOlTesponding to each destination port card. This prevents head-of-line 
blocking. Head-of-lin<; blocking can occur in a scenario \\here one output port is 
backlogged and can't accept any more packels, but another ouput port is idle. If a 
single queue were used for packets destined for the switch fabric and the packet at the 
front of the queue wa~ destined for the backlogged port. it wouk; be necessary to wait 
for that packet to be dequeued before any packets could he ~ent to the idle port. 
Because the input port processor would be blocked from sendicg packets to the idle 
port, it would be wasteful of network resources. 
As stated previously. the internal buffer of the sending port card is where most of the 
router's queuing OCCLI ·S. It IS here where qualIty of sen ice poliL'ies are implemented 
and pacKet marking (r dropping should occur The reason thal the majonty 01 the 
buffering must be performed on the output port rather than the input port is that all 











('haprcr ;: R(l:lter ArcllitcclUrc and Operatioll 
The output queue mu:;t be short as, because it is a FIFO. it htl:; no quaiJty 01' 
service capabilities. This queue's primary role is to prO\ide a simple interface for the 
transmission hardware to dequeue. 
3.4 Scalability and Microflow-awareness 
Although router capacity has increased by a factor of ahout ten in the past five years 
[45], the similarly rar1d growth of the Internet means that the ahility of Intcmet core 
routers to process da.a sufficiently quickly is still of critical concern. ThIS section 
addresses the ability of routers to scale up to high data rates. The issue of scalability is 
particularly relevant t() this study as H-MAQ is demandi in terms or cornputational 
resources. It is thus lIecessary to consider its viability in this regard. The issue of 
scalability was touch.:d on previously when considering the relative merits or the 
Ditlsen and Intserv 11lodel s. There are two aspects to this topic, namel y the amount of 
data or packets that wuters can process in a gIven time. as \\ell as the compleXIty of 
operations that the n)uters are able to perform on the packl'ls gi ven lhese ti me 
constraints. Some eX~dnplcs of complex operations are prioritis packets based on 
their DSCPs. performing microflow lookups. and performin~ per-micrurlow fair 
queuing. This sectior, considers the performance bottlenecks (If modern routers as 
well as the state of thi; art technologies that define the maximum capabilities on these 
bottlenecks. Note that the issue of routing table scalability \Viii not be considered. as it 
does not relate to this study. 
In Section ],] it was sho\vn that L1ser traffic passes through two types of clements in 
modern routers. The:-.c arc the switch fabric and the port processors. Both of these 
elements will be dealt with in turn. 
3.4.1 Limitations in Switch Fabric Technology 
Modern s\vitch fabms are able to switch data at rates in the order of Terabits per 
second H6]. Becaus.: the switch fabric's role is simply to forward packets. no 











3.4.2 Limitations in Port Processor Technology 
The second element tliat will be dealt with is the port processor. The hasic uperations 
that are performed h~re are next hop lookups as well as butTering. Current port 
processors are able to perform these functions as well a,; providing service 
differentiation based un the DSCP of packets at line speeds of l()0 Gigabits per 
second l47]. 
A more challenging .,ervlce offeri IS to provide all of the functions mentioned 
above as well as to provide pcr-mierofIO\\' lookups and queuillg for packets i 11 the 
intemal buffers. Providing such a service poses the following challenges. 
• The processor nel'ds to perform microrlow lookups for ail packets before they 
may be enqueued hy the intemal buffer. 
• The processor needs to be capable of supporti ng separate queues for all 
microflows. 
These challenges are (. ealt with separately in the sections belo\\'. 
3.4.2.1 Pelj'ormillg Jlicroj1ow Lookups 
It has been verified using Veri that. Llsing cU1Tently available hardware 
components. it is po~sible to perform per-micronow lookups and queui on port 
processors at line speeds of 10 Gigabits per second [6]. FUJ1hl'rmore, the company 
EZchip has, since Ocober 2001, been producing port processors that are capable of 
performing per-microflow lookups and buffering at line speed,; of 10 Gigabits per 
second [44]. As described in Section 2.1.2. the adoption of IPv6 will make future 
microflow lookups e~L,ier due to the introduction of the IP flow h:bel. 
In Section 2.2.1 it wa" stated that Intel11et service providers currently use links of up 
to 2.4 Gigabits per set·ond. As described above, cUlTently uyai [able routers ~lre able to 
perform microflO\v-a\\are packet lookups and buffering at above this rate. 
3.4.2.2 Supporting Separate Queues for all Micro.llows Oil Port Processors 
The number of active I1lH.:roflows generally increases \\lth incre;:sed data rates. There 
is thus a concel11 that [ntel11et core routers \\ill be unable to store state for all active 
microflows. This sect on demonstrates that this is not the case by first discllssing the 











Chapter ~: R" Iter Architecture and Operation 
then demonstrating that cUlTent routers are able to efficiently support this number of 
queues. 
In Section 2.2.1, it w~:s stated that Internet core links carry thousands to hundreds o/" 
thousands of conculTc nt flows. Microflow-aware core routers dl) not. however, need 
to have separate queues for this many microl'lo\\s. For a mulL'r to implement per-
microflow queuing, It is not necessary to have a separate queue for all active 
microflows, but only lor backlogged microflows. Because of the bursty nature of TCP 
traffic. the number oj backlogged microfio\\s at any given time is far less than the 
number of microflo\\s constituting active connections across a given link [Sj. In 
addition to this, the wmber of microflows for which a router must store state is 
limited by its intern, I buffer size. Consider the case of a wuter buffering for a 
maximum of 100 ms 1)Jl a 2.4 Gigabit per second output link. If the mean packet size 
is 500 bytes, the wo:st-case scenario in which each buffered packet belongs to a 
different microflow rquires 60 000 separate queues. 
The above points give a ceiling to the number of separate queue.~ that port processors 
need to support. Beca'Jse River Delta Networks was producing port modules that can 
support 256 000 queLlt?s before 200 I [48], it is clear that this perceived limitation is of 
little concern. Furthermore, continuing technological ad\ances wi II increase the 
number of queues that routers can support. 
3.5 Best Practices for Per-Microflow Queuing 
This section lists some best practice techniques to be implemented on routers 
employing per-microfiow queuing. These techniques are recommended based on the 
results of simulation experiments that compared fair queuing mechanisms. The 
relevance of these pral"tices is that they are all employed in H-MAQ. 
• When packets ned to be dropped from huffers. they should be dropped from the 
front of the queucs that they are in. This policy, [..;no\\n a~ drop-from-front, is 
beneficial as it prl)vides TCP with an early indication of nL'twork congestion. It 










('Ilapkr ROlltcr An:l1itcc'turc and Operatioll 
encouraging fast'etransmit [39]. Furthermore, it reduces queuing delay when 
packet-drop levels are high. 
• Lsing fair schedulers does not alone provide adequate fairness, Fair schedulers 
should be used in \onjunction vvith appropriate drop policies, 39]. 
• If per-microflo\\ queuing is used, there is little value in persisting to lise global 
random drop mec lanisms. Using per-queue drop policies such as longest queue 
drop helps to prov de near perfect microrlow isolation [39], 
• The Queue State Deficit Round Robin (QSDRR) scheduling and drop mechanism 
has been found to provide better fairness than conventional DRR [49]. QSDRR is 
similar to DRR l'xcept that whereas \\hen ORR is used. a packet is always 
dropped from the longest queue, with QSDRR. once a drop queue has been 
selected, that quel e remains the drop queue until there is no queue that is shorter 
than the drop qu~ue. The QSDRR drop policy is also more computationally 
efficient because when determining the next drop queue in QSDRR, it is usually 
not necessary to e\umine the queue lengths of all microflow (iueues. 
3.6 A High-perfornlance Router Architecture Exalnined 
This section exami!1l's the architecture of a high-performance router in detail. The 
router examined here is the Multi-Service Router (MSR) dc\eloped hy The Applied 
Research Laboratory. Washington Lniversity. USA [49]. There Jre three motiv<ltions 
for using the MSR a:-, an example implementation. Firstly, the design of the MSR is 
such that it reflects proprietary high-performance routers closely. Secondly, because 
this router is not prop -ietary, the author had access to details about its inner workings. 
And finally, because the MSR forms the basis of the author's implementation, it is 
useful that its architecture be described thoroughly. 
The MSR has the following key features. 
• Line speeds of up iO 2,4 Gigabits per second. 
• A switch fabric capable of 160 Gigabits per second. 
• Wire speed packel classification and routing. 
• Microflow-speci f).: process) of data streams. 
• Support for quali!, u1' service guarantees. 










Chapter 3: Router Architecture and Operation 
• Support for various signalling protocols such as MPLS and RSVP. 
• Active networking functionality. 
• Dynamic loading of kernel modules from the control processor. 
Firstly the hardware components of the MSR are examined in detail. Then its boot up 
and configuration procedures are described. Finally its overall operation is described. 
3.6.1 The Hardware Components of the MSR 
The MSR consists of an ATM switch fabric with embedded programmable processors 
on each port. (Please refer to Appendix A for an introduction to ATM.) Figure 13 




Figure 13: An overview oftbe MSR hardware showing the switch fabric, switch ports, Field 
Programmable port eXtenders (FPXs), Smart Port Cards (SPCs) and line cards 
The MSR's ATM switch core comprises a Washington University Gigabit Switch 











Ch"ptcr Reuter Architecture and Operation 
as weI! as Smart Port Cards (SPCs). Each element or the MSR \\ ill nmv be dealt with 
in detail. 
3.6.1.1 Control Processor 
The control processor is a sofhvare program that runs on a stand-alone machine. This 
machine is attached to a network that is physically connected to one of the MSR line 
cards. Dedicated AT\1 Permanent Virtual Circuits (PVCs) L)gically connect the 
control processor to the MSR. The control processor uses A Pvl control cells to 
communicate explicitly with the switch core as well as with all port processors. The 
control processor rU:1S software that is responsible I'or booting. configuring and 
controlling the MSR ;,s well as monitoring its status. 
3.6.1.2 AT,tI Switch Core 
As stated previously, the A TM switch core cornprises a WUGS ! 51]. This consists or 
an ATM switch fabric with eight input ports and eight output ports. The features of 
the WUGS are as 1'011 JWS: 
• Each port is capahle of operating at rates up to 2.4 Gigabits per second. 
• The switch fabric is capable of throughputs of 160 Gigabits per second. 
• The WUGS handles multicast efficiently. 
• The WUGS sup )Urts packet level discard, otherwise known as early packet 
discard. This me~!IlS that should congestion cause the dropping of ATM cells, the 
WUGS will drop all cells belonging to a single IP packet rather than dropping 
cells from many rackets. This minimises the number of cOlTupted [P packets. 
The Wl.TGS's ports Ire connected by four parallel matrices of Switching Elements 
(SEs). These parallel planes each operate deterministically to distribute the load. The 
SE interconnect is sl.own in Figure 14 and is a good ex.ample of a Benes Topology 
with k 2. Load dis~rjbution thus occurs on the first stage of SEs whilst the second 













• • • • • • 
Input 
Port 8 
Chapter 3: Router Architecture and Operation 
Switching Network Plane 
Figure 14: A Benes Topology -the internal structure of a WUGS 
Output 
Port 1 
• • • • • • 
Output 
Port 8 
When the input port receives a cell from an external link, the cell is sent to a receive 
buffer. Once the switch is ready to receive the cell, it is sent to the Virtual 
Path/Circuit Translation table. This provides the necessary A TM-layer routing 
information. The cell is then ascribed an internal cell header whereupon it is 
effectively split up into four parts so that each part, together with its internal header, 
may be sent to one of the four switch element networks. 
The cell fragments are sent to the first stage of switch elements where load 
distribution is performed. Here, adjacent switch elements employ hardware flow-
control to distribute the flow of cells to successive stages. This eliminates the 
possibility of cell loss within the SE network and minimises buffer requirements. 
Latter switching elements use internal cell headers to switch cell fragments towards 
their proper output ports. 
Once through the SF network, the cell fragments are sent to the appropriate output 
ports. At the output ports they are reassembled and sent to the output buffers before 
being transmitted. 
The control processor controls the WUGS using a dedicated PVC. This control is 











Chapter 3: Router Architecture and Operation 
3.6.1.3 Field Programmable Port Extender 
The FPX is a Field Programmable Gate Array (FPGA) based system [52,53]. The 
FPX, which is used as a port processor on the MSR, has the following features. 
• The FPX interfaces at line speeds up to of 2.4 Gigabits per second. 
• The FPX is guaranteed to perform 9 million IPv4 lookups per second in the worst 
case. Assuming an average packet size of 256 bytes, this translates to 18.4 
Gigabits per second. 
• The FPX can support 10 000 forwarding table updates per second with less than 
9% degradation in lookup performance. 
• The FPX is implemented using dynamic hardware plugins, which combine the re-
programmability of a software-based system with the performance of a hardware 
system. 
• The FPX can be reprogrammed over the network by the MSR's control processor. 
Figure 15 gives an overview of the FPX architecture. The FPX uses SRAM to store 
the forwarding table. It also contains two FPGAs. One of these is used to implement 
the network interface device and the other performs functions related to IP lookups 
and forwarding table updates. The lookup engines, which implement Eatherton and 
Dittia's Tree Bitmap Algorithm [54], are time division multiplexed to use all available 




SRAM Interface (Forwarding Tree Bitmap) 
FIPL Engine Controller 
FIPL Wrapper 
Packet 1/0 

















Chapter 3: Router Architecture and Operation 
The FPX module needs only perform IP lookups on data packets arriving from the 
line cards. Packets arriving from the switch do not, of course, require lookups as they 
have already had their lookup performed. These packets may thus pass transparently 
through the FPX. When a data packet from the line card arrives at the network 
interface device, it is sent to the Fast IP Lookup (FIPL) wrapper. The FIPL wrapper 
extracts the packet's destination address and sends this to the FIPL engine controller. 
The FIPL engine controller enqueues the address and sends it to the next free FIPL 
engine. It also arbitrates the various FIPL engines' forwarding tree accesses on the 
SRAM. The FIPL lookup engine performs a longest-prefix map and returns the next 
hop value for the given packet to the FIPL engine controller. The controller passes 
this on to the IP Wrapper. Based on the next hop value, the IP wrapper sets the ATM 
VCI that the packet should be sent to the switch on. This VCI determines which 
output port the packet will be switched to. The packet is then sent to the switch via the 
network interface device. 
Packets that contain forwarding table updates are forwarded to the control processor. 
The control processor then updates the forwarding tree on the FPXs' SRAM. 
3.6.1.4 Smart Port Card 
The SPC is effectively a compact Intel computer that serves as a port processor when 
sandwiched between the line card and the FPX on an MSR [55]. Figure 16 illustrates 
the SPC architecture. 
166 KHz Intel MMX Embedded Processor 
t t l 
L2 EJ North-Cache Bridge DRAM .. 




B 0 0 I APIC I 
Lr IIART! I+- +-I I JART ! Interface I I BIOS ROM I Link Interface I 
IfART2 r.-ft.jIJART2lnterface I 
-I Switch Interface I 











Chapter 3: Router Architecture and Operation 
A 166 MHz MMX Intel chip constitutes the North Bridge processor of the SPC whilst 
a system FPGA provides the South Bridge functionality. The SPC has 64 MBytes of 
DRAM, as well as a.Washington University ATM Port Interface Controller (APIC) 
host-network interface [56]. The APIC is a network interface chip that allows the SPC 
to selectively intercept data travelling between the FPX and the switch. Each port of 
the APIC is capable of full duplex data rates up to 1.2 Gigabits per second. 
The SPC runs a moditied version of NetBSD and can be booted over the network by 
the control processor. The control processor first separately downloads the NetBSD 
kernel and filesystem to the SPC before initiating the booting of the SPC. The SPC 
contains a serial port that can be connected to a standalone workstation. This allows a 
user to open an SPC terminal window from the workstation. 
i 
The SPC's role in tht' MSR is to perform buffering as well as to perform any non-
standard operations such as: 
• Providing differential quality of service 
• Packet monitoring and marking 
• IP lookups in the absence of the FPX. 
Figure 17 illustrates a simplified IP processing data path for the SPC. 
Queue ofPadets 
from Previous J fop 
: 1 1 1 




































When data packets arrive at the SPC APIC from the previolls h<)p. they are sent to a 
receive buffer. The SFC then performs device specific processin~ on the packets. This 
could include operations such as monitoring for the purpose billi polici or 
marking packets. If no FPX is present on that port. then the SPC must perform IP 
lookups. The SPC is. however. nor as fast as the FPX at performing this func:tlol1. 
Once all local proce~si is completed, the SPC sends the rackets to the Olltput 
queues. There is a serarate output queue for every destination pnt on the router. This 
is designed to prevent head-or-line bJocki The packets are Jequeued from these 
output buffers accordmg to a distributed queueing algorithm. T1is will be expLlined 
fU11her in Section 3,6.3. 
When data packets HI ri ve at the SPC APIC from the s\\itch sidL" they are enqueued. 
Device spec ilk processing is performed on these packets as .ore. The packets are 
then sent to the intemal buffer. This is the primary output buffe:' for each port of the 
router. It is here that buffer management mechanisms such as RED or fair queuing arc 
implemented. Packet' are removed from this buffer as per the scheduling mech,lllism 
and sent to the FIFO output queue. This queue, which IS ,dways kept short, is 
clequeuccl by the AP]( '. 
The SPC II, which vas released in 2002, is an upgrade to the SPC. This port card 
contains a 700MHz Pentium-III processor as well as 256 MB of SDRAM [57] and is 
intended for use by P,)rt applications requiring intensive computttional power as well 
as high data rates, 
3.6.1.5 Lille Card 
The role of the line ;:ard is simply to provide physical layer tr,nslation between the 
MSR and the conne,,'ted network(s). The following media are supported: dual 155 
Mbps OC-3 SONET. 622 Mbps OC-12 SONET. 1.2 Gbps HP G-Link as well as dual 
1.2 Gbps HP G-Link. ;\ Gigabit Ethernet line card is currently unJer development. 
3.6.2 Booting and Configuration of the IVISR 
As stated previollsly, the control processor uses dedicated PVCs 10 communicate with 
the ;\tlSR. There are \'CIs that are dedicated to communication \\ Ith the switch as well 











('haptcr R" iter Arcilitecture ,md Operatioll 
send data to the control processor on VCl number 40 + I where i is their port number. 
T\vo-way communication is thus possible. 
MSR system configUl arion begins with AT\l switch configuratlon. This sets up 
communication pathc bet\veen the contrul processur and .d I port pmcessors. 
Following this. the s .\ltch discovery phase can begin. 1lere. the control processor 
ascertains which pon processors and line cards are on each port. This is done by 
sending control Is LJ each potential port processor. The vi ng pon processors i 11 
tum report their presence as wei I as that of the adjacent cards. The control processor 
then begins downioadlllg a NetBSD kemel and a memory-resident file system to each 
SPC. This is done using a multicast Vc. The SPCs are then bO<Hed whereupon each 
one is sent its port location using MSR control The FPXs are then 
initialised in a similar manner. In the case or the FPX, a progran, and configuration is 
loaded into the applil ation FPGA' s reprogram memory. The control processor then 
sends a control cell. vvhich initiates the reprogramming of the FPGA lIsi the 
contents of the reproglam memory. 
The MSR then runs Zebra [58], which is a route-finding progLlm. The routing data 
received is uscd to p,'pulate the control processor's routing tabie. Based on this. 
control processor computes the forwarding table for each port. If there is an FPX on a 
given pon. the forwar,Jing table is sent there. If no FPX is present then the fonvarding 
table must be sent to the SPC. which will perCorm the lookups :nstead. Any changes 
made to the rollling ta )Ie are propagated down to the forwarding :ables. 
3.6.3 The Operation of the lVISR 
The general mechani~ III whereby the MSR processes packets i:- as foJlows: Wilen a 
packet LlITives at an input port, an IP lookup is performed. Th;s lookup determines 
whieh output port thl: packet should be s\vitched to. The port processor sends the 
packct to thc s\\itch u .;ing thc ATM VCl that corrcsponus to this port. The s\\'itc! then 
forwards the packet to its appropriate output port based on the \CI that it was 












('hapkl Ro ncr ArLhit,ctur~ and Operatiol] 
It has been stated pre\ iously that although packets should primarily be buffered at the 
output port. there arc a number of packet queues throughoLlt tle MSR. In order to 
ensure that the MSR behaves as an output queuing rOllter. ~: distributed queuing 
mechanism is emplo) ed r59]. This mechanism prevents sustained congestion on thc 
internal links of the MSR and in so doing pre\ents reductions in throughput. Put 
differently: the MSR uses virtual Olltput queuing to prevent int~rnal congestion and 
head-or-line blocking 
The mechanism employs a coarse scheduling approach \\hereby each input port 
periodically (every )I)Ous) broadcasts its backlog to each output port. Output ports 
similarly broadcast ini'ormation aboLlt their Olltput rates and queu\:" lengths. Input ports 
are thus able to calculate the rate at which they should send dat:l to each output port. 
Because a separate \ CI is used to send data to each output port, the input ports are 
easily able to adjust tlleir VC's pacing on the APIC to control the flow of data towards 
the output port. It is thus possible to keep data moving to the output ports in a timely 
manner without causing internal congestion. 
Chapter 4 will descri'Jc the design issues taken into consideration when the proposed 










(":aptcr -I Iksigll ('onsickrati()lls 
Chapter 4 
Design Considerations 
4.1 Motivation for H-lVIAQ 
Diffserv is a popul, r technology for implementing tiered :-,ervice levels III IP 
networks, One of the main reasons for this is its ahility to scale tll large network ,-;izes, 
The conventional wi'dom is that a microflow-unaware active buffer management 
mechanism such as RIO should be used III the core or Oil t'serv net\,.vnrks. The 
drawbacks or using such mechanisms were discllssed in detail il Section :2,6.:2, I and 
can be summarised as being lImited fairness, unpredictablity, instabilitv and 
uncontrollabi Iity, 
:"Jot being able to predict the behaviour of a network quantitatJ vely means that it is 
only possible to offer relative service differentiation between c1a~ses and not absolute 
guarantees [32]. The 'a]ue of providing a relative service is ljue:-.tionable as SLAs are 
defined in absolute, 11 l[ relative terms. This means that net\\ork operators once again 
have to reS0l1 to the extensive usc of over-provisioning to guarantee services. This 
results in the shortcomings given in Section 2.3. A further dnl\\back it that because 
the core's behaviour i.; unpredictable and highly dependent on tr:lffic lype. a large and 











to providc a contracted service to a client. Such an outage call re"ult in large penalties 
bcing inculTcd. 
As a solution to these problems, this study proposes the employmcnt of core net work 
clements that are aW;lre of competing microflows and aggrega:es. as well as heing 
aware of competing packets' DSCPs like RIO. Using such a mechanism has the 
follmving benefits: 
• Network managel s would be able to cxplicitly allocate retwork resources to 
aggrcgatcs makin~ it pOSSible to provide service guarantees on an aggregate oasis. 
• Faimess would k improved within aggregates. This would provide a more 
predictable servicl' to end-users. 
• Access control could be improved to make better use of nel work resources. One 
reason is that COle routers would have access to explicit information about the 
reservation and u i I izati on of resourccs by aggregates along all Illtema I net work 
links. Another reason is that resources would be explicitly allocated to aggregates 
on all links of the network. 
• Nct\vork utilizati ,)f1 would be increased because a more predictable, controllable 
and stable netwOh: core would reduce the amount of over-provisioning needed. 
• Should a DiFfsen network migrate to the proposed mechanis:n, extemal interraces 
with other netwOl ks or cilents would not need to be changc(;. This is because the 
mechanism concerns only the inner workings of the Diffserv Ilet\vork. 
• Because the proposed mechanism uses a drop-From-front packet drop policy, TCP 
responsiveness w,)uld be improved, as described in Section .15. 
• Because the prcposed mechanism provides thc means ~o allocatc dif/crent 
maximum buffer sizes to difrerent aggregates, it would h: simple to support 
specialized serviles such as a low delay service for individu:tl aggregates. This is 
described in SectJOll 5.2.1. 
Although it is possible to implement the proposed routers incrementally on a Diffserv 
network. it is only once all core routers in the net\\ork implement H-MAQ and 
effective admission control exists that thc behaviour tlAlic would become 
predictable and conlrollable. Only once this predictability ancl controllability have 










offered by a Diffse'v network without slgmficant oVer-ph)\·lslol1lng. As was 
mentioned above, shuuld a network migrate to H-MAQ, it i~ only the network's 
internal interfaces tha would have to be modified. The network would still olrer a 
Diffserv service to the outside world and, as a result. existing e\ternal interface:; and 
service level ugreements would not need to be changed. 
4.2 Overview of H-MAQ Networks 
This section gives an ,wervie\', of the operation of H-MAQ netv.orks. This is done in 
three parts: first the behaviour required of the edge network elem.;nts is described. and 
then the behaviour of 'he H-:vlAQ core network elements. Finall:. the \lverallnet\\ork 
design is described. 
4.2.1 Edge Network Element Behaviour 
With regards to user traffic, the behaviour of routers implementing Diffserv AF 
where H-MAQ is im )Iemented in the core is identical to that when standard RIO 
routers exist 111 the core. These routers perform policing of aggre:?ates to limit load on 
the core. Aggregates are policed on a per-microflow basis. Excess is re-markecl to a 
higher drop precedence level or, in the extreme case, dropped. 
The behaviour requir,~d of routers when H<v1AQ is imp.emented in the core 
does. however, differ from the standard case with regards signalling. In addition to 
standard Di ffserv signalling, the proposed routers are required to reserve res()urccs for 
aggregates on core rOl ters within the Diffsen net\\ork explicitly. 
4.2.2 H-MAQ Core Network Element Behaviour 
The behaviour of cor~ routers implementing H-l\lAQ differs lTarkedly from that or 
classic Diffserv con: routers. The proposed routers implellent a hierarchical 
scheduling mechanism that provides guarantees to aggregates b) explIcitly allocati 
link capacity to them. Within each aggregate there is a micrnflo,,\-uwure fair queulIlg 
mechanism that allocates a separate FIFO queue to each microfl()\\', These queues are 
serviced using Deficit Round Robin (DRR) [60], A drop precedence-aware vanation 
of QSDRR's drop poliCY is implemented on each aggregate. The benefits of this drop-










4.2.3 Overall Network Design 
The overall design of H-MAQ enabled Diflserv AF networks ,t1l0ws the control of 
resource allocation at all net\vork elements both on the edge and t he core or the 
network. This model differs from the standard Ditlserv model \\tlere only the edge of 
the network is controllable, and the core is both largely Llncontrollable and 
unpredictable. The de~ign of H-MAQ will be specified more clea:'ly in Chapter 5. 
4.3 Feasibility of Proposed Mechanism 
The proposed design does raise the issue of scalability. This is because it has a 
microflow-aware core and because it performs resource a.locatlon at a finer 
granularity than con' cntional Diffserv net\\orks. There are t \\0 areas where the 
scalability of the proposed mechanism could be brought into lJlcstion. These al'e in 
network signalling ,nd in the processi 
considered next. 
of lIser traffic. -;'hese areas wi II be 
4.3.1 Signalling Feasibility of Proposed lVlechanism 
The proposed mecha lism employs signalling paths whereby edge routers explicitly 
allocate resources to aggregates on core links. Because this r-.:source allocation is 
pelformed on an aggregate basis, rather then on a per-microflo\\ basis, as is the case 
with 1ntserv. there is no reason why any scalability problems should be incurred. In 
the extreme case. st,ltic provisioni would eliminate the need for such signalling 
paths. Using dynami: mechanisms such as aggregate RSYP [6 I J \\ould. ho\\ever, 
result in more fIexibk networks. Furthermore, the use of lahel-switching technologies 
such as Multi Proto,:ol Label Switchi 
simpler in Diffserv networks . 
(\WLS) would make aggregate detection 
.... 3.2 User Traffit' Feasibility of Proposed lVlechanism 
It has been suggested that per-microflow fair link sharing is proll bitivcly expensive to 
perform on router pert processors hecause of its computational overhead. This issue 
was considered in dtlail in Section 3.4.2 with the conclusion being that current port 
processors are able L) perform these functions at line speeds ahove those of current 











In Section 1.1. L it was stated that the usage of the 1ntcmcl is Joublmg 011 a yearly 
basis. Section 3.4 stat~d that router capacity increased by only I factor of ten in the 
last five years. It is Lhus clear that router capacity is not cllITently matching the 
Intemet's growth. Thi) situation is more sustainable than it may at first appear as not 
onl yare router capac I ties i ncreasi ng, but so too are the n urn her of routers i 11 the 
Internet. Nonetheless. if the Intemet of the future is to provide guaranteed service 
levels together with high levels of resource utilization. router manufacturers wIll be 
required to rise to till challenge or producing increasingly pm' erful microflo\\' and 
aggregale-a\\are routc·s. 
4.4 Scope of Implementation 
ThIS projcct aims to address the shortcomi ngs related to Dilsen AF :hat were 
described in Section 1.6.2.1. This is done the proposal <:nd e\uluatlon of an 
altemative buffer ma wgement mechanism !H-MAQ) to he u~ed in core network 
routers. The e\uluati\ ns \vi II consider all 01 the shortcomi I1gs di scu:;sed ill Sccti on 
2.6.1.1 excepting the 1nllowing two issues. 
• The issue of the t:ffect of packet size on fairness will be omitted as it can be 
remedied simply hy employing a mechanism that takes pac~et size into account 
when determining whether to enqueue or drop aJTi\ing packels. This mechanism 
was described in Section 2.6.2.2. 
• The issue of ins1<lbilitv will not be considered. as the author was unable to 
demonstrate that H-:vtAQ is more stable than RIO. This was due to limitations in 
the ability of the traffic sources to generate sufficient tLlffic volumes. It is. 
however, the autllor's opinion that H-MAQ does prO\ide better stability than 
mechanisms such as RIO. This 1S because H-MAQ. unlike RIO. includes no Imv-
pass filtering component. 
The following chapter gi ves a technical speci fication for H-\!IAQ. This specification 
forms the blueprint of the author's implementation. It is notable :hat this specification 
considers buffer management and scheduling only. Ko signalling is considered. It can, 
however. be argued t 1at the required signal11 ng functions are t:i vial due to the fact 
that should reservations be performed, this would happen on a per-aggregate basis. 











sub-class of service bUl the results apply to all four AF classes. IllS is because the AF 
classes are specified t ) behave independently. Only two drop p'ecedcnce levels will 
be considered rather tlan the maximum permIssible number of three. This conforms 











Chapter 5: Technical Specification for Implementation 
Chapter 5 
Technical Specification for 
Implementation 
5.1 Edge Router Mechanism 
The edge router is responsible for policing traffic entering the Diffserv AF network. 
The policing should occur on an aggregate basis as well as on a microflow basis. This 
is implemented as follows: The router periodically divides the data rate for each 
aggregate between its competing microflows. This yields the police rate (Rp) for each 
microflow. If a microflow's sending rate exceeds its police rate, the edge router 
begins marking that microflow's packets as being out-of-profile. This is done using 
the Time Sliding Window (TSW) tagger [25]. 
The TSW tagger consists of two distinct parts: a rate estimator and a tagging 
algorithm. The rate estimator estimates each microflow's sending rate upon each 
packet arrival. It maintains three state variables: Will_length, which is a pre-
configured window length measured in units of time; Avg_rate, which is the current 



















Upon each packet aITi val, the state variables are updated as follows: 
Bytes_ill_TSW A vg_rate * Will_length; 
New_bytes = Bytes_llcTSW + pkcsize; 
AvgJate New_bytes / ( currenctime - T Jr'Olil + Will_length ); 
T Jrol11 curren(yackecarrivaCtime; 
This decaying rate estimator produces the A vg_rate variable that is used by the 
tagging algorithm. The tagging algorithm looks for the point where a microflow rate 
exceeds 1.33 * Rp. At this point the tagger starts tagging packets as out-of-profile. 
Both the in- and out-of-profile packets are sent into the core of the Diffserv network. 
In the extreme case where the link connecting the edge router to the rest of the 
Diffserv network is cc,ngested, the edge router may drop packets. 
5.2 Core Router Mechanism 
The proposed H-MAQ core router mechanism is described it this section. The section 
begins with a discussicm on the allocation of internal buffers. Thereafter, the data flow 
is described. Finally, H-MAQ's drop policy is described. 
5.2.1 H-MAQ Intt~rnal Buffer Allocation 
As described in Section 3.3, the internal buffer of a router port is the primary buffer 
for packets that arc to exit on that port. In general, ports with higher data output rates 
can tolerate larger internal buffers without incurring excessive delay. This is because 
they are dequeued more frequently. The amount of buffer space that is allocated to the 
internal buffers of ou~put links is governed by the trade-off between link utilization 
and permissible delay. Allowing longer queues results in increased delays whilst 
shorter queues can resJlt in underflow leading to reduced link utilization. 
H-MAQ uses a per-aggregate buffer allocation policy. This means that each aggregate 











Chapter 5: Technical Specification for Implementation 
space that is allocated to an aggregate should be proportional to its share of the output 
link. This would provide similar delay and utilization characteristics for all 
aggregates. Nonetheless, this need not be the case. Should a given client require a 
customised service, :mch as a low-delay service, this could be provided simply by 
reducing the amount of internal buffer space allocated to that aggregate. This point 
highlights the high degree of flexibility that the proposed solution provides. 
5.2.2 H-MAQ Data Flow 
H-MAQ routers implement a hierarchical scheduling mechanism that provides 
guarantees to aggrefates by explicitly allocating link capacity to them. The core 
routers are required to pelform aggregate-aware and microl1ow-aware queuing. In 
addition, the mechanism takes into account the drop precedence of packets. Figure 18 
shows an example of a logical data path that may be found on the output port of a core 
router implementing H-MAQ. 
• • • 
N 





Figure 18: An example of an H-MAQ logical data path 
Aggregate 
Scheduler 
This is an example of a data path for the case where there are two aggregates sharing 
an output link. In pf<.Lctice, the number of aggregates could be far greater. The data 











Chapter 5: Technical Specificalion for Implementation 
allocated to a separate aggregate. Within each bank of queues, there is a separate 
queue for each microflow. There is a microflow scheduler allocated to each aggregate. 
This scheduler services each microflow queue belonging to that aggregate in turn 
using Deficit Round Robin (DRR). In the final stage. the scheduler puts the 
traffic from each aggregate onto the output link according to the aggregates' service 
level agreements. Any excess available bandwidth is shared out amongst backlogged 
aggregates in proportion to their service level agreements. 
5.2.3 H-MAQ Drop Policy 
H-MAQ uses a drop policy that is an enhancement of the QSDRR drop policy. This 
enhancement extends QSDRR to be able to handle multiple drop precedence levels as 
well as multiple aggregates. The drop policy maintains a separate drop queue pointer 
for each aggregate. This improves isolation between aggregates. The drop policy for a 
single aggregate is explained below. This mechanism is duplicated for each aggregate. 
If the arrival of a packet means that an aggregate has exceeded its allowable buffer 
space, a search is performed to find the microflow queue with the highest number of 
bytes belonging to out-of-profile packets. This queue becomes the drop queue. The 
out-of-profile packet that is closest to the front of the drop queue is discarded. 
Assuming that the afgregate remains in excess of its allowable buffer space, packets 
will continue to be removed from the drop queue until it has fewer out-of-profile 
bytes than any other queue. Once this happens, the queue with the most out-of-profile 
bytes becomes the new drop queue. This process continues until the aggregate is no 
longer in excess of it~. allowable buffer space. 
Should the network be ill provisioned, it is possible that an aggregate could be in 
excess of its allowable buffer space whilst that aggregate has no out-of-profile packets 
enqueued. If this happens, a search is performed to find the queue with the most bytes 
in it. This queue becomes the drop queue. The packet that is at the front of the drop 
queue is discarded. Assuming that the aggregate remains in excess of its allowable 
buffer space and that no out-of-profile packets arrive, packets will continue to be 
removed from the drop queue until it has fewer bytes than any other queue. Once this 











Chapler 5: Technical Specification for Implementation 
an out-of-profile packet arrives, the drop queue will be changed to the queue with the 











Chapter 6: Design and Implementation of Testbed 
Chapter 6 
Design and Implementation of Testbed 
6.1 Choice of Platform 
Both discrete event simulations and network emulation can be used to evaluate 
network architectures. Network emulation is considered to be more desirable as it 
models real networks more closely. The following points relate to these models: 
• Emulated networks run real protocols. This improves their accuracy in modelling 
real networks. 
• The processing overhead in emulated networks is real. This, once again, improves 
their accuracy. 
• The accuracy of simulators as well as the modelling assumptions that they use are 
questionable. 
For these reasons the author chose network emulation to evaluate the proposed 
network architecture. 
The author considered USIng The Click Modular Router as his testbed router 
implementation [62]. Click routers run on multipurpose Linux machines and are 










Chapter 6: Design and Implementation of Testbed 
against using The Click Modular Router because multipurpose machines are not 
representative of the physical architectures of real Internet routers. 
The Washington University MSR was chosen as the platform for the author's 
implementation. This router's architecture is described in detail in Section 3.6 and is 
representative of CutTent high-pelfOlTOance routers. Because the MSR is a 
development router, t'1e code running on its SPCs is open source. This gave the author 
full control of the router's behaviour. It also meant that the author had access to many 
programming tools and macros that had been written for MSR developers. 
6.2 Hardware Components 
The testbed was implemented using a single MSR and two end stations. The MSR 
consisted of a WUGS with an SPC and a line card on two of its ports. No FPXs were 
used in the testbed as performing fast IP lookups was not considered to be of critical 
importance. 
The end stations were each implemented on 1GHz Pentium machines running 
NetBSD 1.4.] as their operating systems. Each end station had an APIC card that was 
connected to one of the two MSR line cards using fibre-optic cables. The MSR's 
control processor was run on one of the end stations. 
6.3 Network Topology 








• : ...--____ ---l 

















Chapter 6: Design and Implementation of Testbed 
This topology includes two aggregate traffic sources that send data to a Diffserv 
network. When the packets reach the Diffserv network edge routers, policing and 
remarking takes place This occurs on a microflow basis according to the aggregates' 
SLAs. The packets ale then sent to the core router and from there to the traffic sink. 
The link from the core router to the traffic sink is the primary resource that traffic 
sources compete for. The ACKs return to the traffic sources along the same path. 
There is no network congestion along the ACK path. A delay is incurred on all links 
and network elements. 
The prevIOUS conceptual topology was emulated usmg the hardware components 
described in Section 6.2. Figure 20 shows how the topology was implemented using 
these components. 
NetBSD Machine MSR NetBSD 
2 Aggregate Traffic -~ 1;1 
Machine 
Core router; ~ 
Sources; 
_.L- W - Traffic Sink; Delay Emulator 
2 Edge Routers Data Collection 
Figure 20: Implementation of testbed 
Figure 20 shows th<:t both aggregate traffic sources and both edge routers were 
implemented on a single NetBSD machine. The traffic sink for both aggregate traffic 
sources was also implemented on a single NetBSD machine. The delay emulator was 
implemented on Port 1 of the MSR and affected packets on the return path only. The 
delay emulator's implementation allowed the author to select which traffic flows or 
aggregates should incur a delay, and how long that delay should be. Both RIO and H-
MAQ were implemented alternately on both ports of the MSR. The following sections 
describe the components of the testbed in detail. 
6.4 Traffic Sources 
It is important that the traffic sources are able to respond to feedback from the 
network. As a result, the use of traces as traffic sources is not appropriate. A number 
of existing software packages simulate network traffic sources. The author considered 











Chapter 6: Design and Implementation of Testbed 
not run on NetBSD. The author thus decided to write his own traffic generator 
program. 
When designing a network traffic generator, it is necessary to consider the nature of 
Internet traffic. Not only is the current makeup of Internet traffic applications 
divergent, as shown in Section 2.2.2, but this is changing continually. Furthermore, 
one needs to consider what the effect on application traffic makeup will be when 
technologies such as Diffserv make the transport of real-time traffic on IP networks 
possible. Presumably the best effort data traffic will continue to increase as it has been 
doing, but there will be a great increase in real-time traffic such as streaming voice 
and video. Because such traffic is likely to be carried using Diffserv, the application 
makeup of future Dif:'serv traffic may be very different from what it is currently. 
The author considen: d the viability of simulating streaming GSM encoded voice as 
well as streaming MPEG 2 encoded video. These would require data rates of 13Kbps 
and 6Mbps respectively. Because the author's evaluations required hundreds of traffic 
sources to be interleaved, it was clear that that the modelling of all of these traffic 
sources would not be possible on the testbed due to hardware limitations. 
The author thus decided to implement a generic traffic source that did not attempt to 
mimic the data flow of any application in particular. An advantage of this approach is 
that using a single generic traffic source instead of simulating a number of 
applications eliminated an extraneous variable from the experiments. Conclusions 
could thus be drawn more clearly from testbed data. 
The aggregate traffic source consisted of between 50 and 300 independent processes. 
Each process represented an application level traffic source. Each application traffic 
source opened a single TCP or UDP socket that was used to send packets into the 
network. The packet'> were sent into the network with an exponentially distributed 
random inter-arrival lime. This minimized correlation between the packet send times 
of each application traffic source. In times of extreme network congestion, the 
exponential traffic sources would decay into persistent sources. The TCP sources 
were, of course, also limited by TCP flow-control. The application traffic source 











Chapter 6: Design and Implementation of Testbed 
resources. The packet size used was 500 bytes, which is close to the mean value for 
Internet traffic. The average data rate of the traffic and the transport protocol were 
determined by the experiment. 
6.5 Edge Router 
As stated in Section 5.1, the role of the edge router is to police and, where necessary, 
re-mark packets. These functions are performed on a per-microflow basis using the 
algorithm described in Section 5.1. The policing and re-marking was performed on 
the same NetBSD machines as the traffic sources. Because the policing and re-
marking is performed on a per-microflow basis, it was decided that each application 
traffic source proces;; would also be responsible for policing that microflow. The 
DSCP of each departmg packet thus only needed to be set once. The police rate was 
determined by the experiment. 
6.6 Core Router 
The core router was irnplemented using an MSR. The MSR's external link rates were 
set to 4 Mbps and its mternallink rates were set to 6 Mbps. To ensure that the router's 
backlogged packets were queued primarily at the internal buffers of SPCs, the rate at 
which packets were dequeued from here was set to 1.5 Mbps. The size of the internal 
buffers was limited to 200 packets. Because of the low load on the input POtts of the 
router and the fact chat only two ports were used, it was not necessary to use 
distributed queuing on the MSR. 
The MSR uses a hashing table to identify which microflow the packets that are to be 
enqueued in the internal buffer belong to. Because the microflows in the experiments 
differed only in destination port and protocol, it was necessary to modify the hashing 
function to give greater weight to these elements. It was also necessary to increase the 
total size of the hash table to 10 000. Only once these modifications had been made 
did each microflow tend to hash to a unique hash value. 
In order to evaluate the performance of H-MAQ, it was necessary to test its 
performance as well as the performance of the best performing microflow-unaware 











Chapter 6: Design and Implementation of Testbed 
mechanisms were thus implemented on the router. The mechanisms were responsible 
for buffering packets before sending them towards their destination as well as for 
providing fair link uti lization. 
The mechanisms were implemented as kernel modules on the spes. The RIO 
implementation was written following the relevant specifications [25,28]. A simple 
FIFO queue that was written by F. Kuhns of Washington University was used as the 
basis for the author's RIO implementation. Refer Appendix B for a description of the 
RIO development and testing procedure. As stated in Section 2.6.1.2, RIO includes a 
number of tuneable plrameters. The author chose values for the in- and out-of-profile 
Lintenn (LintennIn and LintermOut) as recommended by the RIO designers. The 
choice of ThreshIn, .'I1axThreshIn, ThreshOut and MaxTlzreshOut were determined 
firstly by the acceptable buffer occupancy levels for experiments; as well as by the 
designer's recommendation of having staggered parameters with maximum thresholds 
of at least twice the rrinimum thresholds. The q_weight parameter was chosen so as to 
fall within the range set by the RED designers. The RED designers specify that 
q_weight should be above 0.001 and below the value that is determined by the 
following equation: 
L 1 
(1- q - weight)l+l - 1 < TJ h 
+ + 2roS 
q _ weight 
Where L = the maxImum burst size. Letting Thresh = MaxThreshIn = 30 and 
allowing a maximum burst size of 40 packets, the following constraint was arrived at: 
q_weiglzt < 0.09. Thus, 
0.09 > q_weight > 0.001. 
For this simulation i' was found that a value of 0.04 for q_weight produced good 
results by allowing bursty traffic whilst converging to equilibrium in an acceptable 
time. 



















Ma.tTlzreshln = 160 
ThreshIn = 80 
MaxThreshOut = 60 
Th reslz Out = 30 
Chapter 6: Design and Implementation of Testbed 
H-MAQ was implemented according to the specification given in Section 5.2. The 
author's implementation used static provisioning of resources to the two aggregates as 
well as static drop thresholds on each queue. If ever an aggregate was to be dequeued, 
but it had no backlogged packets, the excess capacity would be given to the other 
aggregate. The authoc's implementation was created by extending a QSDRR buffer 
management mechanism written by A. Kantawala and F. Kuhns of Washington 
University. 
6.7 Traffic Sink 
The traffic sink was implemented on a single NetBSD machine. This machine, which 
was connected to Pon 2 of the MSR, was responsible for listening for incoming UDP 
and TCP connections. Once a connection had been made, the sink would receive all 
incoming packets. Upon receiving a packet, the sink would write the data pertaining 
to that packet to a file. Once a packet's relevant data had been stored the packet was 
discarded. 
6.8 Delay Emulator 
The author used a loadable kernel module, written by A. Kantawala, on the MSR to 
emulate network delay. This module buffered arriving packets for a fixed duration as 
determined by the experiment. Because it is possible to specify which microflows the 
delay emulator affects, the author was able to emulate different round trip times for 
different microflows or aggregates. The delay emulator was run on Port 1 of the MSR 











Chapter 6: Design and Implementation of Testbed 
6.9 Data Collection 
Data collected during experiments needed to include the data rate of microflows and 
aggregates as well as the transit delay for packets. Experimental data collection was 
performed as follows. 
All end stations were synchronised using the Network Time Protocol (NTP), which is 
an open source program used to synchronise computers over networks [65]. NTP is 
said to provide sub-millisecond precision when used to synchronise computers that 
are on the same LAN [66]. The NTPQ query program was used to inspect the offsets 
between two of the end stations. The results of this command, run at 11 :40 on 05 
March 2003, confirm·~d an offset of only 7 microseconds. This is illustrated in Figure 
21. 
refid st t when reac~ delay offset 
*yucatan.ucc.eXI LOCAL(O) u 1 64 377 0.420.0G7 C.93 
Figure 21: A sample ntpq -p connnand's output 
Before sending data packets, traffic sources would check the current time using the 
gettimeofday function and place this in the packet payload. The packets would pass 
through the testbed network until being received by the traffic sink. Upon receipt, the 
sink checked the cun'ent time and wrote this, together with the time that the packet 
was sent, to a log file. Using the send time and the arrival time, it was possible to 
determine the transit delay of the packet. In addition to the send and receive times, 
other information pertaining to the packets was stored. This information included the 
DSCP of the packet, the source address of the packet, the source port of the packet 
and the packet size. A log file was thus produced that detailed the network behaviour 












Chapter 7: Evaluations. Results and Analysis 
Chapter 7 
EvaluatioIls, Results and Analysis 
This chapter describes the evaluations that were performed on the testbed, the results 
obtained and an analysis of the results. The evaluations compare the performance of a 
Diffserv AF network with RIO implemented in its core routers to a Diffserv AF 
network with H-MAQ implemented in its core routers. For each round of evaluations, 
the set-up of the testbed is described. The results of the evaluation and then an 
analysis of the result) follow this. Note that because all data was collected at the 
traffic sink, all statistics for throughput and fairness quote the effective throughput 
values also known a:; goodput. 
7.1 Microflow F'airness - Transport Protocol 
Following is a comp<.trison between the performance of a Diffserv network that has 
RIO in its core and that of a Diffserv network with H-MAQ implemented in the core, 
using microflow fairr·ess according to transport protocol as its performance metric. 
The scenario of competing individual end-users within an aggregate using different 
transport layer protocols is being emulated here. An example could be the case where 
one user is browsing the web (TCP), and their colleague is simultaneously using a 











Chapter 7: Evaluations, Results and Analysis 
7.1.1 Evaluation 
In this round of evaluations, 100 microflow traffic sources were started. Of these, 90 
were TCP sources and the remaining 10 were UDP sources. The TCP to UDP ratio 
was thus similar to what would be found in the Internet. The sources all started 
between 0 and 2 seconds. They sent packets through the network for 100 seconds. 
Data was collected between 10 and 90 seconds. The target rate of the sources and the 
police rate were as described in the Results section. 
7.1.2 Results 
Figure 22 gives the results of the evaluation that considers microflow fairness 
according to transport protocol. 
Proportion of Bandwidth Consumed by TCP Sources according 
to Target Rate 
0.95 +-----------~----------__j 
..c: 
~ 0.85 n------- --- -------------__j 
'i 
-g 0.8 - -
eo: - -.· .•.............•.............•............. :Ii 
o 0.75 *-- ---- ----'-------=--------1 
c 
o 't 0.7 -1------------------------1 
o 
Q, 





5 10 20 30 40 50 60 70 










- - .. - - H-MAQ 
Target 
30 Kbps 
Figure 22: The proportion of bandwidth consumed by the TCP sources when they make up 90% 
of the competing sources 
Figure 22 shows the results of 20 experiments in which the source target rates, the 
police rates, as well as the core router mechanism were all varied. The graph shows 
that when the sources' target rates were set to 10 Kbps, the TCP sources consumed 
between 89 and 90% of the used bandwidth. This was irrespective of whether RIO or 
H-MAQ was used in the core. When the source target rates were set to 30 Kbps and 











Chapter 7: Evaluations. Results and Analysis 
bandwidth. When H-MAQ was used in the core, the TCP sources consumed 
approximately 77% of the used bandwidth. Variations in the police rate had little 
effect on the proportion of bandwidth consumed by the TCP sources. 
7.1.3 Analysis of Results 
Because the TCP sources make up 90% of the total number of sources, they should 
ideally consume that proportion of the bandwidth. However, because only TCP has 
transport level flow-control, it receives significantly less than 90% as congestion 
levels increase. This is demonstrated in Figure 22. A comparison between the 
performance of RIO and H-MAQ reveals that when the target rates were 10 Kbps and 
there was thus no congestion, both the RIO and H-MAQ networks showed good 
results. When the target rates were 30 Kbps, both the RIO and the H-MAQ network 
cores showed reduced fairness. The RIO core was, however, 4% fairer to the TCP 
sources than the H-MAQ core. 
The RIO core perfornled better than the H-MAQ core because of differences between 
the traffic profiles of rcp and UDP. Whereas UDP has a smooth traffic profile, TCP 
has a more bursty traffic profile due to its flow-control mechanism. This is explained 
in Section 2.1.3.2. Because the RIO drop mechanism monitors the average buffer 
occupancy level rather than the current buffer occupancy level as H-MAQ does, 
TCP's bursts are more easily absorbed by the RIO buffer. This puts TCP at an 
advantage. 
The performance comparison between H-MAQ and RIO may be summarised as 
follows: In times of low congestion, H-MAQ and RIO had similar perfOlmance. 
When the congestion levels were high, RIO performed 4% better than H-MAQ. 
7.2 Microflow F'airness - Round Trip Times 
Following is a comparison between the performance of a Diffserv network that has 
RIO in its core and that of a Diffserv network with H-MAQ implemented in the core, 
using microflow fairness according to round trip time as its performance metric. The 
scenario being emulated here is that of competing individual TCP end-users within an 











Chapter 7: Evaluations, Results and Analysis 
where one user is performing an FTP transfer to a nearby branch, and their colleague 
is simultaneously doing an FTP transfer to a branch in a distant city. Only TCP is 
considered because the concept of round trip time is meaningless in the case of UDP. 
7.2.1 Evaluation 
In this round of evaluations, 120 TCP microflow traffic sources were started between 
o and 2 seconds. They sent packets through the network for 100 seconds. Data was 
collected between 10 and 90 seconds. The target rate of the sources and the police rate 
were as described in the Results section. On the return path, half of the microflow 
traffic sources were subject to a delay with approximate values of 50, 200 or 400ms. 
7.2.2 Results 
Figure 23 gives the results of the evaluation that considers TCP microflow fairness 












~ 0.35 .. 
0 
0. 




Proportion of Bandwidth Consumed by Delayed Microflows 
10 15 
According to Target Rate 
..... 





. . .. 
35 
Sender Target Rate (Kbps) 
- .. - .... - ... -
.............. 
. . -. . . 








- -+- H-MAQ 
Delay 
200ms 
- -. - - RIO 
Delay 







400ms 50 L-____ ----' 
Figure 23: The proportion of bandwidth consumed by the delayed sources according to target 











Chapter 7: Evaluations, Results aDd Analysis 
Figure 23 shows the results of 30 experiments in which the round trip times for half of 
the microflows were varied, as well as the source target rates and the core router 
mechanism. This shows that in general , as the delay time increases, fairness 
decreases. On the left-hand side of the graph, the delayed sources received 
approximately 50% Cif the resources. This proportion decreased persistently as the 
















Proportion of Bandwidth Consumed by Delayed Microflows 




10 20 30 40 50 
Police Rate (Kbps) 
Figure 24: The proportion of bandwidth consumed by the delayed sources according to police 
rate when the target rate is 50 Kbps and the delay is 200ms 
Figure 24 shows the results of 8 experiments where the target rate was set to 50 Kbps 
and the delay for half of the microflows was set to 200ms, but both the police rate and 
the core router mechanism were varied. The resultant series are flat graphs for both 
RIO and H-MAQ. 
7.2.3 Analysis of Results 
Because half of the TCP microflows were delayed in the experiments, the delayed 
sources should ideally consume half of the used bandwidth. The delayed sources 
tended to receive less than this proportion due to the fact that having long round trip 











Chapter 7: Evaluations, Results and Analysis 
shows that for both RIO and H-MAQ, the longer the round trip time of the delayed 
rnicroflows, the less bandwidth was consumed by these sources. 
A further observation is that as the target rates increase, the fairness decreases. This 
phenomenon is explained as follows : In Figure 23, when the target rates were only 5 
Kbps, TCP flow-control did not limit sending rates. As the target rates increased, the 
inability of the delayed sources to ramp up their sending rates proportionately became 
a limiting factor. This trend continued towards the right-hand side of the graph where 
the delayed sources were at the greatest disadvantage. 
Figure 23 shows that H-MAQ provided fairness that varied from being equal to that of 
RIO to being 18% better than RIO. 
Figure 24 shows that varying the police rate had no appreciable effect on the relative 
throughput of the sources that varied in terms of their round trip times. This held true 
for both RIO and H-MAQ. 
7.3 Aggregate Fairness and Control- Transport Protocol 
Following is a comparison between the performance of a Diffserv network that has 
RIO in its core and that of a Diffserv network with H-MAQ implemented in the core, 
using aggregate fairness and control according to transport protocol as its 
performance metric. This round of evaluations emulates the scenario when two 
competing aggregate clients of a Diffserv network are identical in terms of SLA and 
number of microflows, but whilst one client sends predominantly TCP traffic, the 
other sends predominantly UDP traffic. 
7.3.1 Evaluation 
In this round of evaluations, 300 microflow traffic sources were started. Of these, half 
belonged to Aggregate 1 and half belonged to Aggregate 2. Aggregate l' s sources all 
used TCP and Aggregate 2's sources all used UDP. The sources all started between 0 
and 2 seconds. They sent packets through the network for 100 seconds. Data was 











Chapter 7: Evaluations, Results and Analysis 
were as described in the Results section. In the case of H-MAQ, each aggregate was 
explicitly allocated half of the available bandwidth. 
7.3.2 Results 
Figure 25 gives the results of the evaluation that considers aggregate fairness 
according to transport protocol. 
-5 
Proportion of Bandwidth Consumed by TCP Traffic Aggregate 
According to Target Rate 
0.55 .,.- -------------------------, 




~ 0.4-t--------------== ...... ::::------------j 
~ ,----~ 
o -H-MAQ 
.~ 0.35 ;-----------------"""...,.....--------l """*-RIO .. 
o 
c. e 0.3 -1------------- ------ - ---""""""',.-----; 
0.. 
0.25 ;------------ ---------------l 
0.2 -1-----,------,------,-----...,.-------; 
5 6 7 8 9 IO 
Target Rate (Kbps) 
Figure 25: The proportion of bandwidth consumed by the TCP aggregate according to target 
rate when all packets are in-profile 
Figure 25 shows the proportion of used bandwidth consumed by the TCP aggregate 
when all packets are marked as being in-profile . When the target rate was 5 Kbps, 
50% of the throughput was from the TCP aggregate. This was so for both RIO and H-
MAQ. In the case of RIO, as the target rates increased the proportion of TCP 
aggregate traffic decreased linearly. In the case of H-MAQ, increases in the target 












Chapter 7: Evaluations, Results and Analysis 




















14 16 18 20 
22 \0 9 8 7 
Police Rate (Kbps) 
Target Rate (Kbps) 
Figure 26: The proportion of bandwidth consumed by the TCP aggregate according to police 
and target rates when RIO is used in the network core 
Figure 26 shows the proportion of used bandwidth consumed by the TCP aggregate 
when RIO is implemented in the core of the network. This figure shows how this 
proportion varies according to both the target rate and the police rate . The graph 
shows that although the TCP aggregate's proportion of bandwidth decreases with 
increased target rates, there is an optimal police rate that increases the TCP 
aggregate's proportion. In the case when the target rate is 10 Kbps, the proportion of 
bandwidth consumed by the TCP aggregate levels off at 30% but peaks at 41 % where 











Chapter 7: Evaluations. Results and Analysis 






0.35 '" c ~ 
~ 











5 16 18 20 22 10 7 6 9 8 
Police Rate (Kbps) Target Rate (Kbps) 
Figure 27: The proportion of bandwidth consumed by the TCP aggregate according to police 
and target rates when H-MAQ is used in the network core 
Figure 27 shows the results for the same evaluation as before, except that in this case 
the core router implemented H-MAQ. This graph shows that for all police rates and 
target rates, the TCP aggregate received a constant 50% of the used bandwidth. 
7.3.3 Analysis of Results 
Because the aggregates are identical in terms of their SLA, they should each consume 
half of the used bandwidth . In Figure 25, when the target rate was only 5 Kbps, the 
TCP aggregate received 50% of the bandwidth. This is because there was little 
contention within the network and the microflow sources were all able to send packets 
at their target rates. In the case where RIO was implemented in the core, as the target 
rate increased, the proportion of bandwidth consumed by the TCP aggregate 
decreased linearly. This decrease signified degradation in the network's performance. 
In the case of H-MAQ, the TCP aggregate consumed a fixed 50% for all target rates. 












Chapter 7: Evaluations, Results and Analysis 
Figure 26 demonstrates that when the RIO network is congested, aggregate fairness 
may be improved by applying effective policing. The "hump" in the graph is 
explained as follows: When the police rate was low, all packets were out-of-profile 
and where thus treated identically. When the police rate was high, all packets were in-
profile and were still treated identically. When police rates were similar to the target 
rates, the situation arose where most UDP packets were out-of-profile and most TCP 
packets were in-profile . This is because TCP's flow-control was causing the TCP 
packets to be preferentially marked. It was thus shown that RIO's performance can be 
improved by up to 37% when effective policing and marking is performed at the 
network edge. But even so, RIO's performance continued to degrade with increased 
target rates. 
By contrast, Figure 27 demonstrates that when H-MAQ was used, all combinations of 
police rates and target rates resulted in the TCP aggregate consuming 50% of the used 
bandwidth. 
In summary, although effective policing was shown to improve RIO's performance, 
this was still shown to degrade consistently with increased target rates. The network 
that had H-MAQ in its core was shown to be fair in all cases and thus demonstrated 
the precise controllability of H-MAQ networks. 
7.4 Aggregate Fairness and Control - Round Trip Times 
Following is a comparison between the performance of a Diffserv network that has 
RIO in its core and that of a Diffserv network with H-MAQ implemented in the core. 
This section uses aggregate fairness and control according to round trip time as its 
performance metric. This round of evaluations compares the throughput for two 
identical aggregates that have different round trip times on their TCP flows. The 
situation being emulated is one where a Diffserv network has two competing 
aggregate clients. The clients are identical in terms of their SLA and number of 
microflows, but they differ in that one client is sending TCP traffic to a nearby 











Chapter 7: Evaluations. Results and Analysis 
7.4.1 Evaluation 
In this round of evaluations, 300 TCP traffic sources were started, half belonging to 
Aggregate 1 and half belonging to Aggregate 2. Aggregate l' s packets were subject to 
a delay on the return path. The duration of this delay was detennined by the 
experiment. The sources all started between 0 and 2 seconds. They sent packets 
through the network for 100 seconds. Data was collected between 10 and 90 seconds. 
The target rate of the sources was as described in the Results section as was the police 
rate. In the case of H-MAQ, each aggregate was explicitly allocated a half of the 
available bandwidth. 
7.4.2 Results 
Figure 28 gives the results of the evaluation that considers aggregate fairness when 
one of the aggregates is subject to a greater round trip time. 







0.5 ~;:t- .. . ' .' . 
...... 
... ""',, .... 
0.45 +-----"-;;--'- -;------------------1 . . 
"'111_" ... ·0 .. 
~ 0.4 +-----------"...., X .• -.-.--'----------------I 
o 
c 'f 0.35 
o 
0. 
o 6: 0.3 +-----------------~--.....:..-...... ,...L, []
0.25 +---------------------~., 
0.2 +-------.,..----------,.--------1 
5 10 15 20 







- - D· ·RIO 
Delay 
50ms 
--* .. RIO 
Delay 
200ms 
Figure 28: The proportion of bandwidth consumed by the delayed aggregate according to target 
rate when all packets are in-profile 
Figure 28 shows the proportion of used bandwidth consumed by the delayed 
aggregate when all packets are marked as being in-profile. When the sender target rate 
was 5 Kbps, the delayed aggregate consumed 50% of the bandwidth. This was so for 











Chapter 7: Evaluations, Results and Analysis 
RIO, as the target rate increased the proportion of TCP aggregate traffic decreased 
linearly. Furthermore, increases in the delay time resulted in further decreases to the 
delayed aggregate's proportion of bandwidth. In the case of H-MAQ, increases in the 
target rate had no effect on the proportion of TCP aggregate traffic. This was true for 
both the 50ms and 200ms delay. The proportion of bandwidth for the delayed 
aggregate remained a :;onstant 50%. 
RIO: Proportion of Bandwidth Consumed by Delayed Aggregate 
0.5 
0.45 .... .... 
0.4 "0 .; 
"0 







.s .... .. 
0 
Q. 
0.2 0 .. 
0.. 
2 0.15 
Police Rate (Kbps) 
0.1 18 20 22 24 15 10 5 
Target Rate (Kbps) 
Figure 29: The proportion of bandwidth consumed by the delayed aggregate according to target 
and police rates when the delay is approximately SOms and RIO is used 
Figure 29 shows the proportion of used bandwidth consumed by the delayed 
aggregate when RIO is implemented in the core of the network and Aggregate 1 is 
subject to an additional delay of 50ms. Figure 29 shows how the delayed aggregate's 
proportion of bandwidth consumed varies according to both the target rate and the 
police rate. The grap~ also shows that although the delayed aggregate's proportion of 
bandwidth decreases with increased target rates, there is an optimal police rate, which 
increases the delayed aggregate's proportion of bandwidth. For example, when the 
target rate is 15 Kbps, the proportion of bandwidth consumed by the delayed 












Chapter 7: Evaluations. Results and Analysis 
H-MAQ: Proportion of Bandwidth Consumed by Delayed Aggregate 
0.5 












0 ·z ... 
0 




20 22 24 15 10 5 
Police Rate (Kbps) 
Target Rate (Khps) 
Figure 30: The proportion of bandwidth consumed by the delayed aggregate according to target 
and police rates when the delay is approximately SOms and H-MAQ is used 
Figure 30 shows the results for the same evaluation as before, except that in this case 
the core router implemented H-MAQ. This graph shows that for all police rates and 
target rates, the delayed aggregate received a constant 50% of used bandwidth. 
7.4.3 Analysis of Results 
Because the aggregates are identical in terms of their SLA, they should both receive 
50% of the used bandwidth. On the far left-hand side of Figure 28, the delayed 
aggregate received 50% of the bandwidth . This was so because there was no 
contention within the network. The microflow sources were able to send packets at 
their target rates without being hindered by TCP flow-control or packet losses . In the 
case where RIO was implemented in the core, as the target rate increased, the 
proportion of traffic consumed by the delayed aggregate decreased linearly. 
Furthermore, increased delays resulted in still lower fairness for the delayed 
aggregate. This decrease signified degradation in the network ' s performance. In the 











Chapter 7: Evaluations. Results and Analysis 
and delays. This demonstrated that when H-MAQ was used, the fixed allocation of 
resources to aggregates was effective. 
Figure 29 demonstratl~s that when the RIO network is congested, aggregate fairness 
may be improved by applying effective policing. The "hump" in the graph is 
explained as follows: When the police rate was low, all packets were out-of-profile 
and where thus treated identically. When the police rate was high, all packets were in-
profile and were still treated identically. When police rates were similar to target 
rates, the situation arose where most delayed packets were in-profile and most other 
packets were out-of-profile. This is because TCP's flow-control was causing the 
delayed packets to be preferentially marked. It was thus shown that RIO's 
performance can be improved by up to 52% when effective policing and marking is 
performed at the network edge. But even so, RIO's performance continued to degrade 
with increased target rates. Furthermore, perfect fairness could never be achieved by 
RIO under these circJmstances because should the delayed sources indeed achieve 
their allocated 50% bandwidth consumption, the edge policing and remarking 
mechanism would nl) longer provide them with an advantage and the delayed 
aggregate's consumption would once again drop. 
By contrast, Figure 30 demonstrates that when H-MAQ was used, all combinations of 
police rates and target rates resulted in the TCP aggregate receiving 50% of the used 
bandwidth. 
In summary, although effective policing was shown to improve RIO's performance, 
RIO's performance was still shown to degrade consistently with increased target rates. 
The network that had H-MAQ in its core was shown to be fair under all tested 
conditions and thus demonstrated the precise controllability of H-MAQ networks. 
7.5 Aggregate Fairness and Control - Aggregate Service 
Level 
Following is a compaison between the performance of a Diffserv network that has 
RIO in its core and thlt of a Diffserv network with H-MAQ implemented in the core, 











Chapler 7: Evaluations. Results and Analysis 
evaluations compares the throughput for two aggregates. One of the aggregates has a 
SLA that allows it to send a quarter of the total traffic. The other aggregate's SLA 
allows it to send three quarters of the total traffic. Both aggregates have the same 
number of traffic sources. The emulation thus reflects the scenario where a Diffserv 
network has two competing aggregate clients. Both clients are the same size, but one 
of the clients has subscribed to a cheaper service option. 
7.5.1 Evaluation 
In this round of evaluations, between 50 and 200 traffic sources were started. Of 
these, half belonged to Aggregate 1 and half belonged to Aggregate 2. For each 
experiment, the target rate for all sources was set to 20 Kbps. The police rate for the 
traffic sources belongmg to the aggregate with the higher service level was set to 20 
Kbps. The police rate for the aggregate traffic sources belonging to the aggregate with 
the lower service level was set to 6.667 Kbps. The police rates were thus equivalent to 
that which a commercial Diffserv network would use to control the aggregates' 
utilization of the network according to their SLAs. The network's performance was 
evaluated for both TCP and UDP traffic. The sources all started between 0 and 2 
seconds. They sent packets through the network for 100 seconds. Data was collected 
between 10 and 90 ;econds. In the case of H-MAQ, the smaller aggregate was 
explicitly allocated a quarter of the available bandwidth. The larger aggregate was 
explicitly allocated the remaining three-quarters. 
7.5.2 Results 
Figure 31 gi ves the results of the evaluation that considers aggregate fairness when 












Chapter 7: Evaluations, Results and Analysis 
Proportion of Bandwidth Consumed by the Aggregate with a 
0.55 
Quarter-Sized Service Level 
0.5 -- - .. - H-MAQ 
~  ~ UDP ..c 0.45 
:c ... ~ ~ .~ ... "0 0.4 ... - .. - H-MAQ c ... ,,~ ~ TCP O!l Q:I ... ..... 0.35 0 
~,,~~ c ... 0 
'''::: ... ... RIO .. 0.3 0 ... 
"~~ 
UDP 0. 
0 ... .. ..... llo. 0.25 .... -
" 0.2 • RIOTCP ~ 0.15 
50 )00 150 200 
Total Number of Sources 
Figure 31: The proportion of bandwidth consumed by the aggregate with a quarter-sized service 
level according to the total number of TCP sources 
Figure 31 shows that when there were only 50 sources, the aggregate with the lower 
service level consumed approximately half of the used bandwidth. In the case of both 
RIO TCP and RIO UDP, as the number of sources increased, the proportion of 
bandwidth consumed by the aggregate with the lower service level decreased slowly. 
This rate of decrease accelerated after 100 sources so that when there were 200 
sources, the aggregate with the lower service level received less than 18% of the used 
bandwidth. 
In the case of H-MAQ, as the number of sources increased, the proportion of 
bandwidth consumed by the aggregate with the lower service level decreased. This 
trend continued until the aggregate with the lower service level had reached one 
quarter of the total bandwidth. This point was reached at 100 sources for UDP and 
150 sources for TCP. Beyond this point, the proportion of bandwidth consumed by 











Chapter 7: Evaluations. Results and Analysis 
7.5.3 Analysis of Results 
Because the aggregate with the lower service level has a SLA that entitles it to a 
quarter of the total bandwidth, it should indeed consume that proportion of the used 
bandwidth. Similarly, the aggregate with the higher service level should receive three 
quarters of the bandwidth. That is, of course, contingent on its traffic sources having 
sufficient packets to send. Figure 31 shows the proportion of bandwidth recei ved by 
that aggregate for an increasing number of sources. Because the police rate was 
unaltered as the number of sources increased, what this graph demonstrates is the 
changing proportion of bandwidth received by the aggregate with the lower service 
level as network congestion increases. When there were only 50 sources, there was no 
congestion in the network. Because no packets were dropped and target rates were 
reachable, each source was able to send packets at its target rate. The aggregates' 
consumption of bandwidth was thus proportional to the number of microflows in each 
aggregate. 
In the case of the R[O UDP and RIO TCP series, the increasingly negative slope 
reflects increasing network congestion. This is explained as follows: Under all 
network congestion L~vels, more of the packets belonging to the aggregate with the 
lower service level were marked as being out-of-profile. When there was little 
congestion, as was tre case when there were 50 sources, few packets were dropped, 
and so the fact that the aggregate with the lower service level was more strictly 
policed was of little consequence. As the number of sources increased, so too did 
congestion levels. This resulted in more packets being dropped. Because the 
aggregate with the lower service level had more out-of-profile packets in the network, 
a higher proportion of that aggregate's packets were dropped. The aggregate with the 
lower service level was thus over-penalized. The steep slope of the RIO UDP and 
RIO TCP series at the point where they intersect the "0.25 proportion of bandwidth" 
line indicates the rareness of the conditions under which Diffserv networks with RIO 
cores provide adequate fairness. Furthermore, it is not possible to control the 
operating point for Diffserv networks with regards to their position on the x axis of 











Chapter 7: Evaluations, Results and Analysis 
In the case of H-MAQ UDP and H-MAQ TCP, the proportion of bandwidth 
consumed by the aggregate with the lower service level converged to one quarter as 
the number of source:; increased, On the left-hand side of these convergence points, 
the fact that the aggregate with the lower service level received more than a quarter of 
the bandwidth reflects the fact that at this low utilization level, the aggregate with the 
higher service levelvVas not able to consume all of the resources that had been 
allocated to it. These resources were thus unused or were handed over to the other 
aggregate, The fact that the UDP sources converged before the TCP sources reflects 
the fact that UDP sources consume bandwidth more aggressively, due to having no 
transport layer flow-control. 
In summary, the network with the RIO core was shown to be ineffective in providing 
services to aggregates in proportion to their SLAs, and the network with the H-MAQ 
core was found to provide precise fairness and controllability with regards to resource 
allocation. 
7.6 Aggregate lfairness and Control - Nuntber of Flows Per 
Aggregate 
Following is a compHison between the performances of a Diffserv network that has 
RIO in its core and that when H-MAQ is implemented in the core using aggregate 
fairness and control as its performance metric. This round of evaluations compares the 
throughput of two aggregates. Both aggregates have identical SLAs. They do, 
however, differ in that one of the aggregates has one third as many traffic sources as 
the other. The emulation thus reflects the scenario where a Diffserv network has two 
competing aggregate clients. Both clients have the same SLA, but one client has twice 
as many end-users as .he other. 
7.6.1 Evaluation 
In this round of evaluations, 400 traffic sources were started. Of these, 133 belonged 
to Aggregate 1 and 2h7 belonged to Aggregate 2. For each experiment, the target rate 
for all sources was set to 15 Kbps. The police rates were varied with the constraint 
that the product of the number of sources for Aggregate 1 and the police rate for 











Chapter 7: Evaluations. Results and Analysis 
the pohce rate for Aggregate 2. The police rates were thus equivalent to those that a 
commercial Diffserv network would use to control aggregates' utilization of the 
network according to their SLAs. The network perfonnance was evaluated for both 
TCP and UDP traffic. The sources all started between 0 and 2 seconds. They sent 
packets through the network for 100 seconds. Data was collected between 10 and 90 
seconds. In the case of H-MAQ, each aggregate was explicitly allocated half of the 
available bandwidth. 
7.6.2 Results 
Figure 32 gives the results of the evaluation that considers aggregate fairness when 







0.6 = <'I 
~ .... 0.5 0 
= 0 
''':: 0.4 ... 
0 
Q.. 




Proportion of Bandwidth Consumed by Aggregate with Fewer Microflows 
o 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 
Percent of Delivered Packets that are In-Profile 











Figure 32: The proportion of bandwidth consumed by the aggregate with fewer microflows 
according to the percent of delivered packets that are in-profile 
Figure 32 shows tha ~ in the case of the RIO UDP and RIO TCP series, when the 
percent of delivered packets that were in-profile was either 0 or 100 (i.e. all delivered 
packets were out-of-profile or in-profile respectively) the aggregate with fewer 
sources consumed 33% of the used bandwidth. Between 45 and 95 on the x axis, the 











Chapter 7: Evaluations, Results and Analysis 
In the case of both H-MAQ UDP and H-MAQ TCP, each aggregate consumed 50% 
of the used bandwidth for all x axis values. 
7.6.3 Analysis of Results 
Because the aggregates are identical in terms of their SLA, they should both receive 
50% of the bandwidlh. In the case of the RIO UDP and RIO TCP series, when all 
packets were marked the same, as was the case on the far left and right-hand sides of 
the graph, the aggregate with a third of the microflows consumed that proportion of 
the used bandwidth. This is the part of the graph where policing was indiscriminate 
and all packets were treated identically. Each aggregate thus consumed bandwidth in 
proportion to its number of constituent sources. Between 45 and 95 on the x axis, the 
aggregate with fewer sources consumed more than half of the used bandwidth. This is 
explained as follows: Because Aggregate 1 had fewer sources, each of its traffic 
sources was allowed a higher data rate before being marked as out-of-profile. 
Between 45 and 95 on the x axis, most Aggregate 1 packets that were sent were in-
profile and most Aggregate 2 packets that were sent were out-of-profile. The policing 
mechanism was thus shown to be too effective at putting Aggregate 1 at an advantage. 
This was unfair to Aggregate 2. On the x axis, as one moves from 45 to 0, the 
proportion of Aggregate 1 packets that were in-profile also decreased. This evened 
out the treatment that Aggregate 1 and Aggregate 2 packets received. Although there 
were two instantaneous points at which the RIO network gave 50% of used bandwidth 
to both aggregates, the percentage of packets that are in-profile at any given time 
cannot be kept constant as this is governed by network utilization levels. 
In the case of H-MAQ UDP and H-MAQ TCP, the fact that the each aggregate always 
consumed 50% of the bandwidth indicates the high level of fairness and 
controllability of Diffserv networks with H-MAQ cores. 
In summary, a Diffsuv network with an H-MAQ core was shown to provide precise 
fairness and controlLlbility to aggregates with differing numbers of microflows for 
both TCP and UPD traffic. This is in stark contrast to the case of a RIO network core 











Chapter 8: Conclusions 
Chapter 8 
Conclusions 
From the results of Chapter 7, a number of conclusions may be drawn regarding the 
performances of Diffserv networks with RIO and H-MAQ network cores. These 
conclusions are pres(~nted below in two sections: microt1ow fairness, and aggregate 
fairness and control Finally, the overall performance of the proposed solution is 
considered in light of the previous sections. 
8.1 Microflow Fairness 
The problem statement of this thesis noted that networks with RIO cores have been 
shown to provide poor microflow fairness when competing microflows differ in terms 
of their transport protocol and round trip time. An alternative mechanism, namely H-
MAQ, was proposed, specified and implemented. The per-microflow fairness of a 
Diffserv network with H-MAQ in its core was compared with that where RIO was 
used in its core. The following points assess the relative performance of the two 
mechanisms. 
• When TCP traffic sources compete for resources against UDP traffic sources, 












Chapter 8: Conclusions 
• When TCP sources with long round hip times compete for resources against TCP 
sources with shoner round trip times, networks with an H-MAQ core were shown 
to provide equal to better perfonnance than networks with a RIO core. 
8.2 Aggregate I1~airness and Control 
The problem statement of this thesis noted that it is not possible to adequately control 
resource allocation in Diffserv networks when RIO is used in core network elements. 
An alternative mechanism, namely H-MAQ, was thus proposed, specified and 
implemented. This mechanism pennits precise resource allocation at an aggregate 
level. H-MAQ's perfonnance was compared to that of RIO for a number of cases 
where aggregates were competing for resources. The following points describe the 
perfonnance of the t\\'O mechanisms for different network conditions. 
• When aggregate:; with TCP traffic sources compete for resources against 
aggregates with lJDP sources, the networks with H-MAQ cores were found to 
provide fair and controllable resource allocation according to the aggregates' 
SLAs. This was n contrast to the networks with RIO cores, which are unfair to 
aggregates with TCP sources. 
• When TCP aggregates with long round trip times compete for resources against 
aggregates with ~,horter round trip times, the networks with H-MAQ cores were 
found to provide fair and controllable resource allocation according to the 
aggregates' SLA~;. This is in contrast to the networks with RIO cores, which are 
unfair to the aggft.~gates with long round hip times. 
• When aggregate~ with smaller SLAs compete for resources against aggregates 
with larger SLAs, the networks with H-MAQ cores were found to be effective in 
providing precise resource allocation according to the aggregates' SLAs. This was 
in contrast to networks with RIO cores, which can be unfair to aggregates of both 
smaller and larger SLAs depending on network congestion levels. This holds true 
for both TCP and UDP traffic. 
• When aggregates with many traffic sources compete for resources against 
aggregates with fewer traffic sources, networks with H-MAQ cores were found to 
provide fair and controllable resource allocation according to the aggregates' 











Chapter 8: Conclusions 
either aggregate, depending on network utilization levels. This holds true for both 
TCP and UDP tra:fic. 
8.3 Overall Performance ofH-MAQ 
At the microflow level, the performance of Diffserv networks with H-MAQ cores was 
shown to be comparable to that of Diffserv networks with RIO cores. At the aggregate 
level, networks with H-MAQ cores were shown to be capable of providing exact 
control of resource allocation. Being able to provide this high level of network control 
means that should H-MAQ be used in Diffserv AF networks instead of RIO, over-












Chapter 9: Recommendations and Future Work 
Chapter 9 
Recommel1dations and Future Work 
There is scope for'esearch aimed at improving the fairness of H-MAQ at the 
microflow leveL Thi:; work should consider alternative drop policies as well as the 
feasibility and performance of a system whereby a separate RIO queue is allocated to 
each aggregate. 
In the H-MAQ core. when an out-of-profile packet needs to be dropped from a 
microflow queue, a search is performed to find the out-of-profile packet that is closest 
to the front of the q leue. This packet is then dropped. The author suggests that a 
modification to the H-MAQ mechanism be investigated which causes the packet at 
the front of the queue to be dropped. This is regardless of whether it is 1n- or out-of-
profile. Because all packets in a given microflow queue are from the same traffic 
source, it should not negatively affect that microflow source if an in-profile packet 
were dropped insteac of an out-of-profile packet. The benefits of this modification 
would be as follows: 
• Dropping the pac«et that is at the front of the queue rather than doing a search to 
find the first out-of-profile packet would result in a more computationally efficient 
algorithm. 
• Because the dropped packet would always be the one at the front of the queue, the 
traffic source would be informed of the packet loss sooner. This would improve 











Chapler 9: Recommendations and Future Work 
The effect of different packet marking policies needs to be investigated, this is 
especially necessary when microtlow and aggregate-unaware mechanisms such as 
RIO are used in the n,;twork core. In particular, there should be investigations into the 
performance of Diff:,erv networks when marking mechanisms mark only surplus 
packets as being out-,)f-profile rather than marking all packets as being out-of-profile 











Appendix A: Asynchronous Transfer Mode (ATM) 
Appendices 
Appendix A: Asynchronous Transfer 
Mode 
This appendix introduces ATM as well as well as describing its use in carrying IP 
traffic. 
A.I Introduction to ATM 
Asynchronous Tram fer Mode (ATM) IS a networking technology based on 
asynchronously switching small 53 Byte cells through networks. ATM is connection 
orientated, which means that an explicit connection setup is required before data may 
be sent across an ATM network. Because ATM was designed to support real-time 
traffic such as voice and video, it uses sophisticated congestion control mechanisms 
that interleave data traffic with control cells. These congestion control mechanisms 
are able to limit traffi,.; on a per-connection basis as well as a per-aggregate basis. 
ATM data links are divided into a number of virtual paths, each of which is divided 
into a number of viI tual channels. This configuration enables network elements to 
easily perform shaping or policing on paths or on individual channels. A Virtual 
Channel Identifier (VCl) together with a Virtual Path Identifier (VPI) may be used to 











Appendix A: Asynchronous Transfer Mode (ATfVI) 
(PVC) is a permanent connection between two A TM end systems that uses a fixed 
VPI/VCI. 
A.2 IP Over A TM 
ATM is implemented extensively throughout the Internet. This is done using the IP 
over ATM architecture. IP over ATM works as follows: IP packets entering the ATM 
network are broken into a number of A TM cells. The celIs are sent through the A TM 
network. At the egress of the A TM network, the IP packets are re-assembled 











Appendix B: Rio Development and Testing 
Appendix B: RIO Development and 
Testing 
Because the author's implementation of RIO was to be used as the benchmark against 
which the proposed nechanism was to be compared, it was necessary that the RIO 
implementation be COITect. Note that although the proposed mechanism was tested in 
a similar way to that c.f RIO, its verification is not be presented in a separate appendix 
as the results of this study adequately demonstrate its correctness. 
B.l RIO Development 
The RIO mechanism was implemented using a simple FIFO drop-tail queue as a 
template. This kernel module was written by Fred Kuhn of Washington University's 
Applied Research Laboratory. Using a drop-tail module as a template ensured that the 
basic buffer management operations such as initialising a packet scheduler and 
dropping an arriving packet had already been verified. The RIO module was 
implemented according to pseudo-code obtained from the relevant papers [25,28]. 
B.2 RIO Testing 
The first stage of testing consisted of a number of complete code walkthroughs. 
DUling this process, each line was critically evaluated in the context of the program. 
Once this was complete, the second stage of testing began. During this stage, the 
author performed real-time monitoring of the variables that formed part of the RIO 
mechanism. This took place whilst packets were moving through the router. These 
variables were outputled to a terminal window by the spes on with the RIO kernel 
modules were run. Only once the author was satisfied that RIO's internal variables 
were correct did the final round of tests begin. The final round of tests was aimed at 











Appendix B: Rio Development and Testing 
Figure 33 not only demonstrates that the RIO implementation periormed as expected, 
but also offers an insight into how the mechanism works. The graph shows the delay 
incurred from the time that each packet was transmitted until the time that it reached 
the receiver. This delay is a function of the time spent by packets in the buffers of the 
router. The network topology was as described in Section 6.3. Figure 33 compares the 
delay when a simple FIFO drop-tail queue was implemented at the router with the 
delay when a RIO queue was used. In the case of RIO, there is a further comparison 
between when the packets are policed as being in-profile versus when they are out-of-
profile. This distinction is meaningless in the case of drop-tail as drop-tail ignores the 
DSCP of packets. In this experiment, 70 UDP traffic sources were started 
simultaneously. The;1 each sent out 500 byte packets that had exponentially 
distributed inter-ani val times with an average data rate of 40 Kbps. This meant that a 
total of 2.8 Mbps were being sent. A maximum number of 200 packets were permitted 
in the MSR's output buffer. In the case of RIO, Threshln, MaxThreshln ThreshOut 
and MaxThreshOut were set to 40, 80, 90 and 140 respectively. The router's output 
link data rate was throttled to approximately 160kbps to ensure congestion in the 
router's output buffer. Note that the delay incurred in the testbed router was far 






UDP Packet Delay Comparison Between Drop-Tail and RIO 
/ - . 






Figur.e 33: Relative queuing delay for RIO and drop-tail 
"'Drop-Tail- 70 
UDP Sources 
at 40 Kbps 
~RlO -70UDP 
sources at 40 
Kb ps. In-prot ile 
packets 
*RIO -70UDP 
so urces at 40 
Kbps.Out-ot-











Appendix B: Rio Development and Testing 
Figure 33 shows thvt in the case of the drop-tail queue, the buffers filled up 
progressively at the start of the experiment and remained at that high level. In the case 
of RIO with in-profile packets, the buffers filled up as with drop-taiL They then 
stabilized at a lower k:vel. In the case of RIO with out-of-profile packets, the buffers 
filled up as before, but stabilized at a still lower level. These three levels reflect the 
maximum number of packets allowed in the FIFO drop-tail queue, MaxThreshln and 
MaxThreshOut respectively. 
A further point that is of interest is that for both RIO data series, there was an initial 
overshoot before buff.;r occupancy levels stabilized. This overshoot was expected, as 
there is a lag before RIO's average buffer occupancy variables reach their maximum 













[1] Webopedia: "The 7 Layers of the OSI Model", [Online]. Available: 
webopedia.intemet.comJquick_ref/OSCLayers.asp, August 2003 
[2] IETF Charter: "IP Version 6 Working Group", [Online]. Available: 
www.ietf.org/html.charterslipv6-chatter.html, August 2003 
[3] S. Deering, R. Hinden, "Internet Protocol, Version 6 (JPv6) Specification", RFC 
2460, December 1997 
[4] A. Odlyzko, "[nternet growth: Myth and reality, use and abuse", iMP: 
Information Impacts Magazine, November 2000 
[5] TeleGeography, Inc. Press Release: "Global Internet Backbone Growth Slows 
Dramatically" , [Online] . Available: www.telegeography.com/press/releases/ 
2002/16-oct-2002.html, October 2002 
[6J A. Nikologianms, M. Katevenis, "Efficient Per-Flow Queuing in DRAM at OC-
192 Line Rate using Out-of-Order Execution Techniques", Proc. of IEEE 
International Ccnference on Communications, Helsinki, June 2001 
[7] RedHerring Magazine: "Explained: How Internet core routers deliver", [Online]. 
Available: www.redherring.comJmaglissuel02/800020080.html. September 
2001 
[8] V. Kumar, T. Llkshman, D. Stiliadis, "Beyond Best Effort: Router Architectures 
for the Differentiated Services of TomOlTow's Internet", IEEE Communications 
Magazine, page 152, May 1998 
[9] Light Reading: "Internet Core Router Test", [Online]. Available: 
https:/ Iwww juniper.netlproducts/features/core/core_routectest. pdf, March 
2001 
[10] K. Thompson, G. Miller, R. Wilder, "Wide-Area Internet Traffic Patterns and 
Characteristics (Extended version)", [Online]. Available: www.vbns.net/ 












[11] Caida: "Top applications (bytes) for Subintetface 0[0]: SD-NAP Traffic", 
[Online J. A vai lable: www.caida.org/analysis/workloadlbyapplication/sdnap/. 
June 2002 
[12] S. McCreary, K Claffy, "Trends in Wide Area IP Traffic Patterns", [Online]. 
Available: www.caida.orgloutreach/papers/20001 AIXOOOSI AIXOOOS.html, 
September 2002 
[13] Caida: "Packet Sizes and Sequencing", [Online]. Available: www.caida.orgl 
outreach/resourcesllearn/packetsizes/, August 2002 
[14] A. Broido, K Claffy, E.Nemeth, "Packet arrivals on rate-limited Internet links", 
[Online]. A vai lable: www.caida.orgl-broido/corallpackarr.html. November 
2000 
[IS] T. Bonald, S. Oueslati-Boulahia, J. Roberts, "IP Traffic and QoS Control -
Towards a Flow-aware Architecture", Proc of World Telecommunications 
Congress, Paris, September 2002 
[16J IETF Charter: "Integrated Services Working Group", [OnlineJ. Available: 
www.ietf.orglhtnl.charters/intserv-charter.html, September 2000 
[17J R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, "Resource ReSerVation 
Protocol (RSVP) Version 1 Functional Specification", RFC 220S, September 
1997 
[18] IETF Charter: ''Differentiated Services Working Group", [Online]. Available: 
www.ietf.org/html.charters/diffserv-charter.html, March 2002 
[19] K. Nichols, S. Blake, F. Baker, D. Black, "Definition of the Differentiated 
Services Field (DS Field) in the IPv4 and lPv6 Headers", RFC 2474, December 
1998 
[20] 1. Heinanen, F. Baker, W. Weiss, J. Wroclawske, "Assured Forwarding PHB 
Group", RFC 2597, June 1999 
[21] N. Seddigh, B. Nandy, P. Pieda, "Bandwidth Assurance Issues for TCP Flows in 
a Differentiated Services Network", Proc. of the IEEE GLOBECOM, Rio De 
Janeiro, page 6, December 1999 
[22] K. Chan, R. Sahita, S. Hahn, K. McCloghrie, "Differentiated Services Quality of 
Service Policy Information Base", RFC 3317, March 2003 
[23] N. Li, M. Borrego, S. Li, "Achieving per-flow fair rate allocation in Diffserv", 
ACM Transactons on Modelling and Computer Simulation, Volume 11(2), 












[24] T. Ferrari, "End·To-End Performance Analysis with Traffic Aggregation", Proc. 
of TNC, Volume 34(6), page 905, May 2000 
[25] D. Clark, W. Fang, "Explicit Allocation of Best Effort Packet Delivery Service", 
IEEE/ ACM Transactions on Networking, Volume 6(4), August 1998 
[26] R. Makkar, I. Lambadaris et aI, "Empirical Study of Buffer Management 
Schemes for Di ffserv Assured Forwarding PI-IB", Proc. of Ninth International 
Conference on Computer Communications and Networks, Las Vegas, October 
2000 
[27] J. Heinanen, F. Baker, W. Weiss, J. Wroclawske, "Assured Forwarding PHB 
Group", RFC 2597, June 1999 
[28] S. Floyd, V. Jacobson, "Random Early Detection Gateways for Congestion 
Avoidance", IEEE/ACM Transactions on Networking, Volume 1(4), page 397, 
August 1993 
[29] M. El-Gendy, K. Shin, "Equation-Based Packet Marking for Assured 
Forwarding Services", IEEE INFOCOM, New York, 2002 
[30] I. Yeom, A. Reddy, "Modelling TCP Behaviour in a Differentiated Services 
Network", Tex~ls A&M University ECE Technical Report, May 1999 
[31] S. Low, F. Paganini, J. Wang, S. Adakha, J. Doyle, "Dynamics of TCP/RED and 
a Scalable Control", IEEE INFOCOM, New York, June 2002. 
[32] N. Christin, J. Liebeherr, T. Abdelzaher, "A Quantitative Assured Forwarding 
Service", Proc. of IEEE INFOCOM, Volume 2, page 864, New Y ork, June 2002 
[33] J. Ibanez, K. Nichols, "Preliminary Simulation Evaluation of an assured 
service", [OIlline]. Available: www.globecom.netlietf/draft/draft-ibanez-
diffserv-assured-eval-OO.html, August 1998 
[34] L Stoica, H. Zhang, "Providing Guaranteed Services without Per-Flow 
Management", ACM SIGCOMM Computer Communication Review, Volume 
29(2), page 8 L October 1999 
[35] H. Chow, A. Leon-Garcia, "A Feedback Control Extension to Differentiated 
Services", [Online]. Available: www.globecom.netlietf/draft/draft-chow-
diffserv-fbctrl-t)O.html, March 1999 
[36] L Breslau, E. Knightly, S. Shenker, L Stoica, H. Zhang, "Endpoint Admission 
Control: Arch tectural Issues and Performance", Proc. of ACM SIGCOMM 












[37] A. Habib, B. Bhargava, "Unresponsive Flow Detection and Control Using the 
Differentiated Services Framework", [Online]. Available: citeseer.nj.nec.com/ 
habibO 1 unrespol1si ve.html, 2001 
[38] 1. Andrikopoulos, L. Wood, G. Pavlou, "A Fair Traffic Conditioner for the 
Assured Service in a Differentiated Services Internet", Proc. of the IEEE 
International Conference on Communication, Volume 2, page 806, New 
Orleans, June 2eOO 
[39] B. Suter, T. Lakshman, D. Stiliadis, A. Choudhury, "Buffer Management 
Schemes for Supporting TCP in Gigabit Routers with Per-Flow Queuing", IEEE 
Journals in Selected Areas in Communications, Volume 17(6), August 1999 
[40] W. Eddy, M. Allman, itA Comparison of RED's Byte and Packet Modes", 
Computer Netw,)rks, Volume 42(2), June 2003 
[41] Y. Bernet, P. Ford, R. Yavatkar, F. Baker, L. Zhang, M. Speer, R. Braden, B. 
Davie, J. WrocIawski, Felstaine, "A Framework for Integrated Services 
Operation over Diffserv Networks", RFC 2998, November 2000 
[42] ClearSpeed White Paper: "Layer 3/4" Classification and Routing", [Online]. 
Available: WW\\ .clearspeed.com/technology.php?wp, 2002 
[43] Cisco White Paper: "The Evolution of High-End Router Architectures", 
[Online]. Available: www.cisco.com/warp/public/cc/pdlrt/12000/tech/ 
ruacwp.htm, January 2001 
[44] EZchip press release: "EZchip Introduces a lOGIOC-192 Traffic Manager to 
Broaden its Next-generation Network Processor Architecture", [Online]. 
Available: WW\\ .ezchip.com/htmllpress_0l1022.html, October 2001 
[45] J. Henry, E. McGinnis "Banking on the Internet", [Online]. Available: 
www.sagharbor.com/bankinglbanking-chapter%200ne.htm 
[46] Lucent Technologies Press Release: "Lucent Technologies Introduces New 
Multi-Terabit Switch Router for Small Regional Networks", [Online]. 
Available: www.lucent.com/press/06001000606.nsc.html. June 2000 
[47] ClearSpeed Reference Design: "A ClearConnect 40G Packet Processor", 
[Online]. Available: www.clearspeed.com/technology.php?wp. 2002 
[48] Cable Datacom White Paper: "QoS: One HFC Network, Multiple Revenue 
Streams", [Onli1e]. Available: www.cabledatacomnews.com/whitepapers/ 
[49] S. Choi, J. Dehart, R. Keller, F. Kuhns, J. Lockwood, P. Pappu, J. Parwatikar, 












Performance Dynamically Extensible Router", Proc. DARPA Active Networks 
Conference Exp,)sition, San Francisco, page 42, May 2002 
[50] F. Kuhns, J. DeHart, R. Keller, J. Lockwood, P. Pappu, J. Parwatikar, E. 
Spitznagel, D. Fjchards, D. Taylor, J. Turner, K. Wong, "Implementation of an 
Open Multi-Service Router", Washington University in St. Louis, Technical 
Report, August .2001 
[51] T. Chaney, J. Fingerhut, M. Flucke, J. Turner, "Design of a Gigabit ATM 
Switch", Proc. of INFOCOM, page 2, March 1997 
[52] D. Taylor, J. Lc'ckwood, T. Sproull, J. Turner, D. Parlour, "Scalable IP Lookup 
for Programmable Routers", IEEE INFOCOM, New York, June 2002 
[53] J. Lockwood, D. Taylor, "Design and Implementation of a Field Programmable 
Port Extender", [Online]. Available: www.arl.wustl.edu/gigabitkits/ 
Workshop_OO_Ollslides/fpx-workshop_OllOOO_lockwood.pdf, January 2001 
[54] W. Eatherton, "Hardware-Based Internet Protocol Prefix Lookups" MS thesis, 
Washington University in St. Louis, May 1999 
[55J J. DeHart,'A Smart Port Card Tutorial", [Online]. Available: 
www.arl.wustl.edu/projects/gigabitkits/Workshop_00_07/spc_tutorial/slides/spc 
hw.pdf, July 201)0 
[56] Z. Dittia, J. Coe, G. Parulkar, "Design of the APIC: A High Performance ATM 
Host-Network Interface Chip", IEEE INFOCOM, Boston, page 179, April 1995 
[57] W. Richard, "Smart Port Card Version 2 (SPC-II) Architecture", [Online]. 
Available: www.arl.wustl.edu/gigabitkitslWorkshop_02_06/slides/ 
richard_spc2.pclf, June 2002 
[58] The GNU Zebra Home Page, [Online]. Available: www.zebra.org, September 
2003 
[59] P. Pappu, J. Parwatikar, J. Turner, K. Wong, "Distributed Queuing in Scalable 
High Performance Routers", INFOCOM 2003, San Francisco, March 2003 
[60] M. Shreedhar, G. Varghese, "Efficient Fair Queuing Using Deficit Round 
Robin", SIGCOMM, Boston, page 231, October 1995 
[61] F. Baker, C. llUrralde, F. Le Faucheur, B. Davie, "Aggregation of RSVP for 
IPv4 and IPv6 Reservations", RFC 3175, September 2001 
[62] E. Kohler, R. Morris, B. Chen, J. Jannotti, M. Kaashoek, "The Click Modular 













[63] P. Heegaard, "GenSyn - a Java Based Generator of Synthetic Internet Traffic 
Linking User B.::haviour Models to Real Network Protocols", Presentation at 
ITe Specialist Seminar on IF Traffic Measurement, Modelling and 
Management, Monterey, September 2000 
[64] The Public Netperf Homepage, [Online]. Available: www.netperf.org/netperf/ 
NetperfPage.html, September 2003 
[65] The Network Time Project Home Page, [Online]. Available: www.ntp.org/, 
September 2003 
[66] D. Mills, "NTP Precision Time Synchronization", [Online]. Available: 
www.eecis.udel.edu/-mills/database/brief/precise/precise.pdf, January 2003 
107 
Un
ive
rsi
ty 
o
Ca
pe
 To
wn
