VHDL description for IP Engine / Zeliall Bathich by Zeliall, Bathich
Faculty of Computer Science & Information 
Technology 
Perpustakaan SKTM 
VHDL Description for IP Engine 
WXES3182 
Name: Zelia I Bathicb 
Matrix: WEK010401 
Supervisor : Mr. Noorzaily Mobd. Noor 
Moderator : Mr. Yamani Mohd. Idna Idris 
Un
ive
rsi
ty 
of 
Ma
lay
a
WXES3182 
Table Of Contents 
Abstract 
Acknowledgement 
Chapter 1 : Introduction 
1.1 Introduction .................................................................................... 1 
1.2 Problems Definitions ............................................... . .......................... 2 
1.3 Scope ... ... .. .. ................. .. .. .. . ........ .. . .... ........................................ . .. 2 
1.4 Objectives ........ . .... ...... ...... ........................... .... . .. . ...................... . .... 4 
1.5 Constraints .. . ....................... . ........ . ................. .. ..... .. ....................... 5 
1.6 Scheduling ............................ ... ....................................................... 6 
Chapter 2: Literature Review 
2. 1 Introduction .......................... . .. .. ........ ..................... ......................... 8 
2.2 Network Protocol Layer. ............. . ........................... ... .......................... 8 
2.3 Layering Models .......................... . . ............... ... .. .... . . .. . ........... . . .. .. . .. .. 9 
2.4 The TCPIIP Stack ................... .... ... . ................. ...... ......................... .. 10 
2.5 TCP/IP Protocols ................................. .. ..... .............. ... . ... ................. 13 
2.6 Internet Protocol.. ..................... . ...................................................... 14 
2.7 IP Address ............................. ... . .. . ................................................. 16 
2.8 IP Address Classes ................ ..... .... . .... ..................... ........................ 17 
2.9 Netrrtasks .............................. ... ............................... ....................... 18 
2.1 0 Sub net Address ..................... .. . ............................ ........................... 1 8 
2.11 Direct Broadcast Address .... . . ....... ....... . ........... ... .. .... . .. . ................. .. .. . 19 
2.12 limited Broadcast Address ..... . . .. ... ........... ... .......... .. . .. . .... .......... . .... . .. .. 19 
2 13 IP Routing ..... . ............. .. .. .. . .. . . ........................... .. . . . .. ..................... 19 
2.14 ARP ........ . ................................................ .. ..... .. .......................... 20 
2.15 IP Packet Structure ............................................................................ 21 
2.16 IP Packet Processing .......................................... .............................. 24 
2.17 IP Fragmentation Processing at a Router. ................. ...... .. . .......... ............ 26 
2. 18 IP Fragmentation Processing at the receiving End System ........................... .27 
2.19 Reception of a Frame form The Ethernet ................. . .... . . ................. . ....... 27 
2.20 Comparison between Existing System And Proposed System Architecture ........ 30 
2.20.1 Existing System ..... . .. . .... .... ............ . ........... . ... ...................... .. .. 30 
2.20.2 System Proposed ..................... ......................... .. ...................... 3 1 
2.20.3 Protocol Processor (Proposed System) ........................................... .33 
2.20.4 IP Engine ................................................... .. . .. ........................ 36 
Un
ive
rsi
ty 
of 
Ma
lay
a
WXES3182 
Chapter 3: Methodology 
3. l Methodology ................................................................................. 3 8 
Chapter 4: System Analysis 
4.1 VHDL .......................................................................................... 40 
4.2 What Is VfiDL ............................................................................... 40 
4.3 VHDL Advantages .......................................................................... 42 
4.4 VHDL And Verilog Comparison .......................................................... 44 
Chapter 5: System Design 
5.1 System Design ............................................................................... 48 
5.2 lP Engine Block Diagram .................................................................. 49 
5.3 Internal Block Diagram ..................................................................... 51 
5.4 Process Flow ................................................................................ 56 
Chapter 6: System Implementation 
6.1 Introduction .................................................................................. 57 
6.2 Design Entry .................................................................................. 58 
6.3 Modeling Enity ................................................................................ 59 
6.4 Model Analysis .............................................................................. 68 
6.5 Synthesis ...................................................................................... 68 
Chapter 7: System Testing And Evaluation 
7.1 Simulation and Testing ...................................................................... 70 
7.2 Cycle Simulation ............................................................................. 72 
7.3 System Testing ................................................................................ 73 Un
ive
rsi
ty 
of 
Ma
lay
a
WXES3182 
Chapter 8: Discussion ........................................................................... 75 
Chapter 9: Conclusion ................... .... ....... .................... ............. ............ 76 
References 
Appendices A: VHDL Source Code 
Appendices B : PeakFPGA User Manual 
Appendices C: Main References 
Un
ive
rsi
ty 
of 
Ma
lay
a
Abstract 
Developing hardware support for network layer protocol processing is a very complex 
and demanding task. However. for optimal performance hardware acceleration can be 
required. To cope with the situation. this project present a high-level design approach. 
which targets the development of configurable and reusable components. Therefore it 
obtains the integration of advanced tools for the development of the £P Engine into the 
design environment. 
This process is illustrated based on a TCP/IP header analysis and validation component 
for which initial performance results are presented. The development of this Engine is 
embedded in an approach to develop flexible and configurable protocol engines that can 
be optimized for specific application. 
By implementing the IP Engine in hardware it will help reducing communication 
bottlenecks replacing expensive software solutions. which are based on 32 bit processor 
cores. With its small footprint design it will improve low power-consumption. highly 
cost-effective solution to Perform all protocol functions ofTCP/IP and UDP/IP 
connections for sustained bit rates of up to 100 Mbps independent of packet payload sizes 
and other connection parameters. U
ive
rsi
ty 
of 
Ma
l y
a
Acknowledgment 
Utter most gratitude goes to the almighty Allah for all the confidence and patience in the 
completion of part 1 and part 2 of the thesis. I wish to record my indebtedness and 
appreciation to everyone who has been so helpful and supportive in this project work and 
brought it to success. 
I would like to express my deep gratitude to my supervisor Mr Noorzaily Mohd. Noor for 
the tremendous help he has given me during this project, technical advice and thoughtful 
comment. And also to my examiner Mr. Y amani Mohd I dna Idris for his guidance and 
sharing his experience and knowledge. The valuable advice and motivation wiU be 
cherished thus to develop a personal values of mine in the future. 
Also taking this opportunity expressing my thanks to all fellow members and especially 
the family of Computer Science and Networking for their constructive criticism and 
support to face the difficulties and challenging time. 
Finally, last but not least, I am much obliged to my dear parents who have been given 
invaluable support and inspiration to me throughout my university life. My gratefulness 
also goes to all the unnamed others who directly or indirectly helped me to complete this 
interesting and challenging project. With this sheet of paper, I can only say thank you 
with all my heart. 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 1 
Introduction 
Un
ive
rsi
ty 
of 
Ma
lay
a
Introduction WXES3182 
1.1 Introduction 
Most systems today, which require embedded Internet connectivity, make use of a 32-bit 
processor core and implement the TCPIIP protocol stack in software. 
This realization however often results in strong processor performance requirements and keeps 
system costs for Internet application high. 
Towards improving performance, we undertook a hardware implementation of a signaling 
protocol to eliminate the burden on the host CPU, dramatically reduce bottlenecks in the server 
and help in faster and accuarate data transmission in a tremendous netwok enviroment. 
The system is designed to enhance performance and power consumtions of embedded systems. It 
performs all protocol functions ofTCPIIP and UDPIIP connections for sustained bit rates of up 
to 1 00 Mbps independent of packet payload sizes and other connection parameters. It provides IP 
connectivity even without any external processor interaction, which makes it also an ideal 
internet access solution for existing applications. 
The system will support: 
• 100 mb/s throughput for all packet sizes 
• support of up to 64K connections 
• stand-alone capability 
• complete TCPIIP solution 
1 
Un
ive
rsi
ty 
f M
ala
ya
Introduction WXES3182 
1.2 Problem Definition 
Through the analysis process of the project I discovered few problems that should be overcomed 
which are: 
• Limited references 
• Limited given time to finish the first part of the project, which is to analysis and design 
the system. 
• The difficulties of implementing theoretical definitions and approaches into real system. 
• The execution of the overall system is hard to implement because of other different 
protocols, which is included. 
1.3 Scope 
The project scope detennines part of the project process, which will overcome the burden of the 
overall system development. 
• By implementing the Internet Protocol (IP) engine in hardware which otT-loads 
perfonnance intensive Internet protocols from processor sub-systems and allows for 
separate system optimization. 
2 
Un
ive
rsi
ty 
of 
Ma
lay
a
Introduction WXES3182 
• Several options to overcome the large cost of the hardware implementation: 
Simulation: Simulation is one of the important steps of complex hardware 
design. Open Hardware designers may simulate their designs only without 
implementing them. In this way they did the design using free simulators without 
the cost of implementation. 
The use of Programmable Logic: These days the programmable logic devices 
become very popular and have lot of hardware resources that can compete old 
ASICs. These devices showed some good examples of real complex designs built 
using them. They can be programmed in field using a PC or small programmer. 
This approach becomes too close to the software designs, since any one can 
design his/her own hardware and program it on one of these devices. 
The entire IP Engine can be constructed out of interrelated sub modules, but this is very 
complicated when it is implemented directly. It's difficult to follow which input lines correspond 
to certain variables and what their values would be. It's more efficient to use some other method 
to construct the IP Engine that models it in a method that is easier to understand. It is for this 
reason that programmable logic language llke VHDL were created. VHDL is a very popular 
language for describing modeling and synthesis of digital circuits and systems. Its powerful but 
narrow field of usage makes it difficult to find software packages that easily implement testing of 
the VHDL code. 
3 
U
iv
rsi
ty 
of 
Ma
l y
a
Introduction WXES3182 
1.4 Objectives 
While software implementations require very fast processors to follow Ethernet transmission 
speeds, it provides a sustained bit rate of 100 Mbps up to the TCP/UDP layer of the Internet 
protocol stack. Instead of assigning considerable resources to interrupt driven processor context 
switches and memory access operations, the protocol processor implements a hardware 
architecture, which directly operates on the communication data stream. With a total of lOOk 
logic gates and an operating frequency of only 25MHz, this hardware engine consumes less 
power and provides a competitive, small footprint solution. 
The main objectives of implementing the Internet Protocol Engine m hardware are: 
• Improving price, performance and power consumption of embedded systems. 
• Reducing network latency 
• Reducing system overhead 
• Accelerates network performance to full wire speed 
• 100 Mb/s throughput for all packet sizes 
• Support of up to 64k connections 
• Complete TCP/IP solution 
4 
Un
ive
rsi
ty 
of 
Ma
lay
a
Introduction WXES3182 
1.5 Constraints 
The expected constraints will be faced during system development are: 
• Take a lot of time (including VHDL or Verilog design and simulation). 
• There are several key limitations to the design of the lP stack, most of which are due to 
the limited amount of hardware, RAM and buffer space available on an FPGA (assuming 
the IP stack shouldn't take up over half the FPGA in size). 
• The lack of buffer space also creates problems if multiple datagrams are being received 
and reassembled at once, or if the transport layer protocols are busy then datagrams 
will have to be dropped as no IP buffers will be free unless more IP buffers or transport 
layer buffers are allocated. Increasing the number buffers results in a large increase of 
memory usage and logic needed for controlling them. 
• Require a robust FPGA. 
5 
Un
ive
rsi
ty 
of 
Ma
lay
Introduction WXESJ182 
1.6 Scheduline; 
The bar chart below shows the activities of each process phase that will be carried out through 
the development of the system. It will take an approximate time of 9 months to finish the whole 
thesis project. Starting on the first phase, which is system analysis from June until July. At this 
phase, information is collected on systems available and study is made on the methodology that 
will be used in this project. 
The second phase starts from August until September, which is working on the system design. 
At the beginning of October the second part of the thesis will be started by the implementation of 
the system, which is the system coding. System testing will be carried out at the middle of 
December until the end of January. The system will be tested to check if it's free from errors. 
The last phase of system development is the system evaluation. It starts at the end of January 
until the end of February. The required system output will be checked in this phase. 
6 
Un
ive
r i
ty 
of 
Ma
lay
a
Introduction WXES3182 
··I • I •• ... I - .. ., .... I - ... 
11.,1•1•1,1"1•1~~1"1f'JI" nl"l114ri•J~~H+ "!!lltr+a+~~~~~ .. H~e r.1•1 
u. 
"' 
:· .. 
Figure]. / Activities Schedule Bar Chart 
7 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 2 
Literature Review 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
2.1 Introduction 
A protocol is set of rules and conventions used to impose a standardized, structured 
language for the communication between multiple parties. For example, a protocol might 
define the order in which information is exchanged between two parties. In fact, a data 
exchange can only take place between two computers using the same protocol. 
Transmitting data across computer networks is an arduous task. Network functionality 
has been decomposed into modules called layers to simplify and separate the ta5ks 
associated with data transmission. Each layer is a unit of code that performs a small, 
well-defined set of tasks. A protocol suite (or protocol stack) is a set of many such 
layers, and is usually a part of the operating system kernel on machines connected to the 
Internet. 
A protocol stack is organized such that the highest level of abstraction resides at the top 
layer. For example, the highest layer may deal with streaming audio or video frames, 
whereas the lowest layer deals with raw voltages or radio signals. Every layer in a stack 
builds upon the services provided by the layer immediately below it. 
2.2 Network Protocol Layers 
Computers on a network communicate in agreed upon ways called protocols. The 
complexity of networking protocol software calls for the problem to be divided into 
smaller pieces. A layering model aids this division and provtdes the conceptual basis for 
8 
Un
ive
rsi
ty 
of 
M
lay
a
Literature Review WXES3182 
understanding how software protocols together with hardware devices provide a powerful 
communication system. 
2.3 Layering Models 
In the early days of networking, before the rise of the ubiquitous Internet, the 
International Organization for Standardization (ISO) developed a layering model whose 
terminology persists today. 
Name of Layer Purpose of Layer 
Layer 7 Application Specifies how a particular application uses a network. 
Layer 6 Presentation Specifies how to represent data. 
Layer 5 Session Specifies how to establish communication with a remote system. 
Layer4 Transport Specifies how to reliably handle data transfer. 
Layer 3 Network Specifies addressing assignments and how packets are forwarded 
Specifies the organization of data into frames and how to send 
Layer 2 Data Link 
frames over a network. 
Layer 1 Physical Specifies the basic network hardware. 
Table 2.1./SO 7-Layer Reference Model 
The 7-layer model has been revised to the S-layer TCP!lP reference model to meet the 
current needs of protocol designers. 
9 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
2.4 The TCPIIP Stack 
The picture below is an example of a simple data transfer between 2 computers and 
shows how the data is sent and received through the TCPIIP stack. 
MAC Header 
IP Header 
Computer J 
User data converted for 
transllllSSIOO 
Upper layer data 
TCPHeadcr Upper layer data 
TCP Header Upper layer data 
LLC Header IP Header TCP Header Upper layer data 
11010010110001011011100101101010-----------4 <\ml lnC"'nhll" 
Figure 2.1. Shows Data Sent and Received through the Stack 
The computer in the above diagram needs to send some data to another computer. The 
Application layer is where the user interface exists, here the user interacts with the 
application he or she is using, and then this data is passed to the Presentation layer and 
then to the Session layer. These three layers add some extra information to the original 
data that came from the user and then passes it to the Transport layer. llere the data is 
10 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
broken into smaller pieces (one piece at a time transmitted) and the TCP header is added 
At this point, the data at the Transport layer is called a segment. 
Each segment is sequenced so the data stream can be put back together on the receiving 
side exactly as transmitted. Each segment is then banded to the Network layer for 
network addressing (logical addressing) and routing through the Internet network. At the 
Network layer, we call the data (which includes at this point the transport header and the 
upper layer 
information) a packet. The Network layer adds its IP header and then sends it off to the 
Data link layer. Here we call the data (which includes the Network layer header, 
Transport layer header and upper layer information) a frame. 
The Data link layer is responsible for taking packets from the Network layer and placing 
them on the network medium (cable). The Data link layer encapsulates each packet in a 
frame that contains the hardware address (MAC) of the source and destination computer 
(host) and the LLC information, which identifies to which protocol in the previous layer 
(Network layer) the packet should be passed when it arrives to its destination. Also, at the 
end, we will notice the FCS field that is the Frame Check Sequence. This is used for error 
checking and is also added at the end by the Data Link layer. 
If the destination computer is on a remote network, then the frame is sent to the router or 
gateway to be routed to the destination. To put this frame on the network, it must be put 
into a digital signal. Since a frame is really a logical group of 1 's and O's, the Physical 
ll 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
layer is responsible for encapsulating these digits into digital signals, which is red by 
devices on the same local network. 
At the receiving process, computer will synchronize the digital signal by reading the few 
extra l's and O's as mentioned above. Once the synchronization is complete and it 
receives the whole frame it will passes it to the layer above it, which is the Data link 
layer. 
The Data link layer will do a Cyclic Redundancy Check (CRC) on the frame. This is a 
computation, which the computer does, and if the result it gets matches the value in the 
FCS field, then it assumes that the frame has been received without any errors. Once 
that's out of the way, the Data link layer will strip off any information or header, which 
was put on by, the remote system's Data link layer and pass the rest (now we are moving 
from the Data link layer to the Network layer, so we call the data a packet) to the above 
layer which is the Network layer. 
At the Network layer the IP address is checked and if it matches (with the machine's own 
lP address) then the Network layer header is stripped off from the packet and the rest is 
passed to the above layer, which is the Transport layer. Here the rest of the data is now 
called a segment. 
The segment is processed at the Transport layer, which rebuilds the data stream (at this 
level on the sender's computer it was actually split into pieces so they can be transferred) 
and acknowledges to the transmitting computer that it received each piece. It is obvious 
12 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
that since we are sending an ACK back to the sender from this layer that we are using 
TCP and notUDP. 
We will find that when analyzing the way data travels from one computer to another most 
people never analyze in detail any layers above the Transport layer. This is because the 
whole process of getting data from one computer to another involves usuaJiy layers 1 to 4 
(Physical to Transport) or layer 6 (Session) at the most, depending on the type of data. 
2.5 TCPIIP Protocols 
This chapter discusses the protocols available in the TCPIIP protocol suite. The following 
figure shows how they correspond to the 5-layer TCPIIP Reference Model. This is not a 
perfect one-to-one correspondence for instance, Internet Protocol (IP) uses the Address 
Resolution Protocol (ARP), but is shown here at the same layer in the stack. 
13 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
rrP 
SMTP 
Hfll' 
O.u.aLinL 
Figure 2.2. TCP;Jp Protocol Flow 
2.6 Internet Protocol (IP) 
Every machine on the Internet has a unique identifying number, called an IP Address. 
The IP stands for Internet Protocol, which is the language that computers use to 
communicate over the Internet A protocol is the pre-defined way that someone who 
wants to use a service talks with that service. The "someone" could be a person, but more 
often it is a computer program like a Web browser. 
14 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
A typical IP address looks like this: 
216.27.61.137 
To make it easier for us humans to remember, IP addresses are normally expressed in 
decimal format as a dotted decimal number like the one above. But computers 
communicate in binary form. Look at the same IP address in binary: 
11011000.00011011.00111101.10001001 
The four numbers in an IP address are called octets, because they each have eight 
positions when viewed in binary form. If you add all the positions together, you get 32, 
which is why IP addresses are considered 32-bit numbers. Since each of the eight 
positions can have two different states (1 or zero), the total number of possible 
combinations per octet is 28 or 256. So each octet can contain any value between zero and 
255. Combine the four octets and you get 232 or a possible 4,294,967,296 unique values! 
Out of the almost 4.3 billion possible combinations, certain values are restricted from use 
as typical IP ac.ldresses. For example, the IP address 0.0.0.0 is reserved for the default 
network and the address 255.255.255.255 is used for broadcasts. 
The octets serve a purpose other than simply separating the numbers. They are used to 
create classes of IP addresses that can be assigned to a particular business, government or 
other entity based on size and need. The octets are split into two sections: Net and Host. 
The Net section always contains the first octet. It is used to identify the network that a 
15 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
computer belongs to. Host (sometimes referred to as Node) identifies the actual computer 
on the network. The Host section always contains the last octet. There are five IP classes 
plus certain special addresses. When the Internet was in its infancy, it consisted of a small 
number of computers hooked together with modems and telephone lines. You could only 
make connections by providing the IP address of the computer you wanted to establish a 
link with. For example, a typical IP address might be 216.27.22.162. This was fine when 
there were only a few hosts out there, but it became unwieldy as more and more systems 
came online. 
The first solution to the problem was a simple text file maintained by the Network 
Information Center that mapped names to IP addresses. Soon this text file became so 
large it was too cumbersome to manage. In 1983, the University of Wisconsin created the 
Domain Name System (DNS), which maps text names to IP addresses automatically. 
This way you only need to remember www.um.edu.my, for example, instead of UM~s IP 
address. 
2.7 IP Address 
IP defines an addressing scheme that is independent of the underlying physical address 
(e.g. 48-bit MAC address). IP specifies a unique 32-bit number for each host on a 
network. This number is known as the Internet Protocol Address, the IP Address or the 
Internet Address. These terms are interchangeable. Each packet sent across the Internet 
contains the JP address of the source of the packet and the lP address of its destination. 
16 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
For routing efficiency, the IP address is considered in two parts: the prefix, which 
identifies the physical network, and the suflix, which identifies a computer on the 
network. A unique prefix is needed for each network in an Internet. For the global 
Internet, network numbers are obtained from Internet Service Providers (lSPs). ISPs 
coordinate with a central organization called the Internet Assigned Number Authority 
(lANA). 
2.81P Address Classes 
The first four bits of an IP address determine the class of the network. The class specifies 
how many of the remaining bits belong to the prefix (aka Network ID) and to the suffix 
(aka Host ID). The first three classes, A, Band C, are the primary network classes. 
Number Of Max Of Number Of Max Of Hosts 
Class First 4 Bits 
~etix Bits 
1 
Networks 
1 
Suffix Bits Per Network 
I 7 - r-u8 I 24 1 16,777,216 
I 16,384 I 16 T 65,53-6--l 
I 2.097.152 ,- 8 256 -
14 
r---
c llOx 21 
~110 D Mu1ticast 
E 1111 Reserved for future use. 
I 
Table 2.2. IP addressing classes 
17 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
When interacting with mere humans, software uses dotted decimal notation~ each 8 bits is 
treated as an unsigned binary integer separated by periods. JP reserves host address 0 to 
denote a network. 140.211.0.0 denotes the network that was assigned the class B prefix 
140.211. 
2.9 Netmasks 
Netmasks are used to identify which part of the address is the Network ID and which part 
is the Host ID. This is done by a logical bitwise-AND of the IP address and the netrnask. 
For class A networks the netmask is always 255.0 .0 .0~ for class B networks it is 
255.255.0.0 and for class C networks the netrnask is 255.255.255.0. 
2.10 Subnet Address 
All hosts are required to support subnet addressing. While the IP address classes are the 
convention, IP addresses are typically subnetted to smaller address sets that do not match 
the class system. The suffix bits are divided into a subnet ID and a host ID. This makes 
sense for class A and B networks, since no one attaches as many hosts to these networks 
as is allowed. Whether to subnet and how many bits to use for the subnet ID is 
determined by the local network administrator of each network. 
lf subnetting is used, then the netmask will have to reflect this fact. On a class B network 
with subnetting, the netmask would not be 255.255.0.0. The bits of the Host ID that were 
used for the subnet would need to be set in the netmask. 
18 
Un
ive
rsi
y o
f M
ala
ya
Literature Review WXES3182 
2.11 Directed Broadcast Address 
IP defines a directed broadcast address for each physical network as all ones in the host 
ID part of the address. The network ID and the subnet lD must be valid network and 
subnet values. When a packet is sent to a network's broadcast address, a single copy 
travels to the network, and then the packet is sent to every host on that network or 
subnetwork. 
2.12 Limited Broadcast Address 
If the TP address is all ones (255.255.255.255), this is a limited broadcast address; the 
packet is addressed to all hosts on the current (sub)network. A router will not forward this 
type of broadcast to other (sub )networks. 
2.13 IP Routin2 
Each IP datagram travels from its source to its destination by means of routers. All hosts 
and routers on an Internet contain IP protocol software and use a routing table to 
determine where to send a packet next. The destination IP address in the IP header 
contains the ultimate destination of the IP datagram, but it might go through several other 
IP addresses (routers) before reaching that destination. 
Routing table entries are created when TCP/IP initializes. The entries can be updated 
manually by a network administrator or automatically by employing a routing protocol 
such as Routing Information Protocol (RIP). Routing table entries provide needed 
19 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
information to each local host regarding how to communicate with remote networks and 
hosts. When IP receives a packet from a higher-level protocol, like TCP or UDP, the 
routing table is searched for the route that is the closest match to the destination IP 
address. The most specific to the least specific route is in the following order: 
• A route that matches the destination IP address (host route). 
• A route that matches the network ID of the destination lP address (network route). 
• The default route. If a matching route is not found, IP discards the datagram 
2.14ARP 
The Address Resolution Protocol is used to translate virtual addresses to physical ones. 
The network hardware does not understand the software-maintatned IP addresses. IP uses 
ARP to translate the 32-bit IP address to a physical address that matches the addressing 
scheme of the underlying hardware (for Ethernet, the 48-bit MAC address). 
TCP/IP can use any of the three. ARP employs the third strategy, message exchange. 
ARP defines a request and a response. A request message is placed in a hardware frame 
(e.g., an Ethernet frame), and broadcast to all computers on the network. Only the 
computer whose IP address matches the request sends a response. 
20 
Un
ive
rsi
ty 
of 
Ma
lay
Literature Review WXES3182 
2.15 IP Packet Structure 
Before introducing the system proposed and compare it with the existing system it is 
more helpful to study the lP packet structure and learn how different fields effect the 
processing of datagrams. 
All IP packets or datagrams consist of a header part and a text part. The IP Header 
consists of a 20-byte fixed part plus a variable part. Its size is optimized to maximize the 
packet-processing rate without utilizing excessive resources. The header begins with a 4-
bit version field that keeps track of the version of the IP protocol to which the datagram 
belongs. This field helps smooth the transition from one version of IP to another, which 
can take months or even years. All IP packets are structured the same way - an IP header 
followed by a variable-length data field 
oo!Olf02lo3 fo4 ~5 106 [o7 fos f09 fto filfll[i3[i4flsjt6ft7 [Ji rt9 [2o 121 122 [23 f24 f2s f26 f21 fi8[291J0 f3t 
~Version-f IHL l - TOS ~ -L Total len~ -
I Identification I Flags Fragment offset 
TTL Protocol 'I Header checksum 
Source IP address 
Destination IP address 
Figure 2. 3 Packet Structure 
21 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
Version: The Version field indicates the format of the Internet header. 
IBL: Internet Header Length is the length of the b1ternet header in 32 bit words, and thus 
points to the beginning of the data. Note that the minimum value for a correct header is 5. 
Type of Service: The Type of Service provides an indication of the abstract parameters 
of the quality of service desired. The type of service is used to specify the treabnent of 
the datagram during its transmission through the internet system. 
Total Length: Total Length is the length of the datagram, measured in octets, induding 
Internet header and data. This field allows the length of a datagram to be up to 65,535 
octets. Such long datagrams are impracticaJ for most hosts and networks. 
Identification: An identifying value assigned by the sender to aid in assembling the 
fragments of a datagram. 
Flags: Various Control Flags. 
Fragment Offset: This field indicates where in the datagram this fragment belongs. The 
fragment offset is measured in units of 8 octets (64 bits). The first fragment has offset 
zero. 
Time to Live: This field indicates the maximum time the datagram is allowed to remain 
in the Internet system. If this field contains the value zero, then the datagram must be 
destroyed. 
22 
Un
ive
r i
ty 
of 
Ma
lay
a
Literature Review WXES3182 
Protocol: This field indicates the next level protocol used in the data portion of the 
Internet datagram. 
Header Checksum: A checksum on the header only. Since some header fields change 
(e.g., time to live), this is recomputed and verified at each point that the Internet header is 
processed. The checksum field is the 16 bit one's complement of the one's complement 
sum of all 16-bit words in the header. For purposes of computing the checksum, the value 
of the checksum field is zero. 
Options: The options may appear or not in datagram's. In some environments the 
security option may be required in all datagram's. The option field is variable in length. 
There may be zero or more options. 
23 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
2.16 IP Packet Processine: 
Transmission of a frame over Ethernet 
IP aJtput 
other 
protocols 
Ethernet 
Ether Type 
= oxeos 
Ether Type 
=Oxeoo 
Ether Type 
= Ox806 ..---.....~.&.----, 
damux 
Ethernet 
frame type 
Figure 2.4 Transmission ofaframe over Ethernet 
The IP packet is placed in an Ethernet frames as follows: 
IP Broadcast/Multicast Address: The IP destination address is checked to see if the 
system should also receive a copy of the packet. This happens if this is an IP network 
broadcast address (or a multicast address is used that matches one of the registered lP 
multicast filters set by the IP receiver). If a copy is required, it is sent to the loop back 
24 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
interface. This directly delivers the packet to the IP input routine. The original packet 
continues to be processed. 
IP Unicast Address: The IP destination address is checked to see if the address is the 
unicast (source) IP address of the sending system. Such packets are sent directly to the 
loop back interface (i.e. never reach the physical Ethernet interface). 
MTU: The size of the packet is checked against the MTU of the link on which it is to be 
sent. (Note the MTU of the loop back interface may be different to that of Ethernet). If 
required, fragmentation is performed. 
Next Hop IP Address: The sender then determines the next hop address- that is the IP 
address of the next Intermediate System/End System to receive the packet. Once this 
address is known, the Address Resolution Protocol (ARP) is used to find the appropnate 
MAC address to be used in the Ethernet frame. This is a two stage process: (i) the ARP 
cache is consulte<l to see if the MAC address is already known, in which case the correct 
address is added and the packet queued for transmission. (ii) If the MAC address is not in 
the ARP cache, the ARP protocol is used to request the address, and the packet is queued 
until an appropriate response (or timeout) occurs. 
Encapsulation: The Ethernet frame is completed, by inserting the Destination, Source 
and Ethernet Type fields. 
Transmit: The frame is transmitted using the MAC procedure for Ethernet. 
25 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
2.17 IP Fragmentation processing at a Router 
To fragment/segment a long internet packet, an Intermediate System using the Internet 
Protocol (for example, a router), creates two new IP packets and copies the contents of 
the IP header fields from the long packet into BOTH new IP headers 
The data of the Long packet is divided into two portions on a 8 byte (64 bit) boundary. All 
packets which have a more fragments (MF) flag set, must have an integral multiple of 8 
bytes, but those that do not have this flag set need not do. 
If we call the number of 8 byte blocks in the first portion NFB (for Number of Fragment 
Blocks). The first portion of the data is placed in the first new IP packet, and the total 
length field is set to the length of the FIRST IP packet. The more-fragments flag (MF) is 
set to one. 
The second portion of the data is placed in the second new IP packet, and the total length 
field is set to the length of the SECOND packet. The more-fragments flag (MF) carries 
the same value as the long packet. The fragment offset field of the second new IP is set to 
the value of that field in the long IP packet plus the NFB. 
26 
Un
ive
rsi
ty 
of 
M
lay
a
Literature Review WXES3182 
2.18 IP Fragmentation processing at the Receiving End System 
An end system that accepts an JP packet (with a destination 1P address that matches its 
own IP source address) will also reassemble any fragmented IP packets before these are 
passed to the next higher protocol layer. 
The system stores all received fragments (i.e .• IP packets with a more-fragments flag 
(MF) set to one, OR where the fragment offset is non-zero), in one of a number of buffers 
(memory space). Packets with the same 16-bit Identification value are stored in the S:lme 
buffer, at the offset specified by the fragment offset field specified in the packet header. 
Packets which are incomplete remain stored in the buffer until either all fragments are 
received, OR a timer expires, indicating that the receiver does not expect to receive any 
more fragments. Completed packets are forwarded to the next higher protocol layer. 
2.19 Reception of a frame from Ethernet 
The following summary shows the processing performed by an end system in an IP 
network. It is assumed that the system is connected to an Ethernet network. 
27 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review 
G 
Ox806 Ethernet 
Drtver 
t 
Incoming 
frame 
WXES3182 
IP Protocol Type 
Indicates how to 
demux 
Ethernet Frame Type 
Indicates how to 
demux 
Figure 2.5 Reception ofaframefrom Ethernet 
The received frames are processed as follows: 
l . MAC Protocol: The Ethernet controller in the network interface card verifies that 
the frame is: 
o Not less than the minimum frame length not greater than the maximum 
length (1500 B) 
o Contains a valid CRC at the end 
o Does not contain a residue (i.e. extra bits which do not form a byte) 
2. MAC Address: The frame is then filtered based on the MAC destination address 
and accepted only if: 
o It is a broadcast frame (i.e. all bits of the destination address field are set 
to 1) 
o It is a multicast frame to a registered MAC group address 
o It is a unicast frame to the node's own MAC address 
o Or the mterface is acting in promiscuous mode (i.e. as a bridge) 
28 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
3. MAC SAP: The frame is then demultiplexed based on the specified MAC packet 
type (SAP) 
o It is passed to the appropriate protocol layer (e.g. LLC, ARP, IP) 
o Packets destined for IP have a type field of Ox0800 and those for ARP 
have a value Ox0806 
4. IP Check: The IP packet header is checked, including: 
o By checking the protocol type =4 (i.e . current version of IP) 
o By verifying the header checksum 
o By checking the header packet length 
5. IP Address: The destination IP network address is then checked: 
o If it matches an IP address of the node then it is accepted 
o If it is network broadcast packet to the node's network it is accepted 
o If it is a multicast packet to an IP multicast address which is in use then it 
ts accepted 
o If it is none of these, it is forwarded using the routing table (if possible) or 
discarded 
6. IP Fragmentation: Packets for the node are then checked concerning whether 
reassembly is required: 
o The fragmentation offset value and more flags are inspected 
o Fragments are placed in a buffer until other fragments are received to 
complete the packet. 
7. IP SAP: The IP protocol field (SAP) is checked: 
o The SAP field identifies the transport protocol ( 1 ICMP~ 6 = TCP~ 17= 
UDP) 
o The complete packet is passed to the appropriate transport layer protocol. 
29 
niv
ers
ity
 of
 M
ala
ya
Literature Review WXES3182 
2.20 System Comparison Architecture 
2.20.1 Existing System 
The diagram below shows the architecture of the existing system, which is implemented 
in software. 
I lncomin2 Packet 
Network Interface Card I Etheme• rPJIY ~ 
I FthPmet 
Packet Buffer I I I • I DMA Controller 
IP Processing ~r 
Kernel I 
~ Memory Area 
TCPIUDP 
Processing copy 
tn u<:Pr User 
I. Memory 
r Area 
Application 
Processing 
Host processor 
Figure 2.6 Existing System Architecture 
30 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
In general, this system receives packets through the network Interface Car~ process them 
and then store the packets in buffer before sending it to the main memory. Later on, the 
IP header and the TCP/UDP header will be processed by the operating system. The 
checksum operation will be made on the packet to make sure it's free from errors. The 
operating system has the memory where it is divided into two parts, which are the kernel 
part for the operating system and the user part for the application. Packets wi11 be stored 
and retrieved from memory. 
2.20.2 System Proposed 
The figure below shows the architecture of the proposed system, which consists of 
protocol processor that includes the Ethernet MAC, IP Engine, Packet Multiplexer, TCP 
Engine and the UDP Engine. These Engines works all together to perform an independent 
protocol processor that will enhance packet processing by eliminating the burden of 
protocol processing on the host CPU and dramatically reduce bottlenecks in the server. 
31 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review 
Network Interface Card 
I 
Supporting 
microcontroller I 
" Application 
Processing 
WXES3182 
I lncomine Packet 
I Ethernet PHY 
l 
Protoc-ol Pror.e~~or 
J 
DMA 
User 
Memory 
Area 
I lost processor 
~ r--
lP Engine, TC 
Engine, IP Ra 
Engine 
Implementati• 
F1gure 2. 7 Proposed System Architecture 
32 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
2.20.3 Protocol Processor (proposed system) 
Figure 2.7 shows the inner sub modules of the protocol processor, which contributes to 
perform a packet processing system. Here is a brief explanation on how the packets flow 
through the protocol processor and each Engine functions. At the receiving process, raw 
data that is received form the Ethernet MAC will be sent to the IP Engine. The IP Engine 
will check the JP address, if it matches the machine's own address then the network layer 
header will be stripped of from the packet and the rest of the packet will be passed to the 
multiplexer. Here the rest of the data is now called a segment. 
The multiplexer will send the segment to the TCP/UDP Engine according to the protocol 
addressed in the header. If the segment is meant for TCP Engine then segments will be 
processed by rebuilding the data stream (at this level on the sender's computer it was 
actually split into pieces so they can be transferred) and acknowledges to the transmitting 
computer that it received each piece. 
Scalable, distributed memory architecture offers connection data buffers with 
programmable thresholds for variable interrupt latency and allows for simultaneous 
operation of up to 64k connections. Alternatively, an optional SRAM controller provides 
a high-speed memory interface for single memory architectures. lP connection set up and 
management is made easy by a message-based interface of the comprehensive control 
unit, which fully supports IP management protocols and allows for system configuration. 
33 
Un
i e
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
Its auto-configuration and remote management capability enable it to provide IP 
connectivity even without any external processor interaction, which makes it an ideal 
communication extension for existing applications. 
NIC 
Supporttng 
microcontroller 
IP Engine, TCP 
Engine, IP Raw 
Engine 
Implementation 
34 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review 
st Ho 
Int 
fac 
er 
e 
... 
n 
er 
Sy 
Int 
face 
..... 
Processor 
Interface 
Direct 
Media 
Interface 
Connection 
Memory 
TCP 
1--- Engine 
Connection 
Memory 
UDP 
1--- Engine 
Connection 
Memory 
RawiP 
- Engine 
I 
1---
~ 
Packet 
Multiplexer f--
'-f--
-
f-f--
r-
-
SRAM Controller I 
WXES3182 
Routing 
Table 
l 
ARP 
Controller 
l 
IP 
-
Ethernet 
Engine MAC 
Management Packet 
Buffer 
System Control & 
Connection 
Management 
35 
M 
n 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
2.20.4 IP En~ine 
In this thesis I will be focusing on the system module that will be developed which is the 
IP Engine. Thts system will lead to the implementation of the internet protocol in 
hardware, which will help in fast and accuarate data transmission in a tremendous 
netwok enviroment. The diagram below shows the IP Engine process flow. 
Strip Network 
Header 
Send to 
Multiplexer 
fiu:mrr1 / 9 TP F"nO'inP PrnrP.\'.'1 Plnw 
36 
Un
ive
rsi
ty 
of 
Ma
lay
a
Literature Review WXES3182 
Figure 2. 10 shows the minimal header encapsulation where the network header is 
stripped out of the packet and the rest of the header is passed to the multiplexer. This 
process made at the IP Engine. 
01J02100f041os 06 '07 ,osl09 •ofilfiiu14fls;t617fi819[2o :ii 122 [23 124 f2s i26 [27 J2sl29 i30 
Protocol S I Reserved I Header checksum 
Ftgure 2.10 Packet Encapsulation 
In chapter 5 i will be focusing on the IP Engine system architecture. which will explain in 
detail each sub module and process of the Engine. 
37 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 3 
Methodology 
Un
ive
rsi
ty 
of 
Ma
lay
a
Methodology WXES3182 
3.1 Methodolo2.V 
Th1s chapter illustrate the methodology that have been used in th1s project and the advantages 
of the implemented architecture in developing the IP Engine. The methodology used is the 
hardware description language is VIIDL. 
( Requirements J -~~~~ 
' 
w 
I 
( ASIC or FPGA J ( RTL Model ]._ Simulate 
I 
Synthesize I 
I 
' Gate Level .. , Simulate , .. Model 
I 
Place and route 
I 
( Timing Model ]- Simulate 
38 
Un
ive
rsi
ty 
of 
Ma
lay
a
Methodology WXES3182 
The diagram above shows the basic VHDL methodology process architecture beggining by 
inserting the architecture into the FPGA or ASIC. Firstly, we have to think of the demand of 
each architecture. According to the demmand we can develop the RTL model and the test 
bench using VHDL. 
Test bench then is used for simulation to ensure the RTL model matches the demand. Later on 
the RTL Model will be synthezised and translated into get level model or netlist. The Netlist 
will be placed and route for field or optimizing the speed. The file that have been produced 
after the place dan route level will be downloaded into the ASIC or FPGA. From the diagram 
above, we can see that simulation is done at the level of RTL model, get level model and also 
at the ttming model. 
39 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 4 
System Analysis 
Un
ive
rsi
t  
of 
Ma
lay
a
System Analysis WXES3182 
4.1 VBDL 
The hardware description Language will be used in this project is VHDL hardware description 
language. Recently, it is increasingly being used for description and modeling of digital system, 
leading to use of this language for system design field especially robotics and microprocessors 
development. It is useful in describing hardware for the purpose of simulation, modeling, testing 
design and documentation. lt provides a convenient and compact fonnat for the hierarchical 
representation of functional and writing detail of digital systems. 
4.2 What is VHDL 
VHDL is an acronym for VHSIC Hardware Description Language VHSIC is an acronym for 
very high-speed integrated circuit. It is hardware description language that can be used to model 
a digital system at many levels of abstraction~ ranging from the algorithmic level to the logic 
level. The complexity of the digital system being modeled could vary from that of a simple gate 
to a complete digital electronic system, or anything in between. The digital system can also be 
expljcitly modeled in the same description. 
The VHDL language can be regarded as an integrated amalgamation of the following languages: 
• Sequentiallanguage 
• Concurrent language 
• Next-list language 
• Wave generation language 
Therefore, the language has construct that enable the user to express the concurrent or 
sequential behavior of a digital system with or without timing. It also allows users to model 
40 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Analysis WXES3182 
the system as an interconnection of components. Test wavefonns can also be generated using 
the same constructs. All the above constructs may be combined to provide a comprehensive 
description of the system in a single model. 
The language not only defines the syntax but also vary clear simulation semantics for each 
language construct. Therefore, models written in this language can be verified using a VHDL 
simulator. It is a strongly typed language and is often verbose to write. It inherits many its 
features, rop-daily the sequential language part from the ADA programming language. 
Because VHDL provides an extensive range of modeling capabilit1es, it lS often difficult to 
understand. Fortunately, it is possible to quickly assimilate a core subset of the language that 
is both easy and stmple to understand without learning the more complex features. This 
subset is usually sufficient to model most applications. However the complete language has 
sufficient power to capture the description of the most complex chips to a complete 
electronic system. 
41 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Analysis WXESJ182 
4.3 VIIDL Advantages 
VHDL offers the followmg advantages for digital design: 
• Standard: VIIDL is an EKE standard Just like any standards (such as graphic x-
window standard, bus communication interface standard, high level programming 
languages and so on), it reduces confusion and makes mterfaces between tools, 
companies and products easier. Any development to the standard would have better 
chances of lasting longer and have less chance of becoming obsolete due to 
incompatibility with others. 
• Industry support: With the advent of more powerful and efficient VJIDL tools has 
come the growing support of the electronic industry. Compames use VHDL tools not 
only with regard to defense contracts, but also for their commercial designs. 
• Portability: The same VHDL code can be simulated and used in many design tools 
whose limited capability may not be competitive in later markets. The VHDL 
standard also transforms design data much eas1er than a design database of a 
proprietary design tools. 
• Modeling Capability: VHDL was developed to model al11evels of designs, from 
electronic boxes to transistors. VHDL can accommodate behavioral constructs and 
mathematical routines that describe complex model, such as queuing networks and 
analog circuits. rt allows the use of multiple architectures and associated with the 
same design during various stages of the design process. 
42 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Analysis WXES3182 
• Reusability: Certain common designs can be described, verified and modified 
slightly in VHDL for future use. This eliminates reading and marking changes to 
schematic pages, which is time consuming and subject to error. For example, a 
parameterized multiplier VHDL code can be reused easily by changing the width 
parameter so that the same code can do either 16 by 16 or 12 by 8 multiplication. 
• Technology and Foundry Independence: The functionality and behavior ofthe 
design can be described with VHDL and verified, making it foundry and technology 
independent. This frees the designer to proceed without having to wait for the foundry 
and technology to be selected 
• Documentation: VHDL is a design description language, whlch allows 
documentation to be located in a single place by embedding it in the code. The 
combining of comments and the code that actually dictates what the design should do 
to reduce the ambiguity between specification and implementation. 
• New Design Methodology: Using VHDL and synthesis creates a new methodology 
that increases the design productivity, shortens the design cycle and lower costs. Jt 
amounts to a revolution comparable to that introduced by the automatic semi-custom 
layout synthesis tools of the last few years. 
43 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Analysis WXES3182 
4.4 VHDL and Verilog comparison 
This section compares and contrasts the individual aspects of both languages the VHDL and the 
Verilog. 
i. Capability 
Hardware structure can be modeled equaJly effectively in both VHDL and Veri log. When 
modeling abstract hardware, the capability ofVHDL can sometimes only be achieved in 
Verilog when using the PLI. The choice of which to use is not therefore based solely on 
technical capability but on: 
• Personal preferences 
• EDA tool availability 
• Commercial, business and marketing issues 
The modeling constructs ofVHDL and Verilog cover a slightly different spectrum across 
the levels ofbehavioral abstraction; see Figure 1. 
44 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Analysis WXES3182 
Vcilog 
TlTI.L 
Figure 1. HDL modeling capability 
ii. Compilation 
VHDL multiple design-units (entity/architecture pa1rs). that reside in the same system 
file. may be separately compiled if so desired. However, it is good design practice to keep 
each design unit in it's own system file in which case separate compilation should not be 
an issue. The Verilog language is still rooted in its native interpretative mode. 
Compilation is a means of speeding up simulation, but has not changed the original 
nature of the language. As a result care must be taken with both the compilation order of 
code written in a single file and the compilation order of multiple files. Simulation results 
can change by simply changing the order of compilation. 
iii. Data types 
In VHDL a multitude of language or user defined data types can be used. This means 
dedicated convers1on funct1ons are needed to convert objects from one type to another. 
The choice of which data types to use should be considered wisely, especially 
45 
Un
iv
rsi
ty 
of 
Ma
lay
a
System Analysis WXES3182 
enumerated (abstract) data types. This will make models easier to write. clearer to read 
and avoid unnecessary conversion functions that can clutter the code. VHDL may be 
preferred because it allows a multitude oflanguage or user defined data types to be used 
In Veri log data types are very simple, easy to use and very much geared towards 
modeling hardware structure as opposed to abstract hardware modeling. Unlike VHDL. 
all data types used in a Verilog model are defined by the Veri log language and not by the 
user. 
iv. Design reusability 
VHDL procedures and functions may be placed in a package so that they are avail able to 
any destgn-unit that wtshes to use them. 
There is no concept of packages in Veri log. Functions and procedures used within a 
model must be defined in the module. To make functions and procedures generally 
accessible from different module statements the functions and procedures must be placed 
in a separate system file and included using the include compiler directive. 
46 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Analysis WXES3182 
v. High level constructs 
There are more constructs and featw'es for high-level modeling in VHDL than there is in 
Verilog. Abstract data types can be used along with the following statements: 
• Package statements for model reuse 
• Configuration statements for configuring design structure 
• Generate statements for replicating structure 
• Generic statements for generic models that can be individually characterized, for 
example, bit width. 
All these languages statements are useful in synthesizable models. 
Except veri log for being able to parameterize models by overloading parameter 
constants, there is no equivalent to the high-level VHDL modeling statements in 
Veri log. 
47 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 5 
System Design 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design WXES3182 
5.1 System Design 
The system module which will be developed is the IP Engine. This system will lead to 
the implementation of the internet protocol in hardware, which will help in fast and 
accuarate data transmission in a tremendous netwok enviroment. This chapter will 
explain in detail each process of the 1P Engine by illustrating the system design, flow 
charts and block diagrams that will ease the understanding of the:: system and module 
functions. 
The IP Engine internal block diagram is designed at the part one of this thesis and will be 
developed at the second part of the thesis using the peakFPGA software. Each sub 
process module will be developed and then integreted to perform the whole Engine as an 
Internet Protocol Engine which will process packets independently. The diagram below 
shows the Engine Black Box by explening each in and out pin, followed by the internal 
Black Box of the IP Engine describing the sub process modules and the signals used to 
activate each process. 
48 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design WXES3182 
5.2 IF Engine Block Diagram 
elk 
bufferSelect 
I • Protocol 8 bits 
rs1n 
I • sourceiP 32bits 
fi'ameOata 8 bit Module 1 
tlmeLEDO 
tlmeLE01 
complete 
newOataGram 
newFrame 
• 
.. J dataGramSize 16 b 
frame Type ~ I . 
wrRAM 
1 newFrameByte 
Mocade2 _. 
... 
. 11\kdata 8b1ts • ... 
fi'ameValid 
~ 
. wrAddr 19bits . 
endFrame 
... 
.. 
-
I 
:---· 
·--
:___.. Module3 
... 
Figure 5.1 IP Engine Block Diagram 
49 
Un
ive
rsi
ty 
of 
M
lay
a
System Design WXES3182 
Figure 5.1 shows the IP Engine Block Diagram that consist of9 input pins and 9 output 
pins. The input pins are 
1-clck signal which is used to check for positive transition. 
2-rstn signal to show its in an asynchronous active low reset mode. 
3-complete is a control signal from ram arbitrator to inform that process is complete. 
4-newFrame is an input pin whlch recieve frame from the layer below 
5-frameType tells its a frame for IP when its activated to I . 
6-newFrameByte is a signal which informing that there is a new byte in the stream. 
7-endFrame is to inform its the end of the frame. 
8-frameData is to show that the data is streamed here in this frame 
9-frameValid will determines the validity of frame when endframe is high or activated. 
The 9 output pins are· 
1 0-newDatagram shows that an IP datagram has been fuJly recieved. 
11-bufferSelect indicates location in RAM. 
12-datagramSize shows the size of the datagram is recieved. 
13-protocol determines the type of the datagram whlch is meant to TCP/UDP 
14-sourceiP its to lets the upper protocol know the source lP 
15-wrRAM its a signal to write to RAM. 
16-wrData its a Data to be written in RAM. 
17-wrAddr to send th address lines to be written in RAM. 
18-timeLEDO/l..ED I indicates if buffer 0/1 is busy. 
50 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design WXES3182 
5.3 Internal Block diagram 
The diagram below shows the internal block diagt:am of the IP Engine sub modules 
which illustrate the sub process of the Engine. This design is for the process of recieving 
a packet from the Ethernet and processing it, then passing it to the upper layer of the 
TCP/IP stack. 
~----------~--- i~~------------~----------------------------~1 
, , 
stldle 
.~--.. StGetData 
----------~ B~e 
:---,:-__ .,.St~etNewB~e 
I 
stDowrite 
..,..._----1~ stGetHeaderLen 
stSetupWrite 
Data 
! 
stGetHeader 
B~e 
Figure 5.2 lP Engine internal Block Diagram 
-
stStoreHeader 
B~e 
f 
-
- stComplete 
Fragment 
-
I 
r 
51 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design 
1 
3 
4 
2 
stldle s 
WXES3182 
1-clk 
2-rstn 
3-newFrame 
4-frameType 
S-nextstate 
This process will wait for a new frame arrival and check on the frame type. If both 
signals are enabled then it will determine the next state by getting the header length. 
1 
stGetHeaderLen 2 
3 I 
1-newByte 
2-inByte 
3-framdatalatch 
4-nextstate 
The IP version will be checked here if it's not equal to 4 then next state will remain idle, 
if not it will go to next state, wlllch is stGetHeaderByte. 
52 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design 
3 
1 
stGetHeaderByte 2 
WXES3182 
1-bufferSelect 
2-nextState 
3-cnt 
4-checkswn 
5-tdentO 
6-positionO 
7-identl 
8-position 1 
Jf we finished getting the header and processing them, start on the data and detennine 
which buffer should be used to handle the data. 
stStoreHeaderByte 
1 2 
3 
4 
1-framedataLatch 
2-nextState 
3-newByte 
4-inByte 
Operate on each value of the header received according to count and store the header in 
RAM. 
53 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design WXES3182 
1-newFrame 
2-endFrame 
3-cnt 
4-fragmentoffs 
5-getNewbyte 
6-nextstate 
et 
1 
2 
5 
stGetData.Byte 6 
If we haven't finished receiving the data, then check the counter if it is not equal to 
datagram length then the nex1 s1ate will be stSetupWriteDataByte. Else if endFrame and 
frame Valid is enabled, that means that the frame is finished and was valid. Then it will go 
to the next state which is stSetupwriteData. 
stCompleteFragment 
1-morefragment 
2-bufferSelecting 
3-nextState 
4-resetldent 
5-newdatagram 
6-datagramSize 
1 2 3 
A 
5 
6 
At this state a signal will be sent to the transport layer infonning that the datagram is 
finished or await of a next frame. 
54 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design 
1 stDoWrite 
2 3 4 
5 
6 
7 
WXES3182 
)-complete 
2-
bufferSelectin 
g 
3-incWrCnt 
4- nextState 
5-wrRAM 
6- wrAddr 
7- wrData 
At this state we wait for RAM write request to be serviced if complete, then we can write 
data to RAM 
1 
1-newFrameByte 
2-inCnt 
3-LatchFrameData 
4-nextstate 
stGetDataByte 4 
The last process is to check ifthere 1s newFameByte. If yes then wait for it to arrive. 
55 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Design 
5.4 Process Flow 
GetHeaderLen 
GetHeaderbyte 
storeHeaderbyte 
Frgure 5. 3 Process Flow 
WXES3182 
SetupWriteData 
-------~ Byte 
56 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 6 
System Implementation 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
6.1 Introduction 
After the design of the architecture has been fixed, the implementation of the IP Engine was then 
bieng started. Figure 6.1 shows a simplified design process including both synthesis and 
simulation for one or more programmable logic design. The key for understanding this process 
and to understand how to use VHDL, is to know the Importance of test development. Test 
development should begin as soon as the general requirements of the system are known. 
Design Entry 
Synthesis Functional Simulation 
Device Mapping 
Figure 6.1 Step,\ In implementing A VHDL Module 
VHDL is used for design entry. After bieng captured into a design entry system using a text 
editor, the VHDL source code must be passed diresctly to synthesis tools for implementation in 
a specified type of device. The its input to simulation, allowing it to be functionally verified. 
57 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
On the test development side, script files or VHDL test benches can be createdto exercise the 
cercuit to verify that it meats the functional and tilping constrams of the specification. Thesse 
script files may be entered using a text editor, or rna be generated from other forms of test 
stimulus information such as graphical waveforms. 
6.2 Design Entry 
An external editor has been used for the design entry. Xilinx Foundation offers three types of 
design entry editor for the prefemces of the user. Three of these editors are listed below: 
• HDLeditor 
• FSM editor 
• Schematic Editor 
In this IP Engine modeling project, the HDL editor has been chosen as the design entry editor for 
all the entities. There is a HDL destgn w1zard in the HDL editor to ease the users to code the 
ports of every entity. The user just have to specify the input and output ports of the entity 
following the wizard, then the editor wilt generate the entity declaration codes. After that, the 
users have to code the architecture of the entity that has been created using VHDL whether in 
behavioral or stuctural form . Bestdes, the VHDL editor also has a synthests tools for syntax 
checking of the written codes. 
58 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
~ Modeline Entity 
1-lere is the behavior description of each entity is listed below: 
1 
2 stldle 3 
3-newFrame 
4-frameType 
5-nextstate 
This process wi11 wait for a new frame arrival and check on the frame type. If both signals are 
enabled then it will determine the next state by getting the header length. 
When stldle => 
-- wait for the arrival of a new frame that has a frameType of 1 
i r new Frame = '0' or frame Type = '0' then 
else 
nextState <= stldle; 
- reset the counters for the next datagram 
rstCnt <= '1'; 
rstWrCnt <= '1'; 
newHeader <= '1'; 
nextState <= stGetHeaderLen; 
- get header length and version information 
getNewByte <= '1'; 
end it: 
59 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
1 
stGetHeaderLen 2 
1-newByte 
2-inByte 
3-framdatalatch 
4-nextstate 
The IP version will be checked here if it' s not equal to 4 then next state will remain idle, if not it 
Wi II go to next state, which is stGetHeaderByte. 
''hen stGetHeaderLen => 
--check ip version 
ifframeDataLatch (7 downto 4) /= 4 then 
nextState <= stldle; 
e lse 
nextState <= stGetHeaderByte; 
-- send data to checksum machine 
inByte <= frameDataLatch; 
new Byte <= '1 '; 
--get the header length in bytes, rather than 32-bit words 
nextHeaderLen <=- frameDataLatch (3 downto 0) & "00"; 
end if; 
60 
Un
iv
rsi
ty 
of 
Ma
lay
System Implementation WXES3182 
1 
stGetHeaderByte 2 , 
3 
1-bufferSelect 
2-nextState 
3-cnt 
4-checksum 
5-identO 
6-positionO 
7-identl 
8-position 1 
If we finished getting the header and processing them, start on the data and determine which 
buffer should be used to handle the data. 
\Vhen stGetHeaderByte => 
-- if we've finished getting the headers and processing them, start on the data 
--once finished, refragmentingwill come next 
if cnt = header Len then 
-- on 1 y operate on data meant for us, or broadcast data 
i r checksum = 0 then 
-- determine which buffer should be used to handle the data 
1 f identO = targetldent and timeoutO /= FULL TIME then 
--the ident matches and the timeout counter has not expired 
nextBufferSelect <= '0'~ 
- accept the frame if its offset matches what we think it should be 
- this drops out of order and duplicate frames. 
if positionO = fragmentOffset & "000" then 
nextState <= stGetDataByte~ 
else 
nextState <= stldle; 
end if; 
61 
Un
iv
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
elsif identl = targetldent and timeout} /= FULL TIME then 
--the ident matches and the timeout counter has not expired 
nextBufferSelect <= '1 '; 
-- accept the frame if its offset matches what we think it should be 
-- this drops out of order and duplicate frames. 
tf positionl = fragmentOffset & "000" then 
nextState <= stGetDataByte; 
else 
nextState <= stldle; 
end if; 
e lsi f (identO = 0 or timeoutO = FULL TIME) and fragmentOffset = 0 then 
--The ident doesn't match either of the buffers so check if buffer 0 
-- is free. If ident = 0 or the timeout has expired then the buffer is free 
--This must be the first fragment if it is to go here so also check the offset 
nextState <= stGetDataByte; 
nextBufferSelect <= '0'; 
elsif(identl = 0 or tirneoutl = FULL TIME) and fragmentOffset = 0 then 
-- The ident doesn't match either of the buffers so check if buffer 1 
-- is free. If ident = 0 or the timeout has expired then the buffer is free 
-- This must be the first fragment if it is to go here so also check the offset 
e lse 
else 
nextState <= stGetDataByte; 
nextBufferSelect <= '1 '; 
nextState <= stldle; 
end if~ 
-- ignore frame as it wasn't for us 
nextState <= stldle; 
end if; 
-- otherwise get the next header byte from RAM 
else 
nextState <= stStoreHeaderByte; 
getNewByte <= '1 '; 
end if; 
62 
Un
ive
r i
ty 
of 
Ma
lay
a
System Implementation WXES3182 
.. 
3 
stStoreHeaderByte 4 
. 1-framedataLatch 
2-nextState 
1 2 3-newByte 
4-inByte 
Operate on each value of the header received according to count and store the header in RAM. 
When stStoreHeaderByte => 
nextState <= stGetHeaderByte; 
--operate on each value of the header received according to count 
--count will be one higher than the last byte received. as it is incremented 
-- at the same time as the data is streamed in, so 
-- when the data is seen to be available, count should also be one higher 
-- Send data to checksum process 
newByte <- '1'; 
inByte <= frameDataLatch; 
-- Operate on data in the header 
case cnt(4 do\'.nto 0) is 
when "000 11 " => 
nextDatagramLen (10 downto 8) <= frameDataLatch (2 downto 0); 
when "00100" => 
nextDatagramLen (7 downto 0) <= frameDataLatch; 
when "00101 11 I "00110" => 
shiftlnldentification <= '1'; 
when "00111 11 => 
shiftlnFragmentOffset <= '1 \ 
latchMoreFragments <= '1'; 
when "0100011 => 
shiftlnFragmentOffset <= '1'; 
when 110101011 => 
latchProtocol <= '1'; 
\\hen 1101101 II I 1101110" I "01 111 11 I 11 1000011 -> 
shiftlnSourceiP <= '1'; 
when "10001" I "1001011 I "10011 11 I "1010011 => 
shiftlnTargetiP <= '1 '; 
when others .....:> 
end l:asc; 
63 
Un
iv
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
1 
2 stGetDataByte 
3 4 
5 
6 
1-newFrame 
2-endFrame 
3-cnt 
4-fragmentoffset 
5-getNewbyte 
6-nextstate 
If we haven't finished receiving the data, then check the counter if it is not equaJ to datagram 
length then the next state will be stSetupWriteDataByte. Else if endFrame and frame Valid is 
enabled, that means that the frame is finished and was valid. Then it will go to the next state 
Which is stSetupwriteData. 
When stGetDataByte => 
-- if we haven't finished receiving the data, then 
if cnt /= datagrarnLen then 
nextState < stSetupWriteDataByte; 
- read an IP data byte from the data stream ... 
getNewByte < , 1 '; 
elsif endFrame = '1, and frame Valid = '1, then 
-- this means that the frame is finished and was valid 
-- so update the buffer data and go to final state 
nextState <= stCompleteFragment; 
resetTimeout <= '1 '; -- start/restart timer 
latchldent <= 'I'; --allocate buffer to data 
if fragmentOffset - 0 then -- check if this is the first fragment 
resetPosition <= , 1 '; -- give position initial value 
else 
update Position <= '1 '; -or add to the amount of data stored 
end if; 
elsif endFrame = '1' then 
e lse 
-- the frame is complete but not valid so ignore it 
nextState <= stldle; 
--the frame is not complete so keep looping until it is 
nextState <= stGetDataByte; 
end if; 
64 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
stCompleteFragment 
1 2 3 
A 
5 
6 
l-01orefr.a~ent 
2-bufferSelecting 
3-nextState 
4-resetldent 
5-newdatagram 
6-datagramS ize 
At this state a signal will be sent to the transport layer informing that the datagram is finished or 
await of a next frame. 
When stCompleteFragment -> 
--Signal the transport protocols if the datagram is finished 
-- or await next frame. 
nextState <= stldle; 
ifmoreFragments = '0' then 
-- Last frame so : 
newDatagram <= '1'; --notify higher protocols it's ready 
resetldent <=- '1'; --free buffer for next time 
ifbufferSelectSig = '0' then --output datagram size from correct buffer 
datagramSize <= positionO; 
e lse 
datagramSize <=position I; 
end it; 
end if; 
65 
Un
ive
rsi
ty 
of 
Ma
lay
System Implementation WXES3182 
1 stDoWrite 
2 
j 
4 
s 
6 
7 
1-complete 
2-bufferSelecting 
3-incWrCnt 
4- nextState 
5-wrRAM 
6- wrAddr 
7- wrData 
A.t this state we wait for RAM write request to be serviced if complete, then we can write data to 
RAM. 
When stDo Write => 
--Wait for RAM write request to be serviced 
tf complete = '0' then 
else 
-- keep signals asserted until complete is high 
nextState <= stDo Write~ 
wrRAM <= '1' ~ 
-- The address is based on the fragment offset and buffer 
ifbufferSelectSig = '0' then 
wrAddr <= "001" & (wrCnt + (fragmentOffset & "000"))~ 
else 
wrAddr <= "010" & (wrCnt + (fragmentOffset & "000")); 
end if; 
wrData. <= frameDataLatch; 
-- when write is finished, go to returnState 
nextState <= returnS tate~ 
incWrCnt <= '1'; 
end if~ 
66 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
.1 
stGetDataByte 
2 3 
4 
1-newFrameByte 
2-inCnt 
3-LatchFrameData 
4-nextsta'te 
The last process is to check if there is newFameByte. If yes then wait for it to arrive. 
When stGetNewByte => 
if newFrameByte = '0' then 
else 
-- wait for new byte to arrive 
nextState <= stgetNewByte; 
-- latch new byte and go to returnState 
nextState <= returnState; 
incCnt <= '1'; 
latchFrameData <= '1'; 
end if; 
67 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
§:.4 Model Analysis 
Once entity is described in VHDL, it was then validated using an analyzer and a simulator that 
are part of a VHDL system. The first step in the validation process is analysis. The analyzer 
takes a file that contains one or more design units and compiles then into an intermediate form. 
During compilation, the analyzer validates the syntax and performs static semantic checks. The 
generated intermmediate form is stored in a specific design library that has been designated as 
the working library. The language analyzer always compiles descriptions into this library, 
therfore at the given time, only one library is updated. 
A design library is a location in the host environment where compiled descriptions are stored. 
Each design library has a logical name that is used when referring to a library name to a physical 
storage location which are provided externally by the host envuonment and is not defined by the 
language. One possible way of providing the mapping of physical names to logical names is by 
specifiying the mapping in a special file that the VHDL system could interpret. 
~Synthesis 
After the VHDL entities are described, the next step is to synthesis those entities. Synthesis in 
the domain of digital design is a process of translation and optimization. For example, layout 
synthesis is a process of taking design netlist and translating it into a form of data that facilitates 
Placement and routing, resulting in optimizing timing and chip design. 
68 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Implementation WXES3182 
Logic synthesis, on the other hand, is the process of taking a form of input, translating it into 
form, and then optimizing it in terms of propagation delay and/or area. 
A.fetr the VHD L code is translated into the internal form, the optimization process can be 
peformed based on constaints such as speed, area, power and so on. After the synthesis process 
bieng completed, then the whole module will be simulated to testify that the behavior description 
IS correct. The simulation will then be discussed in the next chapter. 
69 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 7 
System Testing & Evaluation 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Testing And Evaluation WXES3182 
11 Simulations and Testing 
A simulator is nothing more than a program written in some high-level language, the 
program is designed to model the features of a computer system that the perfonnance 
analyst is interested in studying. Since the simulator is just a program, it can be easily 
modified (at least in principle) to change the behavior of the system being studied. For 
instance, you may want to change a processor simulator to increase the number of 
pipeline stages, or to modify some characteristics ofthe cache. 
While simulators are very flexible, it is impossible to model all of the numerous details 
and complex interactions that occur in real systems. Simplifying assumptions must be 
made to make the simulation tractable. These simplifications limit the accuracy of the 
results of a simulation study compared to the results obtained when measuring real 
systems. Even with their limitations, simulators are very powerful tools in the computer 
systems performance evaluation tool chest They are particularly useful when trying to 
predict the performance of a system that has not yet been built. 
Simulators used in computer systems performance evaluation come in two main varieties. 
Execution-dnven simulators actually execute an application program, such as a standard 
benchmark program, as they perform the simulation of the system being evaluated. That 
is, the output of the application program being simulated will be the same as if the 
program were executed on some real system. At the same time, the simulator will be 
modeling the behavior of the system being evaluated. The output of the simulator then, is 
the infonnation about the system. 
70 
Un
ive
rsi
ty 
of 
M
lay
a
System Testing And Evaluation WXES3182 
In contrast to execution-driven simulators, trace-driven simulators read and process a list 
of events. The simulation steps performed then depend on the type and sequence of 
events read. A cache simulator, for instance, could be driven with the sequence of 
addresses that were referenced by a processor when executing some benchmark program. 
A list of input events, such as the list of addresses for the cache simulator, can be 
recorded from the execution of an actual system, or from an execution-driven simulator. 
Another alternative is to generate a sequence of random numbers that follow a desired 
probability distribution. The assumption in this latter case is that the distribution of 
random numbers will be similar to the sequence of events that would be observed in a 
real system. 
At this point in design cycle, the block diagram has been generated and the process 
begins to test and debug the design using the VHDL simulator. The simulator used is a 
PeakFPGA Simulator. Note that in contrast to the conventional approach there is no need 
for gate level yet, and the design will be tested at the higher level of behavior called RTL 
The code for the VHDL design and test patterns will type in binary and hexadecimal 
base, which are the input source for the VHDL simulation tools. Besides, the simulator 
has a tool called script editor that can assist in creating portions of the test code and are 
useful in reducing the amount of effort in entering 1/0 signals and the structure of the 
major design blocks. [n either case, the simulation process similar to traditional logic 
simulation. The source files are compiled and checked for error, then are linked together 
71 
Un
ive
rsi
ty 
of 
Ma
lay
System Testing And Evaluation WXES3182 
with any other blocks that are part of the design. The simulation flow is the same in that 
desired input and output signals could be monitored either as graphical waveforms or as 
tabular listing in order to verify the proper behavior of the circuit. This simulator also 
provides the basic capabilities to allow the simulation to be executed for a short and long 
time interval to be a step through operation one at a time, and to run some desired 
breakpoint condition. 
7.2 Cycle Simulation 
Cycle simulation is a technique for simulating digital circuits that do not take into 
account the detailed circuit timing, Rather cycle simulation competes the steady state 
response of the circuit at each clock cycle boundary. The main benefit of cycle simulation 
over event driven simulation is faster simulation sped, provided the circuit been 
simulated has event activity over 15%.Although a design can never be exhaustively 
tested by functional verification alone, by using cycle simulation, on larger circuits in the 
same amount of time. Since only steady states responses need to be computed for each 
clock cycle, it is possible to perform circuit at levelization operations at compile time as a 
apposed to simulation time which reduces the number of circuit evaluation. 
72 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Testing And Evaluation WXES3182 
7.3 System Testing 
The system is tested using the simulator which will validate the description of each 
module by showing the result corresponding to each given value. The table below shows 
the process of state machine process according to a given input value. 
lnput Conditions Output/States 
new Byte & frame Type =· 1 , It will receive a packet from the layer 
below and process it by going to the next 
state which is stGetHeaderLen. 
frameDataLatch = , 01 00., This will check the IP version, if its four 
then accept packet and go to next state 
which is stGetHeaderByte 
cnt = header Len and checksum - 0 If the counter number of byte received = 
header length and header is free from error 
then continue the process of receiving the 
rest of packet. 
endFrame = '1' and frame Valid - '1' This means that the frame is finished and 
was valid, so update the buffer data and go 
to next state stSetupWriteDataByte 
if complete = '0' Wait for RAM write request to be serviced 
Table 7.1 State Machine process Accordmg to mputs 
73 
Un
ive
rsi
ty 
of 
Ma
lay
a
System Testing And Evaluation WXES3182 
The snapshot below shows the output simulation of the receiving Packet module. 
r:..i VIIOL Simul.>tor [II S T INHRI'U T] ~ r:::: :'x 
Figure 7.1 A Snapshot of the System Simulatton output 
74 
Un
ive
r i
ty 
of 
Ma
lay
a
--(i; ChapterS 
• Discussion 
Un
ive
rsi
ty 
of 
Ma
lay
a
Discussion WXES3182 
Perceived drawbacks of hardware implementations are inflexibility and inability to 
handle complex tasks. The proposed software to be used is reconfigurable Field 
Programmable Gate Arrays (FPGAs) solve the inflexibility problem. 
This device is a compromise between general-purpose processors used in software 
implementations at one end of the flexibility-performance spectrum, and Application 
Specific Integrated Circuits (ASICs) at the opposite end of this spectrum. FPGAs can be 
reprogrammed with updated versions as signaling protocols evolve while significantly 
improving the ca11 handling capacities relative to software implementation. 
As for the challenge posed by the complexity of signaling protocols, implementing the 
basic and frequently used operations of the protocol in hardware, and relegate the 
complex and infrequently used operations (for example, processing of optional 
parameters) to software is carried out in this project. The proposed work items for this 
project are the implementation of a typical signaling protocol in FPGAs. 
75 
Un
ive
rsi
ty 
of 
Ma
lay
a
Chapter 9 
Conclusion 
Un
ive
rsi
ty 
of 
Ma
lay
a
Conclusion WXES3182 
This research is on developing a TCPIIP engine in hardware, which completely off-loads 
performance intensive Internet protocols from processor systems. When compared to 
software implementations, which often consumes 60-90% of available processor cycles, 
the new offload architecture offers cost-effective system advantages. 
Implementation of signaling protocols in hardware poses a considerably larger number of 
problems than implementing user-plane protocols such as TP. The implementation will 
demonstrate the hardware handJing of processing IP headers. Overall, this prototype 
implementation of a signaling protocol in FPGA hardware has demonstrated the potential 
for lOOx-lOOOx speedup vis-a-vis software implementations on state-of-the-art 
processors. 
76 
Un
ive
rsi
ty 
of 
Ma
ay
a
References 
Un
ive
rsi
ty 
of 
Ma
lay
a
References WXES3182 
Two type of references which I have used for writing this report which are: 
Books: 
• TCP/rP lllustrated, Volume 1: The Protocols by W. Richard Stevens. 1994. 
• TCPIIP Tilustrated, Volume 1: The Implementation by Gary R. Wright and W. 
Richard Stevens. 1995. 
• TCP/LP Network Administration by Craig Hunt, from O'Reilly. Second edition, 
1998. 
• Internetworking With TCPIIP Volume 1: Principles, Protocols And Architecture 
by Douglas Comer. Fourth edition, 2000. 
• Data Telecommunication by Carle's N. Thurwachter, Jr. 
• TCP/lP LEAN Web Servers For Embedded Systems by Jeremy Bentham. 
• Data Communications by Myron E. SVEUM. 
• VHDL And Verilog Compared And Contrasted by Douglas J. Smith. 
• VHDL gujde by Douglas Perry. 
Journals And Articles From Tbe Internet: 
• Braden R. , ed. "Reqwrements for Internet Hosts- Communication Layers," RFC 
1122. October 1989. 
• Carpenter B., ed. "Architectural Principles of the Internet," RFC 1958. June 1996. 
• Domain Name FAQ http://www.internic.net/fag.html. 
Un
ive
rsi
ty 
of 
Ma
lay
a
References WXES3182 
• Jacobsen, Braden, Borman, ed "TCP Extensions for High Performance," RFC 
1323. May 1992. 
• Postel, Jon, ed "Transmission Control Protocol," RFC 793. September 1981. 
• "User Datagram Protocol," RFC 768. August 1980. 
• "Internet Control Message Protocol," RFC 792. September 1981 . 
• "Internet Protocol," RFC 791. September 1981. 
• Reynolds, Braden, ed "Internet Official Protocol Standards," RFC 2600. March 
2000. 
• Tanenbaum AS. Computer Networks. 3 1e. Prentice Hall. 1996. 
• Understanding IP addressing http://www 3com.corn/nsc. 501302 html. 
• www.Peakfpga.com 
• www.Xilinx.com 
• www. howstauff\vorks.com 
• www.vhdl-online.de--vhdl-
• http://www.gmvhdl.com/VHDL.html 
Un
ive
rsi
ty 
of 
Ma
lay
a
Appendices A: VHDL Source Code 
Un
ive
rsi
ty 
of 
Ma
lay
a
= rr layer for network stack project. This accepts byte-stre~s of data from 
.... t lt: cthernct layer and decodes the IP information to send data to the upper 
..... ~rotocols. Reassembly is implemented a_o<;f.t;wo incoming packets can be 
.... eassembled at once. Reassembly only works if incoming packets come in 
Order. 
-~~~~~~~---------------------------------------------------------~ 
I 
rbrary IEEE· 
Usc IE ' USe EE.std_ logic_ tt64.nll ; u~c lEEE.std_logic_ unsigned.all; 
Work. global_ constants.al l; 
. \ 
. ( 
Ctl( · 
rty Internet is 
port ( · 
elk: in std_ logic; --clock 
rstn: in std_ logic; -- asynchronouse active low reset , 
complete: in std_logie; --control signal from ram arbitrator 
newFramc: in std_ logic; · --new frame received from the layer below 
frameTypc : in std_ logic; --frame type = '1' for TP 
newFrameByte: in std_ logic; --signals a new byte in the stream 
frameOata: in std _ logic _ vector (7 down to 0); -- data is streamed in here 
endFrame: in std_ logic; --signals the end of a frame 
frame Valid: in std_logic; -determines validity of frame when endFrame is high 
newOatagram: out std_ logic; --an IP datagram has been fully received 
buffetSelect: out std_ logic; --indicates location of data in RAM 
datagram Size: out std_ logic _ vector ( 15 down to 0); --size of the datagram received 
protocol: out std_ logic _vector (7 dnwnto 0); --protocol type of datagram 
sourcelP: out std_ logic_ vector(31 down to 0);-- lets upper protocol know the source IP 
wrRAM: -.>ut s td_logic; --signal to write to the RAM 
wrData: out std_logi _ vector (7 down to 0); --data to write to the RAM 
WrAddr: tllll s td_ logi _ vector (I 8 dmvnto 0);-- address lines to the RAM for writing 
timeLEDO: out std logic; --indicates if buffer 0 is busy 
).; timeLED t : out std_ logic --indicates if buffer I is busy 
~lld. 
rnternet: 
archit.-•·t . h . . -
"" urc mtemet arc ol mtcmet rs 
: ~~naJ declarations 
tyPe M states 
ST ATETYPE rs (stldle, stGetHeaderLen, stGetHeaderByte, stStoreHcaderByte, 
st~ stGctDataByte, stSetupWriteDataByte, stCompleteFragment, stDoWrite, 
~i .etNcwBytc ); 
gnal PresS tate: Sf A TETYPE; 
Un
ive
rsi
ty 
of 
Ma
lay
a
s~gnitl ncxtStatc· ST A TETYPE· St~n· I . , 
" '' returnS tate: ST A TETYPE; -- Used to return from RAM 'subroutines' 
stgn·tl h ~, .. : caderLen: std _logic_ vector (5 down to 0); -- IP datagram header length 
s1~~:' 1 ncxtiieaderLen: std_ logic _vector (5 downlo 0); --signal for the next header lengh stgn~:: datagramLen: std_ logic_vector ( 10 downto 9); -- IP datagram total length in bytes 
len' ncxtDatagramLcn: std logic vector ( I 0 dm' tHo 0)~-- signal for the next datagram gth 
Sigllill d ~ign:t ataLen: s td_ logic_ vector ( 10 down to 0); -- IP datah'Tam data length in bytes 
1 
ncxtDataLen: std_logic_ vector (I 0 down to 0); --signal for the next data length 
Stgnnl · 
sig lncCnt: std_logic; -- incrcrr.ents byte address counter 
Stg 11~ 1 rstCnt: std _logic; -- resets byte address counter 
llal cnt: std_logic_ vcctor (10 downto 0);-- byte address counter for the frame received 
Stgll:t l . 
!iign JncWtCnt: std_logic; --increments the write address counter ~tg ~I rstWrCnt: std_ logic; -- resets the write address counter 
11
''
1 WrCnt: std _ logic_ vector ( 15 dm\1\lo 0);-- write address counter for storing that data 
Stgna) d ~· oWrite: std logic· 1&nn1 - ·' 
' gctNewByte: std _logic; 
-- tell RAM controller to write data 
--wait for new data on the stream 
Stgn<i)J 
,, atchFrameData: std lome · - latch in the data from the stream !!llill f1 - b ' , 
" rameDataLatch: std_ logic_vector (7 downlo 0); -- register to hold latched data 
s'gn· t 
s1gn::, ta~gctlP: std_ logic_vcctor (31 downto 0); 
' ShtftlnTargetiP: std_ logic; 
-- stores target IP ( dcsti nation) 
-- signal to shift in target lP 
\tgn I ~1!!11: 1 shiftinSourceiP: std_ logic; 
" latchProtocol: std_logic; 
. . .. 
-- stores source IP 
--signal to shift in source lP 
'ch .-..-x1g11.~cksum s ignals 
C() 11 checkS tate : std logic · NST - ' toN'. _ANT stMSB: std_logic := '0'; 
S I ANT stLSB: std_ logic :=' I'; 
~'gn I . 
\1111 ". checksum Long : std logic vector ( 16 down to 0); --stores 2's complement sum \l l'll I - -
' c lccksumlnt.: std_ logic_ vector ( 15 dmvnto 0); --stores l's complel)lcnt sum 
. 
~ ' 
'gtlaiJ . . 
atchMSB : std_ logic_ vector (7 do\\'nlo 0); -- latch in firSt byte 
~'"n 1 ~1~11a 1 ncwHcader: std logic; ~ 4 -
,,t,11. new Byte: std _ logic; 
'' •tl lnstNcwBytc : std_l ogic~ 
-- resets checksum' 
--indicate new byte 
--detect changes in newBytc 
Un
ive
rsi
ty 
f M
lay
a
Sign· I . 
cl tneyte: std_logic_ycctor (7 downlo 9); -- byte to calculate 
sign·tl I . 
· c lecksum: std_logic_ vector ( 15 do'vvnto 0); -- current checksum 
.... bl!fferScJect is used both to indicate which area in RAM to write to 
; and to indicate which buffer control ssignals are to operate on 
signal ncxtBufferSelect: std logic; --allows memory ofbufferSelect 
· l~llul buffcrSclectSig : std_logic; --allows memory ofbuiTcrSclcct 
s~gnat ide~tification: std logic vector ( 15 do\~·nlo 0); --identification field ( Slgl·t . - -1 <~ shtftlnldentification: std_Jogic; --signal to shift in identification 
sign·t fi ~ Sl• ~ r~gmentOffset: std _ logic _vector ( 12 downto·O); 
}~11 <~ 1 shtftlnFragrnentOffset: std logic· 
- fragment offset field 
s'gn·tl - ' 
-- sit,lJlal to shift in offset 
' moreFragmcnts : std logic; 
lll"ll'll -
-- more fragments Oag 
o <~ atchMoreFragments : std_Jogic; --signal to determine MF flag 
• 
.... The idcnt signals are of the form "souurce IP : protocol : identificatio~" and 
•• a ~ reUsed in reassembly. 
''
1gn 1 • si ,
1 
a ~rg~tiQent: std_logic _ vector (55 downlo 0); --incoming fram~'s ident 
si~,~~l ~dcntO: std_logic _ vector (55 down to 0); --current ident for buffer 0 
St~ lultdent1: std_logic _vector (55 down to 0); --current ident for buffer 1 
~~~nat latchldent: std_logic; - latch targetldent into s~i'fied buffer ident 
gnat resetldent: std_ logic; --clear ident of specified buffer to indicate a vacant buffer 
lil"n·t si~ 1 ' 1 positionO: std_ logic _vector ( 15 dmvntn 0);-- stores expected offset of next fragment s1~1 lal Position 1: std_ logic _vector (15 down to 0);-- stores expected offset of next fragm~nt $i~ lal updatePosition: std_logic; --add dataLen to current position 
gn<l t rcsetl>osition: std_logic; --set position to be dataLen 
constant TIMER WIDTH : INTEGER:- 30; --can be used to vary timeout length 
~:~n:tl t~meoutO: std_ logic _ vector (TIMERWLDTH- I down to 0); --timeout counter 
si~~~ l ttmeoutl: std_logic _ vector (TIMER WIDTH- I downto 0); --timeout counter 
g •11 resetTimeout: std_ logic; --start timeout counter 
ton. 
lastnnt FULL TIME: std_ logic _vector (TfMERWIDTH - I downto 0): (others=> 'l'); --
Value of timeout counter 
~~'' I I , \i~,':' sourcciPSig : std_ logic _vector (3 1 downto 0); 
:: lttl protocolSig: std_logic _vector (7 tlo" nto 0); 
I)(! gin 
~~These signals arc used instead of buffer ports 
sourceiP <= sourceiPSig; 
Protocol <= protocoiSig; 
• 
-- internal signal for output 
-- internal signal for output 
---~---.__... . "-~--· -~-~ ~--.&.4- ~-- - ~- "' . _....._ .. . ... 
Un
ive
rsi
ty 
of 
Ma
lay
a
bufferScleet <= buficrSeleetSig~ 
·~ Indicate when buffers arc busy 
ltrneLEDO <= '0' '.vhcn timeoutO = FULL TIME or identO = 0 else '1 '· 
timcLED I ·• '0' \\ h~.:n timeout I FU LLTIME or identl 0 dsc 'I'~ 
·-Some definitions to make further code simpler 
largetldent <= soureeiPSig & protocoiSig & identification; 
dataLen <= datagramLen- ("00000" & headerLen); 
·-main clocked process 
Process (rstn, elk) 
begin 
ifrstn = '0' then --only need to reset required signals 
presState <= stldle~ 
retumState <= stldle~ 
identO <= (others => '0')~ 
ident 1 <= (others => '0'); 
timeoutO <= FULL TIME~ 
timeout! <= FULL TIME; 
dsifclk'event and elk = 'I' tlll'n 
-- Go to next state wither directly or via a RAM state. 
• --If a RAM write or a new byte from the data stream are requested, 
--.the state ~achine stores nextState in returnS tate and goes to the 
-- required state. After completion, the state machine will go to 
--returnS tate. This is like a 'subroutine' in the state machine. 
ir do Write = 'I' then 
presState <= stDoWrite; 
retumState <= nex1State; 
cbi I' gctNewBytc = 'I' th~.:n 
presState <= stGetNewBy·t~~ · 
retumState <= nextState; : • · 
dsc 
presState <= nextStat~ 
end i r~ 
-- increment and reset the counter synchronously to avoid race conditions 
if incCnt = '1' then 
cnt <= cnt + 1 ; 
dsifrstCnt = '1' then 
cnt <= (others => '0'); 
c!H.I i r~ 
--increment and reset the write address counter synchronously 
Un
iv
rsi
ty 
of 
Ma
lay
a
• 
• ... Y•.:. ... _ 
ifincWrCnt = 111 then 
wrCnt <= wrCnt + 1 : 
e lsifrstWrCnt = 11 I then 
wrCnt <= (others => '0'); 
end if; 
-- .latch data read from RAM 
i r latchFramcData = '1 I then 
frameDataLatch <= frameData; 
end if; - \ ( 
--these signals must remember their values once set 
headerLen <= nextHeaderLen; 
datagramLen <= nextDatagram.L~.f1;· 
--shift registers and latches to hold important data 
ifshiftlnSourceiP = '1' t~ 
sourcefPSig <= source1PSig(23 down to 0) & frameDataLatch; 
end it; 
if'shiftlnTargetTP = '1 1 then 
TargetiP <= TargetiP(23 downto 0) & frameDatalAttch; 
end i r; 
iflatchProtocol = '1' then 
protocoJSig <= frameDataLateh; 
end it; 
i r shiftlnFragmentOffset = '1 I then 
. t 
fragmentOffset <= fragmentOffset (4 downto 0) & frameDataLatch; 
end if; 
i r latchMoreFragments '1' then 
moreFragments <= framcDataLatch(5); 
end it; 
irshiftlnldentification = '1' then 
identification <= identification (7 do\\nto 0) & frameDataLatch ; 
end ir~ 
-- bufferSelect will remember its previous value 
bufferSelectSig <= nextBufferSelect; 
--handle timeout counters, rcsetTimeout will only reset the current buffer 
I r reset Timeout = 'J' then 
if bufferSelectSig = '0' then 
Un
ive
rsi
ty 
of 
Ma
lay
a
• 
else 
timeoutO <=(others -> '0'); 
else 
timeout! <= (others > '0'); 
cn<..l iJ~ 
-- increment timeout counters but. don't let them overflow 
iftimcoutO /-- FULL TIME then 
timeoulO <= timeoutO 1 I ; 
else 
timeoutO <= FULL TIME; 
end if; 
iftimeout l /=FULL TIME then 
timeout I <= timeoutl + 1; 
else 
timeoutl <= FULL TIME; 
l.!nd if; 
end il~ 
--the following signals will operate only on the current buffer which 
-- is chosen with bufferSelect. 
ifbufferSelectSig = '0' then 
else 
- manage the ident register of the buffer 
if latchldent = '1' then 
identO <= targetldent; 
clsifresetldent = '1' then 
identO <= (others => '0'); 
end i r; 
-- manage the position register of the buffer 
if resetPpsition = '1' then 
positionO <= "00000" & dataLen; 
clsir updatePosition = '1' then 
positionO <= positionO 1 dataLen: 
l'lld ir; 
-- manage the ident register of the buffer 
iflatchldent = '1' thGn 
ident 1 <= targetldent; 
dsir resetldent = '1' then 
ident 1 <= ( othc~'O'); 
end if: 
... 
- manage the position register of\he buffer 
if resetPosition = '1' I hen 
position 1 <= "00000" & dataLenj 
Un
ive
rsi
ty 
of 
Ma
lay
a
resetTimeout <= '0'; 
~,;ase presStatc is 
when stldle => 
• 
--wait for the arrival of a new frame that has a framcTypc of r 
ifnewFrame = '0' or frameType = '0' then • 
ds~ 
ncxtStatc < stldlc~ 
--reset the counters for the next datagram 
rstCnt <= '1'~ 
rstWrCnt <= '1'; 
newHeader <= '1'~ 
nextState <= stGetHeaderLen; 
\ 
r 
--get header length and version infonnation 
getNewByte <= '1 '; 
en <.I if~ 
when stGetHeaderLen => 
--check ip version 
if fr~eDataLatch (7 down to 4) /= 4 then 
nextState <= stldle; 
dse 
nextState <- stGetHeaderByte; 
--send data to checksum machine 
inByte <= frameDataLatch; 
newByte <= '1'; 
-- get the header length in bytes, rather than 32-bit words 
nextHeaderLen <= frameDataLatch(3 downto 0) & "00" ~ 
end if; 
when stGetHeaderByte => 
--if we've finished getting the headers and processing them, start on the data 
-- once finished, refragmenting ,...;11 come next 
i r cnt = hcaderLen l hen 
-- only operate on dat~ meant for us, or broadcast data 
i r checksum = 0 then 
--determine which buffer should be used to handle the data 
I r identO = targetldent and timeoutO /= FULL TIME then 
--the ident matches and the timeout counter has not expired 
nextBufferSelect <= '0'; 
--accept the frame if its offset matches what we \hink it should be 
--this drops 0\.;~ of order and duplicate frames. 
i r positionO = fragmentOffset & "000" then 
nextState <= stGetDataByte; 
else --... 
Un
ive
rsi
ty 
of 
Ma
lay
a
•· 
. ,.)' 
--. . ... · nextState <= stldle; 
end if; 
dsif idcntl = targctldcnt and timeout I /=FULL TIME then 
--the ident matches and the timeout counter has not e~~ired 
nextBufferSelect <= '1 '; 
--accept the frame ifits .offsct matches what we think it should be 
--this drops out of order and duplicate frames. 
if position 1 = fragmentOffset & "000" then 
nextState <= stGetDataByte; 
else 
ncxtState <= stldle; 
end it; 
clsif(identO = 0 or timcoutO = FULL TIME) and fragmentOfTset = 0 then 
-- The ident doesn't match either of the buffers so check if buffer 0 
-- is. free. If ident = 0 .or the timeout has expired then the buffer is free 
--This must be the first fragment if it is to go here so also check the offset 
nextState <= stGetDataByte; 
nextBufferSelect <= '0'; 
clsi f ( ident I = 0 or timeout 1 = FULL TIME) and fragmentOffset = 0 then 
--The ident doesn't match either of the buffers so check if buffer I 
-- is free. If ident = 0 or the timeout has expired then the buffer is free 
--This must be the first fra~:,'l11ent if it is to go here so also check the offset 
else 
else 
nextState <= ·~tQ~~E>ataByte; 
nextBufferSelect <= '1'; 
ncxtStatc < stldle; 
d 'f' ~ en 1 ; 
--ignore frame as it wasn't for us 
nextState <= stldle; 
end ir; 
-- otherwise get the next header byte from RAM 
else 
nextState <= stStorel IeaderByte; 
getNewByte <= '1'; 
end ir; 
"h~n stStorcllcadcri3yte 
nextState <= stGetl IeaderByte; 
-- operate on each value of the header received aecordi ng to count 
--count will be one higher than the last byte received, as it is incremented 
--at the same time as the data is streamed in, so 
-- when the data is seen to he available, count should also be one higher 
Un
ive
rsi
ty 
of 
Ma
lay
a
--~~~~~~~-~~~~-:• •i. 
~ 
·' . 
-- Send data to checksum process 
newByte <= '1 '; 
inByte <= frameDataLatch; 
-- Operate on data in the header 
ca~e cnt( 4 down to 0) is 
when "000 11" => 
nextDatagrarnLcn (I 0 <.Jownlu 8) -.... l'ramcDataLaich (2 down to 0); 
when "00100" => 
nextDatagramLen (7 downto 0) <- trdrneDataLatch; 
\\'hen "00 10 I" I "00 11 0" => • 
shiftlnldentification <= '1 '; 
when "0011 I" => 
shiftlnFragmentOITset <= 'I'; 
latchMoreFragments <= '1 '; 
when "01000" => 
shiftlnFragmentOffset <= '1 ' ; 
when "01010" => 
latchProtocol <= '1 '; 
when "01101" I "01110" I "011 I 1" I "10000" => 
shiftlnSourceiP <= '1 '; 
\>vhcn "1000 1" I "1 001 0" I "1 001 1" I "1 0 I 00" => 
shiftlnTargetiP <- '1'; 
when others => 
end case; 
when stGetDataByte => 
-- if we haven't finished receiving the data, then 
I r cnt /= datagram Len then 
nextState <= stSetupWriteDataByte; 
-- read an IP data byte from the data .stream ... 
getNewByte <= '1'; 
clsifendFrame = '1' and frame Valid = '1' then 
--this means that the frame is finished and was valid 
--so update the buffer data and go to final state 
nextState <= stCompleteFragment; 
resetTimeout <= '1'; --start/restart timer 
latchldent <= '1 '; -- allocate buffer to data 
ir fragmentOffset = 0 then --check if this is the first fragment 
-resetPosition <- 'I'; -- give position initial value 
else 
updatePosition <= '1'; -- or add to the amount of data stored 
end i 1; 
clsif endFrame = 'l' then 
--the frame is complete but not valid so ignore it 
ncxtState <= stidle; 
• ,,
Un
ive
rsi
ty 
of 
Ma
lay
a
• 
else . 
-- the frame is not complete so keep looping until it is 
nextState <= stGetDataByte; 
end it; 
when stSetupWriteDataByte => 
ncxtStatc <= stGctDataBytc; 
--Set up to write the byte that was read in stGctDataBytc to RAM 
doWrite <= 'I'; 
wrData <= frameDataLatch; 
when stCompleteFragment => 
--Signal the transport protocols if the datagram is finished 
--or await next frame. 
nextState <= stldle; 
i r moreFragments = '0' then 
-- Last frame so : 
newDatagram <= 'I'; - notify higher protocols it's ready 
resetldent <= '1'; --free buffer for next time 
ifbufferSelectSig = '0' then -output datagram size from correct buffer 
9atagramSize <= positionO; 
else 
datagram Size < = position 1; 
etl<.l i r; 
end it; 
\\'hen stDoWritc => 
--Wait for RAM write request to be serviced 
If COmplete = '0' then .. 
-- keep sif:,lllals a<;scrtcd u'ritil complete is high 
nextState <= stDoWrite; 
wrRAM <= 'I' · , 
-- The address 'is ~sed on the fragment offset and buffer 
ifbufferSelectSig - '0' then 
wrAddr <= "001" & (wrCnt + (fragmcntOffset & "000")); 
else 
wrAddr <= "010" & (wrCnt + (fragmentOifset .& "000")); 
end it; 
. wrData <= frameDataLatch; 
else 
-- when write is finished, go to returnState 
nextState <= retumState; 
incWrCnt <- 'I'; 
end i 1; 
when stGctNewByte => 
Un
iv
rsi
ty 
of 
Ma
lay
a
i r newFrameByte = '0' the·n . 
--wait for new byte to.'i:n=.five 
ncxtStatc <= stgetNcwBytc~ 
cis~ 
-- latch new b~ go to rcturnStalc 
nextState <= retumState~ 
incCnt <= '1'; 
latch FramcData <= '1 ': 
end it; 
when othns '""' · 
end case· 
> • 
t!f1d process; .. 
-- Perfonn 2's complement to one's complement conversion, and invert output 
checksumlnt <= checksumLong( 15 down to 0) + checksum Long( 1 6); · 
checksum <= NOT chccksumlnt; 
Process (clk,rstn) 
begin 
i r rstn = '0' l hen 
chcckState <= stMSB; 
latchMSB <= (others => '0'); 
chcckSumLong <= (others ~> '0'); 
lastNewByte <= '0'; 
elsi r elk' event and elk = '1' then 
-- this is used to check only for positive transitions 
lastNewByte <= ncwByte~ 
• 
case checkState is 
when stMSB => 
i r newHeader = I 1' then 
-- reset calculation 
checkState <= stMSB; 
checkSum Long <= (others => '0')~ 
cls ifncwByte = '1' and lastNewBytc = '0' then 
-- latch MSB of 16 bit data 
checkState <= stLSB; 
latchMSB <= inByte~ 
else 
checkS tate <= stMSB; 
end if; 
when stLSB -> 
I r ncwlleadcr = '1 I then 
-- reset calculation 
.checkState <= stMSB; 
chy~~~l-lfTlf..ong <= (others => '0'); 
... 
. .. 
Un
ive
rsi
ty 
of 
Ma
lay
a
cis if nenPvte = 'I' and lastnewByte = •o; then 
-- add.~th i's complement arithmetic (convert to l's above) 
checkState <= stMSB; 
checkSumLong <= ('0' & checkSumlnt) + ('0' & latchMSB & inBytc); 
d se 
checkState <= stLSB; 
en <..I i I~ 
when others => 
I checkState <= stMSB; 
end cuse; 
end ir; 
•!lld procc<·s· ll l .~ . ' 
t HHcrnct arch· 
- ' 
• 
. , . 0 • • 
: _. . . 
Un
ive
rsi
ty 
of 
Ma
lay
a
Appendices B: Peak.FPGA User Manual 
Un
ive
rsi
ty 
of 
Ma
lay
a
··. 
PeakFPGA is now a part ofnVisage DXP 
The powerful and versatile VHDL-based FPGA design entry, simulation and synthesis 
solution, Peak.FPGA, has now been incorporated into Altium's new multi-dimensi~nal 
front-end electronic design tool , nVisagc DXP. 
nVisage DXP delivers all of the features and functions .of\PeakFPGA plus a whole Jot 
more! With nVisage DXP you can capture your design using multiple, integrated design 
entry methods. n Visage allows you to freely mix: Schematic-based circuit design, 
Schematic-based FPGA design, Text-based VHDL and CUPL code or mixed schematic 
and VHDL-driven FPGA design. 
Running a Sample Project 
This tutorial will introduce you to the features ofPeakFPGA by taking you step-by-step 
through the simulation and synthesis of an sample project included in the product. The 
sample project we've used is a controller for a video frame capture unit that has been 
Hnplcmcntcd in VHDL as a state machine. 
The sfeps within this tutorial are as follows: 
S~p I : Open the sample project 
Step 2:· Prepare the project for simulation 
Step 3: Simulate the project 
Step 4: synthesize to an FPGA 
t.Jse the Quick Jump menus at the head and foot of each page to skip through the topics . 
. . 
Step 1: Open the sample project 
_.... 
Begin by launching the PeakFPGA application· and clicking on the Open Exi~ting 
Project toolbar button as shown below: 
Un
ive
rsi
ty 
of 
Ma
lay
a
• 
Navigate to the examples directory and choose the VIDEO.ACC 1ilc from the Video 
subdirectory as shown below: 
When the project is open, you will see the Hierarchy Browser ~hild window, listing the 
two VHDL so.urce files associated with this project: 
I f'"IW. , £tllt'"· """-•.:..f\l_.._ 1.16:~· .,.,..~-._,~ -.H..I" Uw, 
a aQ g t ~ I9P .• 4if:l ~ tD· '~~ • .. ee ~ 1 ~~-~ 
VUUIIAI:a : ... .... 
,_. htt')rJIU F VTF ~ l "JHC' 
tJ lCf MIU\I .tlt V I 'IU,IItt•l '-'11t• 
r;.:.; 
1'he Hierarchy Browser is the place where you will select files for editing, invoke 
Processes for simulation and synthesis, and otherwise manage your design files. The 
liierarchy Browser also gives you valuable information about the structure of your 
Vl-IDL design, such as the relationship oflower-level VHDL design units (entities, 
architectures, components, etc.). 
To view the complete hierarchy for your design, for example, you can select the Show 
llierarehy button in the Hierarchy Browser toolbar (or click on the small+ icons next to 
each file name) to expand the view and see what each design module (VHDL source file) 
ts composed of: 
Un
iv
rsi
ty 
of 
Ma
lay
a
1 ~· . .-:.··:.o:.H ...... :--:~!-:.; Tnn• f""Tr·-,"f ' ' •r ... ~ 
L-.J ['i!i ... '"',-,.,~,-,-Tt 1n r .... , ... ,,._,.,.,. '' . t·'T,-'"T' .. ¥• tt"'•l 
u tef r .-.~·~nt-Jt:',.,T ,..,. 1 t .... ,., t:'<. r ~ ....,.r•t 
rr:f ~N f t Y ., C. •.Jr-J t ..., 1.JL I• -U•...a f "1V~ "' ... L'l 
..., t ·1UL'i\Jl-L V\..Ut't I HUL-VhL't 
~ Jl'J L, Itt I'.,. • ._.Ul'll r t Ut.. (V\.."VN 'IIIUC.... . '.-"1 fl,.'t J 
I, • fi'O At If 1 II J I r I t •1 II f , I t,._t1 f If,_ _ ,., f V I 'I tt-.f 't f fill V I 
In this simple project there arc two VHDL source files, each of which includes one entity 
and one architecture. Notice that the top-most VHDL module (VTEST.VIID) also 
includes a component entry that makes reference to an entity in the second module, 
VCONTROL. This is how the Hierarchy Browser displays VIIDL hierarchy 
infonnation, and how it determines file dependencies (and order of compilation) when 
Processing your designs. 
To edit a VHDL source file, simply double-click on the desired module name, or on any 
lower-level design unit name (such as an entity or architecture name) in the Hierarchy 
Browser. Double-clicking on a name in the Browser invokes the built-in text editor as 
shown below: 
. ., 
• ~·II•'• 9 f t '""At-••t (V' .. -t .. ·-• •t 
1:11 --~~fV"''C'J~<.V" IIV"tt.~VOOOI 
• ·- "t t S-Jtl:•t•::••' t rL •,. • 'II t 
• .... , •• r .. ·,,.,, ... ,,, • ... •••••-••• ..... ..... , . , 
......... ~ "' .... ,,,. .... '".' ......... <. 
Q;l r •IT•-· ce.,,,.,..,t:'·•· ,;..-c .r.tn-:-.t. .-.. .c·) 
e ~' ... ••r••••._,, ,,,._. ,Mo. , .. r.•t".ro•:tff•••l .. ,, 
~ .. J .... ,_,). -·=-·J'~":""r~t:l.-. '., 
;;t"" .c. L f 
C'"t-w• L.! ...... W '1.o •.J ~·~ .8~ .:: 
·.t.""'• .t.u .-: w:.c..~ .a.O ... , .... ~: 
" ~ .,..._: 1. . ... " .... -=i 1 ¥.,, 1 ~.; 
........... ,. I'' r.,-. l - 'l .. •.,'"1 t .- y,• .. -r .t (7 
I -"'•T', fl .• ~ .. 'i.ft~ i ..... --.-,"\_ '1, ... ;~ 1 r-;. 
.,... •• t • • ,,,,.,.. ... . · • - \ "''·J ' .... , ' •••• r • ' • • ( 
~1'\.M .. ~ I +• t !_1- i.,F t"¥-::f .1•-"-•,.1 $•~ 1 
Note' that in this example we have double-clicked on the entry for ARCHITECTURE 
STIMULUS, and the editor has jumped to the corresponding section of VHDL code. 
Note: PeakFPGA allows you to spectjj' an e:rlernal text editor to be invoked when project 
files are edited. Refer to the Peak/·7'GA iz'elp flies for more informal ron. 
Un
ive
rsi
ty 
of 
Ma
lay
a
Step 2: Prepare tbe project for simulation 
The first step r processing this design is to prepare it for simulation. This involves two 
steps: first compiling each of the source files, and then linking the resulting compiled 
output files together to create a simulation executable . 
• 
PeakFPGA can, if desired, perform th~se steps automatically (using the dependency 
checking features of the Hierarchy Browser) each time you invoke the simulator. For 
illustrative purposes, however, we will compile each file in this sample project 
individually. 
Before. we compile this design for simulatiqn, let's first look at the compiler options 
.,.. •i. 
available. To sec the compile options, select the Options menu or cl ick on the Options 
ICon (the one that looks like a wrench) to open the Options dia log: 
~ 
f• C'CftC ... fV~a. ...... Af"!ft1o'e,..;M .......... .-.r~•Mttn lrV,41C.I .. tUA~ I~ u~A 'Iff I MM~' I( 1f. 1•rtt fl'llt40t.. ~ ,_.~ ..... N ~r,._~,,._~,_...,l-.!.JI.J"yF.,.I'~o~S,.vnle.L• 
l'he Compile tab (which is the default tab) of the Options dialog shows 'the options 
available during compilation. These options are documented in the Help system (directly 
accessible via that question mark icon to the right). The options we have selected for this 
Project are: 
nottom-up to select. This option instructs PeakFPGA to look for any lower-level VHDL 
files (those that the current file depends on) and compile them before compiling the 
Selected fi le. 
Compile only if out of date. This option prevents files from being re-compiled if they 
are already compiled and up-to-date. This can be a big time saver for larger designs 
consisting or many source files . 
... · ... 
\ 
. . 
Un
ive
rsi
ty 
of 
Ma
lay
a
·~.,. ... 
. . 
Compile into library. This op~ecifies that the currently hithlighted module (the 
. . . 
one being compiled) is to be compiled into a library called WORK (As specified by the 
YHDL language standard, this is the default compile library.) 
To compile a VHDL module (in this case the VTEST.VHD m·odute), you first highlight 
the module in the I licrarchy Browser (as shown below) and select the .compile button (or 
Select. Compile from the Simulate menu): 
• ~t11S _lot '" 
Pc~~t : ~tL e td l?g ~~r 
, .. lie. : .an nf'd l ""~l C . 
K·: rt- ~ 111 :1t= 3' 1 -..--.~~.&. ~; 
Duta: Ln std:J. o;.&. ..:_ v ~~t....,., \ "'1 
V~~t'Lc:. .. --.1 : t.n r:t.d lo-a:.c; 
Ad..-tr : ~'••e •ttl_to ;i..a r:_voc~~ r (' 
~AMWE: ~u~ •e~ l o?•~ : ~ 
•J-==-- = ~c- ·- -.........-·- ;:;._,":JJ ~f 
When you start the compile process. a transcript window appears and displays status 
messages, as well as reporting syntax or other errors found in your source file: 
Tip: cltck on a l•ne w•th a numbered error message c.nd click Jump to L•ne or Error 
Summery. 
' 
ilump to Line I · 
There were no errors in this sample source file, so we repeat the process by selecting and 
« 
compiling the VTEST.VHD module 
... 
Un
iv
rsi
ty 
of 
Ma
lay
a
Note: it is not actually necessary to compile every source file individually as we are 
doing in this sample. Instead, we could have simply compiled the top-level module, 
Which in this case is VTEST.VHD. The Hierarchy Browser would have automatically 
COJn1>iJed the lower-:level VCONTRQL.VHD module first if it were found to be out of 
date. 
l'he next step in preparing the project for simulation is to link together the two compiled 
.. 
!nodules and create a simulation executable. A simulation executable is a special kind of 
executable file that can be executed in PeakFPGA's VHDL simulation and debugging 
environment. 
As with compiling, there arc options available (in the Link tab of the Options dialog) that 
control the linking process. For~ample, the_ options s~t are: 
Update object files before linking. This option specifies t~at the Hierarchy Browser will 
Check to make sure all relevant VHDL modules have been successfully compiled before 
starting the link process. When this option is selected, it is not necessary to manually 
compile the VHDL source files (as we did in the previous step) before invoking the Link 
Process. 
Link only if out of date. This option (which is similar to the Compile only if out of date 
option discussed previously) prevents the link process from being run if the simulation 
.. _!..._ 
Un
ive
rsi
ty 
of 
Ma
lay
a
executable is already up to date as indicated by its time stamp. (Jfthc simulation 
executable is new than all VHDL files and object files tllat it depends upon, it will not be 
re-I inked.) · •. 
~nable source-level debugging. This option instructs the code generation sofiwarc (the 
Portion of the linker that actuallY fcnerates Windows executable code) to insef! 
additional information to allow debugging of the design at the source code level. 
Simu.lation configuration. These two text entry fields (w~ich.a:e left blank in this 
example) allow you to re-spccify the default top-level entity and architecture used for 
simulation. This can be a convenience for certain kinds of test benche:;. 
To start the Link process, select the top-level design unit (or the top-level module) and 
select the Link button (or select Link from the Simulate menu) as shown-below: 
~~ I:.N I I 1"1 I _ <..Utl I HUL (\'I lo ~I .VHLIJ 
- fi; .:.IILIII l t..L:IUIII.."_.tiMUI..L':.i ( '.fll..:.il "lll.\ j 
- r:J t. IJMI "(.INLN I L>U I !vl L !; 1 Vlll>J 
ft'll ,.. • • -- -- --.~1 l v( (INTnOt. vJ•O) 
The tr~nscript will again appear and the Link process will execute. Errors (if any) wi ll be 
reported to the transcript. The project is now compiled~ linked and ready for simulation. 
Step 3: Add functionality to the new module 
The VHDL source file created by the New Module Wizard is not a complete VHDL fil e. 
(PeakFPGA cannot read your mind and know what you want this new module to do, no 
matter how descriptive a name you give it.) The next step, then, is to complete the source 
file by adding the needed functionality. 
At this point it is a good idea to scan through the generated VHDL source file and get a 
fcc) for what has been created. If you examine the file, you will find that it has created a 
comment header followed by a few standard library references, which arc commented to 
• 
; . 
... • .. 
. ·---'-'-------~~-----~-----
Un
iv
rsi
ty 
of 
Ma
lay
a
help you understand their purpose. After this header you'll sec the entity declaration with 
the ports that you described using the Wizard. (The entity declaration for this example 
was shown in the previous screen image.) 
1\ftcr the entity declaration you will find a template architecture declaration containing 
sorne sample code that you can modifY for your particular needs. 
Step 4: Compile the VIIDL module 
Once we have entered the VHDL code and modified the template to our liking, we can 
check our work by invoking the simulation compiler. To invoke the compiler, make sure 
the appropriate module (at this point there is only one) is selected in the I rierarchy 
Bro~ser and select the Compiie button from the toolbar. 
A Transcript window appears as shown below: 
T tp c.hck on e hno w•th o numbru~·d error me•sl!lgc and c;hck Jump h."' L1ne or E 1101 
Summ"I.Y 
I Jumptofij I • I 
I 
I 
Notic~ that the compiler has reported an error (an incorrect express.ion width specified at 
line 45 in the file) . When errors such as this appear in the transcript, it generally means 
that w_e ~ave ~ade some mistake in entering the VHDL code. Fortumi~ely it is easy to 
find .such errors. 
Notice that the transcript window includes two buttons in addition to the Close button in 
the lower part of the dialog. The Jump to Line and Error Summary buttons will 
(respectively) open the text editor and take you to the appropriate line in the source file, 
~-----.:. __ ..:_:._ , 0 
Un
ive
rsi
ty 
of 
Ma
lay
a
• 
and provrdc you with more detailed error message infoqnation and (in some cases) 
suggestions on how to resolve the problem. 
Step 5: Create a test bench 
We ~ow have a completed VHDL module, ready for synthesis into an FPGA (or for use 
in a larger hierarchy ofVHDL mmtales). But how do we.know that the VHDL.we wrote 
is funct ionally correct? The answer, of course, is to simulate it. 
Simulation in VHDL, as we saw in the first tutorial, requires t~at you not only describe 
the design (or component of a design) itself, but that you also provide a test bench. A test 
bench is a VHDL source file that describes stimulus to· be applied to t~c design, which for 
this pf.trJ)osc is often referred to as the unit under test or device under test. Test benches 
can be quite simple, applying a sequence of inputs to the unit under test, o_r can be much 
rnore complex, perhaps reading stimulus information from external files and 
automatically comparing simulation results. 
Regardless of their complexity, all test benches share some common traits: they all . 
reference the lower-level design module (the unit under test) as a component, and they all 
include some means of providing stimulus to t_he unit under test. 
In general, you can expect your test benches to be similar in size and complexity to the 
actual design being verified. In fact, for many desih'llS (including our simple shifter) the 
test bench can significantly greater in size due to the overhead of declaring signals, 
creating the component declaration and component instance, etc. PcakFPGA's test bench 
Wizard saves you time by automatically generating much of this overhead code. 
To s~art the Test Bench Wizard, first highlight the module that will be tested (in our case 
SHIFTS. VII D), then click the Create New Module toolbar button as shown: 
Un
ive
rsi
y 
f M
ala
ya
As in the earlier step, the New Module dialog appears with the three Wizard buttons. This 
lirnc, select the Test Bench Wizard button. 
You'll sec right away that this Wizard looks very much like the previous Wizard. The 
Only difference this time is that the fields for the Entity name, Architecture name, and 
lhc Port declarations arc already tilled out for you, based on the port declarations found 
in the already-generated VHDL module. (If the port declarations list is empty, you may 
have neglected to select an existing VHDL module first, or the selected module does not 
have port declarations in a form recognizable to the Wizard. In the latter situation you 
tnay.need to enter the port list yourself, or paste it in from the clipboard after copying it 
from the original lower-level file.) 
• 
Here is the Test Bench Wizard dialog for our design: 
At this point, all we need to do is quickly verify that the port list is intact (has the ports 
that we defined in the original module) and click the Create button. 
Un
ive
rsi
ty 
of 
Ma
lay
a
The Wizard prompts for a VHDL file name, and suppli7s a default name. And as before, 
we can simply accept this name and allow the file to be added to the project. 
, . 
Wh•n the test bench template has been added to the project, we need to click the Rebuild 
Hierarchy button (described earlier) to establish the hierarchy information in the 
llierarchy Orowscr. Once we have done that, we can modify, the test bench template to 
complete the stimulus and add any other verification-related VHDL source file 
statements. - ' r 
Step 6: Sim ulatc the design 
.. ... .. 
Now that we have the test bench V(e are ready to simulate the design. First we compile 
the test bench by highlighting t~Tt"itEST_SHIF.f8.VHD .module and clicking !he 
Compile button. If we are lucky enough to have no VHDL coding errors, our t_ranscript 
looks like this: 
Next we link the project by again highlighting the test bench module 
(TEST_SIIIFT8.VUD) and clicking the Link button: 
niv
ers
ity
 of
 M
ala
ya
Linking the two compiled modules together provides us (behind the scenes) \vtth a file 
named TEST_SIIIFT8.VX. This is the simulation executable, and it is this file that the 
VHDL simulator will load and execute when we select the Load button. 
Before selecting Load, however, let's take a moment to examine and modify the 
simulation options. We open the Options dialog and click on the Simulate tab. ln the 
Simulate Options dialog, we'll set the Vector display format to binary (to more clearly 
sec the shifter behavior) and set the Run to time to 500 ns as shown below: 
• 
Now, after clicking the Close button to dismiss the Options dialog, we can select the 
load selected simulation button (firsrrrraking sure that the test bench is the highlighted 
lllodule) to start the simulator: Un
ive
rsi
ty 
of 
Ma
lay
a
- .... ,.J ~ .... . -
. - \ 
As in the. first tutor{al, the Select Display Objects dialog appears. We will simply use the 
Add Primaries button to make all top-level signals in the design avai lable for display. 
Wc'.ll also spend a moment arranging the objects to place the signals of interest in a more 
useful vertical display order: 
1'r ...... ~ • .,_ ':0""''1>tF7'Wt -A .,.,.c••l"_., ~·U!'t I •••~~ -.:•p . ....,.,..., t".._.....__.~ ••• •t•:.o ••'-•1 1~1 ttar.-
............ ~· ..• *. ·-••t ...... , ...... . .. _, .... ~· ··-J. 1;1.. ................. . 
When .the signals are selected and ordered to our liking, we can click the Close button to 
dismiss the selection dialog. 
Click the Go button to run the simulation. Notice that the simulation results show us that 
. 
the shifter appears to be working properly, wrapping the left-most or right-most bit 
(depending on direction) to the opposite side of the collection of b1ts while shifting them 
accordingly: 
ttrTO.VXJ 
;..dow 
• 
-
Help ~~~~- . 
jGo,J,-,;_,. 
-
' 
-- ~~-• , ....... ~. 
L 
'-'" 
r · 
- uuuuouuu r -
<NIIIIH .. ILII 
~ 
-~ 
~· I fo I «> 
~'flllru 
- -
- ~ oooann . ·-
mn01 I 
PI~Ei 
' ~41 ::., 
SHIITO.VHP3 --l~sT 
...... ., .... .c•u ~ ...... t•t•t.•:-t 
·- -
- - --- --·----
~-- , n1ooocr . ~----=- ~--· 
II 
'"'" ' 
~ 
Un
ive
rsi
ty 
of 
Ma
lay
a
~ ' 
• . "' .. 
So fa~ s~ good! We've created a new module, and w~:J.J~:ficd that .it works as 
expected. Now we can synthesize the module to create an FPGA compatible netlist. 
Step 7: Synthesize the design 
Synthesis is a straightforward process, particularly when dealing with a single VHDL 
source fi le. For this example we' ll select the Altera family of devices. Notice when we 
select Altcra there are fewer synthesis options: 
• o .... ...-.. .... .......,~.,.__..· i ' 1 ~·~ .........,.,..eot""<to .... cw. 
tl 'T..,. .......... ..,... ........ ~-e" 
'• 
The specific set of options available for synthesis is determined by the device family that 
has been selected. For your reference, all of the synthesis options are documented in the 
on-line help. Simply click the llC::fp button in the dialog. 
After choosing a target FPGA family, setting relevant options and closing the Options 
dial~g, we start the .synthesis process by highlighting the SIIIFTS.VHD module and 
selecting 'the Synthesize toolbar button as shown 
below: 
. ; 
Un
ive
rsi
ty 
of 
Ma
lay
a
When synthesis is complete, we can review the generat(id transcript: 
• 
E'rot:.ol J nt<· 
COlll):•.ll.~n·.~l lva·: Al.'t. -=-L-c., ~1a•:::t..: .. :.::-:-ll. i 
analyz~ . ... 
~~~bor~t~ d~Rign ~sH7PT8" 
p.t'or.;ess : Pl. 
Inf6rrcd structu~c 
tl..l.p tl.op: aL-
f'lip f".lop: e:tr 
rl.j p r.l op: at" 
('lip i.l<.•p: ar 
flip flop: o:-
:t l i. p -£- l. o p : a L" 
Data_ out 
- \ Dat& out 
1DAta -out 
Data out. 
OC\t<l __ O\It 
Data c•ut 
- <~ 
When synthesis is complete and you have an FPGA nctlist, you can then move on to 
Place-and-route using the FPGA place-and-route tools provided by your FPGA vendor . 
.. ,. ... 
Un
ive
rsi
ty 
of 
Ma
lay
a
'· ......... 
Appendices C: Main Reference 
Un
ive
rsi
ty 
of 
Ma
lay
a
I ~ 
r. Embedded Internet Solutions from ADESCOM 
IPAC- The Internet Protocol in Hardware 
for System-on-Chip Applications 
~terns today, which require ~€1 e Internet coMectivig, make 
. se or a 32 bit processor c re and 
1~nt the TCPIIP protocol stack in ~ftware. This realization. however, 
;n resufts in strong processor 
Pt ormance requirements and ke~s 
hs~ste~nternet applications 
'&h. 
parameters. A scalable memory 
architecture offers connection data 
buffers with progiammabie thresholds 
for variable interrupt latency and allows 
for simultaneous operation of up to 64k 
connections. Alternatively, an optional 
SRAM controller provides a high speed 
interface to external memory. 
IP coMection set up and management is 
made easy by a set of user registers of 
' 
IP AC's comprehensive control wiit, · 
which fully supports fP management 
protocols and allows for system 
ronfiguration. Its , auto-configuration 
and remote manrgement capability 
enable IPAC-E IOO to provide lP 
connectivity even without any external 
processor interaction, which makes it 
also an ideal Internet access solution 
for existing applications. 
'2 I c......- IPACCon: I~~··~· MctnO!Y _, ..---..'- F<:d rl . ._. 
Ill AC-E tOO 
1PAC-Ei00 is the first member of ~DESCOM's family of embedded 
nternet soft-macros around the fPAC 
core: Its integrated Ethernet MAC 
~rov,des an Mil interface to a 10/100 1~ernet transceiver. To the system side, AC-E!OO offers an Intel/Motorola 
cornpliant processor bus with DMA 
Support and an optional, programmable 
synchronous interface for direct 
COnnectivity to audio/video and data 
streams. 
~PAC-E I 00 performs all protocol 
unctions of TCP/IP and UDPIIP 
Connections-for SUStained bit rates of up 
to I 00 Mbps independent of packet 
Payload sizes and other coMection 
~ 
~ 
~ 
~ 
~ 
~ 
~1 TCP ~ 1- i ,__I Eopc • I~ I c.=t~cr ~ .f: ~I c- 1- r----1 l>kmory II' ~ ~ Mil I.'OP he\ .. f- fill Inc ·~ ~ M .. 
\ ~' \- • • rl, ,._. 
- = 
c-
,, 
Ovocl '~ M-PK\CIIIulfor ....... M.,lf, 
lntctfJcc 1\awll' - SY"CmCoo..,l\\ 
EnP r c_,.,.M..,.. 
IPAC· EIOO 
SRAMC-1..- n 
• t SRAMI-r-
-
I 00 Mb/s throughput for all ~ Scalable buffer memory 
packet sizes ~ SRAM controller for external 
Support of up to 64k memory 
connections ~ 8/16132 bit processor interface 
Complete TCPIIP solution with DMA support 
-
1Pv4 ~ Stand-alone capability 
• TCP, UDP, raw IP - direct media interface 
• ICMP - auto-configuration 
• client DHCP and DNS 
- remote management 
-ARP ~ Fully synthesizable soft-macro 
IP multicast and IGMP - VHDL RTLsource code 
Integrated I 01100 Ethernet • synchronous design 
MAC with Mil interface ~ Also available as pre-configured 
Conn~tion data buffers with FPGA macro for time-to-market 
programmable thresholds solutions 
;x ADESCOM 
.. 
Un
ive
rsi
ty 
of 
Ma
lay
a
• . ....
yros 
~~ 
\V\~ c, ~tt 
~ hardware implementation of a signaling protocol 
Haobo Wang, Malathi Veeraraghavan and Ramesh Karri• 
Polytechnic University, New York 
.· ABSTRACT 
~ -in switches are primarily implemented in software for two important reasons. First, signaling 
Protocols are quite co~plex with many messages, parameters and procedures. Second, signaling protocols are updated 
?ll:en requiring a'cerUin amount of flexibility~for upgrading field impleme~tations. While these are twcf good reasons for 
trnplementing signaling protocols in software, there is an associated performance penalty. Even with state-of-the-art 
Processors, software itr.plementations of signaling protocol ' are rarely capable of handling over 1000 calls/sec. 
~orrespondingly, call setup delays per switch are~~~ order of milliseconds. Towards improving performance we 
trnplemented a signaling protocol in reconfigurabl~hardware. Our implementation demonstrates the feasibility of 
10?x-lOOOx speedup vis-3-vis software implementatiOns on state-of-the-art processors. The impact of this work can be 
quite far-reaching by allowing connection-oriented networks to support a variety of new applications, even those with 
short call holding times. 
l<erwords: Hardware, Signaling protocols, VHDL, FPGA, SONET/SDH, GMPLS 
1. oodbDUCTION 
~ignating protocols are used in connection-oriented networks primarily to set up and release connections. Examples of 
Signaling protocols iq!=lude Signaling System 7 (SS7) in telephony networks1, User Network Interface (UNI) and Private N~twork Network Interface (PNNI) signaling protocols in Asynchronous Transfer Mode (A TM) networks2 3, Label 
Dtstribution Protocol (LDP)4, Constraint-based Routing LDP (CR-LDP)5 and Resource reServation Protocol (RSVP)6 in 
Multi-Protocol Label Switched (MPLS) networks, and the extension of these protocols for Generalized MPLS 
(GMJ>tsf·••, which supports Synchronous Optical Network (SONET), Synchronous Digital Hierarchy (SOH) and Dense 
Wavelength Division Multiplexed (DWDM) networks. 
Si~aling protocols are implemented in the end devices that request the setup and release of connections as well as in the 
SWttches of ~nnection-oriented networks. These switches could be circuit-switched, e.g., telephony switches, 
SONET/SDH switches, DWDM switches, or packet-switched, e.g., MPLS switches, ATM switches, X.25 switches. The 
en~ devices requesting the setup/release of connections could be end bosts, e.g., PCs, workstations, or other network 
swllches with interfaces into a connection-oriented network, e.g., Ethernet switches or IP routers with an A TM interface. 
Signaling protocol implementations in switches are primarily done in software. There are two important reasons for this 
choice. First, sign_aling protocols are quite complex with many messages, paramet~rs and procedures. Second, signaling 
Protocols are updated often requiring a certain amount of flexihi!ity_for upgrading field implementations. While these are 
two good reasons for implementing signaling protocols in software, the price paid is performance. Even with the latest 
Processors, signaling protocol implementations are rarely capable of handling over 1000 calls/sec. Correspondingly, call 
setup delays per switch are in the order of milliseconds12• 
Towards improving performanc we undertook a hardware implementation of a signaling protocol. We used 
teconfigurable hardware, . . , ield Programmable Gate Arrays (FPGAs) 13 14 to solve the injle:ribility problem. These 
devices are a compromise between general-purpose processors used in software implementations at one end of the 
flexibility-performance spectrum, and Application Specific Integrated Circuits (ASICs) at the opposite end of this 
~Pectrum. FPGAs can be reprogrammed with updated versions as signaling protocols evolve while significantly 
•rnproving the call handling capacities relative to software implementation. As for the challenge posed by the complexity 
of signaling protocols, our approach is to only implement the basic and frequently used operations of the signaling 
Protocol in hardware, and relegate the complex and infrequently used operations (for example, processing of optional 
Parameters, error handling, etc.) to software . 
• ~ w@phgton.ooludu: phone (7 18) 260-3384: mvmoolvedu: phone (718) 260-3493, hnp:/llcunme.poly.cdu.'-m,·, r:uncshliiindia ooly.edu: 
Phone (718) 260-3596: fax (718) 260-3740; Poly1echmc University, 6 Metrotech Center, I'Y, USA 11201 
U
ive
sit
y o
f M
ala
ya
., . 
We modeled the signaling protocol in VHDL• and then mapped onto two FPGAs on the WILDFORCETM reconfigurable 
board- a Xilinx* XC4036XLA FPGA with 62% resource utilization and a XC40J3XLA with 8% resource utilization. 
From the timing simulations, we detennined that a call can be processed in 6.6~ assuming a 25Wiz clock (this includes 
the processing time for four signaling messages, Setup··, Setup-Success, Release, and Release-Confirm) yielding a call 
handling capacity of 150,000 calls/sec. Optimizing this implementation will reduce the protocol processing time even 
further. 
The impact of this work is quite far-reaching. By decreasing call processing delays, it becomes conceivable to set up and 
tear down calls more often leading to a finer granularity of resource sharing and hence better utilization. For example, if a 
SONET circuit is set up and held for a long duration, given that data traffic using the SONET circuit is bursty, the circuit 
Utilization can be low. However, if fast call setup/teardown is possible, circuits can be dynamically allocated and held for 
short durations, leading to improved utilization. - ' 
Section 2 presents background material on connection setup and teardown procedures and surveys prior work on this 
!0 Pic. Section 3 describes the signaling protocol we implemented in hardware. Section 4 describes our FPGA 
Implementation while Section 5 summarizes our conclusions. 
2. BACKGROUND AND PRIOR WORK 
In this section, as background material, we provide a briefreview of connection setup and release. We also describe prior 
Work on this topic. 
2
·1 Background 
An end device that needs to communicate with another end device initiates connection setup. When the ingress switch 
(e.g., switch SWI in Figure I} receives such a request, it uses the destination address carried in the Setup message to 
determine the next-hop switch toward which it should route the connection. This task can be accomplished in different 
ways. It could be a simple routing table lookup if the routing table is pre-computed. Routing table pre-computation could ?e done either by a centralized network management station and downloaded to all switches or by a routing process 
Implemented within each switch that processes distributed routing protocol messages and then executes a shortest-path 
algorithm, such as Bellman-Ford or Dijkstra'su. Alternately, the signaling protocol processor could perform an on-the-
fly route computation upon receipt of a Setup message. Typically switches use a combination of pre-computed route 
lookups and on-the-fly computation if no pre-computed route exists to meet the requirements of the connection. 
'. CoMeetion leireuit or 
virtiUil eireuitl utabliehed) 
Figure I : Illustration of connection setup 
After determining the next-hop switch toward which the connection should be routed, each switch performs the following 
four steps: 
I. Check for availability of required resources (link capacity and optionally buffer space) and reserve them. 
2. Assign "labels" for the connection. The exact form of the "label" is dependent on the type of connection-oriented 
network in question. For example, in SONET/SDH switches, the label identifies a time slot, while in A TM 
networks, 'it is a Virtual Pa~ ldentilierNirtual Channel Identifier (VPINCI) pair. 
VHDL stands for VHSIC Hardware Description Language, where VHSIC stands for Very High Speed lntegr.ued Cimsits 
•• Here we use a gcncnc name for the mcssa~e. 1 e. Srtup. DtfTcrent signaling protocols call this message by different names. e.g., Label request 
lllessage in LOP. 
Un
ive
rsi
ty 
f M
ala
y
3. Program the switch fabric to map incoming labels to outgoing labels. This will allow user data bits flowing on 
the connection a!ler it is set up to be forwarded through the switch fabric based on these configurations. We refer 
to this configuration information as a Switch-Mapping table. 
4. Set control parameters for scheduling and other run-time algorithms. For example, in packet switched networks, 
if weighted fair queueing is used in the switch fabric to schedule packets, the computed equivalent capacity and 
buffer space allocated for this connection are used to program the scheduler. Even in circuit-switched networks, 
such as a SONET network, there could be certain parameters. An example is the transparency re~uirement for 
how the SONET switch handles bytes in the overhead portions of the incoming and outgoing signals. 
In a classical connection setup procedure as illustrated in Figure I, the setup progresses from the calling end device 
toward the called end device, and the success indication messages traveJ i~ the reverse direction. In this scenario, the first 
step should be performed in the forward direction so that resources are reserved as the setup proceeds, but the last three 
Steps could be performed as signaling proceeds in the forward direction or in the reverse direction. Other variants of this 
procedure are possible such as reverse direction resource reservation6. 
After connection setup, user-plane data arriving at a switch is forwarded by the switch hardware according to the Switch-
Mapping table. Upon completion of data exchange, the connection is released with a similar end-to-end release procedure. 
Typically release messages are also confirmed. Switches processing the release messages free up bandwidth, optionally 
buffer, and label resources for usage by the next connection. 
To support the above-described connection setup and release procedures, signaling messages with parameters in each 
message, some mandatory and some optional, are defined in a typical signaling protocol. In addition, other messages to 
support notifications, keep-alive exchanges, etc. are also present in signaling protocols. · 
With regards to implementation, we illustrate the internal architecture of a switch (unfolded view) in a connection-
oriented network in Figure 2. The user-plane hardware consists of a switch fabric and line cards that terminate interfaces 
carrying user data. In packet switches, the line cards perform network-layer protocol processing to determine how to 
forward packets. In circuit switches, the line cards are typically multiplexers/demultiplexers. The control-plane unit 
~Onsists of a signaling protocol engine, which could have a hardware accelerator as we are proposing, or be completely 
11llplemented in the software resident on the microprocessor. The routing process handles routing protocol messages and 
manages routing tables. Network Interface Cards (NICs) are shown in the control-plane unit. These cards are used to 
Process the lower layers of the signaling protocols on which the signaling messages are carried. For example, in SS7 
networks, the NICs process the Message Transfer Part (MTP) layers, which are the lower layers of the SS7 protocol stack. 
In optical networks, the expectation is that an out-of-band IP network will be used to carry signaling messages between 
~Witches. In this case, the NICs may be Ethernet cards. It is also possible to cany the signaling messages on the same 
~nterface as the user data. An example occurs in A TM networks where signaling messages are carried on VCI 5 within 
Interfaces that carry user data on other virtual channels. Management-plane processing is omitted from this figure, e.g., 
Management Information Bases (MIBs), agents, etc. Also, all the software processes required for initialization, 
maintenance of the switch, error handling, etc., and various other d~tails are not shown. 
Input sipAiina · 
ln~rfaces • 
Input 
Interfaces 
' 
Switch 
fabric 
Figure 2: Unfolded view of a switch 
Oulpul 
ln~ac:es 
We note that the signaling hardware accelerator unit shown rn Figure 2 is part of our proposal and not typical in current-
day switches. The illustration in Figure 2 shows that the processing of signaling messages is comparable to packet 
Processing in a packet router, where a Setup message comes in on one interface and is "forwarded" on another interface; 
Un
ive
r i
y o
f M
ala
ya
in reality, many actions are performed on the Setup message, which makes the signaling protocol engine more complex 
than a simple router. 
2.2 Prior work 
There are many signaling protocols as listed in Section l. In addition, many other signaling protocols have also been 
Proposed in the li terature16'2s. Some of these protocols such as Fast Reservation Protocol (FRP)2s, fast reserv11tion 
s~hemes18 19, YESSIR16, UNITE20 and PCC21 have bee!l designed to achieve low call setup delays by improving the 
Signaling protocols themselves. FRP is the only signaling protocol that has been implemented in ASIC hardware. Such an 
ASIC implementation is inflexible because upgrading the signaling protocol implementation entails a complete redesign 
of ~he ASIC. More recently, Molinero-Femandez and Mckeown26 are implementing a technique called TCP Switching in 
Wh1ch the TCP SYNchronize segment is used to trigger connection setup and TCP FINish segment is used to trigger 
release. By processing these inside switches, it becomes comparable to a signaling protocol for connection setup/release. 
They are implementing this technique in FPGAs. Q) .. SIGNALING PROTOCOL 
ln this section, we describe the signaling protocol that we implemented in hardware. It is not a complete signaling 
Protocol specification because our assumption is that all aspects of the signaling protocol other than those described 
b~low will be implemented in the software signaling process shown in Figure 2. Therefore, often in this description, we 
Will leave out details' that are handled by the software. 
3.1 Signaling messages 
We defined a set of four signaling messages, Setup, Setup-Success, Release, and Release-Confirm. Figure 3 illustrates the 
detailed fields of these four messages. 
Bitll 161S BitO 
Messogel..c:n~ I TTL Ms'-Typ II . (OOOI) ConncctJon Rcf=nce (prev.) 
Destination IP Address 
Selup McssaQC Sot.n:e lP Address 
Previot£ Node's lP Address 
Band~th I Rl:saved lnu:rface JIUTber I Tireslot Nlm:>er 
Pad Bits Chcckswn 
Messteel..c:niJh J l3aothWitb Ms'-Tl'JI lc Sen.,-SUccess (OOIO) CoMection Reference (prev.) 
Message Comcction Referencc(o\W) I Reserved Chcckswn 
Rclca.sd Messeeel..c:n~ I Cause Ms'-Tl'P )Jc . Rdease-Confml (OOII/OIOO) Conncc1011 Reference (prc:v.) 
Message Conoccrion Refcrence(o"1'1) !Reserved Chcckstm 
Figure 3: Signaling messages 
The Setup message is of variable length while the other three messages are of fixed length. The Message Length field 
specifies the length of the message. The Time-to-Live (JTL) field is used to avoid routing loops. It is initialized by the 
sender to some value and decremented by every switch along the end-to-end path. If the value reaches 0, a TTL expired 
error is recognized, error handling is in the part of the protocol implemented in software. The Message Type field is used 
to distinguish the different messages. The Connection Reference is used to identify a connection locally. The Source IP 
Address and Destination /P Address specify the end hosts of the connection. The Previous Node's IP Address spectfies 
the previous node along the connection. The reason we included this field is that the lower layers of the protocol on 
Which these signaling messages are carried may not indicate the sender of the message, but a switch would need to know 
the downstream switch's identity in order to process the Setup. The Bandwidth field specifies the bandwidth requirement 
of the connection. The lnterfaceltimeslot pairs are used to identify the "labels" assigned to the connection, which are used 
to program the s':'£itch fabric. Since there may be an odd number of interface/timeslot pairs, 16-bit Pad Bits field is used 
to make all messages 32-bit alignecj. The Checksum field covers the whole message. 
In Setup-Success message, the Bandwidth field records the allocated bandwidth. In Release and Release-Confirm 
rncssages, the Cause field explains the reason of release. Some fields are common to all messages, such as Message 
Length, Message Type and Connection Reference. These fields are in the same relative position for all messages. Such an 
arrangement simplifies hardware design. 
Un
ve
rsi
ty 
of 
Ma
lay
a
3.2 State transition diagram for a connection at a switch 
Release 
Recci-.cd 
Setup 
Recched 
..... ~ ...... ~ 
Confinn Received~ 
Setup Success 
Recci-.cd 
- ' 
Figure 4: State transition diagram 
In connection-oriented networks, each connection goes through a certain sequence of states at each switch. The state of 
each connection must be maintained at each switch. In our protocol, we define four states, Setup-Sent, Established, 
~elease-Sent and Closed. Figure 4 shows the state transition diagram of a connection at a switch. Initially, the connection 
IS in the Closed state. When a switch accepts a connection request, it allocates a connection reference to identify the 
c~nnection, reserves the necessary resources including the labels, programs the switch fabric, marks the state associated 
Wtth the connection as Setup-Sent after sending the Setup message to the next switch on the path. When the switch 
receives a Setup-Success message for a particular connection, which means all switches along the path have successfully 
established the connection, then the state of the connection is changed to Established. Release-Sent means the switch has 
received the Release message, freed the allocated resources, and sent the outgoing Release message to the next node. 
When the switch receives the Release-Confirm message, the connection is successfully terminated, and the state of the 
connection returns to Closed. 
3.3 Data tables 
RouUnJ 
cable 
CAC 
cable 
CoM. 
table 
StoiC 
table 
s .. ,tch 
mappina 
table 
Index I R.anvalue 
Dcsunation addreu I Next oode oddrcu I Next node intcrfocd 
Index I Rdum!Wnaco value 
Next node address I T ocal '*"'widdh I A~ableblnclwidlh 
lndu I R.anv.lue 
Nei&bbor address I Neighbor intcrf-• I Ov.'D intcrfauf 
Index Rdum/Wnttm \•aluc 
Own c:onnC(:tion Connection rcfcren« I S k I Bandwidth( Nodeoddren 
reference Prmous I Next I 11 'I Previous I Next 
Index Rctum/Wri~n value 
Own wnnC(:tion Sequential lnwmina Ch. lD I Outaoina 01. ID ofTsct(O to 
reference BW·I) lntcrfaecNI Ti111C$IOCII' llnkrfacd( TimesloiN 
Figure 5: Data tables used by the signaling protocol 
There are fiv~ tables associated with the signaling protocol, namely, Routing table, Connection Admission Control (CAC) 
table, Connectivity table, State table, and Switch-Mapping table, shown in Figure 5. The Routing table is used to 
determine the next-hop switch. The index is the destination address; the fields include the address of the next switch and 
the corresponding output interface. The CAC table maintains the available bandwidth on the interfaces leading to 
~eighboring switches. The Connectivity table is used to map the interface numbers used at neighboring switches to local 
Interface numbers. This information will be used to program the switch fabric. 
The State table maintains the state information associated with each connection. The connection reference is the index 
into the table. The fields include. the connection references and addresses of the previous and next switches, the 
bandwidth allocated for the connection, and most importantly, the state information as defined in Figure 4. 
~Witch fabrics, such as PMC-Sierra's PM5372, Agere's TDCS6440G and Vitcsse's VSC9182, have similar progr.1mming 
Interfaces. For example, VSC9182 has an I I -bit address bus A[ I 0:0) and a I O-bit data bus 0[9:0). The switch is 
Programmed by presenting the output interface/timeslot number on A[IO:O) and the input interface/timeslot number on 
Un
ive
rsi
ty 
of 
Ma
lay
a
0[9:0). We define a generic Switch-Mapping table to emulate this programming interface, with the connection reference 
as the index, the incoming interfacc/timeslot pair and the outgoing interface/timeslot pair as the fields 
3
·4 Discussion 
~5pects of signaling protocols that make it difficult for hardware implementation include the maintaining of state 
Information, the usage of timers, the need to initiate messages from a switch instead of simply forwarding messages (e.g., 
a release message aborting a connection setup if resources are not available}, the Tag-Length-Value (TL V) structure used 
to carry parameters within messages instead of fixed location fields, choices specific to parameters (e.g., values with 
global or local significance), and most importantly, the current drive toward generalizing protocols with goal of making 
them applicable to a large variety of networks. 
Starting with the last reason first, consider the evolution ofLDP. It has evolved from LOP to CR-LDP to CR-LDP with 
extensions for GMPLS networks, such as SONET/SDH and DWDM. This complex protocol is now targeting almost all 
connection-oriented networks both packet-switched and circuit-switched. This drive impacts almost all fields in 
Parameters within messages. For example, the address field identifying the destination address of the connection allows 
for different address families, IP, telephony E.J64, ATM End System Addresses, etc. Next, with regards to choices made 
f?r specific parameters, consider a simple parameter such as a connection identifier or connection reference. Most 
StgnaJing protocols have this parameter. If this is chosen to be globally unique, then connection related data tables need to 
be searched with a much larger key than if this is chosen to be locally significant. Next, the TL V structure was designed 
for flexibility, allowing protocol designers to add parameters in arbitrary order. But this construct makes parameter 
extraction in hardware a complex task. Finally, with regards to state information, signaling protocol engines have to 
maintain the states of a connection as shown in Figure 4. While the type of state information is quite different, the notion 
of maintaining some state information is already in practice in lP packet and A TM cell forwarding engines for policing 
PUrposes. Other aspects that complicate signaling protocols are the support for a variety of procedures, such as third-party 
connection control and multiparty connection control. 
The signaling protocol described in this section is limited to the part implemented in hardware. Thus, the specification of 
error handling, aborting setups for lack of resources, checking timers, handling connections more complex than simple 
two.party connections, etc. have been delegated to the remaining part of the protocol implemented in software. Our 
approach is to define a large enough subset of the protocol that a significant percentage of users' requirements can be 
handled with this subset. Infrequent operations are delegated to the slower software path. Nevertheless, there are many 
aspects of the complex CR-LDP-like protocols that we have omitted here. Examples include TL V processing, handling 
larger parameters (such as global connection references, called "label-switched path identifier" in CR-LDP}, handling 
?'any choices such as the different types of addresses, etc. We are currently implementing CR-LDP for SONET networks 
tn VHDL for an FPGA implementation. This is an NSF-sponsored project 21• At the end of that experiment, we hope to 
answer the question of whether a complex signaling protocol such as CR-LDP can be implemented in this mode of 
handling frequent operations in hardware and infrequent operations in software, or whether simpler lightweight signaling 
Protocols targeted for specific networks need to be defined, as we have done here. 
4. FPGA-BASED IMPLEMENTATION OF SIGNALING PROTOCOL 
Figure 6: An:hitecrure ofWILDFORCETM board 
Un
ive
rsi
y o
f M
al
ya
To demonstrate the feasibility and advantage of hardware signaling, we implemented a signaling hardware accelerator in 
FPGA. We used the WILDFORCETM multi-FPGA reconfigurable computing board shown in Figure 6, which consists of 
five XC4000XLA series Xilinx® FPGAs, one XC4036XLA (CPEO) and four XC4013XLA (PE1-PE4). These five 
FPGAs can be used to implement user logic while the crossbar provides programmable interconnections between the 
FPGAs. In addition, there are three FIFOs on the board, and one Dual Port RAM (DPRAM) attached to CPEO. The board 
is hooked to the host system through PCI bus. The board supports a C language based API through which the host system 
can dynamically configure the FPGAs and access the on-board FIFOs and RAMs. 
~igure 7 illustrates our prototype implementation. We use CPEQ, PEl, FIFOO, FlFOI and DPRAM. The CPEO 
Implements the signaling hardware accelerator state machine, the State and Switch Mapping tables, FIFOO controller, and 
DPRAM controller. The DPRAM implements the Routing, CAC and Connectivity tables. FIFOO and FIFO I work as 
receive and transmit buffers for signaling messages. PEl implements.th~ FIFO! controller and provides the data path 
between CPEO and FIFO I. • 
;:----~State, Sw11ch ....._ __ __, 
!llappin& llbles 
Figure 7: Implementation of signaling protocol on WILDFORCETM board 
In the following subsections, we describe the design consideration about routing table, and the state transition diagram of 
the hardware accelerator. We also present two novel approaches for managing timeslots and connection references. 
4.1 Routing table look up 
In recent years, there has been significant progress in fast table look"Up in both research literature and commercial 
Products28·30• Lookup co-processors are widely available, such as Silicon Access Networks' iAP, SiberCore 
Technologies' Ultra-9M, Netlogic Microsystems' NSE4256, MOSAID Semiconductor's DC9288, etc. These chips can 
easily process up to 100 million lookups/sec30. In our prototype implementation, we assumed that routing table look"Ups 
can be oflloaded to an external co-processor, and used equivalent three memory accesses to emulate a routing table 
lookup. . 
4.2 State transition diagram of the signaling hardware accelerator 
Fi~re 8 shows the detailed state transition diagram of the signaling hardware accelerator. When a signaling message 
arnves, it is temporarily buffered in FIFOO. The signaling hardware accelerator reads the messages from FIFOO and 
delimits the messages according to the Message Length field .. The Checksum field is verified. The State table is consulted 
to check the current state of the connection. Based on the Message Type field, the signaling hardware accelerator 
Processes messages accordingly. The processing of the Setup message involves checking the TTL field, reading the 
Routing table to determine the next switch and corresponding output interface, updating the CAC table, reading the 
~onnectiviry table to determine the input interface, allocating a connection reference to identify the connection, allocating 
tuneslots and programming the Switch-mapping table. The Setup-Success message requires no special processing. The 
Processing of the Release message involves updating the CAC table, and releasing the timeslots reserved for the 
connection. When ~processing the Release-Confirm message, the allocated connection reference is freed and thus, the 
connection is terminated. After processing any message, the State table is updated. The new message is generated and 
buffered in FIFO! temporarily, and then transmitted to the next switch on the path. 
Un
ive
rsi
ty 
of 
Ma
lay
a
~route 
e-bandwidlh .,_.,_ UM\'II ... UI'i: 
Figure 8: State transition diagram of the signaling hardware accelerator 
4.3 Managing the available timeslots 
The management of timeslots and coM~tion references is easy in software through simple array manipulations. 
liowever, this poses a challenge in hardware implementations. Our solution is to use a priority decoder. 
Figure 9 illustrates our implementation of a timeslot manager. Each entry in the timeslot table is a bit-vector, 
corresponding to an output interface with the bit-position determining the timeslot number and the bit-value determining 
ava_ilability of the timeslot ('0' available, ' 1' used). The priority decoder is used to select the first available timeslot. 
When an interface number is provided by the signaling state machine to the timeslot manager, the bit-vector 
corresponding to the interface is sent into the priority decoder and the first available timeslot is returned. Then the bit 
corresponding to the timeslot is marked as used (from 0 to l) and the updated bit-vector is written back to the table. In 
the example shown in Figure 9, the timeslot manager was asked to fmd a free timeslot on interface 3. It returns timeslot 
.14 and marks it as 'used. • De-allocating a timeslot follows a similar pattern but the timeslot number is needed as an input 
In addition to the interface number in order for the timeslot manager to free the timeslot. 
15 14 13 12 ... 3 2 1 0 
, 0 0 0 
Figure 9: Timeslot manager 
4
·4 Managing the connection references 
A connection reference is used to identify a connection locally. It is allocated when establishing a connection and d~ 
allocated when t~oninating it. A straightforward implementation of a connection reference manager is a bit-vector 
~ornbined with a pnority decoder. The priority decoder finds the first available bit-position (a bit marked as '0'), sends its 
•ndex as the connection reference and updates the bit as used (a bit marked as ' 1'). However, this approach is impractical 
When there are a large number of connections. While our actual implementation only used 32 coMections per switch, we 
designed the connection reference manager to handle 212 simultaneous connections, which requires a bit-vector with 
4096 entries. This is too large for the simple priority decoder implementation as used for times lots. 
Un
ive
rsi
ty 
of
Ma
lay
a
Figure 10: CoMection reference manager 
Our improvement to this basic approach is to use a table with 256 entries of 16-bit vectors to record the availability of a 
total of 4096 connection references. Figure I 0 illustrates this approach. With 4096 connections, we need a 12-bit 
connection reference. The first 8 bits of the connection reference correspond to the table pointer, while the remaining 4 
bits correspond to the first available connection reference from among the 16 pointed to by the table pointer. The 
connection reference manager starts with the table pointer set to 0. If any of the 16 connection references corresponding 
to this row of the table are available (i.e., a bit position is 0), the priority decoder will identify this index and write the 
output connection reference as a concatenation of the 8-bit table pointer and the 4-bit index extracted. In the example 
shown in Figure 1 O, the 12th bit in the first row is a 0. Therefore it outputs the connection re~erence number 12. The bit-
position is marked as used as illustrated with steps 5 - 7 of Figure I 0. 
De-allocating follows a similar approach; the bit corresponding to the connection reference is reset to 0 and the updated 
bit-vector is written back to the table. We can parallelize this approach by partitioning the table into several smaller 
tables, each with a pointer and a priority decoder, forming several smaller managers. All these managers work 
concurrently. A round-robin style counter Can be used to choose a connection reference among the managers. Thus, this 
approach can be generalized if more than 4096 connections are to be handled. 
4.S Simulation 
We developed a erototype VHDL model for the signaling hardware accelerator, used Synplify• for synthesizing the 
design and Xilinx Alliance for the placement and routing of the design. CPEO (Xilinx• XC4036XLA FPGA) uses 62% 
of its resources while PE I (XC40 13XLA) uses 8% of its resources. 
·· · - '«• 
- ~ a.c.:""\ $.tup .., •••• troc ••• Sotwp "'"'"" \ 
.\. \ Tt- I 
. 
'"\" ''(' •vm.V\N 
. t--' \ \ \ 
• 1'---• ,, r" !1 • t '• 
'') 1-~"~· " .tt:o • .JI . 
..----. 
.r--' 1-u-- u fU ·u 1,...,_,.-.~  i\ \ 
•• , ... . - ... tl • . ... 
... ~ .~ ,_\.1\.s ...\ \ "''I.Jf)!N J .... ._ 
J j \ l I 
·u\ 
••t#_.r.t·r..,.. . ~ 
...... l"f···· .. ·' 
J J 
Figure II: Tiir..ing simulation for Setup message 
We performed tiniing simulations of the signaling hardware accelerator using ModeiSim• simulator. The simulation 
results are shown in Figure 11-Figt)re 14. From the timing simulation of the Setup message (Figure II), it can be seen 
that while receiving and transmitting a Setup message (requesting a bandwidth ofOC-12 at a cross connect rate ofOC-1) 
consumes 12 clock cycles each, processing of the Setup message consumes 53 clock cycles. Overall, this translates into 
77 clock cycles to receive, process and transmit a Setup message. 
Un
ive
rsi
y o
f M
ala
ya
l••ca..s•·~.:I~#C. 
a·~-::=M•••ac• 1•""-.....r"•••N•uec• s ...... f\W .. .,,., .... ,. 
- Pron.. ~ Tt~ 
-f-' ~r-r-.,_r-~-~ ~ _r-_r- __r-' ~ ~ -
"' 
l •l:],..Sl .!, ..... . ... 
... . . 
.. 
.... 
-
.... ----.r- ---... r--
-
...• 
,___, 
.___r-:1-
Figure 12: Timing simulation for Setup-Success message 
,.. __ 
Trnmo 1 . 
: nt'\1'\.n.. In, u ~L : IV1J' n : + 1'\J"ln. t' : n n 1 11.. , n n rvo I nn..nn.. I n.nJ'lJ'l. .,..U"\J"\ 
... __ 
=rt / \ \. / 
\ 
I 
.. 
-
..... 
._r-. 
___r'l... '""'--.r' ,...._n........ ....__r-"\..... '""\.....J""' ~ 1 
.r--L...r' ~ ~ ~ ....r-1.... ,...,_,--. 
·- \ 
\ t 
I 
Figure 13: Timing simulation for Release message 
R•l ... •• ... c"""'-Mu•-c-•j t ReJtut_coNL.Meu •p rft•t•u•_coatlnaM•u-c• 
1-' 
RtUIVt P'tOCUJ -- fiVl•M -
.,_r-
.....Ar-t-~ \..4.-r--'~~rAr--,.._r- ,..- _.r- r-r- -...r-"' -~ 
\ 
I 
' • •• , l • l~ot···"' •.•• .. 
. . ' -· ~r· .. .... ....___r- -----. 
J•l• ,-.,..,·· · "'*-
• .. .,_r ._,,.w,._ ...... 
I~· t...v-·•·:~ ._r.r . . , 
,____...., ~ f-- I< 
'• 1 1'~ ·~··. "t t " .: ·••' 
J 
Figure 14: Timing simulation for Release-Confirm message 
Un
ive
r i
ty 
of 
Ma
lay
a
Processing Setup-Success (Figure 12), Release (Figure 13) and Release-Confirm (Figure 14) messages consumes about 
7.0 clock cycles total since these messages are much shorter (2 32-bit words versus II 32-bit words for Setup) and require 
Simpler processing. A detailed breakdown of the clock cycles consumed to process each of these signaling messages is 
shown in Table 1. 
Setup Setup Release Release Success Con finn 
Clock 71-101. 9 51 10 
cycles 
Table I: Clock cycles consumed by the various messages. 
Assuming a 25 MHz clock, this translates into 3.1 to 4.0 microseconds for Setup message process~g and about 2.8 
microseconds for the combined processing of Setup-Success, Release and Release-Confirm message. Thus, a complete 
serup and teardown of a connection consumes about 6.6 microseconds. Compare this with the millisecond- based 
SOftware implementations of signaling protocols. We are currently optimizing the design to operate at 100 MHz thereby 
reducing the processing time even further. We are also exploring pipelined processing of signaling messages by 
selectively duplicating the data path to further improve the throughput. 
5. CONCLUSIONS 
Implementation of signaling protocols in hardware poses a considerably larger number of problems than implementing 
USer-plane protocols such as IP, ATM, etc. Our implementation has demonstrated the hardware handling of functions 
SUch as parsing out various fields of messages, maintaining state infonnation, writing resource availability tables and 
SWitch mapping tables, etc., all of which are operations not encountered when processing IP headers or A TM headers. 
We also demonstrated the significant perfonnance gains of hardware implementation of signaling protocols, i.e., call 
handling within a few ~s. Overall, this prototype implementation of a signaling protocol in FPGA hardware has 
demonstrated the potential for IOOx-lOOOx speedup vis-8:-vis software implementations on state-of-the-art processors. 
OUr current work is implementing CR-LDP for SONET networks in hardware. 
ACKNOWLEDGMENTS 
This work is sponsored by a NSF grant, 0087487, and by NYSTAR (The New York Agency of Science, Technology and 
Academic Research) through the Center for Advanced Technology in Telecommunications (CA TT) at Polytechnic 
Universi~. 
We thank Reinette Grobler for helping specify the signaling protocol, and Brian Douglas and Shao Hui for the initial 
Prototype VHDL model of the protocol. 
REFERENCES 
I. Travis Russell, Signaling System #7, t'd edition, McGraw-Hill, New York, 1998. 
2. The A TM Forum Technical Committee, "User Network Interface Specification v3.1," af-uni-00 10.002, Sept. 1994. 
3. The ATM Forum Technical Committee, "Private Network-Network Specification Interface vI. OF (PNNl 1.0)," af-
pnni-0055.000, March 1996. 
4. L. Andersson, P. Doolan, N. Feldman, A. Fredette, B. Thomas, "LOP Specification," IETF RFC 3036, Jan. 2001. 
5. B. Jamoussi (editor), et al., "Constraint-Based LSP Setup using LOP," IETF RFC 3212, Jan. 2002. 
6. R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, "Resource ReSerVation Protocol (RSVP) Version 
Functional Specification," IETF RFC 2205, Sept. 1997. 
7. E. Mannie (editor), "GMPLS Architecture," IETF Internet Draft, draft-many-gmpls-architecture-OO.txt, ~larch 2001. 
8. E. Mannie (editor) et at., "GMPLS Extensions for SONET and SOH Control, IETF Internet Draft, draft-ietf-ccamp-
gmpls-sonet-sdh-0 !.txt, June 200 I. 
B;ued on a wor$t·casc se3rch through a four option routing t3ble 
Un
iv
rs
ty 
of 
Ma
l y
a
Hardware/Software-Architecture and High Level Design 
Approach for Protocol Processing Acceleration 
Mirl<o Benz, Georg H. Overbeck 
Department of Computer Science 
Dresden University of Technology 
D-0 1 062 Dresden, Germany 
{benz, overbeck}@ibdr.inf.tu-dresden.de 
Abstract 
Klaus Feske, Jens Grusa 
. FhG US Erlangen 
·Department EAS Dresden, Zeuneritr. 38 
D-01069 Dresden, Gennany · 
{feske, grusa}@eas.iis.fhg.de 
Developing h~dware support for. transport layer protocol . processing is a very ~omp1ex and 
demanding task. However, for. optima~ .perfOnn!lJlce hardware acceleration can be required. To 
cope with this . situation we pres~nt a high lev~l qesign approac~ whicll targets the 
. development of configurable and reusable components. Therefore '?'e outHne the integr~tion 
of ad.vanced tools for the developmet:lt of controll~ systems into our design envirqnment.·This 
process is illustrated based on a TCP/IP header ~alysis and validation compon~nt for which 
initial performance results are.presented. The development of these specialised ~ompone~ts is 
embedded in an approach to develpp flexible and configurabJe protocol engines that Gao. be 
optimised for specific applications. 
1 Introduction and Related WorJ< 
. ... 
Today's communication environments are mainly influenced by the tremendous success of the 
Internet. As a result the Intemet·Protocol (IP) and standard layers above- especially TCP [18, 
19] .... are now the common denominator. This means that although these protocols have a 
number of lirnjtations concerning functionality, flexibility and perfonnance other protocol 
approaches like XTP [17] have failed to gain broad acceptance. This is also partly true for 
other superior technologies like A T.M which compete with 1P. Hence it is important to 
transfer the alternatives and ideas- developed in various research proj~ts to improve 
implement~ticns of these standard protocols. ·: 
On the oth~!" hand, the Internet uas encouraged huge investments in fibre optical networks and 
technologies to exploit them more efficiently like Wave Division Multiplex · (WDM). 
Furthennore, new technologies like xDSL and cable modems will also provide high speed· •. 
communication in the access networks. Altogether this will· contribute to an emerging global 
high speed networking infrastructure based on 1P. 
IJ1 contrast to using .the same base protocol everywhere, communication devices are extremely 
diversified. This includes standard workstation and server class computers as weJl as laptops 
up to Wireless Application. Protocol (W AP) mobile phones. Therefore, architectures for 
protocol processing acceleration have to be adaptive to various networlc interfaces, their 
properties as well as processor architectures or optimisatioh goals concerning performance as 
well as memory, CPU and power limitations. Furthermore, the ongoing development within 
and the commercialisation of the Internet have produced quite a number of applications, 
protocols and service proposals and standards for higher layers or extensions to IP. Examples 
are IP security for virtual private networks (VPN), voice over 1P, video conferencing or the 
WWW. Especially the real-time requirements of multimedia data transmission stimulated the 
Un
iv
rsi
ty 
of 
Ma
ya
.deyelopment of resource reservation, protocols, priority mechanisms, accounting or the 
differentiated services approacl'\. . . 
The ongoing researeh and deployment of WDM technologies and the direct transmission of 1P 
datagrams over specific wave lengths, will contribute to very high bandwidth capacities at low 
error rates. These optical networks will again shift the protocol processing overhead to the 
access routers and into the end systems. On the other hand they will probably provide no or 
only limited quality of service (QoS) features. Combined with data touching intensive or real 
time requirements of specific services this adds to processing power that is required within 
endsystems. Hence, flexible architectures for protocol processing acceleration are necessary to 
cope with these conditions. 
There have been quite a number of approaches for" hlJrdware support of communication 
protocols in the early nineties. However they were only successful for lower layer protocols 
(e.g. MAC sublayer) (1, 13, 9]. The most significmt aspect is probably the complexity of 
standard communication stacks like TCPIIP. This makes it impossible to perform all the 
processing in custom hardware because the design, implementation, testing, validation and 
maintenance ·effort would be too high. This would also lead to extreme costs and limit the 
ability to adopt standard modifications and improvements. Another obstacle is the lack of a 
formal specification. Hence, many research projects bad concentrated on the hardware support 
of specialised light weight protocols [8, 15]. Although this was successfully demonstrated, 
'hardware support for complex transport level protocols is still an open issue. 
On the other band, hardware support for communication protocols is again; a very active topic. 
Especially so-called network processors are very popular. They include custom hardware for 
standard protocol specific computations as well as multiple programmable RISC cores, which 
makes them more flexible than custom ASIC designs that are used in routers today. This 
approach provides benefits concerning time to market and allows to continue development 
after purchase and to update the devices as required. This development is supported by 
improved tool support for simulation, verification and synthesis, because of the expansion of 
the component idea onto the hardware design in the form of intellectual property cores and 
subsequent reuse as wen as higher abstraction levels like hardware compilation approaches. 
An example is the integration of high level design tools [20] in the process of developing 
:hardware components. This is shown in (16, 7] and can improve productivity. Furthermore 
ASIC and especially FPGA technologies for hardware prototyping were drastically improved. 
Combined, these achievements facilitate the design and implementation of complex and 
heterogeneous hardware/software architectur~s for protocol prr::essir.g acceleration and 
system-on-a-chip solutions. One such example is the hardware support for A TM, which often 
integrates transport layer functionality like segmentation and reassembling, congestion control 
(ABR) or traffic shaping in hardware. 
In the following chapter we outline our protocol engine approach. First we state design goals 
and present our general approach. Then we outline the aspired architecture and discuss 
possible configurations and describe the involved components. In chapter 3 we illustrate a 
possible TCPIIP partitioning, explain the required protocol processing and outline a 
synchronisation between the software and hardware parts. Then, we describe our validation 
architecture for a hardware implementation of TCPIIP core functionality and present initial 
evaluations and performance data. 
2 
Un
ive
rsi
ty 
of 
Ma
la
a
2 Protocol Engine Project 
The protpcol engine project is a joint effort of multiple research groups with a computer 
science ana electrical engineering background. In this context communication protocols are 
analysed, evaluated and optimised. S~ific protocols tailored for A TM networks and 
multimedia applications have been developed. For existing standard protocols hardware 
support is evaluated and designed. The focus of thiS paper is' to describe the basic approach 
and architecture of the protocol engine with hardware support for TCP/IP in mind as well as a 
prototype. implementation. The · overall goal however.- is to see networking in its entirety -
ranging from applications, over protocols to the actual hardware. This way, by not looking at 
one specific layer alone, system performance can be improved. 
2.1 Assumptions, Approach and Design Goals 
Today's transport protocols like TCP are far too complex to be completely implemented in 
'custom hardware. Furthennore this approach would limit the flexibility and maintainability of 
the solution. On the other band, only a very small part of the protocol has real-time processing 
requirements. The rest consists of lower pribrity tasks like exception handling, buffer 
registration or connection management. This means, that only a relatively small part of the 
protocol bas to be accelerated. 
Within modern local high speed networks the error probability is very low. Hence exception 
handling due to corrupted data, loss or duplicates can be considered as a rare condition. As a 
consequence, non real-time processing like connection setup or exception handling in case of 
errors can still be performed by a modified software stack because relatively expensive 
synchronisation is tolerable. This way, investments in high perfonnance TCPIIP 
implementations can be reused. Another advantage is that a step wise optimisation - based on 
existing implemen~tions - and tailored to specific requirements is possible. 
Advances in CPU development have greatly improved the processing power that is available 
in endsystems. However, due to added services which demand very high processing capacity 
or exhibit very tight real-time requirements hardware 'support could still make sense. Another 
issue i~ 1h~t the ope~ating system becomes a bottleneck for simple protocol proce:;sing tasks 
~ .... 'caust- of context changes, synchroni~ion or_. the r..omm•.::-~.:.ation overhead ;1\herent in 
layered architectUres relative to the protocol processing itself. Hence. the data path between 
the application and the network is very impo~t too. From these requirements and 
constraints we derive the following design goals: 
• Flexible, adaptive architcx:ture to support different protocols based on IP. Allow easy 
extensions to support new features and services. 
• Scalable performance according to specific requirements like cost, power consumption 
and network conditions. 
• Development of compof?-ents. for specific protocol functions. Design of common 
interfaces to allow reuse, hardware emulation in early design phases and flexibility 
concerning the implementation architecture. 
• Allow integration of specific hardware to benefit from existing solutions or for very 
high performance requirements. Furthermore enable easy integration of additional 
processing resources like DSPs or micro controllers. 
• Take advantage of existing software based protocol implementations and allow 
stepwise hardware support. 
3 
Un
ive
rsi
ty 
of 
Ma
lay
• consider protocol processing as a whole. This means that the network interface, the 
protocol processing and the communication with the application have to be regarded 
and optimised in its entirety. 
• On wire co~patibility to otlJ.er (software based) implementations. 
• Enable existing applications to take advantage of the protocol processing acceleration 
in a transparent manner. 
2.2 Architecture and Configuration Opportunities 
Due to the number of requirements and envirollll)ental.conditions a protocol engine has to be 
designed specifically to address these challenges. As ·a consequence there will be many 
configurations. The general idea however is to add hardware support as required, to allow a 
smooth upgrade path and enable scalability. Figure 1 presents the overall architecture of the 
protocol engine with all possible components. · 
Fig. 1: Qlneral Protocol Engine Architecture 
Dependilg on the specific application requirements there can be various configurations of this 
protocol engine. One opportunity consistS in using currently emerging network processor 
designs which contain a standard embedded RISC processor combined with multiple 
progran111able micro engines and very few network specific hardware like the LEVEL ONE 
IXP 1200 architecture [ 11 ]. These architectures are especially suited for technology 
demonstlltions of emerging standards like the integrated services approach. Because they are 
fuJJy prtCf31DIDable, specification modifications can be quickly adopted. Furthermore they 
offer reliively high performance. 
Anothet«irection would be to leave the majority of the protocol processing tasks on the host 
and supJ'rt only the standard data path with specific hardware. The high bandwidth of today's 
networks combined with extremely low error rates makes this approach feasible. As a 
consequllce only in the rare case of exceptions due to error or connection management 
synchroi.sation handling would be required. Supported by a high level design tool a scheme 
for the .le~eiopment of such protocoi accelerators was outlined in i 4] for TCPIIP packet 
validati• within the receive path. 
4 
Un
ve
r i
ty 
of 
Ma
lay
a
For greater. flexibility we plan the integration of a specific ·I/O processor [5]. Th.is p~~essor 
will possess a customisable instruction set. Hence it can be optimised for stream parsing and 
bit operations for example. Th.is can lead to·a reduction of oode size and required processing 
cycles which is a general goaL fo.t power consumption sensitive appllcati(>ns. · 
On the other band we plan to use the 110 processor to control the operation of the protocol 
engine. This means that it can be programmed to control the processing flow. As such it must 
communicate with the other entities of the protocol engine. Hence it must have knowledge of 
the number and capabilities of the integrated components·and ·optimise their .utilisation. We 
plan to develop a message abstraction layer for the exchange of such processing requests 
(r ASK). This enables the 110 processor to communicate with other processing elements 
without having to know their iinptementation architecture in an asynchronous · fashion. 
Therefore, this adaptation cQntains a comm911 me~ge buffer ~d a bus access part as well as 
~ specific interface to the component. This way, specific hardware could be emulated during 
early stages of the development process. This could be a very high performance DSP or 
. protocol specific hardware. On the other band this abstraction opens the ~ay to scalability 
sine~ multiple compon_ents of the same type eould be integrated as well. In this configuration 
the J/0 processor would be responsible to synchronise the processing. results of the entities 
and communicate with a standard embedded.RISC processor... . . 
Due to the complexity of ~port layer protocols .it is usually not beneficial to perform. the 
entire protocol processing in specialised components. Hence we intend to use a genen\1 
P,rocessor whic~ runs a m~ed software TCPIIP stack. Here npn real-time processing tasks 
like connection management or exception handling are performed. Fw1hermore, it is 
responsible for efficient communication . with the application on the host system. To avoid 
operating system overhead we cons~der to use a modified implementation of the virtual 
interface architecture (VIA) for efficient . communication [21]. FUrther information. for 
exatnple bow applications could transparently - without modifications - benefit from this 
acceleration can be found in [2, 3]. · · ' 
3· TCP/IP Partitioning and Fast Path Processing 
Judged by the lines of code TCP is a relatively complex protocol [12]. However, assuming 
bulk data transfer within local area networks only a fraction of the code is actUally required to 
proceS:. most packets. This is called the fast path. This state is reached after the connection is 
established and holds on as long as normal data packets with no control flags in the header are 
transmitted. A further requirement is the absence of error conditions due to _loss, congestion or 
data corruption. Most of these conditions are relatively rare within today's local high speed 
networks. To achieve good performance it is therefore necessary to optimise tbe fast path. As 
a consequence it is not necessary or beneficial to implement complex protocols completely in 
hardware but to only support the common path with specific accelerators. Hence the majority 
of the protocol will still reside in software. 
Within the fast path the sending instance checks whether available data can be transmitted. 
This is for example the case if the ttansfer unit of the network can be fully utilised. Latge 
messages can be fragmented or small ones are aggregated. Thus, message boundaries are not 
preserved. The header fields are set and a checksum is computed then 1P is invoked for the 
actual transmission. At the receiver, fast path processing consists of header analysis, context 
lookup, checksum calculation, packet validation, packet reassembling and band over to the 
higher layers. Important optimlsatiortS include header prediction., context caching and 
integration of checkswn computation in buffer copy operations [ 10, 14]. As a consequence of 
5 
Un
ive
sit
y o
f M
ala
ya
these optimisations it normally takes very few instructions to process a packet Assuming 
· faster and faster processors while the protocol processing remains essentially the same the 
ratio between actually required processing and overheads is getting worse. Since protocol 
processing is not the bottleneck (if not additional services like IP security are used) user mode • 
implementations present an alternative. However, without intelligent hardware support they 
can not fully exploit today's networks. 
3.1 Fast Path Protocol Processing 
Figure 2 illustrates the tasks that are involved in the ,fast path processing for bulk data 
transmission. The sender accepts the application's data and transmits them if certain criteria 
are met. The receiving protocol instance takes and vaUdates the data and eventually hands 
them over to the application. Both protocol instances are coupled by a window based flow 
control that ensures that enough buffer space is available at the receiver. Usually for every 
other received protocol data unit the receiver generates an acknowledgement. Based on this 
inlormation the sender can release transmitted data that was saved for eventual retransmission. 
Furthermore, the transmission window is enlarged enabling the transmission of new data. 
The fast path consists of three major components: TcpSend, TcpRecv and Send.Ack. These 
tasks may run concurrently but since they access shared context data, synchronisation bas to 
be applied. Furthermore, they signal each other required processing like the receiver 
indicating necessity of sending an acknowledgement. Depending on the communication 
behaviour only some tasks are active on each instance. The context data include information 
describing the connection and variables for flow and congestion control as well as for error 
detection. These data sets are kept separate for every connection. Hence connections can be 
processed concurrently. Statistics data however, are gathered for every connection and thus 
have to be periodically synchronised with the software stack. 
Fig. 2: TCP Fast Path Processing Flow for Bulk Data Transfer 
6 
.. ·
Un
ive
rsi
ty 
of 
M
lay
3.2 Software Stack Synchronisation. 
According to the proposed partitioning the ··complex tasks of connection .management and 
error: handling are still j>erformed in -software. Thus, we can benefit from existing well 
perfonning and stable implementations .. Therefore, when an application initiates a -connection 
establishment ~e software stack is invoked. On the receiving side the protocol data unit can 
not be inapped.onto a known fas{path entry, therefore it is passee tQ;AM wftware stack.lfthe 
user decides to accept the communication the context information for this connection is 
. transferred to the fast path processing unit and marked a.ctive. The same happens on the client 
· side. The number' of connections that should. be accelerated may be limited to participants of 
''the sam~ or corresponding networks because normally only · he~e very high performance is 
reqriired and ihe above mentioned conditions are met. The data units that are exchanged 
afterwards are entirely processed by this unit and transferred directly to the application. This 
happens without involdng the operating system or the TCPIIP software stack. This means that 
no interrupt's have to be dealt with, no operating system contexts or processor modes have to 
be changed and cacties could remain intact. . 
When a connection is .idle for some time, errors occur or the user terminates it. the context 
. information· and remaining user data are transferred to the software stack. Then the 
communication is treated as it would have been without an accelerator. In case an error 
condition was suecessfully managed, the fast path unit could be reinitialised for this 
connection. 
4 · · A·TCP/IP Receive Side Processing Component 
Harpware support and parallelism to improve protocol processing perfotmance were not 
considered within the specification of TCPIIP. As a consequence, there are a lot of 
dependencies 'between protocol functions, shared access to the connection state data (the 
t:J;ansmission control block) and a high communication and synchronisation effort between 
protQcol functions. To ·maJ<:e matters worse, this . mainly depends on the actual transported 
data. Therefore, a functional decompos~tion of the protocol is a difficult task. Hence, a prior 
analysis and sirimlatirin oftbe'entire system i~ necessary to achieve good results. 
In .~ section we describe bow hardware support can be integrated within the ~ive path 
processing of TCPIIP. Again, we assume low ·error rates and high network bandwidth. 
Therefore, the requi.r'ed processing is mainly to assure error free data reception and hand over 
to the user process. However, this has to be done very efficiently to cope with gigabit data 
rates. 
4.1 Simulation Model and Test Environment 
,• 
Figure 3 repres,ents the sunulatio'n test benc,h currently in use. The major goals 'were 
prototyping of the f~t path unit. exploration of different configurations, bottleneck 
investigation and derivation of initial performance data. For this to be accomplished, the 
int~ace to the network (e.g. ·MAc in .the case of Ethernet) is currently emulated. 
Furthermore, only one connection is actively processed. 
All models except the interfaces to the appliqtion and to the remote TCPIIP instance are 
written in VHDL. The SRam model allows to dynamically update its contents via a file 
7 
Un
ive
rsi
ty 
of 
Ma
lay
a
r-... )= I nit File joj_ 
Sram 
Model 
... ~ 
0 < 
.... 7 
Read ~ 
SRAM ~ 
0 
.... · ~ 
32 bit 
Input 
FIFO 
ol 
... l 
Application I .-
--. 
~ Store Remote TCP/IP ' H File 
joj 
Send/Recv Queue Sram 
& MAC Emulation Model 1 ~.. ~ < 0 
~ 
Input Output Y Write 
Control Control~ SRAM 
... ~ 
0 
32 bit 
Output 
FIFO 
Context and • 
Statistics AAM ... 
or 1<1 ~at l 
TCP Fast Path Unit'"" . , . ~ .. ~ ;;-
I• 
interface. This is for example 
used by the application 
interface when a packet 
should be transmitted. First 
the user data is transferred to 
the init file. Then the 
contents of the SRam is 
updated.. The position and 
length of this data is stored in 
a transmission queue that is 
available for each 
connection. Via this queue a 
synchronisation between the 
software application and the 
hardware unit is performed. 
The input control generates a 
header and transfers the 
described buffer to the Input 
FIFO. The header contains a 
packet type field. This could 
be application (send/recv), 
network ( send/recv) as well 
as synchronisation and 
statistics. Furthermore, it 
contains the length and 
packet specific data - for 
example the connection to 
Fig. 3: Simulation Model which this packet belongs or 
the network interface that 
received the packet. After the packet is transferred the queue entry is cleared and can be 
reused by the application. The fast path unit is invoked if a configurable threshold for the 
Input FIFO filling is reached. The processing of this unit is described in the next section. Here 
a local SRam can be utilised to store connection and statistic information. The data extraction 
from the Output FIFO and the transfer to the application .or network emulation works in a 
similar manner. The models for FIFO elements and the local Ram target Block Select Ram of 
Xilinx Virtex FPGA devices. 
4.2 TCP Fast Path Unit Receive Processing Flow 
Figure 4 represents the general processing flow within the TCP fast path unit. After leaving 
the reset state the unit communicates with the input FIFO and waits for available data. It then 
checks whether the received packet contains synchronisation information. ff this is the case, 
the connection context data is initialised. Since currently only the receive path is implemented 
a filter for those packets is inserted within the processing flow. Other packets are directly 
transferred to the output FIFO. The next step consists of extracting the connection description 
(IP addresses and TCP ports) and a comparison with the currently loaded context information. 
If the packet does not belong to the accelerated connection it is simply forwarded. Otherwise 
it is analysed as explained in section 3.1. Within the data validation step the TCP checksum is 
8 
Un
ive
rsi
y o
f M
ala
ya
computed for: the entire packet. If this succeeds the data is ·transferred to. the -corresponding 
application. Furthermore an acknowledgement packet for the received data is generated. 
.. . 
Fig.~: TCP Fast Path Unit Processing Flow. . . 
4.3 Design Flow, Implementation and First Results 
.II unspee~fied frame sod on (default. user) 
termlnaf frame 
reference frame 
epsilon op&rator 
qualiier operator 
•1-frame operator 
repeat opera tor 
sequential operator 
altematwe operator 
Conventionally, the controller design of structured data 
stream processing is not well sup~rted by EDA tools. 
So· we are facing a bottleneck· in the design process 
especially for protocol processing hardware 
components. To fill the gap, we incorporated modelling 
and synthesis facilities of the Protocol Compiler from 
Synops}rs [20] into a proved FPGA based rapid 
prototyping (RPT) design flow [6] and utilised it for 
dt-.signing fast path protOcol processing components .. 
OUt RPT design flow starts at RT-Level with a VHDL 
description of the design specification. As a new Fig. 5: Types offrames and 
element in the sequence of this flow, the Protocol frame operato~ 
Compiler is set on top of the whole process, by means of which the high lev~l specification is 
graphically composed. Furthermore the Protocol Compiler provides the following features: 
formal protocol analysis, back annotation simulation, controller logic partitioning and 
synthesis, and VHDL code generation. Being similar to the Baclrus-Naur-notation the Frame 
Modelling Language FML [20] and the graphical symbolic format {Figure 5) closely match 
requirements of high-level protocol specifications like 
• Recognition of header patterns and synchronisation, 
• Parsing and reassembling of structured data streams, 
• Interface issues between data' stream processing modules (such as synchronisation or 
stall). 
9 
Un
iv
rsi
ty 
of 
Ma
lay
a
; . 
. As an example Figure 6 shows the Protocol Compiler description of a simplified 32 bit lP 
header analysi.) implementation. A 32 bit wide register p_data_in is used to processes the data 
sequentially. First the checkslJ!!l is computed. Here, within each cycle portions of 16 bit are 
added. Then the length of the lP header is extracted. Next we check whether the received data 
is a IPv4 fragment. - ...... 
(v_c:heclcsum • p_da~lr!{15:0) + p_da~ln[31:16)) 
(v_lphcounwr • p_data...ln£7:4) - 1 ) 
[!] (lf(p_data_ln(3:0l 1- ·oH4·. 
l set{p_control_outVD> 
(p_pseudoh- p_data...ln[31:16]) 
IF_ THEN_ElSE(p_data_ln[17:1 6] 1• •ooi 
[!] (p_rest • p_data_ln[17:1Sl) 
(v_ lplength[15:0l•roo·. p_data..ln[31:1BD) 
f--·--- -- - - --[!] (v_lplongth[15:0] • {"00·, p_data_ln(31:1BD- 1) 
ver1IHLI ros : :.jro'aai. Wi£ttil 
IP IdEntification Rags I R'ag. afsel 
' l R"otoool m Heooer Checksum 
Source IP Addre;s 
Destin~ion IPAddress 
Fig. 6: Protocol Compiler lP Header Analysis Specification (excerpt, 32 bit) 
' 
Within the next step the pseudo header is initialised which is required for the checksum 
computations of higher layers. The next step calculates the number of 32 bit words 
(v_iplength) that have to be processed and determines eventually remaining bytes (p_rest). All 
these actions are performed within one clock cycle. Than the rest is read in units of 32 bytes. 
Within these steps header fields like the source· IP address are extracted. Furthermore option 
fields are taken into account. After the header fields are extracted we perform validation 
operations as outlined in the previous section to verify the received fragment 
One part of the top-level is shown in figure 7. It illustrates three alternatives of processing 
after receiving valid data. The first four bits of the incoming packet decide which of the 
alternatives will run. If DIN is equal to ,5" (decimal) the following data is used to initialise a 
new connection. The packet is transferred directly to the output-FIFO if DIN is ,3", because 
'Only the receive path is implemented at this time (see part 4.2). A normal analysis of a packet 
begins only if DIN is equal to , l ". After starting the packet analysis by checking the Ethernet 
MAC .address referenced to ,etheme!_input" the fast path unit parses the IP-Header and the 
TCP-Header. The last step is the generation of an acknowledgment if the encapsulated data in 
the packet was valid. 
(DIN 3:0] -- vooo 1 l (DIN 3:01 = ·oo1 ,. l (DIN 3:0)- "0101.1 
luser_sendj lsync_lnputl 
1 lnetwork_lnputl pipelined=false 
~ control_sty le~mln_area I 
Ire PI 
jAcKj II I 
Fig. 7: Fast Path Unit Top Level Protocol Compiler Description 
10 
Un
ive
rsi
y o
f M
lay
a
The Protocol Compiler offers the possibility to synthesize the design applying certain 
optimisation criteria. For example in figure 5 the attribute ,ControLStyle" is set to ,Min 
Area". That means the partition .,user_send" will be optimised for minimum area during high 
level synthesis. Table 1 shows results of the synthesis for a Xilinx Virtex FPGA XV300 
device and the differences between the ContrQl_Style ,Min Area" and .,Min Delay". 
T bl 1 S th . RJ ult a e : ;y111 eSlS es s 
Option Minimum Area Minimum Delay 
Property 
Maximum FreQuency 191 MHz 22~MHz 
Slices 967 982 
External I/0 {lOBs) 71 71 
5 Status and Future Work 
We have presented a concept for the design and implementation of protocol processing 
accelerators. This was demonstrated with a packet classification and validation unit for 
hardware supported TCPIIP receive path processing. First steps in modelling and synthesis of 
the hardware partition where made utilising a usual FPGA rapid prototyping design flow, 
which we extended by a -graphical high level design entry and by reusable protocol 
components. This approach supports an application-Qriented modelling style in order to 
enhance design efficiency and quality for structured data stream processing controllers. 
Additionally, this leads to a quick design exploration, easy changeability, and design cycle 
reduction. A further controller design improvement can be expected by utilising reuse 
methodologies. Consequently, our future work aims at extending our rapid prototyping design 
flow by inserting a library of reusable protocol templates and components. 
The design of accelerators based on these components is relatively difficult because a nwnber 
of conditions have an influence on the achievable performance. Due to the inherent 
complexity of transport layer protocols it is furthermore advantageous to perform the non real-
time processing on standard programmable architectures. Furthermore we will evaluate the 
integration of signal processors as well. As a consequence there are a lot of design alternatives 
combined with protocol engine configurations and requirements to take into account Hence 
an automated and integrated design approach to determine which architecture is best suited for 
a specific protocol processing task would be beneficial. Further issues would be performance 
prediction as well as simulation and validation of the entire communication system. 
6 References 
[lJ Balraj, T.S.; Yemini, Y.: .,Putting the Transport Layer on VLSI - the PROMPT 
Protocol Chip", in: Pehrson, B.; Gunningberg, P.; Pink, S. (ed.): Protocols for High-
Speed Networks, m, North Holland, Stockholm, May 1992, pp. 19-34 
[2] Benz, M.: "The Protocol Engine Project - An Integrated Hardware/Software 
Architecture for Protocol Procesc:ing Acceleration", SDA '2000 workshop 
[3] Benz, M.; Engel, F.: "Hardware Supported Protocol Processing for Gigabit Networks", 
SDA - Workshop on System Design Automation, 1998 
[4] Benz, M.; Feske, K.: "A Packet Classification and Validation Unit for Hardware 
Supported TCP/lP Receive Path Processing", SDA '2000 workshop 
II 
Un
ive
rsi
ty 
of 
Ma
lay
a
INTERNET PROTOCOL (IP) 
(see RFC 791) 
. . 
. :· ... ~· ·. 
' . , 
The Internet Protocol (IP) provides a frame for encapsulating other 
protocols like TCP 
I' 
and UDP. The IP header informs the recipient among other things of: the 
destination and 
source addresses of the packet, number of octets in the packet, whether 
the packet can be 
fragmented or not, how many hops can the packet traverse, the protocol 
that the packet 
carries, etc. The IP version currently utirizied is 4. 
IP HEADER FORMAT 
OCTET ·1 Version (4 bit)+IHL (4 bit) 
OCTET 2 Type of service 
OCTET 3,4 Total Length 
OCTET 5, 6 Identification 
OCTET 7,8 Flags (3 .bit ) +Fragment Offset (13 
OCTET 9 Time to Live 
OCTET 10 Protocol 
(VER, IHL) 
(TOS) 
(TOL) 
(ID) 
bit) (FLG, FRO) 
(TTL) 
(PRO) 
OCTET 11,12 Header Checksum (IP_SUM) 
OCTET 13,14,15,16 Source Address (SRC) 
OCTET 17,18,19,20 Destination Address (DEST) 
OCTET 21,22,23 Options (OPT) 
OCTET 24 Padding 
OCTET 25, 26 ... Data 
In the example shown the IP packet is encapsulated in a PPP frame 
inc.luding the 
flag sequence (7E), address (FF 03), protocol (00 21) and FCS (OB 81): 
In turn, the IP 
packet encapsulates a UDP message . 
Example: 
!E ~Ql) p.O 21145 00 00 40 00. 01 00 00 3C 11 EO 31 CE D::: aF IF C7 B6 78 
CB 04 63 00 
35 00 2C AB DA 00 01 01 00 00 01 00 00 00 00 00 00 04 70 6F 70 64 02 69 
78 06 6E 65 
74 63 6F 60 03 63 6F 6D 00 00 01 00 01\0B s~G!J 
Start 
Address 
SEP 
IP Header 
g_ 
Data 
00 04 70 6F 
00 01 
res 
Stop 
7E. 
FF 03 
00 21 Jl2 Q.Q' 'oo 
: 
40 00 01 00 00 3C 11 ko 311tE 09 8F 1F/t7 B6 78 ' 
---.._. _ .... -' ---- ---- - ----
• 
04 89 00 35 00 2C A9 B4 00 01 01 00 00 01 00 00 00 00 00 
70 64 02 69 78 06 6E 65 7~ 63 6F 60 03 63 6F 6D 00 00 01 
OB 81 
7E 
IP Header '\:? 
VER=4 IHL=5 TOS 0 TOL=64 ID~1 FLG=OO FRO=OO TTL=60 PRO 17 IP St:M=E031 
SRC=206 . 217 . 143.31. DEST-199 . 182 .120.203. OPT=OOOOOOOO 
Click Next for IP Checl:sum calculation code. 
Un
ive
rsi
ty 
f M
ala
ya
IP CHECKSUM CALCULATI ON 
Click here for a short Description of the Internet checksum. 
The IP Header Checksum is computed on the header fields only. 
Before starting the calculation, the checksum fields (octets 11 and 12) 
are made equal to zero . 
In the example code, 
ul6 buff[] is an array containing all octets in the header with octets 
11 and 12 equal to zero. 
u16 len_ip_header is the length (number of octets) of the header. 
I* 
*****************************************************************w***** 
*** 
Function: ip_sum_calc 
Description: Calculate the 16 bit IP sum. 
*•********************************************************************* 
**** 
*I 
typedef unsigned short u16; 
typedef unsigned long u32; 
u16 ip_sum_calc(u16 len_ip_header, u16 buff(]) 
{ 
u16 word16; 
u32 sum=O; 
u16 i; 
II make 16 bit words out of every two adjacent 8 bit words in · 
the packet 
carries 
II and add them up 
for (i=O;i<len_ip_header;i=i+2) { 
word16 =((buff(i)<<B)&OxFFOO)+(buf~(i+l]&OxFF); 
sum = sum + (u32) wordl6; 
II take only 16 bits out of the 32 bit sum and add up the 
while (sum>>16) 
sum= (sum & OxFFFF)+(sum >> 16); 
II one ' s complement the result 
sum= -sum; 
return ((ul6) sum); 
Un
ive
rsi
ty 
of 
Ma
lay
a
TRANSPORT CONTROL PROTOCOL (TCP) 
(i<.FC 793) 
~:Pi~ a ~~ry cle~et J~si9n! o~~ ca~ q!~-kl; 
r-o-'"'t..·.,_e;. TCP ar.:J a 
-:-::!"l~rent. t·~lepho:~·" cctv.::r!:::-ic·r. t--:-o:·:~-:r" :. r 
.llt.l.:t .. - .. 
::: r L 
:-;-. .,:, sen:!-:-r starts v.~ith "H"'ll:.o . :-:~~- _ s;.7-!: ·:.>!·. Ah:. ':". 1..-:- r.:.:i.;:.-nt. 
:-::..."~li.?s 
".'•Jr·:- . H"y I ask vrh•_o ' f' :alling 
i- -='-"=L" This 3 st· f.• 
~: :..-: -ess is 11.-:;-: TCP it i::iat~: c. 
.:·-=n-:ht•·:r ct l.lrt>~s i~ 
~'=''It , r_h,? r8::ir:·io:-11L ans• =-=rs 
:.:-:: l•J i~ r1·J his C~:l:lr ··-:~ 
. Jit;. 
.:::.:! 'i:-:: :;-:-ndsr et ~n···l.=...!::-=: 
~ i:.•io;,nt at1.S' '-:-l 
" :~s . Tltis i~: l\1-.. :-: S! •·:·elf: in:· " 
~~~ntiri~d hims~lr ol 
::. i .::- 1 oint . 
'·lit.:-. 
j e 
.a:-. ; =-t•- .._ .... 
~~:-, .. ·;. 
rl-:'t :.1::. 
I 
~:.;:m h-=zE:, U.-:: c ·t•V·~l s .. ,ticn :ar. :-:·1.:. :.·. · 
t:-.:tr: ei::h.:-r em c·f thc-
):·.::..rti·:s h~-,s clo:-arly un.ler.s:_o•:•:l ··?-.at :h-;-
t?-.i.:- th.ro•Jgh a 
.?.;]ll·:mc.:. <SE':•) nd ar. J..-:}:r.o~·l.;;::;~.:- _::..rr 
·:: ... =- .:.-
'.1.•. 
't-
"- ! -
..... 
:i_ .. _;· u li: .. tl..;-
.... :- e .:'11 ... ~· . - •• ·:;1) . 
u- . . 
• I • rh.:- . !I I .... - .L ...... ..... __ . 
:=~---- ----- --
~ ....... _.:.. I1 tt r 't 
tr::!)SrL.:.ssi: .. c- ... r. a~~ .. tr. .. ) . 
~=~r.;; t.;,j. Ju~t as 
,• - f p -~ r 
·-
• I 
- .. 
.. :1 " ·r 
·=~.:-. · ~ foll.·ll" ... ,r.: ! l r 'J .; f- i ::. !;' i ": L:o • -= - :·. I - • ' • l'C' F • 
u: .. " .. 
t • ~ 
: 
Un
ive
rsi
ty 
of 
Ma
lay
a
OCTE:' 
U::TET 
. :rET 
: ..:TE:' 
C C'TET 
,- .:TET 
·: CTET 
.- ('TE1 
\.C1'ET 
0 \ TE'! 
C.C'TE1 
11 2 
(SY.: E-'0RTJ 
-3 1 •1 
(DEST PORT) 
s, I - r I I 
9 1 1 ~~ 1 ll l 1 ~ ( ;..rn 
1-;, , ' J I • a.: 
107 ) I FLG) 
l C. I !o: 
c-;: tn 
17 1 18 
( 'T r·;:.· 
. . su~r 1 
-1 ~~ 1 2lt 
{Ui- r I 
:t.l , -~ , ~..3 
( l • } 
24 
2S , _r ... 
:r . . · 
L~~a ~i~~~t J bi~)+;~ ~rv~j ~ ~~~)+ 
C0n- ::l flay~ G bi• 
'.ll:'.i·:.: 
TCP/ IP Packet Example: 
7~ 
.!.£ :::1 u.~ (\ Ot) 4B 57 4 (, er: -- I':: :s 7- --\.. BG E r= DG a~ 
0.:1 !?F "'!! ::.r.. SE f\ .. 
S. 9.:. JU n::. 50 18 :4 . l_j\:_:. 4F ':!: I. 7 r. :--. I • 
~0 7':. :35 7! 75 ()~· 
:2 6S 6~ :;r-; 
' 
F 7 0 
-
.. o: ~~ ~5 - OD J ·..-r.; -F. 
St:art 7F 
~ :.F 21 
!P H.;-c:do:-r 45 or (1(1 t; t- ~~ H • i} F:. 0· ec c· f~ 78 'E 
so 
TSf' H-=a,i.;-r 6E l· 4 9?, ~ SB 
·' 
I) 
. " 
~ 
E-::. -.· ~.c. -~.:- r . ~f~:. 0•1 Uz.\ L.X, ~ 
r ::.t- ~ I. ::. ,. 
. 6t; •. 
' 
,., 
. 
··F ~-' .. , 
~~ (l 
' 
-~ 
.. 
' 
= ~ 7 .. _E ( 
F:.: (-7 D .. 
::tor .. 7E 
TCP Header : 
-
--. 
-I ~--
. J =-=-
. I 
F:.. ~= . . 
~ ~ 
Da t a : 
I ; I 
• .2JL •)E 
' 
t .,~ 
rE . c.s 
~~ 
,. 
Un
ive
rsi
ty 
of 
Ma
lay
a
TCP CHECKSUM CALC~LA~!g~. 
(S.;e also short De~cr~ptlO.:·••. -..r. Ir.':.ernet Ch-::d:su::.) . 
T? cal.:ulat.-? T·-E -:-! .:.-::~ l:· a "p.s-:-·.:::·: h-s-ai.::-1. " .:>· ~·ld-:-d t: :.h.:. ':',..P !'.-;-:.:!.;-_ . 
li·lis i!~:.-J 1..::!~s : 
E· So·!:•:-:: ;..·!:t!• s 
:F 0-?.:.:in:.::ic·t• .... :l:!r -=-.:s 
T:::? Proto::-1 
'I -p 1-:-:.gt!-. 
~~~ ch~-k Jm is =~ r~~ 
Tt:;p hE-<d-?:- .::"'n•.l d:tta . 
4 ,.,.~- C" -~~-::-
4 i::·t-:-s 
2 t:.rto::s 
2 ::-:,.:-.:: .;.s 
l: tho:- ~n po;d , =··:::: .-.ct..,_..· i _ :.:!'i-" 
t.: tl.-;: -='n:: of 
:· • .::-?.:1-=- r :tr. :1 th--> 
r~ th~ .;-xa~ple r i· 1 
u16 bu:"f ( J is .:r.. "'u~:i c:r::a::.:1in; all t!.-:- o·::.-:-t.:.~· in tl.·: ':'C'P !'.".:'.1-:i·"".: •• n.l 
d::<'.:.n . 
u:. ,) l<?r 
d!tt.a . 
B~)L r.·:d:l.:. .. 'J is 1 ir 
r:'.l.Ttbe r • ' 
-
"' 
•f OCt-?t:~) 
h~.: 0!1 -:-·.' -?lJ r.. -!. h-2r 
of 
C•t 
T~ ... '· r 
cct.et= ·.:-: •! 
1l'l .: 
I :.:·.r C:~: : .\:: 
u 1 6 s r<: _a :1:: r [ 4 ] :1-.1 :i-:- •. :i ; r~ 1 .a:,:..- - IF :mt -.:. . d ... c-t.ir.~~:.. ... l - •• t... ..... 
?.-:! jr ..... ,;"' c• · :·?ts . 
I"' 
******X***y•~*•***+*~~·y*Y~*+*++~~+*+~+T+****T~*y * **~*+•+********•• ~ •y+ 
*~* 
Funct~on: tcp sum calc{) 
*******•****'~***~••~+trwt+,••+*•**********************•****•*-*~****** 
*** 
Description: 
Calculate TCP c~ecksun 
*I 
typedef usingend short ul6; 
t}~edef uns~gned long u32; 
u16 tcp_su."r._calc(ul6 len_tcp, ul6 src_ addr[) ,u16 dest_addl.'(), BOOL 
padding, u16 buff[]) 
I 
u16 prot_tc?=6; 
u16 p~C.d=O: 
t~:.6 :.;o:rdlf: 
u32 ~~·..l!a; 
I I :.·1nd out .1 f !..!"-·~ 1 .• _,·~~- of cia'~- :..z e·.·en or o~.l :t.lJT!bc:r. I.: 
I I add :.~ pact~:J.· ::1 hJ·t ~ .. 0 ~- th=- .z::d ~f p"cket 
~i (padd~ng&l--1) { 
pi'ldd= 1' 
buff (l~n t.:p) •·0; 
Un
ive
rsi
ty 
of 
Ma
lay
a
http://www.netfor2.coroVUdp . htm 
USER DATAGRAM PROTOCOL (UDP ) 
( S€-1?. RFC 7 .5g) 
Us~r DatacraM Pto-J:ol is utili~~rl to s~n~ rl~ta ~tat~=~- ' t 
r1.:-c:r::ss:tt: i l:: 1 .:."='d ~·· J..,~ ·,.:.:oty 
l~liabl.:: . -~i~~ i.iDF r-:.::~:-=~ i~ .:.n:·q:.~ul-::.t-1 ir. "'<n E r·.-:-_- , :. 'i:·. '.lr r, 
is ~n:aps.l:at~J i~ 
a PPf pa-:k-::t. Sotr. ur~ an·i Ii .. :t-.:-= :r.-=d:su:.s oct-=-:..· c.:.:. t-:--:- ::-.;: f-. .. _ . ... 
h~s its FCE c:·~·= 
lt•:·~I-?V.;o.:: t~~iS rr:tn :•! .:!.:_.; •;;t.:alC.I':~I£:0:, th:.t tit,;. :i~'.-:t a11:: t.l~-; 
r··tro::-rt. E:···-=·:-r , 
t :1-:-.1:..:: is ~ F··.-:-sii.i:ic.y r:.it:~" :.· . .:.s -:\.:.~:-:. :k·~s l.:•t b-=-:!.·x~:· 
r. -::ssaa~ s-: :!U.!..:. :-=- 1 · .. '"-
is rc.ttl·t.;.!. ;:·a.rt .... ! ar .. :tr .. ..:.:. z-=- e,':3~~; tna-.. iu~t. La!:·!:..;::-=~ 
c~-:.-stirr:.ci ::-. . ::,i~ .:...osU-:: 
.:..~ adi:e~=~~ ty th~ TC? I ,r .-.--
iJi)l.' is a .::::!.:t! l,;. to: i~•F·l-:::-:-.L'. 1- t.: ~ .l L·e:- 1 1l .·e it: :t-: .. ;-_ r .• -
~=-=~r- trad: .•f -:- ·.-~=-:· 
:-:.:l.ct SE-!·,- <:·!' 1:. :-::i·:.:--: <~ !.! 1- d·:•-:::s r .. ·t !1-;>~::i r-. .:..:.i:..i"t:-
transni~s!~n . s~ca~~~ ~~ 
this it is .ac..;.:1ly :!-:"~iqr •. 
~.~r~ v..::1at. ~:!>.;:. .:as}: _.ls~ 
- :.h-::-
·-; -:-:.d .:t 
;:ill b-=" C•l j"•:•. j•l'"'t:.y :r.u:-:1 _b, •. · it.. U:f lU-;:S:::C:<j-::S cH.:O ;;'=··.,;.r:t:.2.:· ta~:-:- .. 
th~n TCP p:-vii-::i that 
•~!t ·? c:•rr,:nun:!.:;F..t.i:>r. lid: f• .. r.-'_i ·:."' !=r•:·p-=-rl~· - Ui.Jl- i:: o-·.:d.;:· ~.:-.:.:!.:.::.;J .... 
S· no-! DliS t;:--;:;n .i:: !i . :;-.-:- S-;;-:JrCh) 
r.:-qu·?Sts , _, E-.:~L..n:;"" c:ht r. !'&res , ct to a~c-s::. - l-1=: r. ! ... :\'b~!~ vi 
Illl:t:In~t. . 
UDP J:iE.b..DER FORHfi.T 
~ ~, 
... , _ 
3 ,!! f l~ 
5 , 6 
7 , t :J-.:-~:sur:-
9' , 1 
UDP PACKET E: . .-
(, 
( ! 
"t. 
... 
I {1 71 
~ 
I. • I I 
··-
·l . 
--
Un
ive
rsi
ty 
of 
Ma
lay
a
UDP CHECKSUM CALCULATION 
1 c-E:~ also shnrt 0~:;-·::rh:·tion of Internet Ch-::d:surr.' 
:c. o::alculat ·- Uf'2 •::-. .;:ks.;m ::. "ps.;:;_!:J.:. !· . .o..;,.~":;'r" is a:U~·! t 
!r:iz ith.::la.vl.;~: 
!F So:.~rc.:- hJ.t.._o;".::s ~ by~~s 
1f' r~~t:.inati.'.:.n hj.-i:--:-.=.:: 4 b:,·t.:-s 
t"rot• .... ·:ol 
i.iDE L-:-n9th 
~ bytes 
2 1:•:!;<:-S 
.!a- :h-:-.ksum i.!'" C~-·t:_:t':.-;.:1 :,.:-:-1 al.l tl! .·.··-
~·- f h-;o,,dt-1. ~nd d .:t- . 
:f T...L':l dctt.et C'..•htoti !:w .-.!"1 :··!.·! l l'.JEI'-2-1 •jf :•t.-:-'~ '!'\ 
_ ;:.b"" ~Ia.! of d"'tFt. 
;.:! tlt .. ; ,;.~~:"'d:1~~l~ ~\: ~lp. 1 
. . , --. 
•- I "! :!.:. t 1 
.. -:i 
·.-t . 
ul0 b11ft f I j s an ~l . .f.""·Y c.:•nt:.-:ti nir,·;~ all ':!.·. ..:t~ '·-- ::1 tr-:- •: F' 1.-:..t":;'r • ... :! 
r:~t.a .. 
I• 
i: l if <i:tt:..?. h'ls :.n a•;-:-!t r.·.::d:=-r cr · ;:-t.:-
-~·· c• ~ .. !,._. 
- rc~ 
·!~r :~ r.d 
!.. •ll - ·u 
'-:: - .:.n -::. :n 
y*~-~X~****y*•****+~**~****~*****+***•*Yy+•*~~Y*+Y***++~~·~·~*•++•*~~~* 
Function: udp sum calc () 
Description: Calculate UDP checksum 
~~y*~-*~*~~**y* *~* * *~~Y*+*********~****yy?*~**~*+~ ~ · ~ ~+v~~~~~+•v+.*+•* * * 
*•*• 
""I 
typed~f usl.ngend short u16; 
typedef unsign·- j lcng u32; 
~16 short udp s~~ calc(u16 len_udp, u16 src_addr[),u16 de~t_addr[), 
BOOL padding,-u16-buff[) ) 
{ . 
u16 prot udp-17; 
ul6 padd';;O; 
u16 "rlord16; 
u32 sw~; 
FH1d out l.f r:.h-s- le::;,rth. of d.:-.·~-:=-.. J.$ .::·.- a .. r oc! . ~-· 'U.b _ • If II 
ll add a padd:.::g by'-r, = 0 at tl.-'" · d of n:.d·et 
lf (paddl.r:g&:..=-j) ( 
padci 1· 
buff [ h~.-~_ udp] =0 ; 
I /J m tl.all. ::..., SL;..'!l to zero 
su:n=O: 
Un
ive
rsi
ty 
of 
Ma
lay
a
., 
.lt:c·::ial or, TCt/IP Pad:-:.-t Format 
• Ether~~t h~ad~rs 
• TC? h-:-a·!o:>:e-
• IP h.;aj.;.:!· 
• C·)DE BITS ,_, ti-:..:) ·:F.S • .:..:-~: EH F:T S'.'': Fit: (Lf.:\PF • ..:-= 
u :r~y !"it:-:ly i]n:·:---=- t!1~ " ch.;~.:l:s_:a .:.:.-or.:. ....... r-'='_r--;t~ : r _ .. ~ ... :r·:!.t:~ 
.arre;S :::on • \ 
~ . l32 . IH: .l 7(l . This is du-:.- to th-= s:-c..:all ... l " ..::!o:--:ke-. :::1:·-~ding ". 
t..:• 
h·•·:-.-:. -:-t:t.:-r-=-al . C•:•r:;/ lit"t.S,. .;.th;.;...;:-.11-·J~· r.., - 0 •)21 J t .. ~ --=- L . r.-cml#~"•O J7U 
~;:::.;, 1 r:;:: l:.yt-'?D :-n vir-:-, ~~ by't:-=. r.:a:':uJ. .. 
il::ri·;al 'l':i.:!\-=- : D-:.-c: ::: , 2(•11_:: ll : 1· : -~ .4l r: .. ~I • DU 
Tin:-:- 0:1-:.lt~ fr·:·m pr:::"i01 .. s p::·::}:"'t : r1.rJo·1 · :'lr,~o f: .. ~<n . :i. 
'l'i1 .• ~ ro;-:a':iv-:- to first pa·.::~:-:-t : n;OO• ~ .. r;c·n··l.: 
Fra;t-:- tlur:-0-=-r : 1 
Pack~~ L-?n~th : E: b~te~ 
Car·:.·tr.; LO?n•Jth : 6::: b~·:-=::; 
•rl-?t. II, E'l- : 0(1 : 0l : C: : '<) : .:-.b : 7:, c-- · 0 : - : 07 : - · 1 
D,;-e-tination : 00 : (,0 : 0-:- : 0- : :::::- : (I' 0(' : ·O : 7 : a : 01 i 
Sourc~ : 00 : 01 : 0~ : $3 : a~ : -- ' : n] : n_ : 83 : -~ : 7z 
Type : If (0;.; 800) 
r: t-=·~ frot..c~•l, Src l..ddr : 1:.:: .. 1:,_ . 1.! . 1- 15 . 13 ... . _ ... _ .• 7 1 , Dst 
: J07 . 202 . ~ ! .1 3~ (207 . 20 . 21~ . 132 
·,.,_rsion : .; 
Hc.a:i~r lt-n '".! : 2C t:~~.:.::. 
Dif~~re~t:at~d S~r~ice~ ?~.ld : 
(•(t(ira : ) . • [liff~r-E-r:1_iat.;,.i _.;:.r- i.e(: 
. .. : .. r . - E•:tr-C-::!.oa: J.o. Trc..sy:-~t_ 
l - ..... · •·. ;! tl : 4 '3 
Id-;-n- i:<":.:-::ti.:·;, : (l:-:-~~.:;. 
Fl :';F : ( :-: r ~ 
. 1..- [..rt ' t fra;r. . .;;-r,r: : ;:;._ .. 
. · ' . - i::.:- • frc.":tt.-~nts : :l:.t c-_, 
I' : 
. -
.. I 
•. 
.. '": ... - -· . . i ~ 
: :..:.! : . . lo: ; Eet:: O:·:l•v 
_in : · t" :1- { r\:-:0 ·,; 
Un
ive
rsi
ty 
of 
Ma
lay
a
111·.'..1.·· 
D-::stittatior. rc•rt : 1729 (1729) 
~-=--:ru-:-nce num.t-er: 335190.] 732 
.~.:;kn·:->·lledg-em-::nt number : 36396164 4 8 
Eaad~~ length : 2~ byt~s 
Flaqs : 0:.;1)')12 (SYli, ?.CK} 
-(. ... = Cong-:-sti:·n t-Ii11d:•: ·- 1:..:7:1 (,:-",:P.) : :;-- co .. 
= E·::~i-Ec:hc• : ti:·~ ~-=~ 
• 0 1) . 
• 0 . l 
= U.rg~nt : ~:.:-i.:. E-::t 
= J..c~:r~ ····1..;.::~7!1t : ~t:-: 
r ... = Pu-sh : 1!·::-t ~-=-t 
. 0 •. = F:.:. •.t : ::::..t ~-=-t 
.. 1 . = ~::·:~ = s.,.t 
i F i :-c : li _- s-:,t 
~-::.r:d:.:-; si-- : :G38 .. 
C·.~·:l:.n:m : 11;-1 1 68 tcor:-.:.:;t;i 
•:::ti·:·r.s : ~~ J:.ytes\ 
~ 
··oo i)U 1 ; (\~ 83 c~b '"l"i .... 01.' (t l 97 2ct E·3 -· !4 : r . ~ -:::0 l)(i 0 0 0 0 .l 0 0 0 -
·::: . . E . 
l.: 10 1)(1 ... c 97 Ocj Oo (I (I ~.-.... ~ (lc~ l e 4.::: •:f :::t ( : •"" t: ~1 :~A 8.1 
. , . • I . N. . 
('~(I -, 4 oe1 1•0 17 or cl c7 cc. 01 de do .:: Cl I 12 ~· u 
.. :J: . . 
I 
·:;o 40 G•J bl 68 1)0 (10 0~ ·)~ 0~ b4 00 1! . i.. 
::=<-~-:1-= 3 (::.4 b:·tes .1. "lir:"'" , 50: b:·::-=.=: ·:ar::.·---=·· 
_::.:.:-i·.-~~ Tim-e : D.:>c 2, .?00.2 l! : H : ::'L:578_:_ 
T:m~ d~lta fie~ pr~vioJe p~~~~~ : I = ,0, • s~~r~ 
':"i!'!o:. re-lati·:~ to first p~c!-~t : ~ · :-:·tc:i 
F.:-ar-.o:- r:u:.,ber : 3 
Ee:ck-:-t. L!2ngt.h : 54 byt.;:s 
Ce:pt1Jl e Length : 5-. t·yt ~ 
=:::h-.::rr.-::.t II, Src : 00 : • : l'~ : f.!j : b : 72 , D£:. : C· : 00 : {:: : 07 : .. : tJl 
0-::~tin:~+;io:-. : A : 1 • : : : '•t : :::t.: : Ol (c.O : ' • : ~·. : 07 : · : f'\1) 
s:·:r:c·· : '10 : 1~ : •: : : ,} : -.:. ,no :o:. :o:: : : : c.:~ : 
'; • ~·-= ; { t I • I I' 
;-.-:=-r!. t r:,t.~-ol , ~l· • • --..l:t : .:.::' . 132 . !..,: . _-
.~: .. :· :1 r : 2. (, -t' . .l.. v _ .. : 1 ;1 • 1 ~.: t .. , t I .. "'"': : .. .:: l ~ .. .:. _: 
\'-:-rsi\:r1 : ,~ 
.. 
•I 
. '
: 
l 
.. 
.. 
. l 0. - 0 1 
, '". 
I 
·'. 
1 1 ~ 
·r : 
Un
ive
rsi
ty 
of 
Ma
lay
a
,. 
tl 
... 0 
£C!l-FC!ic• : !io'- .::-:-t. 
Ut-;:r·::n• : llc·':. sr..-:. 
::-.d:nc• ::."'dgillo?nt : t:c•: S?-
' . • • i: us! : :lot ~.::t 
t • • . -
.. l . . :·r, : 3 
.... n :-i11 : 1: .. ~~'t.. 
~·:it,.·!_.; .si:.;, : 1 G36-! 
('h·:-d:.; 'JIT: : 0::601 ';~ ( CC•l' r-: ':: t) 
.pt~:!-..s : (8 l:t)·t-=.-sl 
;:~:-:ir.t:J:" ~-=-.:· Itt i- .. · 146() }-.yt.;.; 
:. p 
., : 
- " i>·: !" ::'.l '+: ~ .. .: ·.z 
1) n: 7 .:. · ·11 
........... t .. s . 
: 11 1'· Ot :~·1 4-::.-0-:. 1!:) t)U 8o JG l't) t)•: 9~ :4 O:o-! 8•:\ o;f ·-e 
J ~~ ~ ,_: • . 
•1, d• -.J OG :1 I) I) _-, 1' .. 
. - . • I=· · 
13• ~I) 0 60 19 0 ' 1)_ . 
: :: ::: ..- _ i: )' t -::- t ·r: .. i ~ .:: 1 
;!:: i·:d Tir~ : D-e> 
:ir .".;.lt.a :::""' 
Ti::~· .-:..._ati·:-=- t 
?r _,r,l':' t:umL-:-l : ,_ 
r "'r-k._: L.;n :.i .. L : \j 
!'0 ,.3. bl: (1(1 r 0 ( (l(l 70 (1,2 
•)4 I 5 1:-~ (Jl '1 .. , .. ... 02 
GO tyt~2 ~a~:ure~) 
~? 11 : }= : 2: . 6,~34~ 0 I 
·~ p·~·=-~=-· : 0 . .:::Sl75500 ~~c ... : • .::.o 
l . :~:7550UO s-c ... nj~ 
. =:: -I • • L- :···]t.b: :-.. r.-;+ -5 
..  .: 
1.:' · - . 
, L~-:. : I) : 01 : 2 : B-.> : a;: : 7.: 
:yr. ·: IP (f:;.08 s 
:'tail=-;. : 0010 
; r.-\ .. C• :r: , 
.. 
.. . -J-... .._ .. 
: 1 
! •. f ... 
I • 
It- • 
Fj-=-ld : 
·~j 
1·" 
I • 
OC : )l : 0~ : 83 : ab : 72' 
:·:-:0•. (1.~.:.·,..,, OY· •• : r ~:'f'. lt ; :: !I : .. :11• 
:·-=-r .-:.._...-s ,-, I•·F··iltt : --u:~·-1 It•:·: ·• 
Un
ive
rsi
ty 
of 
Ma
lay
a
Trans~is~ion Control Protocol , Src Port : 17:r (172 •1 , . ~- Por~ : t~lnet 
(23) , ~?~ : 363?616448 , A~k : 33519~4733 , L~n : 0 
s~urc~ porl : 1729 172?) 
[-?s~inatiJn 1-XHt.. : t.;;ln.:t (231 
S.;rfu o:;!"tC<~ m:r.t~ r: 3:)39611-448 
::o c~:r. 'ollf:dJ·.m-::~t num!. • .:r : 335E• ?,72'S 
E~aJ~· l~ngth : 20 Lyte~ 
FlG:i~ : 0:·:00l!J (Aw:) 
= c ~ '-=-s-.:.:t. to· :.:o.:i::o ;:.;,_!,0::=.:: (C"·I!n : ti:: ~-:- • 
= EC!I-E:~.- : tl·:·-: ::~t 
•lti)•J 
~ l.' 10 
... l A·Ln~~-=10~-=-~: : 3~~ 
0 . • . E ush : ;:c.: s-:.::. 
. 0 .. o::: P-=.s-:-t : !i:.- s.;-
• 0 '·' 0 = Sj'!L : tlr::: .:~t 
•• • t 1 =Fin : rr~:·t s~t 
ro:ir:.:i·;·o; . · : 1752u 
Ci:,:.:k.:utu : 0:-:··l:•(\:-' ,:_,,~.:rr-:-.:t , s!'o:.ul.:: J:.o? ;o;:;.,J:7.) 
r<, 1 1).: I 
"'' ct: 01 II , ('1 ')2 
- .~ at 0 ~c 
. r . . E. 
r (· ~.;., 4·.: (l: .,!) ()I E'O l) ·5 Ill) :) ~)-= =·~ ~0"' a:t : f _._. 
I I 
':C1 
on:c: 0 
( :'1'21) 
·=•·., 
:~ oc cl 'u 17 j.; :• -~ :L ~ -; ::-?, .:i :.o l -
. . 11 . • P • 
l.) .,,! . ('! Cl 00 0 
?.ta:n~ ., bj·t-- on ··ir.-? , 60 !.yt~.: :a1_::.t4~ 
;1:~·~1 Time : o~~ ., 2v •- ll : lS : 22 . ~37 
Tin~ -tlta I!~ ~r vious pac~~~ : ' . :SF~-: ( se~on~ 
'L&.e :eLo'tth·o: to first packst : . 5.:.076~ .. t""I:cond 
Fr:a:~-= Hum~ t : ~ 
F -.:.i:~t. I.Hn ;;t: 1: 60 l:yt- ~ 
C-'=?':.U!.> L ., 1tit : lit byt:-:=s 
E·- ~1-:-r: . .::-- :' ,:. : OU : C : 97 : :J : •:.:; : ;~, D.::. : C. : ':'.: : .-.:: : 83 : 
L':!'r.':·. r-·-·· : Jfl : C1 : (_ : ~:3 : ~i: :- :. (Ott : l: r : ·: : .!.·: .,2) 
s--.--:· : L- :· . : .,.: 
:ro 
L 
: 
r. 
. -- .. 
. 
. 0- ' [·~-
Un
ive
rsi
ty 
of 
Ma
lay
a
D~stination : 158 . 1JL . ~1- . 17' 15~ . ::~ . ~~~ - --
Transcission Con~rol Fr·• 
1 1729), S-?q : 33~l~ti47.: I 
S.:·urc.;- port : t_,-:.lt.- .. 
[ ?s~ .i:t,ati·:>n t= :.rt : 
S -:-r;uo211C o2 nunt·- t : 
N~:~t. ::-;-'".:JII.:O!IC-:- !aU 
A-::knov;l.:-d 3.::-r::.::nt 1. .:·L-:r : 
Hea1~r l~ngLh : ~u t~t~z 
Fla9s : O:·:ti018 'L':-! 1 !-.r::· 
11 = r .. ~ ··::-t ·. •r: . : ... 
. . ~ ·' 
. 0 .. 
. . 0 . 
E-i:-E-h : k- ~J:. 
= U:~en~ : ~~~ ~~· 
... 1 = A · ··n ·,· ;l.:.:i;;i'·.:::.:. : 
1 ... = r·.i, . S-t. 
. () .. 
.. • I) . ~:·:: : I .. ,~ ~~·-
1.1 F~a : t:-.- <-: ... ·i 
Winje~ si~~ : !75_ 
Cho2cksuM : u~~~u~ -·zt t) 
!.::ln.;,t 
Cc>Uuuand : Do .:.utl.e::· i · .t.:. :·n 
~::' )0 00 (! 02 ' a!; - 1 .. 
-::::! . . E . 
I .1:1 (''.) 2b en 
·'' 
n 16 
,. 
. - . ". • I . . 
(•I .:.0 ;• . a a IJO 1'7 or ca 
.. • F. . f. 
I i .. o ~e 70 9f ft r ) tf 0 
,.-
.. ~ 
1 
,_ 
.:.::. 
- ·t 
..::-, 
i 
~ .. 
-·-:: 
·- . 
..... 
;t d· ~., 
.. F 1... ·, 
. . 
-:: 
. 
I • 
Un
ive
rsi
ty 
of 
Ma
lay
a
Flow centro/ is implt:mt:nted to control the Dow orp:Jckets relating to • single 
in order to on:rcomc the difference between the rote at which a source S}'SteuJ 
sends packets and tbe rate at which a destination can accept packets. If the 
destination can aocept packets faster than the source can send them, clearly there 
is no problem. However, if the reverse is true a harmonization (flow control) 
function must be provided. 
Congestion control is concerned with a similar function within the netwoJi: 
i~lf. If the composite rate at which packets enter the network exceeds the rate tl 
which packets leave, then the network becomes congested. Similarly, at a more 
local level, if packets arrive at a network node- for example, an IS - faster than 
they can be processed and forwarded, then the node becomes congested thus 
aiTccting the flow of packets relating to all calls through that node. 
With a connection-oriented network such as X.2S, flow control is performed 
on a VC basis across the local DTE-DCE and DCE-DTE interfaces. A send 
window is defined and when this number of packets have been sent (typically two). 
the sender must wait until an acknowledgment relating to either of them is 
received. Since this function is being performed at the periphery of the network on 
a per call basis, in addition to regulating the flow of packets into the network, it 
helps to control congestion. However, it does not prevent congestion complc::t .. lv 
In contrast, with a conncctionlcss network no flow control is applied to 
packets associated with a call within the network. Instead it is left to the transport 
protocol entity within each ES to perform flow control on an end-to-end basis. I( 
congestion occurs 10,1thin the network, flow control information is delayed and the 
source transport protocol entities stop sending new data into the network. Again, 
although sending new data helps to relieve network congestion, as with a 
connection-oriented network, it does not always avoid it. Therefore, with botb 
schemes we must incorporate a congestion control algorithm within the network. 
Moreover, for internets comprising multiple network types, the congestion control 
algorithm must harmonize between the diiTerent network algorithms. 
Error reporting 
The way in which errors are reported varies from one network type to another; 
Consequently, we must establish a means of error reporting across multiple 
networks. 
All these problems must be addressed by any intemetworking solution. 
9.3 Network layer structure 
The role of the network layer in each ES is to provide an end-to-end, mternetwide 
network service to its local NS_user(s). This can be either a CONS or a CLNS. In 
both cases the NS_users should be unaware of the presence of muHiple, possibly 
diiTercnt, network types. Hence the routing and all other functions relating to the 
Flgure9.6 
Network layer 
structure: (a) sublayer 
protocols; (b) IS 
I '1)wa..,..., ..... , ~.q .. , I 
I NSAP 
N 
( S~1CP =Suba<t~t~pn>l·"'li 
--·· ....................... 
= Subot< dcr<nd<nt con•~ p<Oiocd SSDCP 
.................................. 
=, Subn<t dcp<ndcnt ...,... p<OIOCOI 
I. 
SNOAP 
DL 
PIIY 
I 
r s .... onhubn<l I 
(b) EMost tSnw>-1 
TratupO<t U)U T'"""""rt l.t)tt 
I-OS-'PI t IS/pt<'O~y t l'S~P1 
..... ..... 
............. 
S~ICP !---- Routing+ Rl.l)m~ SNICP 1-- S'IICP 
.................... ........... 
.................................... 
SNDCP SNDCP SNOCP' SSOCP' 
............................. 
. j.--+ "I-- ..................... ------· SSO,\P SNDAP S'IDAP' S'IOAP' 
DL 1---- DL DL' I-- DL' 
PHY PHY PHY' PIIY" 
I l l I 
I 1 r ! Ntl'fllorkl•ubncl I Ntt"'Orklsubnct 2 
-·~"""' W •AduaiOow ES • En6 S)>t<m IS~ lnttnn<d..:< S)~ttm 
relaying of NSDUs must be carried out in a transparent way by the network layer 
entities in each of the end and intermediate systems. 
To achieve this goal, in the context of the ISO reference model the network 
layer in each ES and IS consists not just of a single protocol but rather of three 
(sublayer) protocols, each performing a complementary role in providmg the 
network layer service. In ISO terminology, each network that makes up an 
internet is known as a subnet and hence the three protocols are known as follows: 
i 
• Subnetwork independent convergence protocol (SN1CP) 
• Subnetwork dependent convergence protocol (S~DCP) 
• Subnetwork dependent access protocol (SNDAP) 
The relative position of the three protocols in an ES is given in Ftgure 9.6(a); part 
(b) shows the protocoh in relation to an IS U
ive
rsi
ty 
f M
ala
ya
,/'N:> .·~N/c. "I'' .wur~'-~''"' r/,,· ncr•f.-·()rA .;yrrt-'H.:"C· p,-o,.ic/e"'fd ro N.:.:_u.>~N 
intt:r/U'-r:: with tht: intcmct. Its role is to carry out the vadous htumoni.,.;iJ 
(ccnvcrgencc) functions which may be necessary to route and relay USer 
(transport protocol data units) across the internet. Its operation is indepen<fenc 
the characteristics of the specific subnets (networks) used in the internet and 
assumes a standard network service from them. 
The SNDAP is the access protocol associated with a specific 
(network) in the internet. Example.s are the X.25 packet layer protocol for 
X.25 network and the conncctionless network protocol that is often used 
LANs. Because the service and operational charncteristics associated v.ith 
SNDAPs d1ffer from one network type to another, an intermediate subla)er 
be provided between the SNICP and the SNDAP. This is the role of the 
Oearly, the detailed mapping operation that it performs will vary for 
subnetfnetwork types. 
9.4 Internet protocol standards 
As we discussed in Chapter 8, multiple X.25 WANs can be interconncctCd 
X.75-based gateways. The introduction of a standard specifying the OJX:Iation 
the X.25 packet layer protocol for use with LANs means that one approach 
internetworking is to adopt X.25 as an intemetwide protocol. The latter can 
operated in either a connection-oriented mode or in a pseudoconnectionless 
by using fast select. , 
This solution has th~ appeal that the various intemetworking functions 
much reduced. The disadvantage is that the overheads associated with X.25 
sv.itching are high and hence the packet throughput of these networks is relatiVI'I.Jl 
low. This is also true with fast select, since the same VC/error control functions 
still u~d. Moreover, the much improved bit error rate performance of the 
generation of WANs, such as ISDN, means that frame relay and cell (fast 
sv.itching will be the preferred operational modes rather than convention.ol.!. 
packet switching. 
The solution adopted by ISO is based on a connectionless internet 
and an associated connectionless SNICP. The SNICP is defined in ISO 8475; 
based on the internet protocol that bas been developed as part of research' 
internetworking funded by the US Defense Advanced Research Projects 
(DARPA). The early DARPA internet - ARPA.."'ET- was used to interconn.Cct~ 
the computer networks assoc:iated ~ith a small number of research and uniW"niiV 'II 
sites with those of DARPA. When it came into being in the early 
ARPANET involved just a small number of networks and associated 
computers. Since that time, the internet bas grown steadily. Instead of 
small number of mainframe computers at each site, there are now large 
of workstations. Moreover, the introduction of LANs means that there are 
several thousand networks/subnets. ARPANET is now linked to other 
The combined internet, which is jointly funded by a number of ager.cies, is 
known simply as the Internet 
~·i..temetwide I P 
.,.~~ ..... 
' 
' 
·-
~ ~ Subnct~t COD\'Cflm" p<o<o.:ol 
~ = Suba<t~COit\C'J<~ p<o<ocol 
ll§!88l = Subnct dcpcncknt ooocss pto(oo:ol 
-
lS • lntcn'ne.\Ji.lt S)<\lcm 
GW =C••.,..•r 
ES • En<! J)>t<m 
~ ... ~---
: 
' .. 
IP "Internet proto.>eol 
AP • Appl~ pl"O<XU 
Tl. " Tra<bpo>n ~)<t 
The internet protocol is only one protocol associated with the complete 
protocol suite (stack) used with the Int.:rnet. The complete suite, known as 
TCP/IP, includes transport and application protocols which are now used as the 
basis of many other commercial and research networks. All the TCP/IP 
specifications ore publicly available, as a result of which the Internet is by far the 
largest currently operational internet based on open standards. The two protocols 
we shall discuss in this chapter are the internet protocol associated with th~: 
Internet- known as the Internet IP or simply IP - and the ISO Internet Protocol 
known as ISO-IP or ISO CLNP, which is intended for use with OSL stacks. The 
general approach of both standards is illustrated in Figure 9.7. 
IP is an intemetwide protocol that enables lWO transport protocol entities 
resident in different ESsfhosts to exchange message units (NSDUs) in a 
transparent way. This means that the presence of multiple, possibly different, 
networks/subnets and ISs/gateways is completely transparent to both commu.nic-
ating transport entities. As the IP is a connectionlcss protocol, message units are 
transferred using an unacknowledged best-try approach. 
Although the operational features associated with ISO CLNP are based on 
experience gained from the evolution and use of IP, there are differences both in 
terms of terminology and operational detail. I lence ~·e shall discuss each protocol 
separately. 
9.5 Internet IP 
TCP/IP is now v.idely used in many commercial and research internets in addition 
to the Internet. Nevertheless, almost a lithe protocols associated with the TCP/IP 
hove been researched and developed as part of the Internet. Indeed, nev. protocols 
are introduced relatively frequently as research associated with the combined 
I 
i 
----------"~-- - -· - - ----- - . 
Un
iv
ity
 of
 M
ala
ya
u r• C"'-r., '"<"'r IJ> .,., ... ..,,,~ ....... , 
p;lrt or aU TCPjiP impkment.Jtions. Other OpiiOD-21 protoccls af<' intctxfc.l~ 
open S}'Siems of varying size and complexity. We shall consider only the 
protocols. 
9.5.1 Address structure 
Figure 9.8 
I P address formats: 
(a) frame; (b) subnet 
addressing; 
(c) modified class 8 
address 
Recall that there are two network addresses associated with a host/ ES attached 
an internet. In ISO terminology, these are the network service access 
(NSAP) address and the subnet point of allachment (SNPA) address. 
TCP{IP, these are the IP address and the network point of allachment 
atldress, respectively. The NPA address is different for each nctworlc/subnet 
whereas the TP address is a unique internet wide identifier. The structure of 
address is sho\1.11 in Figure 9.8. In order to gi\·e the authority establishing 
internet some flexibility in assigning addresses, the address structure shown in 
(a) has been adopted. 
To ensure that all hosts have a unique identifier, a 32-bit integer is used 
each IP address. Then three different address formats are defined to allow for 
(o) Bit~~:=:, :, '. 81 16! 24 24! 32~ 
lu ooid ho.tid • ews A 
~) 
(<) 
!21 11 I 16 ~ ltlo •• ~~ j ~~~4 ~ewss 
I 131 I I 21 I I • I 
2S 
muldcut addrns •Muh""'>l 
'·l·hl : : : ~ ·R=~td 
ootid " Net .. orit kl<ntir,.,. hootid = Hou j,j..,ur ... 
II 8116m 
fnh:m<lllo~< n<lid p&rt ho~tid I lnttmet roul•ntt p&rt -(- Loc:al pan I 
'I t • . • I 
I· 161 I 3' ' 
('W.A/BJC 
l•lol nc:)id I subnctld : h<><tid D • a ... B 
by ~n •d~ ~i~-;-;;~~~;,#J;:;,;(~;.;,;;,·~;:;>~ ·.;;;;·;,d.;,.;;;.;.~ F;.Q"rn ·nil ct8~ ~lie-. 1n,.,..._. 
primary classes are A, B, and C; each is inrcnded for use with a differenr size or 
nerwork. The class 10 which an address belongs can be de1ennined from 1he 
position of the firs! zero bit in lhe tirsr four bits. The remaining bils specifyrwo 
subfields - a ne~ork identifier (netid} and a host identifier (hosrid). The subfield 
boundaries are located on byte boundaries to simplify decoding. 
Class A addresses have 7 bits for the netid and 24 bits for the hostid; 
class B addresses have 14 bits for the netid and 16 bits for the hostid; and class C 
addresses have 21 bits for the netid and 8 bits for the hostid. Class A addresses arc 
intended for use with networks that have a large number of attached hosts (up to 
22') while class C addresses allow for a large number of netwQrks each with a 
smaU number of attached hosts (up to 256). An example of a class A network is 
ARPANET; an eumple of a class C network is a single site-v. ide LAN. 
An address with a hostid of zero is used to refer to the network in the netid 
field rather than to a host. Similarly, an address with a hostid of all Is refers to all 
hosts attached to the network in the netid field or, if the latter is allis also. then to 
all hosts in the internet. Such addresses are used for broadcast purposes. 
To malce it easier to communicate IP addresses, the 32 bits arc broken into 
four bytes. These are converted into their equivalent decimal form with a dot 
(period) between each. This is known as dotted decimal Example addresses arc as 
followlo. 
00001010 00000000 00000000 00000000 p 10.0.0.0. =class A 
netid 10 (ARPANET) 
10000000 00000011 00000010 00000011 .~ 128.3.2.3 = class B 
= netid 128.3, hostid 2.3 
110000000000000000000001 11111111 ~ 192.0.1.'255 ""classC 
= all hom broadcast on netid 
192.0.1 
Class D addresstS are relll!rved for multicasting. In a LAN, a frame may be 
sent to an individual. broadcast, or group address. The last one allows a group of 
hosts for example. workstations -that are coopera1ing in some way, to arrange 
for network transmissions to be sent to all members of the group. This is often 
referred to as computer-supported cooperatin ~orking (CSC\\); class D addresses 
allow this mode of working to be extended across an Internet. 
Although this basic structure is adequate for most addressing purposes, the 
mtroduction of multiple LANs at each site can mean unacceptably high overheads 
in terms of routing. As Chapter 7 described, MAC bridges are normally used to 
interconnect LANs of the same type. 'Ibis solution is attracti\'e for routing 
purposes, since the combined LAN then behaves like a .single network. When 
• interconnecting dissimilar LAN types, the diflerences in frame format and. more 
importantly, fmme length, mc:an that routers are normally used since the 
fragmentation and reassembly of packets frames is a function of the network 
layer rather than the ~lAC sublayer Hov.e\'er, the use of routers means that each Un
ive
rsi
ty 
of 
Ma
lay
a
LAN ~nust h.tt>·c its OH-n rJctid~ In the case orfJrgr: sil'c .... the-n: n1o.~y· 1;tc a 
number of such LANs. 
This means that with the basic addressing scheme, all the routers relatinl 
a site need to take part in the overall internet routing function. The efficieocy 
any routing scheme is strongly inHuenced by the number of routing nodes 
make up the internet. The concept of subnets has been introduced to 
routers- and hence routing - associated with a single site from the overall 
routing function. Essentially, instead of each LAN associated with a site 
its own netid, only the site is allocated an internet netid. The identity of 
LAN then forms part of the hostid field. This refined address format is shown 
Figure 9.8(b). 
The same address classes and associated structure are used, but the 
now relates to a complete site rather than to a single network. Hence, since 
single gateway attached to a local site network performs internet wide routing, , 
netid is considered as the internet part. For a single netid with a number 
associated subnetworks the hostid part consists of two subfields: a subnelld 
and a local hostid part. Because these have only local significance, they are 
collectively as the local part. 
Because of the possibly wide range of subnets associated with different 
networks, no attempt has been made to define rigid subaddress boundaries 
the local address part. Instead, an address mask is used to define the subaddra 
boundaries for a particular network (and hence netid). The address mask is 
by the internet gateway and the routers at the site. It consists of binary h 
those bit positions that 'contain a network address including the netid 
subnetid - and binary Os in positions that contain the hostid. Hence an 
mask of 
I IJIIIII Jill II II II II Jill 00000000 
means that the first three bytes (octets) contain a networkfsubnet identifier and 
fourth octet contains the host identifier. 
For example, if the address is a class B address a zero bit in the second 
position - this is readily interpreted as: the first two octets arc the internetwidO 
netid, the next octet the subnetid, and the last octet the hostid on this subnet. 
an address is shown in Figure 9.8(c) 
Dotted decimal is normally used to define address masks, m which case 
above mask is written: 
255.255.255.0 
Byte boundaries are normally chosen to simplify address decodang. Hence 
this mask, and assuming the netid was, say, 128.10, then all the hosts 
to this network would have this same internet routing part. The presence 
possibly large number of subnets and associated routers is thus transparent to 
internet gateways foe routing purposes. 
To ensure IP addresses are uiuque, they must be assigned by the 
authority that ss setting up the open system environment. For a small internet; 
is relatively straightforward. However, in the case of l:uge internets, such a~ 
slloc11tc netids and multicast addrases Sct:ondi.Y, -.• n- --,;;;ihO;,-;;.,.;,!#soci:Jlcd 
each network assigns hostids on that network. The central authority for the 
Internet is known as the Net11-ork Information Center (NI C). 
9.5.2 Datagrams 
Before we consider the various functions and protocols associated with the 1 P, let 
us describe the format of an lP data unit. Tbis is known as a datagram. The format 
and contents of a datagram are shown in Figure 9.9. 
The version field contains the version of the IP used to create the datagram 
and ensures that all other systems - gateways and hosts - that process the 
datagram dunng its transit across the internet interpret the \'ariou) fields correctly. 
The current version number is 4 and is referred to as IP 'crsion 4, or, simply, IJ>,-4. 
The header can be of variable length. The header length specifies the actual 
length of the datagram in multiples of 32-bit words. The minimum lc!ngth -
\\ithout options- is 5.lfthe datagram contains options, these must be in multiples 
of 32 bits. Any unused bytes must be filled with padding bytes. 
Bit orJcr - I l 3 4 S 6 7 S 9 10 II 12 ll IJ 1$ 16 
Hc.IJ<r 
D•L> 
-
Verdon IH<•olcr kogthl T)l'Corsc:"i.< 
Toulla>flla 
ldcnuroat»n 
oiMI 1 Fr•smc•• ...,.,.. 
I 
I 
li 
Timc·to-l1't I Pro1ocol • 
llc3~or cht(\•um 
Soum:IP .J.!~ 
I j 
O..tinationiP ...Jdreu -
0pt10aS 
OlU 
I 
(S6S S:l6 ":uu) 
9 10 II 12 13 14 IS 16 
r,;ocor~""' I P~~ I o I r I R lu~ 
L____ H1Fh rchab1hl) 
Hi~h throu~hput 
Lowdd.1y 
'--------- Prionty (0...71 Un
ive
rsi
ty 
of 
Ma
lay
a
77u.• 'JP'«" oro!t<,-../~ p/..~.-1" .. rhe .-~.n;:;;--.,..;k--;;;;:;-;h-;. <.'<7..>+ ,...,..,,,c-,~,. 
networks. It allows an application process co specifY the prr:krrr:d attribu~1 
associated with the route and is, therefore, used by each gateway during 
selection. For example, if a reliable delivery service is preferred to a best-tzy 
transfer, then given a choice the gateway should choose a connection-oriented 
network rather than a connect.ionless network. The total length defines the total 
length of the datagram including the header and user data parts. The 
imum length is 65 536 bytes. 
As we shall explain in Section 9.5.4, user messages may be transferred 
the internet in multiple datagrams, wilh the identification field being used to 
a destination host to relate different datagrams to the same user message. 
The next three bits are known as Oag bits of which two are currently 
The first, known as the don't fragment or D bit, is again intended for 
intermediate gateways. A set D bit indicates that a network should be chosen 
can handle the datagram as a single entity rather than as multiple 
datagrams - known as fragments. Hence if the destination host is connected 
that network (or subnct) it will receive the user data in a single datagram or not 
all. The transit delay of the user data can therefore be more accurately an,.ntifWI 
The second flag bit, known as the more fragments or M bit, is 
during the reassembly procedure associated with user data transfers 
multiple datagrams. The fragment offset is also used by the same procedur .. 
indicate the position of the (data) contents of the datagram in relation to the 
user data message. We shall describe the reassembly procedure in Section 9.5.4. 
The time-to-lire value defines the maximum time for which a datagram 
be in transit across the internet. The value, in seconds, is set by the source IP. lt 
then decremented by each gateway by a defined amount. Should the value 
zero, the datagram is discarded. This procedure allows the destination IP to 
known maximum time for a datagram fragment during the reassembly orooedunL'i 
It also enables dal3grams that are looping to be discarded. 
More than one protocol is associated with the TCP/lP suite. The 
field is used to enable the destination lP to pass the datagram to the 
protocol. 
The header checksum, which applies just to the header part oft he oata~ 
is a safeguard against corrupted data grams being routed to incorrect destinatioot,i 
It is computed by treating each 16-bit field a~ an integer and adding lhem 
together using l's-complement (end-around-carry) arithmetic. The checksum 
then the l's-complement (inverse) of the sum. 
The source address and destination address arc the internetwide IP 
addresses of the source and destination hosts. 
Finally, the options field is used in selected datagrams to carry 
information relating to the following: 
• Security The ·data field may be encrypted for example, or 
accessible only to a specified user group. 
• Source routing If known, the actual route to be followed through 
int~met may be specified in this field as a list of gateway addresses. 
9.5.3 
9.5.4 
·~ ,,.,. -~;cr--- IJ/1' n'f'o."'O'"' ,~h-41/':41-.-;-; . ..... ,. - ., ... ,,,...~, •''~"''•JI"'" • 
p..us..gc or a dtJtagrurn through the internet to r'r:'c:ord its address~ Tlu: 
resulling list can be used, for example, in the source routing field of 
subsequent datagrams. 
• Stream identification This enables a source to indicate: the type of data 
being carried in the datagram if this is not computer data, for example, 
samples of speech. ' 
• Timestamp If present, this is used by each gateway along the path followed 
by the datagram to record the time it processed the datagram. 
Protocol functions 
The IP pro\ ides a number of core functions and associated procedures to carry 
out the various harmonizing functions that are necessary when interworking 
across dissimilar networks. These include the following: 
• Fragmentation and reassembly This concerns the transfer of user messages 
across networksfsubnets which support smaller packet sizes than the user 
data. 
• Routing To perform the routing function, the IP in each source host must 
know the location of the internet gateway or local router that is attached to 
the same network or subnet. Also, the IP in each gateway must know the 
route to be followed to reach other networks or subnets. 
• Error reporting When routing or reassembling datagrams within a host or 
gateway, the IP may discard some: dntagrams. This function is concerned 
with reporting such occurrences back to the IP in the source host and with a 
number of other reporting functions. 
We shall discuss each of these functions separately. 
Fragmentation and reassembly 
The size of the user data - normally referred to as an NSDU associated with an 
NS_user request can be up to 64K or 65 536 octets (bytes). The maximum packet 
sizes associated with different types of network are. much less than this, ranging 
from 128 octets for some X.2S packet switching networks to over 8000 octets for 
some LANs. The fragmentation and reassembly functions associated with the IP 
fragment the NSDU associated with an NS_uscr request into smaller fragments 
segments in ISO terminology- so that they can be transferred across a particular 
network in appropriately sized datagrams. On receipt of t~e fragments of data 
relating to the same NSDU contained in each IP datagram, the IP reassembles the 
NSDU before passing it on to the destination NS_user. 
One of two approaches may be adopted since the maximum packet size rna) 
vary Jrom one network to another. Either the fragmentation and reassembly 
functions can be performed on a per network basis- intranet fragmentation cr Un
iv
rsi
ty 
of 
Ma
l y
Figure9.10 
Fragmentation 
alternatives: 
(a) Intranet; 
(b) internet. 
(• ) 
(') 
~ 
t g 
g 
~ 
' 
' t g 
g 
. 
,oq~un ~nq:,~~('7'nY,,.._.,., ~7· 6:i..-~ _,,No,._.;-/hll,...,.,.,.nt~, 
approaches are shown in Joii;urr: 9./0(a) and (b), r<:$peclivr:Jy. 
In genera/, the IP in a host knows only the ma.rimum packet size 
with its local network. Similarly, the fP in each gateway knows only the 
packet sizes associated with the two networks to which it is connected. 
intranet segmentation, the JP in the souroe host first fragments the 
data- the NSDU- into a number of individually addressed data grams as 
by the network to which it is attached. It initiates the sending of these either to 
: ................ !Cfi:.::.;.:· .. 
GW• Ca~ ..... y 
MTU • Mc:~age cran"rcr unic 
NSOU• l'ln,.o<'< ,.,_.;..,.Uta unh 
tnom>rt 
He-ad<r ~ Subnetln<t~<orL haJ<r } 
Dau-u MTU 
' ... .:-+- SubneVoetv.'Orltr.aiJ(':r 
____ ,,_,. __________ ,_ ~.,........~~........,~ 
obt."'in$ I he NPA adJrc:... .. or the host or Gillcw.-Jy. 
We sball discuss the way in which it obtains the NPA address in Section 
9.5.5. On receipt of each datagram, the IP in the host or gateway reassembles the 
NSDU. Next it refragmcnts the reassembled NSDU into a possibly different set of 
individually addressed datagrams as dictated by the maximum packet size of the 
second network. 
This procedure is repeated by each gateway until the datagram reaches the 
IP in the deSLination host, where the NSDU is again reassembled and passed to 
the destination NS_user. 
With internet fragmentation, the IP in the source ho~t carries out the same 
fragmentation procedure as before and sends the resulting cJatagrams to the IP in 
the first gateway. However, this time the IP does not r(4ssemble the NSDt;. 
Instead it either modifies the appropriate fields and sends the received datagrnms 
directly onto the second network (if the latter can support this ~ize of datagram). 
or refragments the datagram into smaller fragments (datagrams). In Figure 9.10, 
we assume that the maJtimum packet size associated with the second network 
subnet is smaller than that used by the first. Consequent!), the IP will segment 
each datagram it receives into a number of smalll!r datagrams, each v.ith the same 
source and destination addresses. 
This procedure is repeated at the next gateway. However, since in Figure 
9.10 the last network/subnet can support a larger packet siu than the data grams it 
receives, the received datagrams are transmitted directly with only selected 
modifications to some header fields. As before, the IP 10 the destination ho>t 
reassembles the user data from each datagram it receives and passes the resulting 
NSDU to the destination NS_user. 
We can deduce, especially from the packet flows associated with the third 
network in Figure 9.10, that mtranet fragmentation allows the maximum packet 
size of each network to be used, since the individual fragments are rea.ssembled b) 
each gateway in the route. With internet fragmentation thi~ is not necessarily the 
case, but it has the advantage that the reassembly processing is not needed at each 
gateway. 
The IP does in fact use internet fragmentation. This may at first appear 
surprising but it is used because of the problem of lost datagrams. Some networks 
operate v.ith a best-try connectionless protocol with the pos~ibility that one or 
more datagrams relating to a single NSDU may be corrupted while being 
transmitted. As v.e have seen, with intranet fragmentation the receiving lP in 
each gateway reassembles the complete NSDU before relaying it to the ne~t 
network. If any fragments arc missing (for example, a datagram is discarded 
because it has been corrupted), the rcce1ving IP must decide at what point to a bon 
the rea~sembly function . . 
To determine this the IP in the source host defines a maximum time limit 
that a gateway may wait for any datagrams relating to ao NSQU during each 
reassembly operation. Known as the time-to-live, this limit i; carried in the header 
of all datagrams relilling to the NSbU. It is set by the IP in the source host and 
is then decremented by each IP that processes the datagram. If a datagram is Un
ive
rsi
ty 
of 
Ma
lay
a
Example 9.1 
........... 
7rBIJl'.IDCA.ttiiiJ;77ic·currr:iJr ... Jui!'HR>p~7nru Fh~ n._....,~,-,.~~,.~ 
reaches zero at any point during the I'C8Ssembly processing in 8 g:ucwuy (orb, 
the reassembly function is aborted and all fragments relating to that NSDU 
discarded. 
The time-to-live field in each datagram is in multiples of I second, so 
amount it is decremented by each IP varies depending on the (known) 
transit delay of the associated network. In the case of internet fragmemauonl 
the IP in each gateway still decrements the time-to-live lifetime field in 
datagram it receives and discards any datagrams for which the value 
zero. The IP in the destination host aborts its reassembly operation in 
way. In both cases, if fragments are missing and the reassembly operauon 
aborted, a time exceeded error message is generated and returned to the IP in 
source host. 
An NSDU of 1000 octets is to be transmitted over a network which sup1>0f11 
maximum NS_user data size of 256 octets. Assuming the header in each 
datagram requires 20 octets, derive the number of datagrams 
required and the contents of the following fields in each datagram header: 
• Identification 
• T otallength 
• Fragment offset 
• More fragments flag 
Maximum usable data per datagram= 256-20 = 236 octets 
Use, say, 29 x 8 = 232 octets 
Hence five datagrams are required, four with 232 octets of user data and one 
72octets 
The fields are as follows: 
Identification 20 (say) 20 20 20 20 
Total length 252 252 252 252 92 
Fragment offset 0 29 58 87 116 
More fragments flag 1 1 1 1 0 
a •••••••••••••••••••••• I ••• , •••••••••••••••• 
~~ 
9.5.5 Routing 
As each network (or subnet) in an internet may use different types of point 
attachment addresses, a system host or gateway attached to a network' 
send a datagram directly to another system only if it is attached to the 
network. To route datagrams across multiple networks, the fP in each 
work gateway must know either the point of attachment address of the desbnaUQ! 
9.1 1 
routing 
within 8 host. 
satcu-~,>~'u..luull rhr- roul#' o Ui~ l'rtiulrr,Y Ue•Tiiiiiti7a.u J'.u·FWO.rllf';""TT-r~ ,..n .... ~~rt; ,. 
next g:Jteway must be IJttached to 8 networJ. ro "·hic/1 tht: gateway is attached 
The major problem with routing is how I he hosrs and gareways within the in remer 
obtain and maintain their routing information. 
Two basic approaches are used for routing within an internet: centralized 
and distributed. With a centralized routing scheme, the routing information 
associated with each gateway is downloaded from a central site using the network 
and special network management messages. The network management system 
endeavors to maintain their contents up to date as networks and hosts are added 
or removed and fauhs are diagnosed and repaired. In general, for all but the 
smallest internets, this is a viable solution only as long as each indhidual network 
has its own nerwork management system which incorporat~ sophisticated 
configuration and fault management procedures. 
With a dislributed rouling scheme, all the hosts and gateways cooperate in 
a distributed way to ensure that the routing informataon held by ea~h system -
hosts and gateways - is up to date and consistent. Routing information is 
retained by each system in the form of a routing table which contains the 
NPA address to be used to forward each datagram The Internet uses such a 
scheme. 
The routing procedure associated v.ith the IP first reads the destination IP 
(NSAP) address from within a datagram and then uses this to lind the 
corresponding point of attachment address - of a host or a gateway - from 
the routing table. In addition, a set of routing protocols is used to create and 
maintain rhe contents of each routing table in a distributed way. The general 
scheme used within a hostiP is shown in Figure 9.11. 
~ .. 
SNICP = Subnet iM<p<ndcnl "'"''f1lt'I'CC pro1o.:(>J 
SNOCP = Sul>net ~~ «1<1\fti<GO< rrotocol 
SNDAP = Sutmcl dq><n<knl """"' prolo.:OI 
1-J>A = (S.,'>)nct r<>int o( att.l<hmn>t (oddras) 
S~ICI'-IP 
subla)<r 
SSOCP 
IUbla)<t 
Un
ive
rsi
ty 
of 
Ma
lay
a
Figure9.12 
General intemet 
architecture and 
terminology. 
...... UIQFICJ,,IUU~ ....... )"'.:1110",,,..... 
Before we discuss the various routing protocols relating to the Internet, let us 
at its architecture and the associated tenninology. To reflect the fact that 
Internet is made up of a number of separately managed and run int ......... .:: 
each internet is treated as an autonomous system with its own internal 
algorithms and management authority. T he combined Internet is considered 
core backbone network to which a number of autonomous systems are 
The general architecture is shown in Figure 9. 12 together with some (very 
simplified) autonomous system topologies. 
To discriminate between the gateways used within an autonomous 
and those used to connect an autonomous system to the core network, we 
tenns interior gateway and exterior gateway, respectively. The corresPQndiri~ 
routing protocols arc the Interior gateway protocol (IGP) and the 
protocol (EGP). Since the Internet consists of an interconnected set of intc:~ 
each of which has evolved over a relatively long period of time, each autonomoll&l 
system has its own IGP. However, the Internet EGP is, as indeed it must be, 
internetwide standard. 
,. ............ .. .. .... ............... .. ... ... ..... -............................. -.................................. .. .... .. : 
• Internet ·-~---······-··-······· - ···-·~---····1 i 
Subn<t _ _ .;__-< 
NUI(Iif 
Mt .. ~ 
S)'Sl<IIIS 
. . 
. . 
-·· ........ ....... ..... .... .. ..... .. .. .... .. .... .............. --- ........... --- .... -.---. -·-.. --- .......... ] 
-Q- • F ·c:t(f\Qr plc-41). 
- lnlmor .. " .. ,>. 
--o-- .. SYbncl t'OU\tr 
0. = :-: .... "'. (n<tid) 
0 ., Subnct,.ork (sub<lct>l) 
number or nctworks/subncts~ ln a motr IJcncr.:rl •pplic.o~tlan nn .uutonontou.s 
system might consist of just one network managed and rutl by a single 
corporation. Others might consist of a set of subncts connected to a site-wide 
backbone network with a single exterior gateway. An example is a site with 
multiple LANs interconnected by routers. To simplify the discussion, we shaU 
consider only autonomous systems that consist of multiple networks since the 
presence of subnets just adds another le'el of routing between an interior gateway 
and the hosts. 
If every gateway and host system in an internet contained a separate entry in 
its routing table for all other systems, the size of the routing tables and the amount 
of processing and transmission capacity needed to maintain the tables would be 
excessive and, for the Internet, unmanageable. Instead. the total routing informa-
tion is organized hierarchically as follov.-s: 
• Hosts maintain sufficient routing information to forward data grams to other 
hosts or an interior gateway(s) that is (are) attached to the same network. 
• Interior gateways maintain sufficient routing information to forward 
datagrams to hosts or other interior gateways ~ithin the same autonomous 
system. 
• Exterior gateways maintain suffiaent routing infonnation to forward 
datagrams either to an interior gateway, if the datagram is for the same 
autonomous system, or to another e~terior gateway, if it is not. 
A number of routing protocols have been developed to implement this 
scheme. These include an intranet protocol known as the address resolution 
protocol (ARP), a number of interior gateway protocols (IGPs). and ttn exterior 
gateway protocol (EGP). The scope of each protocol and the associated routing 
tables are shown in Figure 9.13. We shall discuss each separately. 
Address resolution protocol 
To enable an interior gateway to forward any daiagrams it receives for hosts that 
are attached to one of its local networks. it must keep a record of the hostid and 
corresponding NPA address- known as an address pair for all the hosts attached 
to each of these networks. To obtain this infonnation, each host simply informs 
the local gateway ofits existence by sending it its IPtNPA address pair. Typically. 
this is stored at the host in permanent storage (such as the hard disk) and is then 
broadcast. With nonbroadcast networks, the address pair of its local gateway(s) is 
(arc) also stored and used directly. As a result, each interior gateway builds up a 
local routing table with the IPjNPA address pairs of all host~ that are attached to 
each of the networks to which the interior gateway is itself attached. 
When a host wishes to send a datagram to another host on the same 
network, the lP simply sends the datagram to its local gateway for forwarding. 
AhhQugh this must be done for datagrams addressed to hosts on other networks. 
for hosts auachcd to the same network it can lead to excessively high o\erheads. Un
ive
rsi
ty 
of 
Ma
l y
H""' 
IP 
Host ,,· loc31 RT HO>ticl NPA 
~) HI NP:' I 
H'z NPAZ 
--
t~oth<t , 10 Xk>cal RT Hosud l'<tid. NPA 
nc'work.s • HI Nctid 1: NPA I 1 "~~in thl< 
autonomouJ H'z Netid 1: NPA 2 s)Stc:m 
10 X remote RT N<ti<l Ditton«. OW 
OnAme J 
network l ......... 
Netid I 0. I~ X 
(c) NetidZ N,l' 
toothtt EO X lo<al RT 
aulOO()d"W)u.l autonomovs noarn< { 
5Y'ICIII ..--1 1"-$)"'<IN 
RT 
H 
Figure 9.13 ~g 
Routing protocols: 
(a) general 
architecture; (b) ARP/ 
IGP scope and routing 
tables; (c) IGP/EGP 
scope and routing 
protocols. 
EO KmnoteRT 
N«fl 
Rouliac l.blc Netid Z 
Host 
Ettmor"""'-r 
lntmorpt.,.•Y 
especially if a large number of hosts are attached to the network. To 
this, the IP in each host endeavors to obtain the hostidfNPA address pair or 
hosts on the same network with which it communicates. This enables a host 
send a datagram to hosts on the same network direc1ly without involving 
gateway. 
(ARP). ARP l"oiTn:; an irllC'IJrii/ pltrt ol"tltc .IF' 11~ ~u. .. ~J, 1Jo4t."::lu:rc i~rARP in 
each interior gateway, as shown in Figure 9. / J(a). 
Whenever the fragmentation procedure associated with the IP creates a 
datagram for forwarding, it first passes the address pointer of the memory buffer 
in which the datagram is stored to the ARP. The ARP maintains a local routing 
table which contains the hostidfNPA address pairs of all the hosts connected to 
this network with which the host communicates. If the destination IP address in 
the datagram is present in the table, then the ARP simply passes the datagram 
address pointer with the corresponding NPA addre$$ to the SNDAP protocol, 
'With the netid field of the IP address set to zero to indicate this network. The 
SNDAP then initiates the sending of the datagram either by broadcast or directly. 
If the NPA address is not present, the ARP endeavors to find it by creating 
and sending an ARP request message and waiting for a reply. The request message 
contains both its own IP/NPA address pair and the requ1red (targt>t) IP address 
Again, this can either be broadcast- in which case it is received by the ARP in all 
hosts- or sent directly to the ARP in the gateway using the gateway's (known) 
NPA address. In the second case, the ARP in the gateway simply relays the 
message to the required host using its own local routing table and the required 
destination IP address in the request message. 
The ARP in the required destination host recognizes its own IP address in 
the request message and proceeds to process it. It first checks to sec whether the 
source hostidfNPA address pair is within its own routing table; if not, it enters 
them. It then responds by returning an ARP reply message containing its own 
NPA address to the ARPin the requesting host, using the latter's NPA address 
from the request message. On receipt of the reply message, the ARPin the source 
host first makes an entry of the requested hostidfNPA pair in its own routing table 
and then passes the waiting datagram address pointer to the SNDAP protocol 
together with the corresponding NPA address which indicates where it should be 
sent. The hostid/NPA pair is recorded by the destination since it is highly probable 
that the destination host will require it later when the higher-layer protocol 
responds to the datagram. 
As we indicated earlier, the IP/NPA address pair of a host is normally held 
in permanent storage and read by the computer operating system at stan up. With 
diskless hosts, this is not possible, so an associated protocol known as the re\"erse 
address resolution protocol (RARP) is used. The server associated with each set of 
diskless hosts has a copy of the IPtNPA address pair of all the hosts it serves. 
Wben a diskless host first comes into senicc, it broadcasts an RARP request 
message to the server containing its own physical hardware network address, that 
is, its NPA. On receipt of such messages, the RARP in the sener responds 'With a 
reply message containing both the IP address of the requester and its own IP{NPA 
address pair. In practice, the formal of the request and reply messages associated 
Y.ith ARP and RARP are the same, as shown in Figure 9.14 
The operation field specifies the particular message type: ARP request/reply, 
RARP request/reply. When making an ARP request, the sender writes its own 
hardware address (HA) and IP address in the appropriate field) together with the Un
ive
rsi
ty 
of 
Ma
l y
a
Figure 9 .14 
ARPand RARP 
message formats. 
I UAn:Jw-.~<r IJ~ -~-
l'ro<Dcollypc 
HLEN I PLE.'I 
Opcnuion 
-Scnd<r hv<N"" od.Jrns 
-
Sender IP ad.Jrcit -
-Tars<~ h.ardl<an: addrns 
Tall<' IP lkl.lrns 
II LEN • Hardl<~n: oddm.slcngth 
PLEN • I P add1<$$lcnJ!h 
Opcralion =I ARP ~<quat 
c 2 ARP mpo<u< 
=3RARP~<q­
=4RARPmporu< 
destination IP address in the target IP field. In the case of a RARP, the sender 
simply includes its own HA address. To ensure that the HA address is interpreted 
correctly, the ha rdware type field identifies the type of LAN, for example, CSMA/ 
CD is I. Tite protocol type field indicates the type of protocol being used: ARP, 
RARP, and others to be defined in subsequent sections. 
Interior gateway protocol 
As we indicated earlier, the interior gateway routing protocol can vary from one 
autonomous system to another. The most widely used protocol is the IP routlna 
information protocol (RIP). It is a distributed routing protocol which is based on a 
technique known as the distance \ ector algorithm (DVA). A more recently 
introduced protocol is based on two algorithms known as the link stale (LS) and 
shortest-path-first algorithms (SPF). The link state open sbortest-patb·first (link 
state OSPF) protocol bas been adopted as the international standard for use with 
the ISO CLNP. Since the OVA is specific to the TCP/ IP we shall discuss it here. 
We shall discuss the link state OSPF in Section 9.8.3 in the context of the ISO 
CLNP. 
The term distance is used as a routing metric between two gateways. For 
example, if the metric is hops, then thi~ is the number of intermediate networb 
between two gateway$. If the metric is delay, then this is the mean transit 
delay between the two gateways, and so on. Whichever metric is used, the OVA 
uses a distributed algorithm to enable each interior gateway in an autonomous 
system to build up a table containing the distance between itself and all the other 
networks in that system. 
. Initially, each gateway knows ol\)y the netid of each network to which it is 
attached as well as the IPfNPA address pair of each gateway attached to these 
networks. Typically this information is entered by management when the gateway, 
tile • uJ•4.""YIK,? _..,~•·t",n· •uc- ll•"··"•.r~ ~ U'- •- •u••• ..,, ....... '"'"''""''~ .,,_., • ....,.,..., ......... ~ 
associated with an interior gateway was shown in Figure 9.13(b). If !he metric is 
hops, the remote routing table contains simply the netid of each of its local 
networks, a distance of zero, and its own IP address as the gateway from which the 
distance applies. Similarly, if the metric is delay, this is determined by a gateway 
which sends a message (datagram) to each of the gateways attached to its ov.n 
networks and measures the time delay before it receives the responses. The 
distance is then set to, say, half of these values. 
Periodically, each gateway sends the current contents of its (remote) routing 
table to each of its neighbors. Based on the contents of its neighbors' tables which 
it receives, it updates, or adds to, the contents of its own routing table. The 
receiving gateway simply adds the l nown distances to each of its immediate 
neighbors to the distances contained in the recei\'ed tables. Since this procedure 
repeats, after each iteration the routing table starts to build up as new distances 
are reported. If a reported distance to a network is less than a current entry, the 
entry is updated. After a number of iterations, each gateway has an entry for each 
of the networks in the autonomous system. The time taken to achie\'e this i$ a 
function of the size of the system and the frequency with which routing 
information is exchanged. The ume for routing information to propagate 
throughout the system is known as the route propagation delay. 
For example, consider the simple network shown in Figure 9.15(a) and 
assume the metric is hops. The way in which the routing tables build up is shown 
in part (b). The initial contents of each gateway simply contain the netid of its 
local networks. For this network, the contents of each routing table are complete 
after just two exchanges of routing tables. The final routing table for each gat~'way 
contains the distances to each network in the system and Lbe immediate neighbor 
gateway to be used to reach iL Thus from gateway I, the distance to netid 6 is two 
hops - that is, two intermediate networks- '1-ia gateway 2. At gateway 2, netid 6 is 
a distance of one hop via gateway 3 to which netid 6 is attached. 
We can readtl)' deduce that a metric of hops can lead to the selection of 
inferior routes. For example, if the delay metric associated with each network is 
the same as its netid, it would be quicker to go from gateway 4 to netid 6 \ia 
gateways I, 2, and 3 in three hops rather than 5 and 6 v.itb two hops. The delay 
metric often gives a better performance. A protocol that uses delay is HELLO . As 
its name implies, the delay is determined by periodically sending bello messages to 
each of its neighbors and timing their responses. 
To ensure that table entries reflect the current topology of the network when 
faults develop, each entry bas an associated timer. It the entry is not confirmed 
v.ithin a defined time, then it is timed-out. This means that each gateway transmits 
its complete routing table at regular intervals, typtcally 3Q seconds. For a small 
network this is not necessarily a problem, but for large networks the overheads 
associated with the distance vector algorithm can be very high. Also, gateways . 
rna) have dissimilar routes to the same destination since entrie:; are made in the 
ord~r 10 Y..hich they are received and equal distance routes arc dascard.:d. As a 
result, datagrams between certain routes may loop rather than going directly to Un
ive
rsi
ty 
of 
Ma
lay
ebffc~nr S.}'Y"tcrns rnu.,·t ./in-tiiiJrt!'C to ~~chan&e such inrormation. Til is ls cne ro~ ~ 
the Mlg/Jbor acquisition and termination proet:dun:. ~Vhen two gateways 
sucb an exchange, they are said to have become neighbors. When 11 
wants to exchange routing information, it sends an acquisition request 
the EGP in the appropriate gateway which then returns either an acqutStUo. 
confinn message or, if it does not want to accept the request, an acquisition 
message which includes a reason code. 
Once a neighbor relationship has been established between two gateways 
and hence autonomous systems- they periodically confirm their relationship. 
is done either by exchanging specific messages - hello and 1-heard-you - or 
embedding confirmation information into the header of normal routing inrn-···~ 
tion messages. 
The actual exchange of routing information is carried out by one of 
gateways, which sends a poD request message to the other gateway asking it 
the list of networks (netids) that are reachable via that gateway and thei r distancea 
from it. The response is a routing update message which contains the requested 
information. Finally, if any request message is incorrect, an error message 
returned as a response "'itb an appropriate reason code. 
As with the other IP protocols, all the messages (PO Us) associated with 
EGP are carried in the user data field of an IP datagram. All EGP messages 
the same fixed header; the format is shown in Figure 9.16. 
The ~ersion field defines the version number of the EGP. The t) pe and 
fields collectively define the type of message while the status field 
message-dependent statu~ information. The checksum, which is used as a 
guard against the processing of erroneous messages, is the same as that used 
IP. The autonomous system number is the assigned number of the autonomouo~ 
system to which the sending gateway is attached; the sequence number is used 
synchronize responses to their corresponding request message. 
Neighbor reachability messages contain only 3 header with a type field of 
a code of 0 = hello, and 3 I = I-hcard-you. 
Neighbor acquisition messages have a type field of 3; the code 
defines the specific message type. The heUo inten al specifies the frequency 
which bello messages should be sent; the poll inten al performs the same f1 
for poll messages. 
A poll message has a type field of 2. The code field is used to piggyback ' 
neighbor reachability information: a code ofO = hello and a code of I = I-hea 
you. The source oe~ork IP address in both the poll and the routing 
response messages indicates the network linking the two exterior gateways. 
allows the core network itself to consist of multiple networks. 
The routing update message contains the list of networks (net ids) that 
reachable via each gateway within the autonomous system arranged in 
order from the responding exterior gateway. As indicated, th1s enables 
requesting gateway to select the best exterior gateway through which to send 
datagram for forwarding within an autonomous system. Notice that to 
space, each netid address is sent in three bytes (24 bits) only with the most 
significant 8-bit hostid field missing. The latter is redundant for all class types. 
Figure9.16 
EGP message formats. 
9.5.6 
11··~-# ,., ,. 
I \........, I 7)'p< TYJ'C' • $ • Nt,Jhb;:lt-~bi/,1 
CO<k sw .. COfkzO•lltllo 
rutdhu.kr Cl>cd:sWn 
; I " 1·11<an1·)04' 
.Wton<>"""'"I)'Jlcm numbtf 
S<qU<DCt oumbtf 
--------
>< 
---------r Hdlo int<f'~l --. ••c. I !!EC. 1'/~tu.cd II 
I 1'1>Uintcn11l • Sou..., 0<1-'< IP addms 1- Sowoc: n<t.,otk IP oddr= ~ Type • ) • Neighbor 1Cqutiil10n 
Code • 0" Acqui$iuon rcquctt o •• .,...,.l tP•dd•"-• Typc~l • l'l>ll 
I e Ac:quisitloo confirm r II OtiUncc> Code a 0 = llell<> 
2s AcqWitioo rt:(wc o..t•DC'C D. I• Nctodut D = I a l·hc•N·)O" ) =Caxr<q..nt 1'-eu4 I •Qslono< o .. 4 • Cease conlim~ I 
Ne1id l 11 d1~l>ltk.'"'e Du Li!t o( n<t,.orh (1'-ET>) 
l<'•cb>ble from g•tt"~Y I 
Out•ncr D , * Nct.Ual D., in dAun« ot\Jt:r 
l'-eto4 1•4lsUn« D,, 
I 
Ncti4 l .s.!dtAnce 0 , 
Gat.,.•~ ' IP addms 
I II Dl>UD«S 
Ouun<e D, '" N«ubal D. 
l';ctod I 11 Jssunoe 0 ,. 
I 
NctiJ 2 ll dutan« 0..,1 List o( net""'~s (NETs) 
r:ocho!>k from c-tt.,.~~ N 
~no< 0, I" Nct..:S at D, in d.:sunce ord(f 
N<tiJ l II cb>iance D 
I 
I 
!';ct•4 2a:dist~ D •. 
T)t>e S I 
Code • 0 " R."Gtio& update 
Internet control message protocol 
The internet control message protocol (lCMP) fo~s an integral part of all IP 
implementations. It 1s used by both hosts and gateways for a variety of function~. 
and especially by network management. The main functions associated with the 
ICMP are as follow~: 
• Error reporting 
• - Reachability testing 
• Congestion control 
Un
iv
rsi
ty 
of 
Ma
lay
