M-VIA on the PowerPC architecture by Hiedajat, Nicky Sagitta
Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 
1-1-2003 
M-VIA on the PowerPC architecture 
Nicky Sagitta Hiedajat 
Iowa State University 
Follow this and additional works at: https://lib.dr.iastate.edu/rtd 
Recommended Citation 
Hiedajat, Nicky Sagitta, "M-VIA on the PowerPC architecture" (2003). Retrospective Theses and 
Dissertations. 19990. 
https://lib.dr.iastate.edu/rtd/19990 
This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and 
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses 
and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, 
please contact digirep@iastate.edu. 
M-VIA on the PowerPC architecture 
by 
Nicky Sagitta Hiedaj at 
A thesis is submitted to the graduate faculty 
in partial fulfillment of the requirement of the degree of 
MASTER OF SCIENCE 
Major: Computer Engineering 
Program of Study Committee: 
Brett Bode, Co-major Professor 
Srinivas Aluru, Co-major Professor 
Ricky A. Kendall 
IOWA: State Unlverslty 
Ames, Iowa 
2003 
11 
Graduate College 
Iowa State University 
This is to certify that master's thesis of 
Nicky Sagitta Hiedaj at 
has met the thesis requirements of Iowa State [Jniversity 
Signatures have been redacted for privacy 
TABLE OF CONTENTS 
LIST OF TABLES w 
LIST OF FIGURES vi 
ACKNOWLEDGEMENTS vii 
ABSTRACT viii 
CHAPTER 1. INTRODUCTION 1 
CHAPTER 2. OVERVIEW ON VIA 6 
2.1 Netwark architecture 6 
22 VIA model 7 
2.3 VIA data. flow 8 
2.4 Memory management 12 
2.5 M-VIA (Modular Virtual Interface Architecture) 12 
2.6 VIA performance on Intel and Alpha 13 
CHAPTER 3. IMPLEMENTATION OF M-VIA ON POWERPC 16 
3.1 Endian system overview 16 
3.2 Endian Synchronization 17 
3.3 Fast Trap 19 
CHAPTER 4. IMPLEMENTATION OF SUNGEM DEVICE DRIVER FOR M-VIA 20 
4.1 M-VIA network device driver module system overview 20 
4.2 Initialization and registration. 22 
4.3 Ring buffer management 23 
iv 
4.3.1 Ring buffer design 23 
4.3.2 Ring buffer cleanup 25 
4.4 Packet Type Management 26 
4.4.1 Packet type management in the M-VIA Device driver 26 
4.4.2 Packet type management in M-VIA Kernel Agent 27 
4.4.3 Packet type management in M-VIA User Agent 28 
4.5 Send 28 
4.~ Receive 29 
4.7 Deregistration 29 
CHAPTER 5. TEST METHODOLOGY 30 
5.1 Test Platform 3 0 
5.2 Vnettest 3 0 
5.3 MP Lite 30 
S .4 NetPIPE 31 
CHAPTER 6. RESULTS 32 
CHAPTER 7. DISCUSSION 37 
BIBLIOGRAPHY 39 
v 
LIST OF TABLES 
Table 2.1 Reliability Level ..  11 
v~ 
LIST OF FIGURES 
Figure 2.1 Contemporary network architecture    6 
Figure 2.2 VIA network architecture   7 
Figure 2.3 VIA model  ... 8 
Figure 2.4 VIA endpoint connection process   10 
Figure 2.5 VIA and TCP/IP performance on Alpha with SysKonnect gigabit 
controller   13 
Figure 2.6 VIA and TCP/IP performance on Intel with Intel Pro 100 controller   14 
Figure 2.7 VIA and TCP/IP performance on Intel with SysKonnect gigabit controller 15 
Figure 3.1 Endian system overview  .. 16 
Figure 3.2 An example of unsynchronized endian order  .. 17 
Figure 4.1 M-VIA device driver module system overview  ..  21 
Figure 4.2 Visual descriptions of TX and ~:X ring buffer  24 
Figure 6.1 Bandwidth graphs of VIA and TCP-IP on Intel Pro 100   32 
Figure 6.2 Bandwidth graphs of VIA and TCP-IP on SysKonnect  3 3 
Figure 6.3 Bandwidth graphs of VIA and TCP-IP on Sun GEM Fast Ethernet 
controller   34 
Figure 6.4 Bandwidth graphs of VIA and TCP-IP on Sun GEM Gigabit Ethernet 
controller   3 5 
V11 
ACKNOWLEDGEMENTS 
I am grateful to Dr. Brett Bode, Dr. Dave Turner and Troy Benj egerdes for their 
support and understanding when I wrote this thesis. Their support and encouragement helped 
make this thesis possible. A special thanks to Dr. Brett Bode for his guidance from the 
beginning until the end of my research. Not only did he guide me, he also helped me a great 
deal in writing and correcting grammar errors in this thesis. My committee member, Dr. 
Srinivas Aluru, also offered me help and advice. Mike Welcome and Paul Hargrove at 
NERSC answered email questions of mine, sometimes lots of questions, which improved my 
understanding about M-VIA and MVICH. Thanks to Al Cox and my girl friend, Aini 
Christina Utomo for correcting grammar errors and praying for this research 
accomplishment. 
V111 
ABSTRACT 
One of the basic principles of cluster communication is to have the smallest cost in 
the time consumed on delivering messages between nodes. The Virtual Interface Architecture 
(VIA) is a communication protocol. for system area networks (SAN) that bypasses much of 
the overhead of traditional network protocol stacks and provides more direct access to the 
network interface controller (NIC). The aim of our research was to investigate if VIA would 
perform well on PowerPC processors which have a different architecture than the previous 
processors used with VIA. For that reason, VIA was implemented on PowerPC and new 
driver support was added. The results indicate that VIA performs better than TCP/IP on large 
message sizes but not on small message sizes. 
1 
CHAPTER 1. INTRODUCTION 
If only one worker builds a house, it might take several years to finish the house, but 
if more than one worker participates in building the house, it would be done in several 
months. The same concept is used in parallel computation. Many processors, connected to 
each other, either inside one computer or in different computers, running the same 
computation. 
There are two different types of parallel computation: SIMD (Single Instruction 
Multiple Data) and MIMD (Multiple Instruction Multiple Data). SIMD systems engage the 
same operations on different data at the same time. MIMD systems can perform different 
operations on different data. concurrently. 
There are two types of MIMD systems: shared memory and distributed memory. In 
shared memory MIMD, the same memory bus is shared among a collection of processors and 
memory modules. In distributed memory MIMD systems each processor handles its own 
memory. Several computers or servers connected to each other and having their own 
processors and memory in each machine constitutes a distributed memory MIMD. MIMD 
systems have many different interconnection topologies (i.e. trees, flat network, 2D or 3D 
meshes topology). Some MIMD systems are made from a group of PCs or workstations. 
These kinds of MIMD systems are called clusters. 
The performance of a cluster or parallel network is highly dependent on the 
underlying network architecture [ 1 ]. The architecture of the traditional protocol stack is more 
costly and inefficient. Support for sockets and synchronization overhead (sleep/wakeup) are 
2 
included in operating system (OS) overhead. Allocating and freeing buffer descriptors in the 
network software subsystem also brings more overheads in TCP/IP. The drawback of this 
organization is that all communication operations require a call or trap into the OS kernel, 
which may be expensive to execute. The de-multiplexing process and reliability protocols 
give more expensive computations and are part of the overhead in communicator. 
Considering the importance of faster cluster computing, the VIA (Virtual Interface 
Architecture) protocol was initiated to improve the speed of packet transmission in cluster 
interconnection communication. VIA, jointly defined by Intel, Compaq and Microsoft, was 
designed to work faster than TCP/IP [2]. VIA was designed to improve the throughput in 
high performance network systems by substantially reducing the system software processing 
required to exchange messages as compared with the traditional network interface 
architecture [ 1 ]. VIA eliminates OS overhead by providing a protected and directly 
accessible process interface to the network hardware, which called a virtual interface (VI). 
VIA is designed to be relatively simple and easy to implement in hardware [2]. Such 
hardware (ex: GigaNet clan [3]) directly reads data structures in user memory and moves 
data to and from user memory, requiring no intervention from the host processor in normal 
operation except for the connection setup. However, this requires custom hardware that is 
expensive and not widely available. 
M-VIA, a software implementation of VIA, consists of a loadable kernel module and 
a user level library, and requires a modified device driver. M-VIA is designed to be easily 
portable to new network devices. M-VIA does not require hardware support for VIA. 
Although VIA was designed to be easy to accelerate with hardware, the user interface does 
not require hardware support [2]. 
3 
ParMa, which tried to port LAM/MPI to use M-VIA as the underlying data transport 
using Intel Pro/100 Fast Ethernet cards, Fast Ethernet DEC 21x4x (Tulip) based cards, 
Packet Engines GNIC-I (Yellowfin) Gigabit Ethernet cards and Packet Engines GNIC-II 
(Hamachi) Gigabit Ethernet cards [4]. Lawrence Berkley National Laboratory also deployed 
MPI on VIA in Sept 2000 and called it MVICH. 
M-VIA was previously only implemented on Intel and Alpha architectures. However, 
the cluster world was not concentrating on only those two architectures. Other platforms have 
also been used for clusters including the PowerPC architecture. For an example: Terra Soft 
Solutions, Inc, which believed PowerPC to be a strong and fast platform, instigated 8-node 
PowerPC Linux clusters [5]. Total Impact working with Terra Soft offered the brig, as a 
small footprint, single board PowerPC Linux network appliance computer that can be used 
alone, as an embedded system, or within a multiple bay chassis for a high density server or 
cluster solution, which integrated into a 4 or 8 PowerPC node cluster with Black Lab Linux 
[6] [7]. With the availability of LinuxPPC and Yellow Dog Linux, as native ports of Linux to 
the PowerPC, it became feasible to do cluster computing with linux on PowerPC. In this 
case, the Scalable Computer Laboratory installed a cluster using Yellow Dog Linux and 
Black Lab Linux with 16 single and 16 dual processor nodes running at 400-500 Mhz 
Macintosh PowerPC G4 with 512Mbytes — 1Gbytes RAM [8]. 
The PowerPC has significant advantage in numerical processing for scientific 
computing, namely the A1tiVec vector parallel processing extension. The PowerPC has also 
been shown to have lower power consumption compared to the Intel processors [9]. Lower 
power consumption will reduce expenses since life cycle cost and deployable considerations 
4 
would also appropriately include power consumption and cooling for any significantly sized 
system. 
G3, G4 and GS processors, which are part of PowerPC computers, have been widely 
recognized as main/direct competitors of x86 processors released by Intel. With AltiVec 
technology support, as the comparison with MMX technology in the Pentium family, G4 
systems became more competitive with x86 systems [ 10] . A1tiVec expanded the capabilities 
of PowerPCTM microprocessors by delivering high-bandwidth data processing and 
algorithmic-intensive computations in a single-chip solution. 
More interestingly, the Sun GEM, a Gigabit and Fast Ethernet controller is a nice 
feature built-in to Apple's G4 systems. A11 Apple G4 and GS PowerMacs come with this 
built-in network controller. Even most of the newest Power~ook products come integrated 
with this built-in network controller. 
For the reasons of knowing that using PowerPC would give some other significant 
value, this research was conducted on G4 as one of the members of PowerPC family. It was 
questioned if M-VIA would perform better than TCP/IP an G4. In order to support the 
implementation of M-VIA on the PowerPC architecture and research on implementing the 
Sun GEM driver, abuilt-in network controller supported by Apple's G4 system, was also 
conducted. Implementation of M-VIA on PowerPC would be done by changing the byte 
order of packet transmission adapted with PowerPC byte ordering. The traditional Sun GEM 
driver would be changed and compiled as a Linux module with support for the VIA protocol. 
It is hoped that the information in this study will be useful in identifying the strengths and 
weaknesses of using VIA in parallel computing on different kinds of processor architectures. 
Results of this study may suggest a better implementation of M-VIA, which will bring better 
5 
performance. More device drivers may need to be implemented in order to enhance the 
usefulness of M-VIA through .many different kinds of NIC. 
CHAPTER 2.OVERVIEW ON VIA 
VIA's goal was to reduce system software processing in exchanging messages [ 1 ] [2]. 
Wilf Sullivan observed in his paper on Virtual Interface Architecture Primer that 
performance on cluster system had been reduced by the overhead of standard communication 
protocols and inefficiencies of their interaction with operating systems [11]. VIA improves 
performance, bypasses much overhead and provides direct access to the network interface 
hardware. More information about VIA is presented in the following sub-chapter. 
2.1 Network architecture 
Wilf Sullivan [ 11 ] illustrated the overhead inside the layers in the traditional network 
architecture (Figure 2.1). Such overhead is caused by required address translation from 
virtual to physical address in TCP/IP, data. copying between application and network 
hardware, operaring system (OS) context switch and inefficient protocol stacks resulting in 
costly CPU overhead. 
Application 
Library 
Session (kernel) 
Comm Protocol Stack 
Device Driver 
Host Bus Adapter 
Interconnect 
Overhead 
Figure 2. l contemporary network architecture 
In VIA, every buffer registered is tied to a physical memory address so that the virtual 
to physical translation remains fixed without participation of the operating system. VIA 
provides direct data. copies between application memory and NIC. VIA also provides direct 
access from User-level to NIC via a Kernel Agent that reduces user to kernel context 
switches. 
Application 
Virtual Interface Provider Library 
Network Interface Card 
Interconnect 
Figure 2.2 VIA network architecture 
VI User Agent 
VI Kernel Agent 
2.2 VIA model 
VIA user space is associated with the application process, while kernel space referes 
to operating system functions, including I/O Control or system calls that directly access the 
network controller. Kernel space in VIA is implemented and run with participation of the 
device driver in protected mode. The device driver provides intermediate data copies from 
user to kernel space that reduces operating system context switches. Functions handled by the 
Kernel Agent in VIA are infrequent since the establishment of the VI connection usually 
occurs only once and is long lived, thereby reducing context switches between user anal 
kernel space. 
8 
Application 
(Messages passing interface) 
VI User Agent 
Send-Receive/RDMA 
User space 
Kernel space 
Register memory 
v 
I Iern~1 ~~et 
e 
n 
d 
e 
c 
e 
I 
V 
e 
~~ 
VI 
~._ 
S 
e 
n 
d 
R 
e 
c 
e 
I 
V 
e 
Figure 2.3 VIA model 
To interface the application program and the operating system (i.e.: Linux OS) VIA 
Consumers use the VIPL (VI Provider Library), which is a standard API that is easily used 
and independent of the underlying hardware implementation. VIPL was implemented in 
Modular Virtual Interface Architecture (M-VIA) [2] as the software implementation of VIA. 
2.3 VIA data flow 
VIA has the virtual interface (VI) that provides a mechanism of interaction between 
the VIA kernel agent and the VIA user agent. VIA gives the user the ability to create multiple 
processes and multiple connections (VI), each with direct access to the network hardware. 
Each NIC can support at least 1024 VIs, where each VI contains send queues and receive 
9 
queues. Each queue has uncompleted descriptors that the sender uses to send information to 
the receiver. Each descriptor presents pointers to data. buffers and much other information 
corresponding to the needs of the providers. Descriptors that have completed are removed 
from the queue. 
In the connection initiation, memory used for data copying has to be registered and 
protected by a protection tag. A VI connection is first initiated by requesting a connection 
from one side to another side by sending a request descriptor. As you see in Figure 2.4 [ 1 ], 
the client sends the connection request and the server accepts the connection. The descriptors 
used in processing a request contain the data length, data memory address, status of transfer, 
queue information and a scatter gather style buffer pointer list. 
During the data flow the user can choose to send data using the traditional method of 
data transfer (Send-Receive model} or using RDMA (Remote Direct Memory Access). In a 
Send-Receive data. transfer, each end has its own access rights to data memory space. In a 
RDMA transfer, the sender has been given the information that specifies the memory address 
the receiver side used to write the data. Data transmission completion is usually followed by 
indicating completion to completion queue and sendlreceive queue. With this completion 
action, data transfer is more synchronized. MP_Lite [6], one implementation of MPI using 
M-VIA, uses both Send-Receive and RDMA data transfer. It uses Send-Receive for the first 
1 ~ Kbytes data: and then uses RDMA for bigger data sizes. This was done in order to bring 
more stability and bandwidth. 
10 
Server Process 
ConnectionWait 
Times out 
Attributes Unacceptable 
ConnectionRej ect issued 
ConnectionAccept 
returns error 
ConnectRej ect 
issued 
..- 
T 
1 ConnectWait issued with criteria for Connection 
Waiting 
Client Process 
ConnectRequest matching 
ConnectionAddress criteria Received 
Examining 
Connection 
Attributes 
1 Cor~n~ction Attributes AcceptaF~~~ .~ 
ConnectionAccept issued 
~►  on Connectionid 
Waiting for 
Accept to 
return 
-~►` 
.-
Wait on 
Response 
ConnectionAccept 
returns Successfully _ T
yr 
Connected 
Figure 2.4 VIA endpoint connection process 
ConnectionRequest to 
ConnectionAddress 
ConnectionRequest 
rejected or timeout 
ConnectionRequest 
rejected or timeout 
11 
Table 2.1 Reliability Level 
Property/Level of 
Reliability 
Unreliable Reliable Delivery Reliable Reception 
Corrupt data detected Yes Yes Yes 
Data delivered at most once Yes Yes Yes 
Data delivered exactly once No Yes Yes 
Data order guaranteed No Yes Yes 
Data. loss detected No Yes Yes 
Connection broken on error No Yes Yes 
RDMA Write Support Yes Yes Yes 
State of Send/RDMA Write 
when request completed 
In-flight In-flight Completed on 
remote end also 
State of in-flight 
Send/RDMA Write when 
error occurs 
Unknown Unknown First one unknown 
others not delivered 
Furthermore, VIA supports three levels of reliability at the NIC level. They are 
Unreliable Delivery, Reliable Delivery and Reliable Reception. Unreliable delivery 
guarantees to deliver data packet at most once. In this level of reliability, corrupted data 
transferred must be detected on the receiver side. However the data may go undetected. 
Reliable delivery guarantees to deliver packet exactly once in the order submitted. Data loss 
and errors caused by a broken connection are detected. However errors occurring after a 
descriptor completes with a successful status are not reported to both sender and receiver. 
Either the send or receive side can have the error message. Reliable reception has a descriptor 
delivered in successful status only when the data has been delivered into the target user 
memory. M-VIA version 1.2 does not implement the reliable reception over the supported 
Ethernet device. Such Ethernet devices are Sun GEM Gigabit Ethernet, SysKonnect Gigabit 
12 
Ethernet, Intel Pro 100 Fast Ethernet and DEC DC21 x4x (Tulip) Fast Ethernet. Table 2.1 
lists the differences between levels of reliability. More explanation is provided in VIA 
specification. 
2.4 Memory management 
Every buffer that is used for VIA has to be registered and recognized by a VI 
connection. Every registration will return a memory handle that uniquely identifies the 
registered physical memory region. Every buffer registered is tied to a physical memory 
address so that the virtual to physical translation remains fixed without participation of the 
operating system. This eliminates the overhead of intermediate data copying from user to 
kernel space. 
VI provides memory protection in the form of protection tags that ensure that user 
process cannot send to or receive from a memory location that is not associated with a pair of 
connected VIs. The memory protection tag uniquely identifies the memory region associated 
with the VI. 
2.5 M-VIA (Modular Virtual Interface Architecture) 
M-VIA is the complete high-performance software level implementation of the 
Virtual Interface Architecture for Linux [2]. M-VIA provides a loadable kernel module and 
also has defined a software level library, VIPL (Virtual Interface Provider Library), to 
provide an easy interface with hardware. M-VIA is only zero-copy on the send side, the 
receive side requires one-copy. For each new network controller, it requires modifying the 
device driver, which also needs to be compiled as a module on Linux. M-VIA was designed 
13 
to be easily ported to new network device controllers, so that only a few device classes need 
to be added or modified. M-VIA is designed so that TCP/IP and VIA can be used 
simultaneously on the same device [2]. 
2.6 VIA performance on. Intel and Alpha 
~~ ~ 
Th
ra
u~
hp
ut
 i~
 ~
1b
 
~+II~ Litz ~ ~~'1
u
~ - 
I I (~̀ ([~1 a 
T'..i'1 ri►~i: ar 
~'~ ~ T~ ~' J ~ ~"i ~~ Fr ~r~ ~ ~wiYsyEY►M~/ArFi 
~~ ~ r 
~Q ~ ^~ 
~~ 
~~ 
~~ 
;c ~.Kw•~} W 1,1~w~w` 
4 
4 
~~, ~_•.. •'~'h i~~r 'Fe 
wi~a i~? i\E 9 ~'~v1'Y ~aF r - ~i 
T r~~,~il'9 ~'~.~ii ~'i1~ 
~s~r~yt 
y ' k•' 
~~:~~~-
~F~* mo w  ~ i 
r 
~~ I ~~ ~ ~ ~~ ~~ 
~~~~~ ~I~~ 1~1 
~~~~~ $'i-
Figure 2.5 VIA and TCP/IP performance on Alpha with SysKonnect gigabit controller 
Taken from the thesis of Implementation of MP_Lite for the VI Architecture by 
Weiyi Chen [12], M-VIA was previously tested on the Alpha and Intel architectures as 
figures 2.5 — 2.7 shows, using MP_Lite and MPICH [ 13] as the massage-passing interface. 
Tests were previously conducted on two Intel Pentium 450 MHz with 256 Mbytes memory 
and two Compaq DS20 500 MHz with 1.5 Gbytes memory. The OS used was Linux Red Hat 
14 
6.2 with kernel version 2.2.19. Tests for Intel architecture were conducted on two different 
network controllers: SysKonnect Gigabit Ethernet cards on both architectures and on Intel 
Pro 100 Fast Ethernet cards only on the alpha architecture. Tests for Alpha architecture were 
conducted only on SysKonnect Gigabit Ethernet cards. 
Th
r~
~u
gl~
 ~~
at: 
in 
~k
~ 
~P Litz ICI= 1 --
1~1'~I H ~~~~~~.~ 
~~ L~:f~1 ~ P I ~ '~~'I 
~1P1 ~"H ,,.,,,~a,a,.,~, 
T~ P ~~~~ ~,~~ 
~ ~ ~ ~a~ ~ a~ ~ ~ ~~ ~~ 
h~~ a~~ ~ i~~ i n E~;~tt~ 
~~~a~~ ~~+~~ 
Figure 2.6 VIA and TCP/IP performance on Intel with Intel Pro loo controller 
Figure 2.5 shows that on Alpha architectures, MP_Lite M-VIA provides higher 
bandwidths compared with MPICH, but gave lower performance compared with TCP jumbo 
frame. TCP/IP with jumbo frame reaches 880 Mbps and MP_Lite reaches 720 Mbps. The 
Intel implementation of M-VIA, running on MP_Lite, was better than TCP/IP. Figures 2.6 
and 2.7 also show that M-VIA on jumbo frame also completely outranks the TCP/IP 
performance. MP_Lite M-VIA has 40 bus latency and TCP/IP has 52 p.s with Intel Pro 100 
15 
Fast Ethernet controller. On SysKonnect Gigabit Ethernet controller, MP_Lite M-VIA has 45 
p s and TCP/IP has 5 3 p s. TCP/IP has maximum 290 Mbps bandwidth and MP-Cite M-VIA 
has maximum 425 Mbps bandwidth. 
~.~. _.~w~~, . 
ICI R_L its ICI -'~~I —
I~1'~I H 
f~IF'I H 
T P" ~~.~ 
Th
r~
~r
~l
~p
ut
 irr
 ~
~ 
.~~rn~rn~a1 
~~~~~~~ ~i in E?~rt~.~ 
s;~ ~'~+. 
i  rrj a ~~ 
~ ~~ ~C~ ~ 
es 
+t~~ 
~~'~~~~t 
Figure 2.7 VIA and TCP/IP performance on Intel with SysKonnect gigabit controller 
16 
CHAPTER 3. IMPLEMENTATION OF M-VIA ON POWERPC 
This implementation was started by the porting of M-VIA on Intel version 1.2b2 to 
PowerPC architecture. The only difference found between Intel and PowerPC architecture for 
this implementation was byte ordering and fast trap implementation. The Intel architecture 
uses little endian byte ordering in contrast with the PowerPC that uses big endian byte 
ordering. Therefore to make M-VIA work on the PowerPC, M-VIA needed to be modified to 
provide byte order synchronization. More detail on the synchronization implementation is 
explained in the following subchapter. 
3.1 Endian system overview 
Application 
VI User Agent 
VI Kernel Agent 
Network Interface Card 
Interconnection 
Figure 3.1 Endian system overview 
Machine dependent endian 
Default to Little Endian 
Little or Big Endian 
Big Endian 
Application and VI user agent layer in VIPL both read and write data in machine 
based endian byte ordering. In the VI kernel agent layer, M-VIA was programmed default to 
process data in little endian mode (M-VIA could reconfigure kernel agent to read and write 
data. in big endian byte ordering by changing the macro value before the compilation). In the 
17 
network layer, data was delivered either in little endian or big endian based on the controller 
specification. In the bottom layer, data was delivered in big endian byte order. 
3.2 Endian Synchronization 
The kernel agent in M-VIA version 1.2b2, that was tested on Intel and alpha,always 
assumes little endian addressing. However, when it was tested on PowerPC it was found out 
that some part of the connection request packet on the sender was written in big endian byte 
order. At the receiver side, packet was read in little endian. This problem was caused by 
unsynchronized byte ordering in the kernel layer. In M-VIA version 1.2b2, the kernel agent 
did not change the endian byte order for control packet sent during VIA connection 
establishment. The kernel agent also did not change the byte order for the packet received 
from network to the kernel layer. Therefore, the same packet but different byte order was 
received on the receiver side. Figure 3.2 explains in an example haw miss reading will occur 
in unsynchronized byte ordering. 
Layers 
Endian type on sender 
Application Kernel Network Physical 
Big - i Blg -♦ Little Big 
Endian type on receiver Little ~ Little Little Big 
r 
Problem occurs because there is 
no endian synchronization 
Figure 3.2 An example of unsynchronized endian order 
18 
Once the byte synchronization on error was found it was solved by adding 
ByteSwapPacketO function in the kernel agent to swap request packet to the desired byte 
order before sending and to swap the packet back to the original byte order on the receiver 
side. ByteSwapPacketO was defined by Paul Hargrove at the Ernest Orlando Lawrence 
Berkeley National Laboratory (LBNL) as the code below. 
void ByteSwapPacket(VIPK_CCHAN_PACKET "Pkt) 
{ 
/* Pkt->PSN is opaque */ 
/* Pkt->Token is opaque */ 
VI PK_SWAB 16S(andPkt->Op}; 
VIPK_SWAB16S(andPkt->SrcAddr. HostAddressLen}; 
VIPK SWABI6S(andPkt->SrcAddr.DiscriminatorLen}; 
VIPK_SWAB16S(andPkt->DstAddr. HostAddressLen}; 
VIPK_SWAB16S(andPkt->DstAddr. DiscriminatorLen}; 
VI PK_SWAB32S(and Pkt->Session}; 
VIPK_SWAB32S(and Pkt->SrcConn Hand le}; 
VI PK_SWAB32 S (and P kt-> DstCo n n Ha nd l e}; 
VI PK_SWAB32S(andPkt->ViHandle}; 
1* Pkt->ViAttribs is encoded in canonical order */ 
VI PK_SWAB32S(andPkt->Sequence}; 
} 
ByteSwapPacket() takes a packet as the parameter and converts the byte order of each 
layer in the packet to the desired byte order, which is little endian. The packet swapped here 
is the packet for requesting connections from the client to the server (sender to the receiver). 
The packet sent contains the source address, destination address, source address length, 
19 
destination address length, source connection handle and destination handle, VI Handle and 
sequence number. Packet data byte order is swapped with function of VIPK_SWAB. 
VIPK_SWAB is the inline function defined in M-VIA kernel code to make byte order 
conversion. A more detailed explanation about connection handles and VI handles is 
conducted in Virtual Interface Architecture Specification Manual written by Compaq 
Computer Corporation, Intel Corporation and Microsoft Corporation [ 1 ]. 
Using this function every packet written was ensured to be in little endian in the 
kernel layer so that the receiver can read it in the same byte order as the sender sent. Now the 
packet is written and read in the same order it is able to be tested with all available M-VIA 
network device (ie: Intel Pro 100 Fast Ethernet card, SysKonnect Gigabit Ethernet card) 
3.3 Fast Trap 
In the Intel implementation of M-VIA, fast traps were coded using assembly code in 
order to speed the packet sending process. Using fast traps eliminates the process of kernel 
I/Q Cantrol. When a packet was ready to be sent in the user layer, a trap interrupt was used to 
send a signal to the network hardware, then the M-VIA kernel agent skipped the IU Control 
process and called the appropriate kernel send function. 
Fast trap hardware signals are not available in the PowerPC architecture. Besides that, 
the performance of using I/O Control instead of fast traps was assumed to give only small 
differences in performance. Because of these reasons, the PowerPC implementation of M-
VIA uses I/~ Control instead of fast traps. 
zo 
CHAPTER 4. IMPLEMENTATION OF SUNGEM DEVICE 
DRIVER FOR M -VIA 
Porting to a different CPU architecture is not the only challenge in this research 
proj ect. A new network device driver was also added. The network device chosen was the 
Sun GEM Fast/Gigabit Ethernet controller built-in to Apple Macintosh G4 systems. The new 
M-VIA compatible Sun GEM driver was not built to be used far M-VIA only, but it had to 
work for TCP/IP and M-VIA. Therefore, the original Sun GEM driver was not totally 
changed. It was modified by adding functions for M-VIA. 
4.1 M-VIA network device driver module system overview 
The M-VIA device driver module system was implemented and adapted inside the 
traditional device driver module system. The module system in previous implementation of 
M-VIA device driver was expressed as the module structure shown in figure 4.1. 
The device driver for M-VIA had to have: M-VIA device initialization, registration, 
process of send and receive, ring management, packet type management and M-VIA device 
deregistration. 
21 
Send 
Data mapping 
Data fl agging 
M-VIA device initialization 
Registration 
Basic send receive 
RDMA Write 
TX Ring RX Ring 
Ring Buffer Management, 
Packet Type Management I I 
M-VIA device deregistration 
Figure 4.1 M-VIA device driver module system overview 
Receive 
Data type differentiation 
The M-VIA network device starts with device memory allocation and initialization of 
device attributes, such as the name of device (ie: via eth 1), maximum length of sender or 
receiver ring buffers and maximum VI connections allowed per device. The attributes are 
registered using a device pointer to the M-VIA kernel agent. The process of sending and 
receiving is conducted each time the device detects or receives a send or receive interrupt 
signal. The device driver sends a signal to the device after it finishes filling the transfer (TX) 
ring with the mapped memory address for the packet that is going to be transmitted. When 
the device receives a signal for receiving packets, it places the packets in the receiver buffer 
defined by device driver. Buffer management functionality includes buffer blocking, buffer 
setup and buffer cleanup. Packet type management is critical for differentiating between 
~~ 
TCP/IP packets and VIA packets. The deregistration stage clears up the allocated memory 
and shuts down all device processes, after which the device can no longer be used. 
4.2 Initialization and registration. 
The Initialization was done in the VI P K E RI N G_I N IT(} macro function. This function is 
called by the device initialization function (ie: gem_init_one(~ function in the SunGEM 
device). During the initialization, devices attributes are allocated in memory and set to a 
default value. Some of important device attributes are described as follow: 
■ LocalNicAddress. This attribute provides the MAC address of the device. 
■ MaxVl provides the maximum vI connections in each peer-to-peer connection. 
■ MaxSegmentsPerDesc indicates the maximum number of segments allowed per 
descriptor. 
■ MaxcQ. The maximum number of Completion Queues that can be used in one VI 
connection. 
■ MaxcQEntries. The maximum number of entries allowed in each Completion Queue. 
■ MaxTranferSize. Maximum delivered size in each VIA transmission. 
■ MaxPtags provides the maximum number of different protection tags allowed. 
■ NativeMTU. The MTU size specified by the network device controller. 
(A mare complete description and explanation can be found in the VIA documentation [ 1 ]). 
After the initialization, registration is conducted by the VipkERingRegister() function. 
This function is called by the SunGEM open function, gem_open(). VipkERingRegister{) 
analyzes the device attributes values and inserts the pointer in the array of device pointers 
provided to the kernel agent. 
23 
4.3 Ring buffer management. 
For the M-VIA implementation, it's very important to understand the ring buffer 
design and how the ring buffer should be cleaned for the use by both TCP/IP and VIA 
protocols. The next sub-chapters will explain first about traditional ring buffer design then 
about ring buffer cleanup management. 
4.3.1 Ring buffer design 
The ring buffer for both transferor (TX) and receiver (R;X) was by default initialized 
to 128 slots of buffers. Each slot contained 64 bits of control or status word and 64 bits of 
pointer buffer. The control word provides flags to the TX ring buffer that indicate value, size 
and type of a packet mapped into the pointer buffer. Similar to the control word, the status 
word gives size and value for the checksum. The pointer buffer points to the mapped memory 
address for the transmitted or received packet. Figure 4.2 describes more about ring 
management visually. 
In the control word, bit 0 to bit 14 is reserved for indicating the length of packet in the 
pointer address. Bit 15 to bit 20 is where the start offset of the checksum is placed. Bit 21 to 
28 indicates the offset for the checksum. Bit 29 indicates whether a checksum is enabled or 
not. If bit 30 is on, then it means that the buffer is the last segment of the message sent. If bit 
31 is not zeroed, it indicates that the buffer is the first segment of the message. When bit 32 
is set to be on, the receiver side would send an IRQ signal to the processor. 
In the status word, bit 0 to 15 is reserved for the TCP checksum. Bit 16 to 30 indicate 
the size of the packet received. The rest of the status word bit indicates the hash filter, match 
alternative MAC address and CRC error indication. 
24 
TX Rin g 
0 
1 
2 
3 
control word pointer buffer 
0 
Control word 
63 64 127 
2 
3 
status word pointer buffer 
0 63 64 127 
0---14 ~ 15--20 ~ 21--28 
TXDCTRL_BUFSZ ~ XDC ± _E F 
TXDCTRL CSTART 
TXDCTRL SOF 
TXDCTRL_COFF - 
TXDCTRL CENAB TXDCTRL THE 
Status word 
~ 29~ 30 ~ 31 32 ~ 33 ~ 
TXDCTRL_N OCRC 
34---63 
Undefined 
0---15 
R.XDCTRL TCPCSUM 
16---30 
1 
RXDCTRL BUFSZ 
Figure 4.2 'Visual descriptions of TX and RX ring buffer 
31--63 
Flag bits for TCP 
In the SunGEM driver, struct gem_txd expresses one slot of the TX ring buffer and 
struct gem_rxd identifies one slot of the RX ring buffer. An array of txd[], which contains 
elements of type gem_txd and an array of rxd(], which contains elements of type gem_rxd, 
expresses the TX and p:.X ring buffers. 
25 
For the purpose of implementing a M-VIA device driver, it is important part to know 
how to identify the first and the last segment of a VIA packet sent to the receiver and how to 
identify the size of segment. To indicate the first segment of the message delivered, the 
driver puts the TXDCTRL_SUF' value in the segment. For the last segment the TXDCTRL_E4F 
value is put in the segment. The size of the VIA segment is also inserted and acknowledged 
by masking with TXDCTRL_BUFSZ. The VIA protocol does not provide either checksum or 
cyclic redundancy code to prevent packet corruption. Therefore only bits 0 to 14, 30 and 31 
in the TX descriptor are used for the M-VIA implementation. 
4.3.2 Ring buffer cleanup 
Another important issue that needs to be considered in managing the ring buffer is to 
cleanup the TX rings after the controller has sent the packet. This is to ensure that VIA 
segments will not be sent during TCP/IP transmission, neither will TCP segments during 
VIA transmission. The TX ring cleanup process in M-VIA is run by the 
VI P K_E RI N G S E N D_U N MAP (} function. This function basically unmap s the memory address 
of the packet sent and nullifies the buffer and the control word of the TX ring. 
VIPK_ERING SEND_UNMAP(} is called by the gem_tx(} inline function and the 
gem_clean_rings(} procedure in the SunGEM driver. The gem_tx(} function is called every 
time the device finishes sending packets. The gem_clean_rings(} is called during the closing 
of the device. 
26 
4.4 Packet Type Management 
Since the device driver must be able to be used for both TCP/IP and VIA protocols, 
the driver needs to have the ability to differentiate the type of packet to be delivered and run 
a different protocol function for each different type of packet. The M-VIA user and kernel 
agent also has different types of packets for establishing connections, for message 
transmission and for packet received acknowledgements. The following sub chapters will 
explain how the M-VIA device driver, the M-VIA user agent and the M-VIA kernel agent 
differentiate every different type of packet in the transmission. 
4.4.1 Packet type management in the M-VIA Device driver 
The capability to differentiate VIA and TCP/IP protocols is provided by a variable 
type struct VIPK_ERING_TX_TYPE, which is called TxType, in the device driver. This variable 
type was programmed by the M-VIA developer team in Berkeley Lab [2]. This following 
paragraph shows what is contained in the struct VIPK_ERING_TX_TYPE. 
typedef enum { 
VIPK ERING TX TRADITIONAL, 
VIPK_ERING TX VIA_UR, 
VIPK ERING TX VIA RD, 
VIPK ERING TX VIA IGNORE, 
VIPK ERING TX VIA CONTROL 
} VIPK ERI NG_TX_TYPE 
When the TxType is set to VIPK_ERING_TX TRADITIONAL, the driver runs traditional 
transmission, which means packets sent are TCP/IP packets. The rest of 
VIPK_ERING_TX_TYPE is associated with the VIA protocol. VIPK EKING TX VIA_UR is 
27 
used when VIA is set to [Jnreliable Delivery and VIPK ERING_TX VIA_RD is associated with 
Reliable Delivery. VIPK_ERING_TX VIA_IGNORE means that the type of the transmitted 
packet would be ignored. This type is usually used for packet headers. 
VIPK ERING TX VIA CONTROL occurred during VIA connection establishment. 
4.4.2 Packet type management in M-VIA Kernel Agent 
Different packet types are also processed differently in the M-VIA Kernel Agent. 
Some flags are added to the packet in order to give different action for each different packet 
types. A list of different types of flags used by M-VIA Kernel Agent is shown and explained 
below. 
• VIPK CONTROL LAST FRAG. This flag identifies the last VIA segment of a packet. 
.- - -
The M-VIA Kernel Agent also specifies which segment is the first segment and 
which segment is the last segment of the packet sent, to know when the kernel has to 
start and end processing on the packet. This would seem to be redundant since the 
SunGEM device drivers have differentiated between the first and last segment. 
However, different device drivers have different identification mechanism for the first 
and last segment of packets and thus M-VIA has to have its own identification 
mec anism. 
• VIPK_CONTROL_FIRST_FRAG. This flag identifies the first segment of a VIA packet. 
• VIPK CONTROL_OP ACK. Acknowledgement segment is sent every time the receiver 
has completed receiving one segment of a packet. 
• VIPK CONTROL IS NACK. A segment that is not used as an acknowledgement 
_ _ _ 
segment is identified as allon-Acknowledgement segment. 
28 
• VIPK CONTROL_OP_CTRL. This type of packet is used in the connection declaration. 
4.4.3 Packet type management in M-VIA User Agent 
VIA has two types of data transfer model: Send-Receive and RDMA. RDMA 
contains RDMA write and RDMA read. M-VIA user agent differentiates the data transfer 
model using flags inside the data segments. A list of different types of flags for the data 
transfer model is explained below. 
• VIP_CONTROL_OP_SENDRECV. No RDMA process runs on the data transfer. Both 
sender and receiver side registered their own buffer. Receiver placed data received 
from the sender in new unused buffer. 
• VIP_CONTROL_OP_RDMAWRITE. Sender knows the destination memory address and 
was able to write messages without consuming remote receive queue 
• VIP_CONTROL_OP_RDMAREAD. Receiver knows the remote resource memory 
address and was able to read messages without consuming remote send queue. The 
implementation of this RDMA read data transfer model has not been implemented in 
current implementation of M-VIA. 
4.5 Send 
When packets are ready to be sent, the memory address of the packets is mapped to 
the ring buffer. Different segments from one packet are placed in each slot of the ring buffer 
and identified by the flags introduced in the previous sub-chapters. 
The process of sending packets is conducted by the function 
VI P K_E RI N G_S E N D_H D Rt} and the function VI P K_E RI N G_S E N D_S EG (}. 
VI P K_E RI N G S E N D_H D R~} maps the header of a packet to the ring buffer in order to identify 
29 
the sequence number of packets. The header has to be the first segment of the packet. The 
other segments are mapped by the function VIPK_ERING_SEND_SEG(). 
After each time the device sends a packet the driver calls a function to cleanup the 
ring buffer. In the SunGEM driver this process was run by the function gem_tx(). Inside this 
function, function VI P K E RI NG_S E N D_U N MAP(} _ is called to unmap the VIA packet memory 
address mapped in the ring buffer, as was explained before in the previous chapter. 
4.6 Receive 
The receive process is performed by the function VipkERingRecv(} which is called by 
the function gem_rx() in the SunGEM driver after the device detects a receive interrupt 
signal. This function grabs the mapped memory from the receiver buffer and sends it to user 
space. The receive ring buffer is also cleaned and ready to be used for the next received 
packet. 
4.7 Deregistration 
M-VIA device deregistration was conducted by the VipkERingDeregister(} function 
called in the gem_close{} function in the SunGEM device driver. VipkERingDeregisterQ 
cleaned the SunGEM network device pointer from the array of registered devices in the M-
VIA kernel agent. As a result, no more M-VIA process will run on the Sun GEM device. 
30 
CHAPTER 5. TEST METHODOLOGY 
5.1 Test Platform 
All tests were run on two Apple Power Macintosh G4 systems with PowerPC 7400 
CPUs, AltiVec supported, 450 Mhz processor and 512 Mbytes memory. The OS used was 
Yellowdog Linux with kernel version 2.4.19. Tests were conducted on three different 
network controllers: Intel Pro 100 Fast Ethernet card, SysKonnect Gigabit Ethernet card and 
Sun GEM Fast and Gigabit Ethernet network controller built-in to PowerMac G4 systems. 
5.2 Vnettest 
Vnettest, is a simple a test code, included with M-VIA. The original vnettest used 
unreliable delivery level. The vnettest code was slightly modified for the purpose of testing. 
It only tests the send/receive data. transfer mode and omits the RDMA. So vnettest was 
basically built for the sake of "ping-ing" on top of VIA using different packet sizes. Since a 
driver built for VIA has to be able to run simultaneously for TCP/IP, the simple test for 
TCP/IP was also run with the ping program built-in on Linux system. 
5.3 MP Lite 
The tests were run using MP_Lite [6]. Turner and Chen, built MP_Lite. MP_Lite is a 
lightweight message-passing library designed to deliver the best performance on many 
different types of protocols. MP_Lite supports only a few of the basic MPI functions 
including blocking and asynchronous sends and receives and common global operations such 
31 
as broadcast, synchronization, sum, min and max. It does not offer more advanced operations 
like the use of communicators other than MPI_C~MM WORLD, derived data types, advanced 
I/O functions. Using MP_Lite, MPI applications are able to run unmodified on top of TCP or 
VIA (GigaNet hardware or M-VIA on Gigabit Ethernet) on PC/workstation clusters, use the 
high-performance native SHMEM library on Cray T3E and SGI Origin systems and on SMP 
mac Ines. 
M-VIA was tested with MP_Lite using the reliable level of delivery for data 
transmission. MP Lite uses RDMA for transmitting messages more than 16Kbytes length 
and uses traditional send-receive data transmission for shorter messages. 
5.4 NetPIPE 
NetPIPE [ 14] is a bandwidth and latency measurement tool developed also by Snell, 
Mikler, Gustafson and Helmer. It provides simple ping-pong tests, sending and receiving 
messages in exponentially increasing size between two nodes. NetPIPE can be run across a 
network or within an SMP system. NetPIPE provides tests for different message passing 
interfaces like MPICH, MVICH, MP_Lite, PVM, TCGMSG and the 1-sided message-
passing standards MPI-2 and SHMEM. For the sake of this research NetPIPE was only run 
on MP Lite. NetPIPE determines different bandwidth. and latency for each different packet 
size using the defined message passing interface and network protocol provided (i. e. VIA or 
TCP/IP). The results were saved in output files giving rows of message sizes, bandwidth and 
latency. This information could be used for plotting a graph of bandwidth.. 
32 
CHAPTER 6. RESULTS 
B
a
n
d
w
i
d
t
h
 i
n 
M
b
p
s
 
100 
90 
80 
70 
60 
50 
40 
30 
20 
10 
1 10 100 1000 10000 
Message size in bytes 
---~~- eepro-TCPIP eepro-VIA 
100000 
Figure 6.1 Bandwidth graphs of VIA and TCP-IP on Intel Pro 1.00 
1000000 10000000 
Figure 6.1 shows the comparison of bandwidth versus message size between 
M-VIA using MP_Lite and TCP/IP running on Intel Pro 100 Fast Ethernet card. From this 
graph, it looked like that VIA performed slightly better than TCP/IP on latency and 
bandwidth. Although the bandwidth is a little bit worse from 8000 bytes message, but it 
started to beat TCP-IP above 90000 bytes message. TCP/IP has 89.75 Mbps maximum 
bandwidth with latency of 58.87 ps. MP_Lite M-VIA has 91.09 Mbps maximum bandwidth 
33 
with latency of 48.47. Both TCP/IP and MP-Lite M-VIA on PowerPC have higher latency 
compared with the performance on Intel processors. 
B
an
dw
id
th
 i
n 
M
bp
s 
900 -~- 
800 
700 J 
600 
500 , 
400 - 
300 
200 
100 
0 ~ 
1 10 100 1000 10000 
sk98-TCPIP 1500 
_ _ _.u_ .  _~.._ . .. .. _ 
Message size in bytes 
100000 
'Y 
1000000 10000000 
sk98-VIA-1500 sk98-TCPIP 9000 -sk98-VIA-9000 
Figure 6.2 Bandwidth graphs of VIA and TCP-IP on SysKonnect 
Comparison between VIA and TCP/IP on SysKonnect SK-98xx Gigabit Ethernet 
cards is given on Figure 6.2. With a Maximum Transfer Unit (MTU) of 1500 bytes, VIA's 
latency and bandwidth are similar to TCP/IP's on messages below 90000 bytes, but VIA 
reaches higher bandwidth for messages above 90000 bytes. TCP/IP has 40.64 µs latency and 
MP Lite M-VIA has 43.60 µs latency. M-VIA with a MTU of 9000 bytes (jumbo frame), 
VIA performed worse than TCP/IP for message below 260 Kbytes, but again it gave a higher 
34 
bandwidth above that message size. MP-Cite M-VIA has 43.99 has latency and TCP/IP has 
latency of 41.48 bus. The throughput of TCP![P and MP_lite M-VIA are 830.70 Mbps and 
845.96 Mbps. Both of MTU of 1500 bytes and MTU of 9000 bytes shows that MP-Cite M- 
VIA has bigger latency compared with TCP/IP. The findings do not clearly indicate the 
significances of smaller (better) latency of 'VIA on small message size (c.f. below 100 
Kbytes) compared with TCP/IP. This result brings more questions and discussion in the next 
chapter. 
B
an
dw
id
th
 i
n 
M
bp
s 
1 10 100 1000 
Message size in bytes 
10000 
---fast-sungem-TCPIP ----sfast-sungem-VIA 
100000 1000000 
Figure 6.3 Bandwidth graphs of VIA and TCP-IP on Sun GEM Fast Ethernet controller 
35 
B
a
n
d
w
i
d
t
h
 i
n 
M
b
p
s
 
soo 
700 - 
600 - 
500 
400 - 
300 - 
200 - 
100 
1 10 100 1000 
Message size in bytes 
10000 
gigabit-sungerr~TCP -gigabit-sungerr~-VIA 
100000 1000000 
Figures 6.4 Bandwidth graphs of VIA and TCP-IP on Sun GEM Gigabit Ethernet 
controller 
Results of testing for VIA and TCP/IP on the Sun GEM Fast Ethernet controller are 
presented on figure 6.3. The presentation indicates that VIA on Fast Ethernet controller had 
higher latency and had lower bandwidth than TCP/IP. The presentation shows that VIA had 
better performance when the messages size reach 390 Kbytes and above. Latency of TCP/1P 
and MP-Cite M-VIA are 85.33 has and 192.48 µs. With the Sun GEM Gigabit built-in 
Ethernet controller, shown by figure 6.4, MP_Lite M-VIA has better bandwidth but worse 
latency as compared with TCP-IP when the message size is above 1500 bytes. TCP/IP has 
36 
61.68 µs latency and MP_Lite M-VIA has 134.03 µs latency. The throughout of MP_Lite 
and TCP/IP are 690.32 Mbps and 486.25 Mbps. Again the result on VIA does not give good 
performance on small messages sizes, but gives improving results on large message sizes 
(c.f. above 100 Kbytes). 
37 
CHAPTER 7. DISCUSSION 
Our research measured the performance of VIA on the PowerPC architecture, as it 
was known that in previous research VIA performed better than TCP/IP on Intel processors 
but not on Alpha processors [2 ] [ 12 ] . 
M-VIA version 1.2b2 is designed and tested only for Intel and Alpha processors, 
which use little endian byte order. By adding a byte synchronization function, M-VIA has 
been successfully ported to Apple's G4 systems, which use big endian byte order. M-VIA 
can now be run on both big and little endian byte order architectures. M-VIA is also 
designed to be easily portable to new network drivers. The Sun GEM device driver was 
successfully implemented through the addition of functions required for M-VIA 
functionality. This implementation adds one mare choice of network controller for M-VIA. 
In theory, VIA should give less latency and more bandwidth, since the efficiency of 
OS bypass should use fewer CPU cycles. However, our findings found some turning points, 
where VIA would give better performance only on large massage sizes in every network 
interfaces we used. TCP-IP as the traditional protocol in clustering or parallel computing 
system, still would be considered to perform better than M-VIA for small message sizes 
(below 100 Kbytes). The results of our findings show that MP_Lite M-VIA performs better 
on Intel Pro 100 on bandwidth but not on latency. Both the SysKonnect SK-98xx and the Sun 
GEM Gigabit Ethernet controllers gave some promising performance for bandwidth with 
large message sizes. 
38 
It was well understood that bypassing the operating system should give us better 
performance on VIA protocol. However, the result found that M-VIA on PowerPC only 
performed better on large message sizes. 
It is possible that the reduction in performance could be caused by the different 
processor architecture of the PowerPC. It is also possible that TCP-IP has become more and 
more reliable with the current kernel implementation through the communication layers (i.e. 
Open System Interconnect (OSI) layers). Kernel implementation of TCP/IP keeps being 
updated and will make TCP-IP more competitive with VIA, especially on small sized data 
transmission. 
It is considered that the implementation of M-VIA on PowerPC was not done with 
assembly coding like it was in the Intel implementation. The assembly coding for Intel gives 
the ability of using hardware fast trap interrupt signals to give increased speed for sending 
and receiving VIA descriptors. This limitation may be considered as one of the reasons why 
VIA performs worse than TCP-IP in this research. M-VIA would need more updates to bring 
more reliability and more competitive results with TCP-IP. 
More NIC implementations should be done and tested to see how latency and 
bandwidth performance is related to the NIC when using VIA. The reliable reception level 
and RDMA read have not yet been implemented. These implementations need to be done in 
order to give more capabilities to M-VIA. 
39 
BIBLIOGRAPHY 
[ 1 ] Compaq Computer Corporation, Intel Corporation and Microsoft Corporation. "Virtual 
Interface Architecture Specification." December 4, 1997. 
[2] National Energy Research Scientific Computing Center (NERSC). "M-VIA: A High 
Performance Modular VIA for Linux. "December 13, 2000. 
http://www.nersc.gov/research/FTG/via/. (Date retrieved: July 7, 2003). 
[3] Linux Cluster Institute (LSI). "M-VIA and MVICH: Status and Future Plans." 
http://www.linuxclustersinstitute. org/Linux-HPC-Revolution/Archive/PDF00/Welcome.pdf. 
(Date retrieved: October 1, 2003). 
[4] Panella, Marco. "ParMa2: porting VIA in LAM/MPI." October 11, 2001. 
 (Date retrieved: August 29, 2003). 
[5] Emmen, Ad. "Terra Soft Offers Portable, 8-node PowerPC Linux Clusters." Terra Soft 
Solutions, .Inc. September 7~, 2000. http://www.terrasoftsolutions.com/news/2000-09- 
07.shtml. (Date retrieved: May 3, 2003). 
[6] Tuner, Dave; Chen, Weiyi and Kendall, Ricky. "Performance of the MP_Lite message-
passing library on Linux clusters". Scalable Computing Laboratory, Ames Laboratory. 
http://www.scl.ameslab.gov/Projects/MP_Lite/index.html. (Date retrieved: February 15, 
2003. 
[7] Terra Soft. "The briQ." http://www.terrasoftsolutions.com/products/briQ/. (Date 
retrieved: October 24, 2003). 
40 
[8] Scalable Computing Laboratory, Ames Laboratory. "PPC Cluster." 
http://www.scl.arneslab.gov/Projects/PPC_cluster/. Checked on October 20th, 2003. 
[9] Every, David K. "Hardware Dojo Learning about Computer Hardware." 1999 
http://www.mackido.com/Hardware/. (Date retrieved: October 19, 2003}. 
[ 10] Every, David K. "What is A1tiVec." 
http://www.mackido.com/Hardware/A1tiVecVsMMX.html. Checked on November 3rd, 2003. 
[ 11 ] Sullivan, Wi1f. "Virtual Interface Architecture Primer." DY 4 Systems inc. 
http://www.omimo.be/Magazine/OOgI/2000g1~012.pdf. (Date retrieved October 10, 2003). 
[ 12] Chen, Welyl. "Implementation of MP_Lite for the VI Architecture." Iowa State 
University, Dept. of Computer Science. 2001. 
[ 13] University of Chicago. "MPICH-A Portable Implementation of MPI." http://www- 
unix.mcs.anl.gov/mpi/mpich/. (Date retrieved: November 14, 2003). 
[ 14] Quinn, Snell O.; Mikler, Armin R.; Gustafson, John L; Helmer, Guy. "NetPIPE: a 
Network Protocol Independent Performance Evaluator". Scalable Computing Laboratory, 
Ames Laboratory. http://www.scl.ameslab.gov/Projects/NetPIPE/index.html. (Date 
retrieved: December 20, 2002). 
[ 15 ] Bovet, Daniel and Cesati, Marco. "Understanding the Linux Kernel". O'Reilly. 2001. 
[ 1 f ]McClanahan, Kip. "PowerPC Programming for Intel Programmer". IDG Books. 1995 . 
[ 17] Rubini, Alessandro and Corbet, Jonathan. "Linux Device Driver 2nd Edition." O'Reilly. 
2001.. 
[ 18] Sorensen, Stefan. "Low Latency Cluster Communication." September 26, 2000. 
http://www.daimi.au.dk/~pmn/sc~0/pro4/report.html. (Date retrieved: July 10, 2003). 
41 
[ 19] National. Energy Research Scientific Computing. "MVICH — MPI for Virtual Interface 
Architecture." Sept 1St, 2000. http://www.nersc. gov/research/FTG/mvich/. (Date retrieved: 
May 24, 2003). 
