AUTOMATED TESTING OF 10GbE DEVICES by Avramović, Nikola
  
 
 
 
 
 
 
 
 
 
 
 
VYSOKÉ UČENÍ TECHNICKÉ V BRNĚ 
BRNO UNIVERSITY OF TECHNOLOGY 
 
 
FAKULTA ELEKTROTECHNIKY 
A KOMUNIKAČNÍCH TECHNOLOGIÍ 
FACULTY OF ELECTRICAL ENGINEERING AND COMMUNICATION 
 
 
ÚSTAV MIKROELEKTRONIKY 
DEPARTMENT OF MICROELECTRONICS 
 
 
 
 
 
 
AUTOMATICKÉ TESTOVÁNÍ 10GBE ZAŘÍZENÍ 
AUTOMATED TESTING OF 10GBE DEVICES 
 
 
 
BAKALÁŘSKÁ PRÁCE 
BACHELOR'S THESIS 
 
AUTOR PRÁCE 
AUTHOR 
 
Nikola Avramović 
VEDOUCÍ PRÁCE 
SUPERVISOR 
doc. Ing. Lukáš Fujcik, Ph.D. 
BRNO 2016   
 
 ABSTRACT 
This thesis deals with the designs of the functional verification model and the synthesizable 
tester of the 10Gb Ethernet devices that use XGMII interface. VHDL programming language 
is used to describe the model. This thesis consists of the creation of the bus functional model 
and the design of the tester that is implemented as a generic self-test module. The resulting 
design allows for verification and testing of the PHY and MAC layers. DE5-Net development 
board was used in the implementation of the tester. The board was fitted with FPGA Stratix V 
circuit, by Altera. 
 
KEYWORDS 
Verification, Ethernet, IEEE, OSI, BFM, Generator, Monitor, DUT, PHY, MAC, XGMII, 
FPGA, TB, IP, Self-checking, SFP+, Synthesis 
 
 
 
 
 
 
 
ABSTRAKT 
Tato práce se zabývá návrhem modelu pro funkční verifikaci a návrhem syntetizovatelného 
testru 10Gb Ethernet zařízení, které používají XGMII rozhraní. Pro popis modelu je použit 
programovací jazyk VHDL. Práce zahrnuje vytváření bus functional modelu a návrh testru, 
který se implementuje jako genericky self-test modul. Výsledný návrh umožňuje verifikaci a 
testování PHY a MAC vrstve. Pro implementaci testru byla použita vývojová deska DE5-Net 
osazena FPGA obvodem Stratix V GX od firmy Altera. 
 
 
KLÍČOVÁ SLOVA 
Verifikace, Eternet, IEEE, OSI, FMS, Generator, Monitor, DUT, PHY, MAC, XGMII, 
FPGA, TB, IP, Vlastní-kontrola, SFP+, Syntetizace  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
AVRAMOVIĆ, N. Automatické testování 10GbE zařízení. Brno: Vysoké učení 
technické v Brně, Fakulta elektrotechniky a komunikačních technologií, 2016. 50 s. 
Vedoucí bakalářské práce doc. Ing. Lukáš Fujcik, Ph.D.  
  
STATEMENT OF ORIGINALITY 
The work contained in this thesis has not been previously submitted for a degree or 
diploma at any other higher education institution. To the best of my knowledge and 
belief, the thesis contains no material previously published or written by another person 
except where due references are made.  
 
 
 
V Brně dne 30. 5. 2016    …………………………………. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ACKNOWLEDGEMENT 
I give my thanks to doc. Ing. Lukáš Fujcik, Ph.D.for directing and guiding my work. 
Additionally, I give my thanks to the TTTech Computertechnik AG company for 
providing me with the devices and given information which helped in making this 
theses. 
 
 
 
V Brně dne 30. 5. 2016    …………………………………. 
 
  6 
CONTENTS 
Introduction 8 
1 Teoretical analysis 9 
1.1 Digital design ............................................................................................ 9 
1.1.1 Types of abstraction .............................................................................. 9 
1.1.2 Hardware description languages (HDL) ............................................. 10 
1.1.3 Classification of device technologies ................................................. 11 
1.2 Verification ............................................................................................. 12 
1.2.1 Verification plan ................................................................................. 12 
1.2.2 Types of Verification .......................................................................... 13 
1.2.3 Typs of device under test .................................................................... 14 
1.2.4 Hardware Verification Languages (HVL) .......................................... 15 
1.2.5 Coverage ............................................................................................. 15 
1.3 Design methodologies ............................................................................. 15 
1.3.1 Bottom-up ........................................................................................... 15 
1.3.2 Top-down ............................................................................................ 16 
1.4 Testbench (TB) ....................................................................................... 16 
1.4.1 Raw test vectors (RTV) ...................................................................... 17 
1.4.2 Complete functional model (CFM) ..................................................... 17 
1.4.3 Bus functional model (BFM) .............................................................. 17 
1.4.4 Stimulus Generator ............................................................................. 18 
1.4.5 Response Monitor ............................................................................... 18 
1.4.6 Self-Checking ..................................................................................... 19 
1.4.7 Types of BFM tests ............................................................................. 20 
1.5 Ethernet ................................................................................................... 21 
1.5.1 Physical layer (PHY) .......................................................................... 22 
1.5.2 Reconciliation Sublayer (RS) and 10 Gigabit Media Independent 
Interface (XGMII)................................................................................................... 23 
1.5.3 XGMII data stream ............................................................................. 24 
1.5.4 Data link layer ..................................................................................... 25 
1.5.5 Media Access Control (MAC) ............................................................ 25 
1.6 Small Form Factor Pluggable Module (SFP+) ....................................... 27 
  7 
1.7 Altera Stratix® V GX FPGA (5SGXEA7N2F45C2) ............................. 27 
1.7.1 Intellectual property core .................................................................... 28 
2 Practical part 29 
2.1 Verification IP module ............................................................................ 29 
2.1.1 Structure of verification IP modul ...................................................... 30 
2.1.2 Generator (TX_MODEL) ................................................................... 31 
2.1.3 Monitor (RX_MODEL) ...................................................................... 34 
2.2 Synthesizable tester ................................................................................. 37 
2.2.1 Generator ............................................................................................ 38 
2.2.2 Monitor ............................................................................................... 41 
2.2.3 Funcional verification of synthesizable design ................................... 42 
3 Conclusion 45 
Literature 46 
Definitions and Acronyms 48 
Figures 49 
APPENDIX 50 
  8 
INTRODUCTION 
With increasing demands for higher data flow, it became evident that more complex 
digital devices were needed to satisfy these demands. Each of these devices is required 
to pass rigorous verification process before it’s released to the market. Devices need to 
pass multiple stages of verification and each and every stage is equally important. The 
whole verification process is done using specialized computer softwares, that initially 
were not complex enough to support this kind of in-depth testing, so they had to be 
significantly improved upon in order to be viable as testing environment. The key to 
successful verification is strict adherence to the standards, which go into detail on how 
the process should be undertaken. 
It is estimated that the verification and test represents as much as half of all the 
work that goes into completing the project. As mentioned before, there are multiple 
stages of verification  that the newly-created IP needs to go through. One of the initial 
stages of the process is called “functional verification”. The main goal of this stage is to 
determine, without a doubt, that the circuit actually works. If the circuit passes this 
initial stage, it can proceed to the simulation stage, which is done in netlist. Netlist 
simulation of the circuit is done on a logical level, which means that this test is typically 
done when the application-specific integrated circuit is being produced. Furthermore, 
hardware tests come after the manufacturing of the application-specific integrated 
circuit (ASIC). Their purpose is to test the frequency, voltage level, interference, etc. 
The main goal of the thesis is to verify non-synthesized Intellectual Property (IP) 
module and to test the implemented IP module on the FPGA circuit. Verification and 
testing is done on the IP module that has XGMII interface. The company that offers IP 
modules for FPGA Stratix V GX circuit is Altera. The goal of this thesis can be divided 
in two parts. The first part will consist of the design of the non-synthesizable bus 
functional model, which is written in VHDL. 
The second part will consist of the design of the synthesizable tester and its 
implementation into the FPGA. The IP Models are used by the company TTTech 
Computertechnik AG (we will use the short name of TTTech further) to verify the 
devices that use the Deterministic Ethernet. 
The thesis is divided in five sections. Section two is purely theoretical, which 
means that it explains how the Digital design, Verification, Design methodologies, 
Testbench, Ethernet, the SFP+ input module, and Altera’s FPGA kit work. Section three 
consists of two subsections: First will contain the verification IP model of the generator 
and the monitor, and the second is consisting of the design and the implemention 
synthesizable tester into the FPGA.   
 
  9 
1 THEORETICAL ANALYSIS 
There are many different methods used for verification and testing of integrated circuits, 
some of them are: mechanical, optical, electrical, and software method. These methods 
are used in research and design of integrated circuits. The language that is used the most 
when designing and verifying integrated circuits is called a Hardware Description 
Language (HDL). The advantages of using these languages are: increased production 
speed, accurate verification, and lower cost. 
The fact of the matter is that the devices that are designed in HDL need to be 
verified. This verification can have several stages, for example: after the code is written, 
the program proceeds to verify whether the syntax is valid, afterwards, the programmer 
must check their functions for errors so that they can proceed to more detailed checks. 
After the completion of the behavioral check, the new device needs to be checked for 
delays and whether its function is in the correct time interval.  
1.1 Digital design 
With increasing complexity and gate count it became impossible for designer to design 
the digital circuit manually, so a new way of designing digital circuits had to be 
developed. The solution to this problem became defining a hierearchical design 
abstraction. High-level abstraction is focused and contains only the most vital data, low-
level abstraction is more detailed and takes account of previously ignored 
information.[1]  
Four layers are defined in digital design (Transistor level, Gate level, Register-
transfer-level, Processor level). 
1.1.1 Types of abstraction 
Transistor level is the lowest level of abstraction, which describes the interconnection 
between basic electrical structures such as transistors, capacitors and resistors.[1]  
 
 
 
 
 
 
 
 
 
Figure 1-1: Timing characteristic of transistor and gate level [1] 
  10 
Gate level describes interconnection between logic gates, simple logic gate 
circuits, such as and, nand, or, xor and 1bit 2to1 multiplexer, and basic memory 
elements, such as latch and flip-flop.[1]  
Signal is interpreted by binary logic (log0 and log1) which enables the use of 
boolean logic operations which simplify circuit calculation significantly. 
 The difference between transistor and gate level can be seen in Figure 1-1, which 
demonstrates that gate level abstraction does not take level transition skews into 
account.  
Register transfer level (RTL) is an abstraction with basic building blocks: 
functional units (e.g. adders, comparators, …), storage components (e.g. registers, flip-
flops) and data-routing components(e.g. multiplexers).[1] 
“The data representation at the RTL becomes more abstract. Signals are frequently 
grouped together and interpreted as a special kind of data type, such as an unsigned 
integer or system state. The behavioral description at this level uses general expressions 
to specify the functional operation and data routing, and uses an extended finite state 
machine (FSM) to describe a system designed using RTL methodology. 
A major feature of the RTL description is the use of a common clock signal in the 
storage components. The clock signal functions as a sampling and synchronizing pulse, 
putting data into the storage component at a particular point, normally the rising or 
falling edge of the clock signal. In a properly designed system, the clock period is long 
enough so that all data signals are stabilized within the clock period. Since the data 
signals are sampled only at the clock edge, the difference in propagation delays and 
glitches have no impact on the system operation. This allows us to consider timing in 
terms of number of clock cycles rather than by keeping track of all the propagation 
delays.”([1], page 12) 
Processor level is a highest level of abstraction, where system is described without 
taking timing and logical level into consideration. The coding is the same as in 
conventional languages (Java, C++,…). Basic blocks are IP modules. IP module written 
in this level can be implemented on several parallel pieces of hardware which can then 
communicate using standard interface.[1] 
Note: Some sources use term „behavioral“. 
1.1.2 Hardware description languages (HDL) 
Digital system (circuit) can be described using various levels of abstraction. This 
abstraction is interpreted using HDL. HDL, as the name implies, describes hardware. 
This makes coding in it very different from conventional programming languages. The 
most used HDL are VHDL (Very High Speed Integrated Circuit HDL) and Verilog 
developed in 1980s. This thesis uses VHDL. 
VHDL is used for two purposes: Input to a simulator, Input to a synthesizer. 
Describing input to a simulator means describing model, by which a digital design 
is verified or for creating a test bench (an environment, where a device is connected and 
tested). Writing this code is done in highest abstraction level and cannot be used as a 
synthesizable code.[1]  
  11 
Input to a synthesizer describes digital circuits to be synthesized. It consists of 
entity, which describes Input/Output signals and architecture, where RTL level is 
designed. RTL describes basic blocks: state machines, multiplexers and logic gates. 
Using modern synthesizers this code is transformed into lower level of abstraction – 
gate level or transistor level.[1]  
1.1.3 Classification of device technologies 
There are many technologies for designing and manufacturing integrated circuits. One 
technology characteristic is a circuit manufacturing process.  
Production can be completed using the so called “in the field” production process. 
This allows us to implement a new IP core, or to repair the old one. On the other hand, 
for some applications a new device needs to be manufactured. This device cannot be 
changed just by implementing a new IP core; instead, it needs to be manufactured again. 
[1] 
Devices are divided by their manufacturing technology: 
ASIC (Application-Specific Integrated Circuit)  
This technology is meant for one specific application. It is necessary to maintain 
complete control over the circuit during its manufacturing process; this means that the 
circuit needs to be checked on the layout level (last level before manufacture). The 
circuit needs to be optimized for maximum performance. The ASIC manufacture 
process is very complex and expensive and it is mostly used in the manufacture of small 
circuits.[1] 
CPLD (Complex field-Programmable Logic Device) 
In this technology, a device consists of an array of generic logic cells and general 
interconnect structure, this circuit is called “in the field”. Logic cells and general 
interconnect structure are manufactured in a way that allows for reprogramming. By 
programming logical cells we receive a higher level of abstraction; this means that we 
can see these cells as switches or fuses. By combining several switches we get a logical 
circuit.[1] 
Programmable Logic Device is normally constructed as a two-level array, with an 
and plane and an or plane.The interconnect of one or both planes can be programmed to 
perform a logic function expressed in sum-of-product format. One of the disadvantages 
of this gate is that it does not have a general interconnect structure, and thus their 
functionality is severely limited. Complex field-Programmable Logic Device was 
invented in order to correct this disadvantage. Logical cells in CPLD are much more 
sophisticated when compared to the logical cells present in PLD; they normally consist 
of a D-type flip-flop and a PAL-like unit with configurable product terms. CPLD 
connection pattern is made to be reprogrammable, which represents a huge step forward 
in this technology.[1] 
FPGA (Field Programmable Gate Array)  
FPGA has basically the same structure as CPLD, with the exception that the cell is 
usually manufactured to be smaller. Main cells of a FPGA device are D-type flip-flop 
and a small look-up table or a set of multiplexers. The connection between cells is 
  12 
manufactured to be more flexible than in CPLD. 
The tendency while developing ASIC is the creation of an IP core, which is tested 
in an FPGA device.[1] 
1.2 Verification 
Verification is often compared with test bench, or a series of test benches, in a sense, 
verification is a process that proves that the implementation is correct. Because the term 
“verification” is used differently in different applications (e.g. the word “verification” is 
also used in testing of physical elements), this thesis will use the term to mean the 
verification of an IP core. As we mentioned before, verification is a process where 
different kinds of verification exist: Formal Verification, Model Verification, Functional 
Verification, etc.[2]  
1.2.1 Verification plan 
The success of the verification project is a correct approach towards the verification 
plan. A good plan consists of a detailed goal, all kinds of measurements (e.g. test 
programs ModelSim, etc.), and a realistic time estimate. A further advantage of a good 
plan is that it allows engineers to begin working on the creation of a functional model 
before they begin designing the IP module itself. One of the options on how to continue 
with the verifications is displayed in the Figure 1-2.[3]  
A verification plan can come in many different forms, for example, it can be in the 
form of a spreadsheet, a document or a simple text file. The best case scenario is when 
the same type of plan is used company-wide; this greatly improves the speed by which 
engineers can search for documentation, in case that they need information.[2], [3]  
 
 
 
 
 
 
 
 
 
 
 
 
I’ll show one example of how one should approach the matter of creating a 
verification plan [3]: 
Figure 1-2: Verification plan [3] 
  13 
 Overview 
 Resources, Budget and Schedule 
 Verification Environment 
 Verification Flow 
 Feature Extraction 
 Stimulus Generation Plan 
 Checker Plan 
 Coverage Plan 
 Details of reusable components 
1.2.2 Types of Verification 
Different sources define verification differently, furthermore, there is not one agreed 
upon standard on how a device can be verified. This thesis will use the following two 
aspects: functionality and timing verification. Functionality verification checks whether 
the IP module is functioning correctly, whereas timing verification is represented as 
certain timing constraints.[2], [3] 
Formal verification 
Formal verification is classified as functional verification, because it confirms if 
the code is correct and whether the synthesizing of the aforementioned RTL code was 
correct. When designing an ASIC circuit, it is necessary to consider whether the 
synthesized result is correct, i.e. it is necessary to consider formal verification. 
The role of formal verification is twofold, first of which is equivalence checking 
and the second is model checking. 
„Equivalence Checking compares two netlists to ensure that some netlist post-
processing, such as scan-chain insertion, clock-tree synthesis or manual modification, 
did not change the functionality of the circuit.“([2], page 9) The other use is needed 
because of the  necessity to check whether the RTL code is written correctly.[2] 
Model checking is relatively new and its main purpose is to check, if the output 
from the module is correct; for example, model checker is used to warn that collision is 
possible, if we have some state machines which have some unreachable states.[2] 
Today, there are many powerful programs used for synthesization programs 
(translation from higher to lower levers of abstraction). Some of these programs are ISE 
Design Suite and QUARTUS II, by the companies XILNIX and Altera, respectively. 
They offer tools which have formal verification, which enables them to correctly 
synthesize IP model to their FPGA devices.  
Funcional Verification  
As already mentioned, the main goal of functional verification is to verify the 
correct function of the proposed IP module. This verification can be realized by sending 
stimuli to the design under test (we will use the short name of DUT further) input and 
by using monitored output signals it is possible to verify if the circuit is correct. It is 
  14 
possible to realize this verification by writing input combinations in test bench, which is 
monitored in the simulation program. The more sophisticated way is to create a model 
which would then create stimuli and monitor the output by itself. This model can, when 
necessary, to check whether the simulation was successful. The creation of a model and 
related testing is in some sources referred as Verification Model (more on model 
creation in 1.4).[2]  
Functional verification is used in higher levels of abstraction of a given IP module 
(RTL level and behavioral level), because the test itself is not as demanding. Functional 
verification of an IP module is realized by introducing stimuli onto the input signals and 
by monitoring the responses of a given IP model on the output, without taking delays 
into consideration. Stimuli are created in a generator, e.g. generator can be written in 
hardware verification language (HVL) (object-oriented language, like OpenVera, e, 
SystemVerilog, etc.) and using the driver, they are adjusted to the input (translated from 
the behavioral level to RTL level). Generator can be written in HDL languages as well, 
which means that the stimuli do not have to be adjusted to the IP module. Monitor 
model can be written in both HDL and HVL, the same as the generator.[2] 
Note: More about abstraction, in chapter 1.1.1. 
Timing verificatiom 
Timing verification checks whether the system has reached its goal; this is 
determined by maximum delay, or minimal frequency. The delay in behavioral 
abstraction, or RTL, can be calculated by the number of components that are present 
within the circuit, because each of these components has its own delay which is written 
in the libraries. With further synthesisation (if we reach the gate or transistor level), we 
already reach some rough estimates and we must start taking into consideration 
conductor delays when in gate level, and conductor length, when in transistor level.[1]  
1.2.3 Typs of device under test 
There are three types of devices being tested: black-box, white-box, and grey-box. 
Black-Box  
Black-box DUT is performed without any knowledge of the actual implementation 
of a design. Interface of this device is standardized, which facilitates easy testing 
without necessitating previous knowledge of the devices design. The disadvantage of 
the black-box is that it is difficult for the designer to precisely locate errors when 
performing error assessment, or worse, if it comes to delays, the task of actually finding 
where the delay has occurred is all but impossible. Their one advantage is in their price, 
which means that they can be used several times.[2] 
White-Box 
As their name suggests, the designer has a complete view of the device’s layout. 
The advantage of this layout lies in that it can be set up in an interesting combination of 
states and inputs quickly, or in its ability to isolate a particular function. The 
disadvantage of this model is that it cannot be used for low-level abstraction testing, 
such as gate-level or transistor-level.[2]  
 
  15 
Grey-Box 
Grey-Box verification represents a compromise between the black-box and white-
box. It is mostly used to increase the coverage. While the model is not completely 
known to the designer, the access and top level is, which enables them to further change 
the device dynamic (e.g. speed up a real-time counter, force the raising of exception or 
modify the size of the processed data to minimize verification time).[2]  
1.2.4 Hardware Verification Languages (HVL) 
Hardware Verification Languages are object-oriented languages. Their advantage is that 
they can work with complex data types, inheritance, and polymorphism which allows us 
to easily create a non-synthesizable model for functional verification of an IP module 
(e.g. bus functional model). The verification using HVL it is possible to create random 
sources which can then verify most input combinations. By using a random generator it 
is possible to find holes in the system, which are easily rectified by adding coverage; 
adding coverage completes the verification. The disadvantage is that it is not possible to 
verify low-level abstractions, synthesizable, digital circuits.[2] 
The following languages are available on the market: SystemVerilog, OpenVera, 
SystemC, e, etc. Their purpose is largely the same, with the difference being in that each 
of syntax and constructs for structuring code (architecture, class, model, unit…).  
HDL will be used because of the company TTTech which provides the license. 
1.2.5 Coverage 
Coverage is a tool which shows which part of our defined coverage groups needs to be 
covered (what test scenario we are missing).This methodology was used in software 
engineering for the first time. With the use of higher level abstraction HDL, this 
metodology started to be useful also in a digital design verification. One additional 
reason in favor of using coverage is that it helps the engineers to determine whether the 
design is correct, or not, during verification.[2]  
Functional Coverage Metrics’s role is to limit the code which describes the IP 
module and the state machines (FSM), which can be in an unreachable state.[2], [4]  
1.3 Design methodologies  
When designing a new application it is important to formulate a plan on how to 
continue when it comes to design, verification, and lastly testing. The main issue is 
deciding which design methodology is going to be used, top-down, or bottom-up. 
Which strategy is more advantageous depends on: integration density, 
performance, and packaging advantages and enable product differentiation in features, 
functions, size, and cost.[5]  
1.3.1 Bottom-up 
In bottom-up design methodology, the design team starts with the subsystem design or 
  16 
Figure 1-3: Testbench and DUT 
 
system components (blocks in synthesizable design). Critical components are used by 
the entire team, which is why they are created first, and the rest of the components are 
created alongside each other. Each block is designed and verified/tested according to 
personal preference. Afterwards, all blocks are brought together in order for the entire 
system to be verified/tested.[5] 
The advantages of bottom-up are that it allows the designer to concentrate on the 
critical areas of the system and that the components which are manufactured in such a 
way can be reused. Disadvantages are: price, difficult communication between 
designers, etc.[5] 
1.3.2 Top-down 
The specification of a top-down methodology allows the creation of a behavioral model 
which is divided on smaller blocks; afterwards, a synthesizable IP core is created. These 
smaller blocks are created with regards to the entire system. 
The advantages of such methodology lie in the fact that the team can work on 
designing and verifying/testing simultaneously; the team  is able to analyze trade-offs in  
system performance, partitioning, and packaging.[5] 
1.4 Test bench (TB) 
Test bench is in effect a simulation code which is used to send the predetermined input 
signal to Design Under Test/Verification (DUT/V, here in after DUT), and to observe 
the response, if at all necessary. Figure 1-3 shows how a test bench interacts with 
DUT.[2]  
 
 
 
 
 
 
Designers usually have only three stimuli sending techniques available [6]: 
1) Raw test vectors (RTV) 
2) Complete functional model (CFM) 
3) Bus functional model (BFM)  
By using a series of test benches, it is possible to verify the functional model of IP 
module. This is one of the most used methods for verification. 
 
  17 
1.4.1 Raw test vectors (RTV) 
Raw test vectors are input signals which are precisely defined according to the pattern, 
which are then sent to the DUT. When the input, and function, which is in the DUT, are 
both known, one can expect accurate outputs, in which case the DUT is verified. RTV 
are found to be useful in small systems, however, as the design scale increases, the 
number of test vectors increases, which means that the number of errors and difficulty 
of writing increases as well.[7]  
One of the advantages of HDL is that it offers a behavioral description, which does 
not necessarily need to be synthesizable.[7] 
1.4.2 Complete functional model (CFM) 
Complete functional model, is the model that has the same internal operations as the 
component being modeled. You can either write it yourself, which is very complex, or 
you can simply buy an existing one. After the writing or purchase of a complete 
functional model, the user is confronted with two issues: first, the information about the 
internal components of a complex IP model is proprietary and thus, not available to the 
user, and second, the fact that the complete functional model. CFM is very complex 
because of the fact that engineer needs a lot of hours to write it. The increase in man 
hours is directly proportional to the increase in price.[7] 
1.4.3 Bus functional model (BFM) 
Bus functional model explains the behavior of the part at the interface-level (bus 
transaction level), without internal modeling. BFM represents a compromise between 
CFM and RTV, it is not as complex as the complete functional because it can be created 
directly using the interface information which is explained in the data sheet. It is a lot 
simpler to write and maintain when it is compared to the RTV because it is so hard to 
write all the combinations. BFM has the advantage of being much faster, smaller, and 
cheaper when compared to the CFM, and when compared to the RTV, it is much easier 
to search for errors.[2], [7] 
Like its name implies, it is a model, which means that it has the highest level of 
abstraction (behavioral). It is non-synthesizable and can be written in both HDL and 
HVL. HVL have the option of a lot more sophisticated design of BFM (e.g. random 
functions, which are easier to create in HVL), with a caution that the model must have a 
translator to HDL. 
The basics of a BFM can be broken down into two elements: stimulus generator to 
the DUT and response monitor from the DUT as shown in Figure 1-4. 
 Stimulus generator is designed in a way so it can be used as an input 
signal for DUT. Some of the methods of design are: CFM, RTV, BFM… 
 Response monitor is a device that checks the validity of data received from 
DUT. [7] 
Since the first task of this thesis is the creation of a BFM model for standard 
interface XGMII, I will deal with the types of BFM, Generator, Monitor, Self-test, etc. 
in more detail. 
  18 
 
 
 
 
 
 
 
 
1.4.4 Stimulus Generator 
As it was mentioned in the previous chapter, stimulus generator is a main block. Its role 
consists of creating the stimuli and sending them to the standardized interface in the 
right order. 
Each DUT input is connected to the generator output. 
A definition or standard that states which types of generator can be designed does 
not exist; it all depends on the application. For the sake of simplification, we will divide 
generators on regular and random.[3]  
Regular stimulus generators can have simple stimulus, which are the same as in 
RTV, (more in 1.4.1) and complex stimulus. In complex stimulus, the timing, or 
stimulus itself, depends on answers from the DUT. The user has full control over the 
time of input signals. [1]“However, if the interface being driven contains handshaking 
or flow-control signals, the generation of the stimulus requires cooperation with the 
design under verification (test).“ ([3], page 262-263) 
Random stimulus generator has found use in verification/testing of more bit 
inputs, e.g. 64bit means 264-1 combinations, which is understandably very difficult to 
test. This kind of testing is done using random numbers and statistical calculations. 
Random generator can be used in random delays as well, which can occur on the chip. 
The simplest random generator is Linear Feedback Shift-Register (LFSR). The 
shift register is a set of registers that are connected in such a way that it shifts the 
information from register N to register N+1, with each signal clock. Through shift-
register feedback we get circle shift-registers. These registers can, on special occasions, 
“xor” each other which enables them to generate random numbers. The following 
literature is recommended in order to facilitate understanding of the topic: Efficient 
Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators from 
Xilinx. [8] LFSR generator has one disadvantage: it cannot generate combinations of 
zeroes.  
1.4.5 Response Monitor 
Reactions of the DUT need to be validated somehow; one of the ways to accomplish 
this is writing output signals into the file during the simulation. When the simulation 
reaches the correct model, this file is saved as a so called “golden file”, to which all 
Figure 1-4: Basic Test Bench Block Diagram 
  19 
following design modifications are compared. While this way is possible, it is, however, 
plagued with multitude of disadvantages: human error (e.g. if the values contained 
within the golden file are correct), large number of files due to the size of the model 
could mean that too much of the disk’s memory is being used, difficulties in comparing 
the delay due to the differences in the length of idle state.[7]  
A better way to approach data control is to automate the control of DUT output 
data, but the best way is to use test bench self-checking (more about self-checking in the 
following chapter), which should be able to understand the protocol or timing of the 
DUT, check for violations which may arise, and be easily written so that all response is 
accounted for in some manner. Self-checking removes the human factor and thus it 
validates impuls (stimuli) functions.[7]  
Sometimes we are not interested in output signal delays; instead, we want to know 
if these output signals are correct. In this case, the resource monitor is expanded by one 
memory unit (e.g. FIFO) and then the signals are saved in FIFO, where they can be 
compared to the input signal with no regards to time. Figure 1-5 shows an example on 
how to create a response monitor that is expanded by a single memory unit. [2] 
 
1.4.6 Self-Checking 
In verifying or testing, efforts are made towards reducing the human factor and 
increasing speed; the solution to this lies in self-checking. This technique lets the test 
bench to detect errors and declare success or failure on its own. By coding error 
detection into the test bench lets the engineer to dedicate their time to other projects; 
mean while self-checking is completing fully automated simulations on its own. 
There are many different methods fo self-checking: Hard Coded Response, Data 
Tagging, Reference Models, Transfer Function, Scoreboarding, Golden vectors, etc. 
The next part will describe only self-checkings that are pertinent to this thesis, for easier 
understanding see Literature. [2] 
Data Tagging 
Lots of designs use input information only for processing of other information; 
sometimes this information can be changed, while other sections are left untouched.  
These data sections are called payload and the term packet or frame is often used to 
describe the unit of data processed by the design. First priority is to process appropriate 
data in order for the designer to get to the payload. Anyway, payload is not the 
information about the expected destination, position and transformation for this packet. 
The output monitor uses information from the payload for each sent packet, and 
afterwards it decides if the packed was correctly processed.[2]  
The examples of these applications being used are found in Ethernet hubs, IP 
Figure 1-5: Response Moniotor and FIFO [2] 
  20 
routers, switches, and SONET framers.[2]  
Scoreboarding 
Scoreboard is a data structure which compares data. Figure 1-6 shows that input 
data goes through auxiliary “transfer function” (more in literature[2]) which fulfills the 
same role as DUT; while on the other side we have the response monitor which checks 
and then sends data to the scoreboard. If the right data arrives at the right time, 
scoreboard detects them as valid, if they do not arrive correctly, scoreboard reports an 
error. 
Due to the non-existence of any standards or definitions on how the scoreboard 
needs to look, the scoreboard is designed according to the needs of self-checking.  
 
Figure 1-6: Scoreboard with FIFO [2] 
1.4.7 Types of BFM tests 
Driving the BFM From External Files 
One of the methods is to launch the files that contain the Hardver Verification 
Language (HVL) code. This code is organized as a model-specific format that the BFM 
reads and does operations from, much like low-level assembly code which uses an 
operand type and data to accomplish an instruction. The code generally resides in a text 
file and is read into the BFM during simulation time all at once or at a line per 
instruction depending upon how the model is written.[7]  
HVL Code 
One of the best uses for HVL code is the command and use of special BFM 
operations. It also enables maximum amount of freedom for an engineer to expand the 
code, which also means that they have to be mindful of VHDL code capabilities. HVL 
is also not without its flaws: File I/O is slow, and complex operations (e.g. read, modify, 
and write) are not possible, Synchronization of multiple BFMs is complex and 
cumbersome, sampling and driving signals in the test bench are difficult. One of the 
main drawbacks of using HVL code is that the HVL code must be “knowledgeable” of 
data operations. Communication of HVL over the multiplied BFMs is another of its 
  21 
problem. 
Another method for the use of HVL is the compilation of the HVL code into an 
object, which is understood by BFM. This method is practical because it can be used in 
many different applications that are written specifically for these objects. [7] 
Complex BFM  
Using a pure VHDL procedural-driven methodology eliminates HVL code hazards 
by embedding the test code, written in VHDL, into the test bench and utilizing the full 
power of the VHDL event-driven simulator, as it is shown in Figure 1-7. 
 
 
 
 
 
 
 
 
 
The use of packages, types and records enables the engineer to design the BFM in 
high level of abstraction. Using high abstraction through the architecture enables the 
engineer to design a more complex test, and to disregard the primitive functions.[7],[8] 
1.5 Ethernet 
Computer networks that are used in local and metropolitan communication are generally 
called Ethernet. Ethernet was created in 1973, in Robert Metcalfe’s doctoral 
dissertation. Back then it was called ALOHAnet, which was later developed by the 
company Xerox PARC.[10] 
The usage of IEEE standard is responsible for the commercialization of Ethernet. 
This commercialization was greatly accelerated by the computer industry’s meteoric 
rise. The cooperation of three companes: Digital Equipment, Intel, and Xerox, resulted 
in the creation of the IEEE 802 standard in the March of 1980. This collaboration was 
mentioned in THE INSTITUTE magazine [10]. Rapid advances in communication 
technology resulted in the creation of some additional standards: IEEE 802.1, 802.2, 
802.3, 802.11, 802.15 etc. The most important standard for this thesis is IEEE 802.3. 
This standard defines the inner workings of Ethernet and its uses, more precisely the 
two lowest layers in the OSI model (physical and data layer). 
OSI model was standardized by the International Organization for Standardization 
(ISO), under the designation X.200 [12]. This model defines seven layers, as it can be 
Figure 1-7: Test Bench Components [7] 
  22 
Physical
Data Link
Network
Transport
Session
Presentation
Application
OSI Model TCP/IP Model
HTTP, DNS,
SMTP, POP3,
RTP, SSH,
DNS...
TCP, UDP,
SPX
IP, IPX,
Apple Talk
Ethernet 802
Token ring, PPP
seen in the Figure 1-8. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Standard X.200 also defines the interlayer communication, boundary creation, 
sublayer creation, layer function, etc.[11]  
1.5.1 Physical layer (PHY) 
Physical layer is the lowest layer in the OSI system. The physical layer converts the 
high and low electrical layers to bits. Its main purpose is: to perform physical 
connecting and disconnecting, when full multiplex connection is not available, the 
checking of sent and received information, the multiplexing of incoming bits, and 
management of the physical layer, as defined in standard [13]. This thesis will focus on 
the transfer of data via the 10Gbps Ethernet connection and the explaining of the IEEE 
defined sublayers. 
In order to more easily explain the following sublayers to the reader, basic 
functions that are defined by the standards will be used.  
Physical Medium Attachment (PMA) 
Physical Medium Attachment sub layer has six distinct functions: PHY reset, PHY 
Control, PMA Transmit, PMA Receive, Link Monitor and Clock Recovery, which is 
defined in standard [13] clause 55.4.2. Its role is to temporally synchronize signals that 
are sent or received to MDI connector using MASTER/SLAVE technology. 
The main role of PMA is explained by the Altera StratixV device, which is data 
serialization/deserialization. In essence, this means that during the transmission (TX) it 
is converting parallel data streams to serial data streams, whereas upon receiving it is 
converting serial data streams to parallel data streams. Clock data recovery (CDR) must 
be preformed in both directions. 
 
Figure 1-8: ISO/PSI model 
  23 
 
Physical Coding Sublayer (PCS) 
Physical Coding Sublayer is defined by clause 55.3, which calls upon clause 46 of 
standard [13] in which the function XGMII is explained. Because aforementioned 
subject is not the main subject of this paper, only the main functions “transmit” and 
“receive” will be explained. 
 The role of transmit function is to connect eight data octets, which are coming 
from XGMII, and to complete checked Ethernet payload (CRC8). Adding check bits 
yields a CRC-checked Ethernet payload of 50 65 + 8 = 3258 bits. Receive function’s 
role is to divide received 65 bits into two groups: RXD<31:0> and RXC<3:0> and to 
send them to XGMII. The rest of the bits are CRC8. Two XGMII data transfers are 
decoded from each block. 
1.5.2 Reconciliation Sublayer (RS) and 10 Gigabit Media Independent 
Interface (XGMII)  
RS and XGMII are located between data and physical layer, this allocation also 
recognizes that implementations can benefit from a close coupling between the PLS 
sublayer or PCS and the PMA sublayer which is defined in standard [13].ISO/IEC 
(IEEE) OSI reference model is shown in Figure 1-8. 
The XGMII protocol is one of the most important parts of the ethernet layer for this 
thesis, because the testing of FPGA Stratix V will be performed using this protocol.The 
main aim of this thesis is to test the speed of the data flow through the FPGA at full 
speed. 
XGMII compresses independently sent and received signals. Both directions use 32 
data signals (TXD<31:0> and RXD<31:0>), four control signals (TXC<3:0> and 
RXC<3:0>) and two clocks (TX_CLK and RX_CLK). Signals TXD and TXC are sent 
with each change (rising and faling) in TX_CLK signal; signals RXD and RXC are, 
however, sent with each change in RX_CLK signal. 
 
Figure 1-9 : Reconciliation Sublayer (RS) inputs and outputs [13] 
  24 
Thirty two TXD and four TXC signals are supposed to be organized in 4 lines, as it 
can be seen in Figure 1-9. Four lines in both directions should share clocks (TX_CLK 
for transmit and RX_CLK for receive). Four lines are sequentially sending eight bits 
each, which means that while first one is sending its eight, all the rest are waiting; the 
next one is the second, etc. During the sending, PLS_DATA.request is represented as a 
single octet which is created in PCS. First octet goes to line 0, second goes to line1, 
third to line 2, fourth to line 3, fifth goes to line 0 and so on. Encoding is performed by 
using its own clocks, TX_CLK and RX_CLK, and the control bits, RXC and TXC, are 
used to check whether the transmitted/received bytes are correct. 
Output signals that are going towards the PCS are defined using different 
functions: 
PLS_DATA.request 
PLS_DATA.indication 
PLS_CARRIER.indication 
PLS_SIGNAL.indication 
PLS_DATA_VALID.indication 
These functions are standard; manufacturers are increasing or decreasing the 
number of these functions depending on the supply and demand. 
1.5.3 XGMII data stream 
The data streams sequence of bytes, where each byte conveys either a data octet or 
contorol character. Data stream is defined in clause 46.2 standard [13]. 
One of the most important data streams for this paper is XGMII, therefore it is 
explained in detail in this chapter, as it can be seen in the Figure 1-10. 
Figure 1-10 : XGMII stream [13] 
Inter-frame <inter-frame> 
The XGMII inter-frame period is a period during which parts are transmitted and 
received. This period appears when data-stream frame is lacking. Inter-frame 
corresponds to the inter-packet gap inside MAC. This gap begins with the Terminate 
control character, and it ends with the idle control character. The idle control character 
ends with the priority START control character. The length of the inter-packet gap 
between the transmitted and received MAC can be adjusted using one or more 
functions: RS lane alignment, PHY clock rate compensation, or 10GBASE-W data rate 
adaptation function. Minimum amount of IPG in XGMII that goes into RS is five octets. 
Inter-frame or idle specified in [13] is the bit sequence: 
00000111 
 
  25 
Preamble <preamble> and start of frame delimiter <sfd> 
The preamble <preamble> begins a frame transmission by a MAC as specified in 
[14] and when generated by a MAC consists of 7 octets with the following bit values: 
10101010 10101010 10101010 10101010 10101010 10101010 10101010 
The start of frame delimiter <sfd> indicates the start of a frame and immediately 
follows the preamble. The bit value of <sfd> at the XGMII is unchanged from the Start 
Frame Delimiter (SFD) specified in [14] and is the bit sequence: 
10101011 
Data <data> 
In order for data to be well shaped within the frame, they need to consist of a set of 
octets. By RXC and TXC we have the opportunity to recognise the start and end of the 
data. 
End of frame delimiter <efd> 
Using TXC clocks and terminating control character, which is received by 
encoding TXD, we receive information about the data frame delimiter <efd> for the 
transmit data stream.  
End of data stream is determined by RXC and Terminate control character 
encoding of RXD. 
 XGMII should be able to recognize the end of frame delimiter on any of the four 
lanes of XGMII. EFD or terminate specified in [13] and is the bit sequence: 
11111101 
1.5.4 Data link layer 
Data link layer is the second layer in the OSI system. That is a protocol layer, in which 
communication between nodes is defined. One of its roles is to check received frames 
and to correct them, if there are any errors that occurred in the physical layer. Data link 
layer enables the Network layer to control the validity of data that are arriving from the 
physical layer. Data link is defined by standard [15]. 
In order to understand the work of Ethernet, this chapter explains the work of data 
link layer, which could be tested using BFM tester with minor modifications. 
1.5.5 Media Access Control (MAC) 
Clause 2, 3, and 4, standard [14] define Media Access Control (MAC) sublayer. MAC 
clients and Link Control (LLC) sublayer are sublayers of Data link layer. MAC role is 
to bridge LLC layer to the physical layer. The services provided by the MAC sublayer 
allow the local MAC client entity to exchange LLC data units with peer LLC sublayer 
entities. After rebooting, MAC should continue on its familiar state. 
MAC frame and packet 
Clause 3 standard [14], specifies three types of MAC frames:  
a) A basic frame 
  26 
b) A Q-tagged frame 
c) An envelope frame 
All three types are using the same frame format, and all of that three types of 
packets can be implemented into a given BFM tester, if need be. 
Frame is divided in: Preamble, Start Frame Delimiter (SFD), the addresses of the 
MAC frame’s destination and source, a length or type field to indicate the length or 
protocol type of the following field that contains the MAC client data, a field that 
contains padding if required, and the Frame Check Sequence (FCS) field containing a 
cyclic redundancy check value to detect errors in a received MAC frame. 
 Preamble field is a seven-octet field which is used to allow the MAC circuitry to 
permit the synchronization of received packets (see 1.5.3). 
Start Frame Delimiter (SFD) field is the sequence 10101011 which follows the 
immediate change of pattern of received bits, that is, it checks whether the received bits 
are the same as that octet and if they are, it reports the beginning of the frame. 
Address fields each MAC frame has two addresses: Destination Address field and 
the Source Address field, respectively. Both addresses must have the length of 48 bits. 
First bit, or the LSB, is used to mark whether the address is individual (which marks it 
as 0) or group (which marks it as 1). Second bit in the address shows whether the 
address is global (0), or local (1). Each address is sent in such a way that its first bit is 
LSB.  
Destination Address field specifies the exact MAC to which the frame is sent. A 
MAC sublayer address is one of two types: 
a) Individual Address. The address associated with a particular station on the 
network. 
b) Group Address. A multidestination address, associated with one or more 
stations on a given network. There are two kinds of multicast addresses, 
Multicast-Group Address and Broadcast Address. 
The Source Address field specifies the station sending the MAC frame.This 
address is not interpreted by the MAC. 
Lenght/Type field, this two-octet field takes one of two meanings, depending on 
its numeric value. For numerical evaluation, the first octet is the most significant octet 
of this field. Its values are:  
a) If it is less than or equal to 1500 decimals (05DC hexadecimal), than these 
octets are represented as standard frames. 
b) If greater than or equal to 1536 decimals (0600 hexadecimal), than that 
means that they are reported as mutually exclusive. 
MAC Client Data field contains a sequence of octets and all of those octets cannot 
pass beyond certain frames. Ethernet contains at least one of three MAC Client Data 
field sizes:  
a) 1500 decimal-basic frames  
b) 1504 decimal-Q-tagged frames  
 
  27 
c) 1982 decimal-envelope frames 
Pad field is the field that is in use to create the minimum size of frame that is 
necessary to fix CSMA/CD protocol. If needed, pad field can be added after MAC 
Client Data field before the FCS field begins to be calculated.  
Frame Check Sequence (FCS) field, cyclic redundancy check (CRC) is used in 
sent and received algorithms in order for them to be generated for FCS field. FCS field 
contains 4 octets (32 bits) of CRC value. In FCS field, the total sum value of bits which 
are contained within: the Destination Address, Source Address, Length/Type field, 
MAC Client Data, and Pad.[16] 
Extension field follows the FCS field and is a sequence which was created by 
expanding the MAC Client Data over the permitted limit. 
1.6 Small Form Factor Pluggable Module (SFP+) 
SFP+ is a transceiver that is used in telecommunications and data transfer and is used 
for transferring the signal between the physical layer and the network cable. If the cable 
in question is optical, that means that within the SFP+ module exists a laser which is 
used to transmit and receive the given information on a specific wavelength; with 
different wavelengths, come the differences in information sending range, which can be 
seen in the standard [17].Labels for different wavelengths are designated by the 
manufacturer, for example: Agilestar [18] has: SX -850nm, LX 1310nm,ZX 1550nm, 
EZX 1550nm... Transfer of information can be half-duplex or full-duplex, the difference 
is that with full-duplex transmited and received information are being transferred over a 
single cable, whereas in half-duplex they are being transferred over two: one to 
transmit, and the other to receive information. 
XSF (10 Gigabit Small Form Factor Pluggable) is an addition to the SFP+ 
standard. Those standards are catalogued in document SFF 8431 and that is an addition 
to the SFF80 standard. 
SFP+ specification is important for this paper because the I/O of a given FPGA 
will be connected via loop, in order to test the validity of a given FPGA circuit. 
1.7 Altera Stratix® V GX FPGA (5SGXEA7N2F45C2) 
Altera StratixV GX is an important FPGA circuit for this thesis, because it will be used 
to implement the synthesizable tester. IP core can be verified by using the non-
synthesizable BFM.  
BittWare's S5-PCIe-DS (S5PE-DS) is a PCIe x16 card featuring two high-
bandwidth, power-efficient Altera Stratix V GX or GS FPGAs. Designed for high-end 
applications, the Stratix V provides a high level of system integration and flexibility for 
I/O, routing, and processing. The S5PE-DS provides up to 64 GBytes of DDR3 
SDRAM as well as options for RLDRAM3 and QDRII+. Providing additional 
flexibility are four front-panel QSFP cages, allowing 4 40GigE interfaces, 16 10GigE, 
or 4 QDR/FDR InfiniBand interfaces direct to the FPGAs built in PHYs for the lowest 
possible latency. With almost 2 million logic elements available (952,000 per FPGA), 
  28 
the board is ideal for high-performance computing, and with the reduced latency 
provided by the network interfaces, ideal for high frequency trading, 
military/government agency secure communications, and network processing 
applications.For more information [19]. 
 
 
 
 
 
 
 
 
 
1.7.1 Intellectual property core  
Intellectual property core, IP core, or IP module is a reusable unit of logic, cell, or chip 
layout design. IP core is divided on Soft and Hard core. Soft IP core is a plain or 
encrypted netlist that is licensed by the company which holds the rights to it; hard IP 
core is a building block of a logical circuit (ASIC or FPGA).  
Altera offers various standardized building blocks for a given FPGA circuit. Some 
of them are: Physical IP core, Media Access Control IP core, etc. PHY IP core and 
MAC IP core can be verified using BFM, since they communicate using standardized 
XGMII, which will be used in this thesis. More about PHY IP core [20], more about 
MAC IP core [21]. 
Main software that is going to be used for generating IP module will be Quartus II, 
by Altera. More information about this software can be found in the data sheet [22]. 
 
Figure 1-11: Block diagram Stratix V [19] 
  29 
2 PRACTICAL PART 
The goal of this thesis is to explore available options regarding automatical testing of 
10GbE XGMII interface and to propose suitable testers. Testing is divided into two 
parts: first is the non-synthetizable IP module, which is another name for verification IP 
module, and the second is synthetizable IP modul, which is alternatively called 
synthesizable tester. Testers are created for XGMII interface (more in 1.5.2) and they 
can be used for MAC and PHY layer testing. (more in chapter 1.5)  
When it comes to creating a new application, it is very important to plan the 
verification and testing of the device in advance. Standardized interfaces (e.g. XGMII, 
XAUI...) and devices (e.g. 10Gb Ethernet) support simultaneous work on design and 
verification during testing. By creating a proper verification plan (an example on how to 
create a verification plan can be seen in 1.2.1), verifiers can then proceed to create the 
IP module. As it was already mentioned, the goal of this thesis is to complete the 
verification of a non-synthesizable IP module (more in 1.2.2) and to test the 
synthesizable IP module on an FPGA circuit by Altera Stratix V (more in 1.7). 
Digital design is, in most cases, written in HD language (more in 1.1.2) on the RTL 
level. This level is synthesized, which means that it is transferred on a lower level, such 
as gate and transistor level (more on abstraction in 1.1.1). Verification is a process, 
which means that each level must be tested and verified (steps on how to go about this 
can be found in chapter 1.2). There are some programs (e.g. ModelSim, Quartus II, ISE 
Design Suite, Vivado Design Suite, etc) available on the market that automate several 
steps in verification (Formal verification). This thesis, however, will not mention formal 
verification because it is not the focus of this thesis. 
2.1 Verification IP module  
Verifying the functionality of DUT is one of the basic steps in verification. Functional 
verification is done in test bench. Test bench uses several methods: raw test vectors 
(RTV), complete functional model (CFM) in combination with bus functional model or 
solely with bus functional model (more in chapter 1.4). 
The most suitable verification IP module for PHY level testing via XGMII 
interface for this thesis is bus functional model (BFM), due to it being a compromise 
between RTV and CFM. RTV is simple enough to describe, however, it has a difficult 
debugging process and creating CFM for the entire PHY level is a lot more difficult.   
Non-synthesizable IP module is created in the highest level of abstraction (process 
level), which allows for behavioral description of the function. The chosen methodology 
for creating the non-synthesizable IP model is bottom-up (more in 1.3.1), since it is 
easier to design each BFM subsystem (Generator and Monitor) separately and then add 
further functions according to need, which is demonstrated in the next sub-chapter. 
We can see the PHY layer as a black box (more about black box in 1.2.3), because 
the IP module that is created is universal; it is compatible with all kinds of 10Gb 
Ethernet (10GBASE-W, 10GMASE-S, etc.) 
  30 
2.1.1 Structure of verification IP modul 
Figure 2-1 shows the basic structure of verification IP module. These blocks are 
instanced in test bench and then interconnected. They consist of: DUT, BFM, 
xgmii_bfm_pkg, and test_procedure. DUT is in this plan used as a short circuit, due to 
the testing and validation of BFM function. BFM consists of two subsystems (more 
about BFM in 1.4.3): Generator, which is in this case called TX_MODEL (more about 
this in 2.1.2) and Monitor, or, in this case RX_MODEL (more about this in sub-chapter 
2.1.3). 
Note: Additional literature is recommended due to the complexity of the VHDL’s 
behavioral description [22].  
 
xgmii_bfm_pkg library contains (more on libraries, record, procedure and VHDL 
function in literature [22]) the following: 
 basic constants, record XGMII data stream (more about xgmii stream in 
1.5.3) 
 functions and procedures (to_string – converts the entire XGMII stream to 
string, FIFO memory- has following tasks: push, pop, get fifo depth a clear 
data)  
Note: („The difference between the two is that a procedure encapsulates a 
collection of sequential statements that are executed for their effect, whereas a function 
encapsulates a collection of statements that compute a result.”([22], page 207) 
Example of main XGMII stream is described in the following record: 
---------------------------------------------------------------------- 
type tXGMIISyncDataFrame is record 
preambule :std_logic_vector((C_PREABULE_BYTS*C_DATA_BITS-1downto0);     
sfd       : std_logic_vector((C_SFD_BYTS*C_DATA_BITS)-1downto0);  
data      : tFrameData;      
Figure 2-1: Verification of DUT by non-synthesizable IP modul 
 
  31 
terminate :  std_logic_vector((C_TERMINATE_BYTS*C_DATA_BITS)-
1downto0);        --end of frame delimiter 
 
len       : natural;        -- Data Bit Len 
err       : boolean;         -- Error detected 
err_cycle : integer;             
end record; 
---------------------------------------------------------------------- 
tp_xgmii_tx/rx_bfm_pkg is a test procedure. It is used in order to facilitate the 
calling of procedures from the generator, monitor, or from both. Function of test 
procedure could be described just by using test bench; it is best to avoid this however, in 
favor of better visibility. 
 This next code contains an example of a process in test procedure: 
---------------------------------------------------------------------- 
  testProcedure_p: process 
  begin 
 
 enable_scoreboard(handle => hXGMII_RX_BFM(0)); 
wait for 100ns; 
 
 send(preambule => X"AAAAAAAAAAAAAA",sfd => x"AB",data => 
X"0123456789ABCDEF0123454816",terminate => X"FD",handle => 
hXGMII_TX_BFM(0),blocking => false,ctxdv_deassert => -1); 
  
add_expected(data=>X"0123456789ABCDEF0123454816",handle=>hXGMII_
RX_BFM (C_XGMII_RX_BFM),sfd => X"AB",terminate => X"FD",err => 
false,err_cycle => -1); 
  
 wait for 1us; 
  -- TEST DONE 
  test_finished <= true; 
  wait; 
  end process testProcedure_p; 
---------------------------------------------------------------------- 
2.1.2 Generator (TX_MODEL) 
Generator contains two subsystems: xgmii_tx_bfm and xgmii_tx_pkg. Its main goal is 
to send the correct impulses in the correct order. xgmii_tx_pkg is a library in which 
procedures are defined (a procedure is similar to a function, more about their 
differences in 2.1.1). Calling of the procedure is done in tp_xgmii_rx/tx_bfm_pkg. 
Functions called in this manner are executed in the BFM model architecture 
(xgmii_tx_bfm) via package (xgmii_tx_bfm_pkg). 
As was already mentioned, xgmii_tx_bfm_pkg is a library that contains 
procedures. Procedures can be overloaded, which means that we can use the same name 
for more than one procedure; the only thing that needs to be different are the input 
signals (e.g. send(std_logic, integer, natural); send(boolean, std_logic_vector);). 
xgmii_tx_bfm_pkg contains public and internal procedures. Public procedures 
communicate with external packages (e.g. tp_xgmii_tx/rx_bfm_pkg), whereas internal 
communicate with instanced packages like CoveragePkg, xgmii_bfm_pkg, and 
RandomPkg. The main overloaded procedure is send, which in the first case has a 
clearly written frame, whereas in some other cases it can send random data or have a 
  32 
random send time. 
This thesis features only basic functions, due to the difficulty in implementing 
more complex ones. 
The following example is that of a send function: 
---------------------------------------------------------------------- 
-- Send Frame 
procedure send ( 
     constant preambule   : 
std_logic_vector((C_PREABULE_BYTS*C_DATA_BITS)-1 downto 0) := 
X"AAAAAAAAAAAAAA";     
      
     constant sfd : std_logic_vector((C_SFD_BYTS* C_DATA_BITS)-1 
downto 0) := X"AB"; 
     
     constant data           : std_logic_vector; 
     constant terminate   :     
std_logic_vector((C_TERMINATE_BYTS*C_DATA_BITS)-1 downto 0):=X"FD"; 
     
     signal   handle         : inout tBFMHandle; 
     constant blocking       : boolean := false; 
     constant ctxdv_deassert : integer := -1      
   ) is 
      variable frame    : tXGMIISyncDataFrame; 
   begin 
     frame.preambule    := preambule; 
      frame.sfd                        := sfd; 
      frame.len                        := data'high-data'low+1; 
      frame.data(frame.len-1 downto 0) := data;  
     frame.terminate         := terminate; 
      frame.err                 := false;-- Request Transaction 
Sending 
      BFMCmd.op                        := SEND; 
      BFMCmd.frame                     := frame; 
      BFMCmd.ctxdv_err                 := ctxdv_deassert; 
      -- Wait until BFM is ready to process command 
      if handle.ready /= '1' then 
         wait until handle.ready = '1'; 
      end if; 
      -- Send Command Request 
      handle.req  <= '1'; 
      -- Wait for acknowledge when blocking 
      if blocking then 
         wait on handle.ack; 
      else 
         wait for 0 ns; 
      end if; 
      -- Deassert request 
      handle.req    <= '0'; 
      -- Wait for delta 
      wait for 0 ns; 
end send; 
------------------------------------------------------------------- 
xgmii_tx_bfm  is the most vital part of this part of the thesis. It contains the exact 
send procedure of XGMII stream. As soon as the procedure in xgmii_tx_bfm_pkg is 
called, the architecture that checks signal handle is called; it then starts to send data 
  33 
according to the procedure that is defined by the standard.  
Example of sending the XGMII stream is given below: 
------------------------------------------------------------------- 
when  SEND =>    
  
frame_len := 
local_params.frame.len+local_params.frame.sfd'length+ 
local_params.frame.preambule'length;   --length in bits   
send_data:= (others => '0');  
  
send_data(frame_len-1 downto 0) := local_params.frame.data 
(local_params.frame.len-1 downto 0) & local_params.frame.sfd & 
local_params.frame.preambule;  
 
for i in 0 downto 11 loop   
  wait until rising_edge(xgmii_clk);  
 end loop; 
 wait until rising_edge(xgmii_clk); 
 if frame_len > 0 then  
  line_x:=0; 
  for i in 0 to (frame_len/C_DATA_BITS)-1 loop  
   if i = 0 then 
RXD(C_DATA_BITS-
1downto0)<=send_data(C_DATA_BITS-1 downto0);  
    RXC(line_x) <= '1'; 
   else   
    line_x:= (i mod C_LINES); 
 
RXD(line_x*C_DATA_BITS+C_DATA_BITS-1 downto 
line_x*C_DATA_BITS) <= 
send_data(i*C_DATA_BITS+C_DATA_BITS-1 downto 
i*C_DATA_BITS); 
   
    RXC(line_x)<= '0'; 
    if line_x = C_LINES-1 then 
     wait until rising_edge(xgmii_clk); 
    end if; 
   end if; 
  end loop; 
  line_x:=line_x+1; 
RXD(line_x*C_DATA_BITS+C_DATA_BITS-1 downto 
line_x*C_DATA_BITS) <= local_params.frame.terminate; 
  RXC(line_x)<= '1'; 
  for j in line_x+1 to C_LINES-1 loop 
           RXD(j*C_DATA_BITS+C_DATA_BITS-
1downtoj*C_DATA_BITS)<=X"07"; 
     RXC(j)<= '1';   
        end loop; 
  wait until rising_edge(xgmii_clk); 
  RXC <= (others => '1'); 
       RXD <= X"07070707_07070707";  
end if; 
------------------------------------------------------------------- 
Simulation  
Generator check is performed by calling a simple procedure send in 
tp_xgmii_tx/rx_bfm_pkg. The stimuli are checked using the ModelSim simulation 
  34 
program (Figure 2-2). 
2.1.3 Monitor (RX_MODEL) 
The main role of the monitor is to distribute received data onto the correct parts of 
XGMII stream (onto the SFD, preamble, data, etc.), to save them to memory (more in 
chapter 1.4.5) and to finally automatically compare the correctness of data (more about 
scoreboard in 1.4.6). 
Monitor is similar to the generator in that it has two subsystems, xgmii_rx_bfm a 
xgmii_rx_bfm_pkg. Calling of the procedure is performed in test procedure. Both the 
generator and then the monitor can be called in the same test procedure (more examples 
in chapter 2.1.1). After the procedure, the architecture is called next, which then 
immediately begins to monitor DUT output. 
xgmii_rx_bfm_pkg contains internal and public procedure, same as in generator 
model (more in previous chapter). Internal are just get a set. They are used solely for 
internal communications in BFM. Public procedures can check both the entire stream 
and only one part of the stream. The advantage of bottom-up design is clearly displayed 
here, because xgmii_rx_bfm_pkg can be expanded according to the needs of the test. 
 Example of a procedure: Wait for Preambule, Wait for SFD, Get Last Received 
Frame, Add Expected Transaction, Enable Scoreboard, Disable Scoreboard, etc. One 
possibility of the overload procedure add_expected is shown in the following code: 
---------------------------------------------------------------------- 
-- Add Expected Transaction (No Time Compare)    
procedure add_expected ( 
      constant data       : std_logic_vector;  -- Frame Data 
      signal   handle     : inout tBFMHandle; 
       
  constant sfd  : std_logic_vector((C_SFD_BYTS*C_DATA_BITS)-1 
downto 0) := X"AB";  -- Frame sfd 
    
  constant terminate  : 
std_logic_vector((C_TERMINATE_BYTS*C_DATA_BITS)-1 downto 0) := 
X"FD";           -- Frame terminate 
      constant err        : boolean:= false;        -- Frame Error 
  constant err_cycle : integer:= -1  
  -- Clock cycle in which was error asserted 
      ) is 
   begin 
     BFMCmd.op                           := ADD_EXPECTED; 
     BFMCmd.expected_tr.frame.preambule  := X"AAAAAAAAAAAAAA"; 
Figure 2-2: Siganls checked by ModelSim 
  35 
  BFMCmd.expected_tr.frame.terminate    := terminate; 
     BFMCmd.expected_tr.frame.sfd        := sfd; 
     BFMCmd.expected_tr.frame.len        := data'high-data'low+1; 
     BFMCmd.expected_tr.frame.data(BFMCmd.expected_tr.frame.len-1 
downto 0) := data; 
     BFMCmd.expected_tr.frame.err        := err; 
     BFMCmd.expected_tr.frame.err_cycle  := err_cycle; 
     BFMCmd.expected_tr.exp_sim_cnt_from := 0; 
     BFMCmd.expected_tr.exp_sim_cnt_to   := 0; 
     handle.req        <= '1'; 
     wait on handle.ack; 
     handle.req        <= '0'; 
     wait for 0 ns; -- Wait for delta cycle        
end add_expected; 
------------------------------------------------------------------- 
xgmii_rx_bfm is an architecture which is comprised of three processes, the most 
important of which is monitor. Its role is to check DUT output signals, to distribute 
them to the correct parts of the stream (SFD, preamble, data, terminate, idle), after it 
receives standardized pattern bits (more about standardized combinations in 1.5.4) and 
finally to save them into FIFO memory. FIFO memory is located in xgmii_bfm_pkg 
(more about xgmii_bfm_pkg 2.1.1). The role of the command process is to switch 
between requested procedures. These procedures then report (using package report) 
about whether they were correctly executed and whether they were correct. Scoreboard 
process is used to compare saved to received data. When exactly the scoreboard starts to 
compare depends on the procedure. 
The following example contains partial code of the monitor process: 
-- ---------------------------------------------------------------- 
-- Monitor 
-- ---------------------------------------------------------------- 
   monitor_p: process 
       … 
      
  while TXC /= X"FF" loop  
    if line_x = C_LINES then 
   wait until rising_edge(TX_CLK); 
   if TXC = X"FF" then 
     … 
     -- ------------------------------------------- 
     -- Update Last Frame 
     last_frame      := observed_tr.frame; 
     last_frame_valid:= true; 
     … 
    else  
   if j < C_PREABULE_BYTS then 
    i:=i+C_DATA_BITS; 
 
observed_tr.frame.preambule(i-1 downto i-
C_DATA_BITS):= 
TXD(line_x*C_DATA_BITS+C_DATA_BITS-1 downto 
line_x*C_DATA_BITS);  
     
line_x:=line_x+1; 
   elsif j < C_SFD_BYTS+C_PREABULE_BYTS then 
  36 
    k:= k+C_DATA_BITS; 
 
observed_tr.frame.sfd(k-1 downto k-
C_DATA_BITS):=TXD(line_x*C_DATA_BITS+C_DATA_BIT
S-1 downto line_x*C_DATA_BITS);  
     
line_x:=line_x+1; 
   else 
       len:=len+C_DATA_BITS;  
if TXC(line_x) = '1' and 
TXD(line_x*C_DATA_BITS+C_DATA_BITS-1 downto 
line_x*C_DATA_BITS)= X"FD" then 
 
      wait until rising_edge(TX_CLK); 
 
    end if; 
     
observed_tr.frame.data(len-1 downto len-
C_DATA_BITS) := 
TXD(line_x*C_DATA_BITS+C_DATA_BITS-1 downto 
line_x*C_DATA_BITS); 
   … 
   end process monitor_p; 
------------------------------------------------------------------- 
Simulation  
Provided that the generator model (TX_MODEL) is created correctly, one 
add_expected function is added to the tp_xgmii_tx/rx_bfm_pkg. Simulation state 
(which data in what order were received by the monitor) can be displayed by using 
package report_pkg. Figure 2-3 shows correctly received stream, whereas Figure 2-4 
shows error stream.  
 
Figure 2-3: Correct data stream 
  37 
Figure 2-5: Block diagram of synthesizable tester  
2.2 Synthesizable tester 
It is important to show full functionality of the IP module (RTL code), which is going to 
be implemented into FPGA circuit, in digital circuit design. After functionality has been 
proven, powerful tools (e.g. Quartus II, ISE Design Suit) are used to synthesize a given 
IP model. For this kind of design it is necessary to create synthesizable tester. 
Synthesizable tester is created within the RTL abstraction (more about abstraction 
in 1.1.1), the same as a digital design. The chosen design method is top-down, due to 
the RTL abstraction. The advantage of this method is that the designer has complete 
view of how the design must look like from the very start. The design is then divided 
into smaller blocks (more about top-down method in 1.3.2).  
Due to the existence of different possibilities of I/O signal design, it is necessary to 
know the top structure of the device. e.g. if we use SDR (Single Data Rate) in XGMII, 
verification should find this mis-design and it is possible only with gray type testing, 
since standard says that it is XGMII DDR (Double Data Rate), more in 1.5.2.  
 
 
 
 
 
 
 
 
 
 
 
Figure 2-4: Error data stream 
 
  38 
Figure 2-5 shows a block diagram of a synthesizable tester. Generator is located on 
the receive side of DUT, which is responsible for sending the correct frame header 
(SFD, preamble, terminate), random data and CRC. Monitor is located on the transceive 
side, and it is has the following functions: to sort the received data into the correct order 
and to calculate CRC at the same time. CRC is continuously calculated until the 
information of the end of the frame is received. Afterwards, the calculated CRC is 
compared to the bits of the received CRC (e.g. last 32 bits are CRC). More about CRC 
in chapter 1.5.5.   
Note: The following chapters will be explained using diagrams and state machines. 
Code will be explained in the appendix. 
2.2.1 Generator 
The matter of designing the generator was approached with top-down methodology in 
mind. The first thing that was examined was function, what it needs to complete, how 
the XGMII streams look like, etc. The next step examines which solution was the best; 
only after this last step did we commence with the creation of individual blocks. 
The first step in creation of individual blocks was the examination of IEEE 802.3 
standard (see [11], [10], [14]) and the detailed examination of XGMII stream. 
Afterwards, the idea was that generator and monitor will not have any signals used for 
the communication, except of the communication by DUT (e.g. so that the generator 
can be implemented on one device, and the monitor on the other). This means that the 
checking had to be completed by means different than by using the scoreboard. One of 
the most well-known techniques in used for Ethernet checking is CRC (Cyclic 
Redundancy Check) (more about CRC in chapter 1.5.5). The second idea was to create 
data randomly, in order to improve DUT checking. In order to bypass the complexities 
of the devices, the LSFR random generator will be used (more about LSFR generator in 
1.4.4).   
Figure 2-6 shows the first design of the generator. LSFR and CRC32 (32bit) would 
be connected. LSFR random generator would then create 8bits, which are supposed to 
be sent to FIFO memory. Here, we would use special kind of FIFO memory, which has 
an 8bit input and a 64bit output. CRC would be calculated until it receives the 
information that the FIFO memory is almost full, after which it would be sent. It would 
be necessary to create eight block combinations (LFSR, CRC + FIFO), in order to fulfill 
the necessity of sending data in each clock. Control_FIFO would control all block 
combinations. 
The advantages of a generator created in this manner are: great speed, high 
reliability when creating data, great speed of the LFSR. Disadvantages are: it takes up a 
lot of space, complex design. 
Second generator design can be seen in the Figure 2-7. It is comprised of a random 
generator (LFSR), 32bit cyclic redundancy check (CRC), control block and mux. When 
the signal enable_send is configured to log1, control block begins to send information to 
the CRC32 bit block that it needs to begin calculating. As soon as the data is ready, the 
control block controls frame_mux which then switches between parts of XGMII stream. 
The advantages of this kind of generator are: it is less complex when compared to the 
previous design, it occupies less space on the chip; one disadvantage is that it is 
  39 
rst
clk
set
CONTROL_FIFO
RXC
8
ready q_0err_statusGen_en
set
clk
rst
rdreq
RXD
64
RX_CLK
FIFO_0
(8->64)Generator_0
(LFSR,CRC32)
clk rst clk rst
counter
len
rand_data rand_data
q_0
8
64full full
sof
eof eof 
sof
wrreq wrreq
ready err_statusGen_0_en rdreq
FIFO_n-1
(8->64)Generator_n-1
(LFSR,CRC32)
clk rst clk rst
counter
len
rand_data rand_data
q_n-1
8
64full full
sof
eof eof 
sof
wrreq wrreq
ready err_statusGen_n-1_en rdreq
q_n-1
...
unusable in higher speeds.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This thesis will use the second design, due to its relative simplicity when compared 
to the first design. 
Generator inputs are (Figure 2-8):  
 frame_length (16 bit vector) – it sets frame length, basic frame is 1500 
bytes long (more in chapter 1.5.5). Generator is set up to round up to the 
integer multiply of 64bit (X*64 = length of frame). 
 enable_send – request to begin the transaction 
 seed – random generator seed 
 clk – clock signal of the whole digital design 
 reset – reset signal 
 Generator outputs are: 
 busy – signal that tells us that the transaction is ongoing 
 RXC – receive 10GbE control 8 bit vector (according to the standard it is 
4bit, more in 1.5.2), because SDR clock will be used 
 RXD – receive 10GbE data 64 bit vector (according to the standard it is 
32bit, more in 1.5.2), because SDR clock will be used  
 RX_CLK – device clock signal  
Figure 2-6: Generator with FIFO  
  40 
Figure 2-8: CONTROL_GEN 
LFSR random generator that was downloaded from the following web site [24] 
and was created using Xilinx’ design [8] is not the subjsect of this thesis. 
CRC 32 bit was downloaded from the following web site [23], due to it being a 
standardized circuit which is not the subject of this thesis. 
 
 
CONTROL_GEN is limited by the fact that it can only send intiger multiplies of 
the 64 bit frame. Figure 2-8 shows state machines which, when a change in enable_send 
signal occurs, then start to send preambles and SFD, data, and finally CRC and 
Figure 2-7: Generator block diagram 
GEN_CRC32
    
CONTROL_GEN
 FRAME_MUX
LFSR
reset
rand_data
enable_send
enable_send
frame_length 
frame_length 
rxc
rxd
rx_clk
11
32crc_out
crc_result
clk
clk
reset
reset
resetclk
64
64
64
8
set_seedset_seed
data_in
select
selectrand_data64
crc_enable
crc_en
efd
idle
pream_sfd
2clk
seed
seed
rand_data = data_in
busy
busy
CONTROL_GEN
en = 1
len = auxalary signal
len <= 0
ST_0
(Idle) 
rst
len := 0
RXC = (others=>1)
Set_seed= ‘1’
crc_en = ‘0’
sel= “00”
busy=’0’
en = 0
ST_2
(data) 
len := len-8
RXC = (others=>0)
Set_seed= ‘0’
crc_en = ‘1’
sel= “10”
busy=’1’
len >0 
ST_1
(pream_sfd) 
len := frame_len
RXC = ((0)<=1, (others=>0))
Set_seed= ‘1’
crc_en = ‘0’
sel= “01”
busy=’1’
ST_3
(crc32_efd) 
len := 0
RXC = ((3downto0)<=0, (others=>1))
set_seed= ‘1’
crc_en = ‘0’
sel= “11”
busy=’1’
  41 
Figure 2-9: Monitor block diagram 
MONITOR
CONTROL_MON
restart
clk
TXC
8
TX_CLK
rxc
TXD
MON_CRC32
crc_out
clk rst
reset
crc_enable
crc_enamble data_in
64
64
rxd
32
crc_result
err_efd
crc_data
err_control 
2
err_idle
err_preambule
err_sfd 
err_data
terminate. 
FRAME_MUX is used to send the correct parts of frame when a change occurs in 
signal select. 
2.2.2 Monitor 
Monitor is designed using the same mindset that has already been used in generator 
design (more in the previous chapter), in order to accomplish the communication 
between the generator and monitor, the DUT was used. Monitor, as seen on Figure 2-9, 
is comprised of two blocks: CRC32 (which has the same code as the CRC32 that is 
found in the generator) and CONTROL_MON. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Input signals are:  
 TXC – transceive 10GbE control 8 bit vector (according to the standard, its 
4bit, and more in 1.5.2). SDR clock will be used because of this. 
 TXD - transceive 10GbE control 8 bit vector (according to the standard, its 
4bit, and more in 1.5.2). SDR clock will be used because of this. 
 TX_CLK – clock output from DUT. 
Output signals declare whether the elapsed test was valid. 
 err_control – two bit vector (“00”-off, “01”- idle, “10”- receiving, “11”- 
received) 
  42 
rst
ST_0
(Idle) 
RXC = (others => ‘1’)
rst
CONTROL_MON
RXC(0) =  ‘1’ &
RXC = (others => ‘0’) RXC = (others => ‘0’)
RXC = (any = ‘1’) &
RXD = “FD”
RXC = (others => ‘0’)
RXC = (any = ‘1’) &
RXD = “FD”
others
others
crc_enable = ‘0’ 
err_control = “01”
err_idle <=(couner >=con_IPG)?1:0
err_preamble<=(couner >=con_IPG)?0:last.st
err_sfd <=(couner >=con_IPG)?0:last.st
err_data<=(couner >=con_IPG)?0:last.st
err_efd <=(couner >=con_IPG)?0:last.st
ST_1
(pream_sfd) 
crc_data <=(txc=0)?txd:0
crc_enable =(txc=0)?1:0 
err_control = “10”
err_idle = ‘0’
err_preamble=con_pre?1:0
err_sfd=con_sfd?1:0 
len <= 0
ST_2
(data) 
crc_data <=(txc=0)?txd:0
crc_enable =(txc=0)?1:0 
err_control = “10”
err_idle = ‘0’
ST_3
(crc32_efd) 
crc_data <=(txc=0)?txd:0
crc_enable = ‘0’ 
err_control = “11”
err_idle = ‘0’ 
err_data=con_result_crc32?1:0  
err_efd=con_teminate?1:0
 err_idle – its role is to check if the inter frame gap, of minimum 5 clocks, 
was correct (if >=5; err_idle = ‘1’, if < 5; err_idle = ‘0’) 
 err_preamble – ‘1’ means correct, ‘0’ means wrong 
 err_sfd - ‘1’ means correct, ‘0’ means wrong 
 err_efd - ‘1’ means correct, ‘0’ means wrong 
 err_data – if the calculated, monitor, CRC32 is the same as the sent, 
generator one, then err_data = ‘1’, if it is not the same err_data =’0’ 
Main role of CONTROL_MON is to divide the received frame on to correct parts 
and to compare them to the correct result (e.g. received SFD can be compared to the 
standard hexadecimal number ‘AB’). Its other function is to send data (RXD) to CRC32 
block, after receiving preamble beginning (RXC(0) = ‘1’) and to compare the calculated 
CRC to the last 32 bits, after the data ends (RXC(any) = ‘1’ & RXD = x’FD’). All this 
is explained in the state diagram in Figure 2-10. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2.2.3 Funcional verification of synthesizable design  
RX_MODEL is used for functional verification of synthesizable design (of Generator 
and Monitor). The first thing that was monitored using RX_MODEL during the design 
of the generator was the output. Generator was considered as DUT (shown in Figure 
2-11), and the validity of the function was tested using RX_MODEL and ModelSim. 
Figure 2-12 shows a correctly sent frame with random data. Processes clk and rest are 
Figure 2-10: CONTROL_MON 
  43 
Figure 2-12: Generator siganal from ModelSim 
created within the testbench (they are the basic processes of the testbench). 
Monitor verification was performed using the synthesizable generator, 
RX_MODEL and ModelSim. Block diagram in Figure 2-11 shows the testing of the 
monitor. The resulting ModelSim bit stream proves that the monitor is valid (Figure 
2-13). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2-11: Funcional  verification of synthesizable design 
TESTBENCH
RX_MODEL
64
8
TXC
TXD
TX_CLK
RXC
RX_CLK
TX_MODEL 64
8
RXD
GENERATOR
seed
frame_length
enable_send
busy
clk
16
MONITOR
err_control
err_idle
err_preambule
err_sfd
err_data
err_efd
2
clk restart
restart
  44 
 
 
 
 
 
 
 
Figure 2-13: Monitor siganal from ModelSim 
After the entire diagram is implemented (Generator + Monitor), logic utilization is 
357 ALMs, which is less than 1% of the total ALMs, which is a satisfactory value. Used 
pins are at 198, which is 19%. Improvements to used pins can be made using 
standardized interface (e.g. UART), however this is not the matter with this thesis. More 
information on devices and occupation is in Figure 2-14. 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2-14: Flow summary 
  45 
3 CONCLUSION 
The main goal of this thesis was the testing of a 10Gbps Ethernet device, which means 
that the first step towards the complition of this goal was the studying of the standards. 
The standard which explained the first two layers of the OSI system is the IEEE 802.3. 
This standard was studied in detail and was used to explain the basic functions and the 
way the subsystems work. The XGMII interface, which was used to connect layers, will 
be used in the same way as it was used during the verification of the Ethernet. 
After the meticulous studying of the standards, the problem of different ways of 
verification and testing were approached. First part of the thesis dealt with the 
functional verification of the 10Gbps Ethernet device; the second part dealt with the 
testing of the synthesizable IP core. 
Bus Functional Model was chosen as the most suitable model of verification. BFM 
consists of two parts: TX_MODEL (generator) and RX_MODEL (monitor). BFM was 
used to demonstrate the creation and calling of procedures in VHDL. Program 
ModelSim was used to prove that the procedures were correct and that they could be 
used for functional verification of the PHY and MAC layers. 
Synthesizable tester was comprised from two parts: generator and monitor. The 
main idea behind the creation of the tester was to have the generator and monitor 
communicate only via standardized XGMII interface. Generator was used to send the 
correct frame packet, and random data, which were created by using the LFSR 
generator. The last 32 bits of data were CRC, which recalculated sent data. Monitor 
received the XGMII stream, and afterwards it divids them into smaller parts of the 
packet (Preamble, SFD, data, EFD). After the XGMII stream has started to be received, 
the monitor starts to recalculate CRC and finally it compares its recalculated CRC to the 
one that was sent from the generator. RX_MODEL was used for verification of 
generator by ModelSim, whereas the monitor was verified by functional generator and 
the ModelSim. Tester was implemented using QUARTUS II in FPGA Stratix V circuit. 
This has shown that the tester occupies only 462 ALM‘s, which is less than 1%. 
BFM has no limitations, and improvements in procedures can be implemented 
according to application needs. The limitations in synthesizable tester lie in the fact that 
it could be used to only send packets that are 64bit integer multiplies. Improvements in 
synthesizable tester can only be made by making a small adjustment in the 
CONTROL_GEN block within the generator. Models and testers could be used in all 
10Gbps Ethernet devices. 
The goal of this bachelor thesis was completed, the design is functional and it 
allows for the verification and testing using generic modules that can be used as a 
device self-test. 
 
  46 
LITERATURE 
[1] CHU, Pong P. RTL hardware design using VHDL: coding for efficiency, portability, and 
scalability. Hoboken, N.J.: Wiley-Interscience, c2006. ISBN 9780471720928 
[2] BY JANICK BERGERON. Writing Testbenches: Functional Verification of HDL Models. 
Second edition. Boston, MA: Springer US, 2003. ISBN 9781461503026. 
[3] PHASES OF VERIFICATION: Verification Plan.Http://www.testbench.in: Systemverilog 
for Verification [online]. 2016 [cit. 2016-05-31]. Available from: 
http://www.testbench.in/TS_22_PHASES_OF_VERIFICATION.html 
[4] Coverage Analysis Techniques for HDL Design Validation[online]. Taiwan, 2001 [cit. 
2016-06-01]. Available from: http://front.cc.nctu.edu.tw/Richfiles/6502-apchdl99.pdf. 
Department of Electronics Engineering National Chiao Tung University. 
[5] BROWY, Chris, Glenn GULLIKSON a Mark INDOVINA. A TOP-DOWN APPROACH 
TO IC DESIGN: INTEGRATED CIRCUIT DESIGN METHODOLOGY GUIDE [online]. 
V1.4. Texas: Copyright (c), 2014 [cit. 2016-06-01]. ISBN /. Available from: 
http://www.indovina.us/~mai/a_top_down_approach_to_ic_design.pdf 
[6] PrimeNet: Designing Procedural-Based Behavioral Bus Functional Models for High 
Performance Verification [online]. Florida: PrimaNet, 1999, 2015-12-17 [cit. 2015-12-17]. 
Dostupné z: http://www.primenet.com/ 
[7] Dedicated System [online]. California: Synapsis, 2001 [cit. 2016-06-01]. Available from: 
http://web.archive.org/web/20040122033338/http://www.omimo.be/Magazine/01q2/2001q
2_p027.pdf 
[8] Efficient Shift Registers, LFSR Counters, and Long Pseudo- Random Sequence 
Generators. In: Http://www.xilinx.com/ [online]. U.S.A: XILINX, 1996 [cit. 2016-06-01]. 
Available from: 
http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf 
[9] Universal Verification Methodology: (UVM) 1.2 User’s Guide. 2. /: Accellera, 2015. 
[10] Theinstitute: IEEE 802 Committee Celebrates 30th Anniversary [online]. /: IEEE Xplore 
Digital Library, 2010, 2010-6-5 [cit. 2015-12-17]. Available from: 
http://theinstitute.ieee.org/benefits/standards/ieee-802-committee-celebrates-30th-
anniversary668 
[11] IEEE at a Glance: IEEE Quick Facts [online]. /: IEEE, 2015, 2015-12-17 [cit. 2015-12-
17]. Dostupné z: https://www.ieee.org/about/today/index.html 
[12] Data networks and open system communications: x.200. 1. /: INTERNATIONAL 
TELECOMMUNICATION UNION, 1994. 
[13] IEEE Standard for Ethernet: SECTION 4. 2. New York: IEEE Std 802.3™, 2012. 
[14] IEEE Standard for Ethernet: SECTION 1. 2. New York: IEEE Std 802.3™, 2012. 
[15] ISO/IEC 7498-1:1994. Information technology- Open Systems Interconnection-Basic 
Reference Model: The Basec Model. 2. Switzerland: ISO/IEC, 1994. 
[16] Hammond, J. L., Brown, J. E., and Liu, S. S. Development of a Transmission Error Model 
and Error Control Model. Technical Report RADC-TR-75-138. Rome: Air Development 
Center (1975).  
  47 
[17] SFP+ 10 Gb/s and Low Speed Electrical Interface: SFF-8431. 4.1. Sartogan: SFF 
Committee, 2009. 
[18] USA. 10 Gigabit Ethernet and the XAUI interface: 71612C. In: . /: Agilent Technologies, 
2002, číslo 5988-5509EN. 
[19] From BittWare Inc.: Order Information.Https://www.altera.com [online]. America: Altera, 
2016 [cit. 2016-06-01]. available from: https://www.altera.com/solutions/partners/partner-
profile/bittware-inc-/board/s5-pcie-ds-dual-altera-stratix-v-gx-pcie-board-with-quad-qsfp--
-ddr3--qdrii---and-rldram3.html  
[20] Altera Transceiver PHY IP Core User Guide. In: Https://www.altera.com/ [online]. U.S.A, 
San Jose, CA 951: ALTERA, 2016 [cit. 2016-05-25]. Available from: 
https://www.altera.com/content/dam/altera-
www/global/en_US/pdfs/literature/ug/xcvr_user_guide.pdf  
[21] 10-Gbps Ethernet MAC MegaCore Function User Guide. In: Https://www.altera.com/ 
[online]. U.S.A, San Jose, CA 951: ALTERA, 2016 [cit. 2016-05-25]. Available from: 
https://www.altera.com/content/dam/altera-
www/global/en_US/pdfs/literature/ug/10gbps_mac.pdf  
[22] Quartus II Handbook Volume 1: Design and Synthesis. In: Https://www.altera.com/ 
[online]. U.S.A, San Jose, CA 951: ALTERA, 2015.05.04 [cit. 2016-05-25]. Available 
from: https://www.altera.com/content/dam/altera-
www/global/en_US/pdfs/literature/hb/qts/qts_qii5v1.pdf 
[23] OutputLogic: CRC Generator [online]. WordPress, 2016 [cit. 2016-06-01]. Available 
from: http://outputlogic.com/ 
[24] OpenCores: Projects [online]. Swedish, 2016 [cit. 2016-06-01]. Available from: 
http://opencores.org/ 
[25] ModelSim [online]. U.S.A: Mentor Graphics, 2016 [cit. 2016-06-01]. Available from: 
https://www.mentor.com/products/fv/modelsim/ 
[26] ALTERA: Quartus Prime [online]. U.S.A: Altera, 2016 [cit. 2016-06-01]. Available from: 
https://www.altera.com/downloads/download-center.html 
 
  48 
DEFINITIONS AND ACRONYMS 
Architecture VHDL module body (implementation) 
ASIC Application Specific Integrated Circuit 
BFM Bus Functional Model 
CPLA Complex Programmable Logic Array 
DUT Design Under Test 
e.g. exempli gratia (In English: for example) 
entity Specification of a VHDL module interface (ports and generics) 
etc. et cetera 
FIFO First In First Out 
FPGA Field-programmable gate array 
HVL Hardware Verification Language  
IP Intellectual Property (commonly used for a complex VHDL design) 
IEEE Institute of Electrical and Electronics Engineers 
LSB Least Significant Bit 
MAC Media Access Control 
MSB Most Significant Bit 
OSI Open Systems Interconnection 
PHY Physical Layer 
PLA Programmable Logic Array 
RTL Register Transfer Logic 
RXC Receive Control 
RXD Receive Data 
TB Test Bench 
TXC Transmit Control 
TXD Transmit Data 
signal Basic VHDL interconnect element (used to connect ports between modules 
and device pins as well as to define registers) 
VHDL VHSIC Hardware Description Language 
VHSIC Very High Speed Integrated Circuit 
XGMII 10 Gigabit Media Independent Interface 
  49 
FIGURES 
Figure 1-1: Timing characteristic of transistor and gate level [1] .................................... 9 
Figure 1-2: Verification plan [3] ..................................................................................... 12 
Figure 1-3: Testbench and DUT ..................................................................................... 16 
Figure 1-4: Basic Test Bench Block Diagram ................................................................ 18 
Figure 1-5: Response Moniotor and FIFO [2] ................................................................ 19 
Figure 1-6: Scoreboard with FIFO [2] ............................................................................ 20 
Figure 1-7: Test Bench Components [7]......................................................................... 21 
Figure 1-8: ISO/PSI model ............................................................................................. 22 
Figure 1-9 : Reconciliation Sublayer (RS) inputs and outputs [13] ............................... 23 
Figure 1-10 : XGMII stream [13] ................................................................................... 24 
Figure 1-11: Block diagram Stratix V [19] ..................................................................... 28 
Figure 2-1: Verification of DUT by non-synthesizable IP modul .................................. 30 
Figure 2-2: Siganls checked by ModelSim ..................................................................... 34 
Figure 2-3: Correct data stream ...................................................................................... 36 
Figure 2-4: Error data stream .......................................................................................... 37 
Figure 2-5: Block diagram of synthesizable tester ......................................................... 37 
Figure 2-6: Generator with FIFO .................................................................................... 39 
Figure 2-7: Generator block diagram .............................................................................. 40 
Figure 2-8: CONTROL_GEN ........................................................................................ 40 
Figure 2-9: Monitor block diagram ................................................................................ 41 
Figure 2-10: CONTROL_MON ..................................................................................... 42 
Figure 2-11: Funcional  verification of synthesizable design ......................................... 43 
Figure 2-12: Generator siganal from ModelSim ............................................................. 43 
Figure 2-13: Monitor siganal from ModelSim ............................................................... 44 
Figure 2-14: Flow summary ........................................................................................... 44 
  50 
APPENDIX 
A USER GUIDE 
 
For this thesis we need the following programs: ModelSim (more on address [25]and 
QUARTUS II (more on address [26]). 
 
BFM is in the folder XGMII_BFM_and_Tester => models => xgmii_bfm => src 
Tester is in the folder XGMII_BFM_and_Tester => src  
 
In order to run tester in the program ModelSim is recommended to: 
 
Run the Command Prompt (CMD), adjust the correct folder (e.g. C:\Users\...\Desktop\ 
XGMII_BFM_and_TESTER), then run the setup_env. When the setup_env is compiled 
run the run.bat. After that the run of the ModelSim program is automatic. 
 
For implementation of the Generator and Monitor it is recommended to: 
 
Turn on the QUARTUS II program, open the project XGMII_BFM_and_Tester => impl 
=> result => open top_quartus.qpf . 
 
 
