Processor Microarchitecture for Implementation of Ephemeral State Processing within Network Routers by Muthukumarasamy, Muthulakshmi
University of Kentucky 
UKnowledge 
Theses and Dissertations--Electrical and 
Computer Engineering Electrical and Computer Engineering 
2003 
Processor Microarchitecture for Implementation of Ephemeral 
State Processing within Network Routers 
Muthulakshmi Muthukumarasamy 
University of Kentucky, mmuthulakshmi@gmail.com 
Right click to open a feedback form in a new tab to let us know how this document benefits you. 
Recommended Citation 
Muthukumarasamy, Muthulakshmi, "Processor Microarchitecture for Implementation of Ephemeral State 
Processing within Network Routers" (2003). Theses and Dissertations--Electrical and Computer 
Engineering. 142. 
https://uknowledge.uky.edu/ece_etds/142 
This Master's Thesis is brought to you for free and open access by the Electrical and Computer Engineering at 
UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Electrical and Computer Engineering by 
an authorized administrator of UKnowledge. For more information, please contact UKnowledge@lsv.uky.edu. 
STUDENT AGREEMENT: 
I represent that my thesis or dissertation and abstract are my original work. Proper attribution 
has been given to all outside sources. I understand that I am solely responsible for obtaining 
any needed copyright permissions. I have obtained needed written permission statement(s) 
from the owner(s) of each third-party copyrighted matter to be included in my work, allowing 
electronic distribution (if such use is not permitted by the fair use doctrine) which will be 
submitted to UKnowledge as Additional File. 
I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and 
royalty-free license to archive and make accessible my work in whole or in part in all forms of 
media, now or hereafter known. I agree that the document mentioned above may be made 
available immediately for worldwide access unless an embargo applies. 
I retain all other ownership rights to the copyright of my work. I also retain the right to use in 
future works (such as articles or books) all or part of my work. I understand that I am free to 
register the copyright to my work. 
REVIEW, APPROVAL AND ACCEPTANCE 
The document mentioned above has been reviewed and accepted by the student’s advisor, on 
behalf of the advisory committee, and by the Director of Graduate Studies (DGS), on behalf of 
the program; we verify that this is the final, approved version of the student’s thesis including all 
changes required by the advisory committee. The undersigned agree to abide by the statements 
above. 
Muthulakshmi Muthukumarasamy, Student 
Dr. J. Robert Heath, Major Professor 
Information not available, Director of Graduate Studies 
F 
ABSTRACT OF THESIS 
PROCESSOR MICROARCHITECTURE FOR IMPLEMENTATION OF 
EPHERMERAL ST ATE PROCESSING WITHIN NETWORK ROUTERS 
The evolving concept of Ephemeral State Processing (ESP) is overviewed. ESP 
allows development of new scalable end-to-end network user services. An evolving 
macro-level language is being developed to support ESP at the network node level. Three 
approaches for implementing ESP services at network routers can be considered. One 
approach is to use the existing processing capability within commercially available 
network routers. Another approach is to add a small scale existing ASIC based general-
purpose processor to an existing network router. This thesis research concentrates on a 
third approach of developing a special-purpose programmable Ephemeral State 
Processor (ESPR) Instruction Set Architecture (ISA) and implementing 
microarchitecture for deployment within each ESP-capable node to implement ESP 
service within that node. A unique architectural characteristic of the ESPR is its scalable 
and temporal Ephemeral State Store (ESS) associative memory, required by the ESP 
service for storage/retrieval of bounded (short) lifetime ephemeral (tag, value) pairs of 
application data. The ESPR will be implemented to Programmable Logic Device (PLD) 
technology within a network node. This offers advantages of reconfigurability, in-field 
upgrade capability and supports the evolving growth of ESP services. Correct functional 
and performance operation of the presented ESPR microarchitecture is validated via 
Hardware Description Language (HDL) post-implementation (virtual prototype) 
simulation testing. Suggestions of future research related to improving the performance 
of the ESPR rnicroarchitecture and experimental deployment of ESP are discussed. 
KEYWORDS: Ephemeral State Processing, Ephemeral State Store, Ephemeral State 
Processor, PLD Technology, HDL Virtual Prototyping. 
- - --
► 
PROCESSOR MICROARCHITECTURE FOR IMPLEMENTATION OF 
EPHERMERAL STATE PROCESSING WITHIN NETWORK ROUTERS 
By 
Muthulakshmi Muthukumarasamy 
Director of Thesis 
 
Director of Graduate Studies 
THESIS 
Muthulakshmi Muthukumarasamy 
The Graduate School 
University of Kentucky 
2003 
PROCESSOR MICROARCHITECTURE FOR IMPLEMENTATION OF 
EPHERMERAL STATE PROCESSING WITHIN NETWORK ROUTERS 
THESIS 
A thesis submitted in partial fulfillment of the 
requirements for the degree of Master of Science in Electrical 
Engineering in the College of Engineering 









My sincere thanks and gratitude are due to my academic advisor and thesis 
director, Dr. J. Robert Heath for his guidance and support throughout the thesis. I am 
very thankful for his constant encouragement, suggestions and evaluations, and for the 
help he provided in editing various versions of this thesis. I would also like to express my 
sincerest thanks to Dr. Ken Calvert and Dr. James Griffioen, for providing me an 
opportunity to work on this thesis research and for the support and inspiration they 
provided. I would like to extend my thanks to Dr. Hank Dietz and Dr. William Dieter for 
serving in my thesis committee and providing me with invaluable comments and 
suggestions for improving the thesis and for possible future research. 
I extend my deepest gratitude and thanks to my parents for their support and 
belief in me. My heartfelt thanks are due to my friend Balaji, who provided on-going 
support throughout this thesis and encouraged me in difficult times to complete this 
thesis. 
lll 
TABLE OF CONTENTS 
Acknowledgements .............. .... ..... ......................................... .................................... .. ... iii 
List of Tables ...... ................................ ............................ ..... ... .... ....................... ....... .... viii 
List of Figures ... ..................................................................................... ................... ...... ix 
Chapter One: Introduction ........................................................................ ..... ................... 1 
1.1. Background and Positioning of Research ....................................... ............ 1 
1.2. Goals and Objectives .................................. ........................ .............. .... ...... 4 
Chapter Two: Ephemeral State Processing (ESP) .......................................... .... ....... ..... .. 6 
2.1. Introduction ..... ..... ............... ......... ................................. .............................. 6 
2.2. Ephemeral State Store (ESS) .......... ............................................ ................ 7 
2.3. ESP Packet Format and Processing .. .... ............. ......... ................................ 8 
2.4. Macro Instructions of ESP ............ ..... ........................... ....... ............. .......... 9 
2.5. Example End-to-End Applications using ESP .. .................................... ... . 19 
2.6. Prologue to Ephemeral State Processor (ESPR) ....... ........... ........ ............ .21 
Chapter Three: Ephemeral State Processor (ESPR) ......................... .......... ............ .. ..... .22 
3.1 . ESPR Requirements Summary ......... .. ... ............ .............. ...... ....... ......... ... 22 
3.2. Highest Level Functional Organization ofESPR ..................................... 23 
3.3. Micro Instruction Set Format, Types, Architecture and Definition .......... 26 
3.3.1. Micro Instruction Format ......... .......... ......................... .... ............... 27 
3.3.2. Micro Instruction Types (Classes) ............. .................................... 28 
3.3.2.1. ALU/SHIFT Type Instructions ............................... .. ........ 28 
3.3.2.2. Immediate Type Instructions ... ... ... ..................... .... .......... 28 
3.3.2.3. Branch/Jump Type Instructions ........................................ 29 
3.3.2.4. LFPR/STPR Type Instructions ...... ..... .............................. 30 
3.3.2.5. GET/PUT Type Instructions .................. ................. .......... 31 
3.3.2.6. Packet Related lnstructions .... .. ...... .......................... ... ...... 32 
3.3.3 . Further ESPR Architecture Definition ............................ ............ ... 32 
3.4. Micro Instruction Implementation of ESP Macro Instructions ...... ........... 34 
Chapter Four: Associative Ephemeral State Store (ESS) ............................................... 38 
4.1. ESS Design ... ...................................................... .. ..... ....... ..................... ... .. 38 
4.2. Content Addressable Memory (CAM) ............ ... ........ .... ........................... 39 
IV 
4.2.1. Write Operation ......... ........ .......... ................... ...... ... ..... ........... ....... . 42 
4.2.2. Read Operation ...................... ....... .................. .. ....................... ... ..... 42 
4.3. Random Access Memory (RAM) for storage of Value, Expiration 
Time and Empty bit .... ............................... .... ............................................ 43 
4.4. Expiration Time Calculation Block ...................................... ...... ............... .44 
4.5. Empty Location Calculating Block ..... ........ .............................................. .44 
4.6. ESS Controller ......... ......... ..... ..................................................................... 45 
4.7. Operations performed in ESS and Flow chart ................ ........................... .45 
4.7. l. GET Operation ... .... ............. ... ..... ..... ...... .. .... .. .................................. 45 
4.7.2. PUT Operation ............ ........... .......................................................... 45 
4.7.3. Branch on GET Failed (BGF) / Branch on PUT Failed 
(BPF) Operation .................................................... ...................... .... 46 
4.8. ESS Scalability, Size and Performance ...... .. ....... .. ....... ..... ........................ .46 
Chapter Five: Ephemeral State Processor Version 1 (ESPR.Vl.) Architecture ............ .48 
5.1. Basic Register/Register Architecture Development with ESPR 
Components .................................................................... .......... .... ... ........... 49 
5.2. Four Stage Pipelined Architecture (ESPR.Vl) ............. ............................. 51 
5.3. Micro Controller ............................. ............................................................ 55 
5.4. ESPR.Vl Requirements Evaluation ... ........................................ ................ 57 
5.5. Special-Purpose Versus General-Purpose approach to ESP ...................... 60 
5.6. Analytical Performance Model for ESPR ........... .......................... ............. 60 
Chapter Six: Post-Synthesis and Post-Implementation Simulation Validation of 
ESPR. V 1 Architecture ........................... .. ......................... ......................... 66 
6.1. Introduction ...... ............ ........ ......................... ................. ....... ..................... 66 
6.2. Post-Synthesis Simulation Validation of ESPR.Vl Architecture .............. 67 
6.2.1. Simulation Validation of Single Micro Instructions ........................ 68 
6.2.1.1. Validation of 'ADD' Micro Instruction ..... .......................... 68 
6.2.1.2. Validation of'GET' Micro Instruction ...... ... ... .. .... ..... ........ 69 
6.2.2. Micro Instruction Program Sequence Validation of 
ESPR.Vl Architecture ........ ............................................................. 70 
V 
6.2.2.1. Micro Instruction Program Sequence for ALU/SHIFTER 
Validation ......... ........... ........ ..... ..... .......... ...... .... ................ . 71 
6.2.2.2. Micro Instruction Program Sequence for ESS 
Validation .................. ................... ... .......... ... .................. .... 73 
6.2.2.3. Micro Instruction Program Sequence for Branch/Forward 
Unit Validation ...... ................... ...... .......... ................ .......... 75 
6.3. Post-Implementation Simulation Validation ofESPR.Vl Architecture .... 78 
6.3. l. Post-Implementation Simulation Validation of 'COUNT' Macro 
Instruction ..... .......... ... .. ... .... .............. ..... ...................... ................ .. .. 79 
6.3.2. Post-Implementation Simulation Validation of'COMPARE' Macro 
Instruction ....... ............. ............................................... ..................... 85 
6.4. Results and Conclusions ........................... ............................... ................... 87 
Chapter Seven: Ephemeral State Processor Version 2 (ESPR.V2) Architecture ........... 89 
7 .1. Pipelined ESS .......... ....... .... ... ..... .. .. ................. ....... .................... ....... ...... ... 89 
7 .1. 1. Tag Match (TM) Stage ....... .. ........................ ........................ .. .......... 91 
7.1.2. Empty Location and Lifetime Check (ELTC) Stage ............. .......... 92 
7.1.3. ESS Update(EUD) Stage ........ ... ........... .......... .................. .............. 93 
7.2. Five-Stage Pipelined ESPR.V2 Architecture ..... ... ...... ..... ................ .......... 94 
Chapter Eight: VHDL Design Capture of ESPR Architectures ........ .......... ..... ............. .. 98 
8.1. Design Partitioning and Design Capture ... ... ........ .. ..... ................ ....... ..... .. . 98 
8.2. Initializing the Memory Contents .... .... ..... .............. ..... .... ......................... 104 
8.2.1. Initializing a RAM Primitive via a Constraints File ........ ........ ...... 105 
8.2.2. Initializing a Block RAM in VHDL ..... ..... ....... ............ .. ...... ........ .106 
8 .3. Timing Constraints ................. .... .... ................ ... ... .. ........ ..... ..... .... ..... ....... 110 
Chapter Nine: Post-Implementation Simulation Validation of 
ESPR.V2 Architecture ....... ..... .... ........ ............................ .................. .... 11 l 
9. l. Validation of Correct Execution of Single Micro Instructions ................ 112 
9.2. Small Micro Program and Individual Functional Unit Testing of 
ESPR.V2 ··················· ·········· ·· ···················································· ··············· l l 5 
9 .2. 1. Validation of ALU Unit and JMP Instruction 
of ESPR.V2 ..... ..... .... ... .. ............. ... ............ .. .. ... .............. ............... 11 5 
VI 
9.2.2. Validation of Packet Processing Unit of ESPR.V2 ....................... 117 
9.2.3. Validation ofESS ofESPR.V2 ... ................................................... 120 
9.3. Validation of Macro Instructions of ESP on ESPR.V2 ........................... . 122 
Chapter Ten: Conclusions and Future Research ... ..... ..... .. ...................................... ..... . 134 
Appendices ......................... .............. ... ............ ........ ... .......................... ......................... 138 
Appendix A: Micro Instruction Set Architecture and Definition .................. 139 
Appendix B : Macro Level System Flow Chart ................ .............................. 147 
Appendix C: System Flow Chart for ESPR.Vl Architecture ....................... .152 
Appendix D: System Flow Chart forESPR.V2 Architecture ........... .... ....... .. 156 
Appendix E: VHDL Code for ESPR.V2 Architecture ................................. .. 162 
References ....... ............................. ................ ................................................................. 298 
Vita ........ ..... .... ... .. ............ ...... ........... ...... ................... .................................................... 30 I 
Vll 
LIST OF TABLES 
Table 5.1. Function Table for Shifter.. .......................................... ............. ....... .............. 56 
Table 5.2. Function Table for ALU ...................... ... .......... ............................................. 56 
Table 5.3. Function Table for Branch Detection Unit. ...................... .. ........................ ... 57 
Table 5.4. Control Signals for Micro Instructions ........................... .......... .......... ........ ... 58 
Table 6.1. Logic Resources Utilization for ESPR.Vl Architecture ............................... 67 
Table 8.1. Comparison of Designs for Instruction Memory .......................... .. ............. 104 
Table 8.2. Block RAM Initialization Properties . ........ ..... ....... .. ........................ .... ...... .. 107 
Table 9. 1. Logic Resources Utilization for ESPR.V2 Architecture ............................. 111 
Table 10.1. Throughput of ESP Macro Instructions in ESPR.V2 Architecture . ........ .. 135 
Vlll 
LIST OF FIGURES 
Figure 1.1. ESP Processing in Router ........ ... ...................................... ......... .................... 3 
Figure 2.1. ESP Packet Format .. ............ .... ........... ......................... ..... ................ ............. 8 
Figure 2.2. FLAG Field of ESP Packet.. ................. ... .. ................ ........... .......................... 9 
Figure 2.3. COUNT Operation ................... .. ......... ...... ...... ............................ ...... ........... 11 
Figure 2.4. COMP ARE Operation .......... ........................................................................ 12 
Figure 2.5. COLLECT Operation ....... ........ ... ..... ......................... ............................. ...... 13 
Figure 2.6. RCHLD Operation ..... .. ......... ............................... ......... .. .............................. 15 
Figure 2. 7. RCO LLECT Operation .................. ................... ....... .................................... 17 
Figure 2.8. Finding Path Intersection ... ....... ...... ..................... ................................. ...... .. 19 
Figure 2.9. Reducing Implosion using Two-Phase Tree Structured Computations ....... 20 
Figure 3.1. Functional Units of the ESPR System .......................................................... 24 
Figure 3.2. Packet Processing in Packet RAM of ESPR ....... ........... ................ .............. 25 
Figure 3.3. Micro Instruction Format ............. ..................................................... .... ...... . 27 
Figure 3.4. Field Definitions ................................... ... ................................ ....... .............. 28 
Figure 3.5. ALU/SHIFT Type Instruction Format and Definition .......................... ....... 29 
Figure 3.6. Immediate Type Instruction Format and Definition ...... .. ..... ............. .... ....... 29 
Figure 3.7. Branch/Jump Type Instruction Format and Definition ................................ 30 
Figure 3.8. LFPR/STPR Type Instruction Format and Definition ................ .. ..... ........... 31 
Figure 3.9. GET/PUT Type Instruction Format and Definition ..................................... 31 
Figure 3.10. Packet Related Instruction Format and Definition .................. ....... ..... ....... 32 
Figure 3.11. Equivalent Micro Instruction Sequence for COUNT .... ............................. 34 
Figure 3.12. Equivalent Micro Instruction Sequence for COMP ARE .......... ................. 34 
Figure 3. 13. Equivalent Micro Instruction Sequence for COLLECT. ............................ 35 
Figure 3.14. Equivalent Micro Instruction Sequence for RCHLD ................. ................ 35 
Figure 3.15. Equivalent Micro Instruction Sequence for RCOLLECT . ......................... 36 
Figure 4.1. Functional Block Diagram ofESS ...................... ... ..................................... .40 
Figure 4.2. l 6x8 CAM Macro ...................... •· ············· ··········· •································ ···· ···.41 
Figure 4.3. 16x64 CAM using 8 16x8 CAMs .................................... ................ ......... ... .41 
ix 
Figure 4.4. Expiration Time Calculating Block .......... .................. ..... ....... ... ... ............... .44 
Figure 4.5. ESS Operations - Flowchart ............................. ....................... ..... ............... .47 
Figure 5.1. Basic ESPR Architecture .......................... ................................................... .49 
Figure 5.2. Four-Stage Pipelined ESPR.Vl Architecture ............................................... 52 
Figure 6.1. Validation of 'ADD' Micro Instruction ........ .. ............................................. 69 
Figure 6.2. Validation of 'GET' Micro Instruction .................. ...... .. .............................. 70 
Figure 6.3. Program for Validating ALU/Shifter. ... ........... ............ ....... .......................... 71 
Figure 6.4. Simulation Output for ALU/Shifter Validation ............................................ 72 
Figure 6.5. Program for Validating ESS ....... .............................................. .................... 73 
Figure 6.6. Simulation Output for ESS Validation . ......... .... .. ...... .... .. ......... .................... 74 
Figure 6.7. Program for Validating Conditional Branch Control .... .. .... .... ..................... 76 
Figure 6.8. Simulation Output for Conditional Branch Validation .......................... ...... 76 
Figure 6.9. Simulation Output for COUNT ......................... .. ......................................... 80 
Figure 6.10. Simulation Output for COMP ARE .............................. ...... ...... ...... ....... ..... 85 
Figure 7.1. High-Level Block Diagram ofESS .............................................................. 90 
Figure 7.2. High-Level View of Three-Stage Pipelined ESS ... ...................................... 90 
Figure 7.3. CAM in the Tag Match (First) Stage of Pipelined ESS ............................... 91 
Figure 7.4. Components of ELTC (Second) Stage of Pipelined ESS ..... ....................... 92 
Figure 7.5. Component ofEUD (Third) Stage of Pipelined ESS .......... ... ......... ............ . 93 
Figure 7.6. Five-Stage Pipelined ESPR.V2 Architecture ............................................... 95 
Figure 7.7. High-Level View of Packet Processing Unit.. ......... ................ .......... .. .. ....... 96 
Figure 8.1. High-Level Hierarchy of ESPR.VI .......... ...... ..... .... ................ ............. ...... .. 99 
Figure 8.2. High-Level Hierarchy ofESPR.V2 ............. ............................................... 100 
Figure 8.3. High-Level Hierarchy ofIF Stage for both ESPR. VI and ESPR. V2 ........ 10 I 
Figure 8.4. High-Level Hierarchy ofID Stage for both ESPR.Vl and ESPR.V2 ........ 101 
Figure 8.5. High-Level Hierarchy of EX Stage of ESPR.Vl ..... ................................. 102 
Figure 8.6. High-Level Hierarchy ofWB Stage of ESPR.Vl ....... .............................. 102 
Figure 8.7. High-Level Hierarchy ofETM Stage ofESPR.V2 ................................... 103 
Figure 8.8. High-Level Hierarchy of LTC Stage of ESPR.V2 ....... ............................. 103 
Figure 8.9. High-Level Hierarchy ofUD Stage of ESPR.V2 .... .. .... ............................ 104 
Figure 8.1 O. NCF File for Initializing Instruction Memory ................... ....................... 107 
X 
Figure 8. I 1. Example Micro Instruction Sequence ................................... .. ................. 108 
Figure 8. I 2. VHDL Code for Instruction Memory Using Block RAM ....................... 108 
Figure 9.1. Simulation Output for SHR Micro Instruction ........................................... 112 
Figure 9.2. Simulation Output for LFPR Micro Instruction . ........................................ 114 
Figure 9.3. Program for Validating ALU Unit and JMP Instruction ............................ 115 
Figure 9.4. Simulation Output for ALU Unit and JMP Instruction Validation ............ 116 
Figure 9.5. Simulation Output for Packet Processing Unit (IN) Validation ................. 117 
Figure 9.6. Simulation Output for Packet Processing Unit (OUT) Validation . ............ 119 
Figure 9.7. Program Sequence for Validating ESS .......... ............................................. 120 
Figure 9.8. Simulation Output for ESS Validation ....................................................... 120 
Figure 9 .9. Simulation Output for Validation of COUNT Macro Instruction .............. I 22 
Figure 9.10. Simulation Output for Validation of COMP ARE Macro Instruction ...... 125 
Figure 9.11 . Simulation Output for Validation of RCHLD Macro Instruction ............ 128 
Figure 9.12. Simulation Output for Validation ofRCOLLECT Macro Instruction ..... 132 




This Chapter discusses the background needed for a better understanding of the 
research work, goals and objectives of the thesis. 
1.1. Background and Positioning of Research 
In order for the Internet to support new end-to-end communication services 
required by emerging network applications, additional network-level mechanisms are 
needed. There are three approaches which provide the needed network-level mechanisms 
in their own way. The first, more traditional approach, is to target a specific end-to-end 
problem and develop a focused, stand-alone network-based solution [1,2]. The second 
approach is to deploy a flexible infrastructure (e.g., active networks [3,4,5]) that can be 
reprogrammed to provide any needed functionality. The third approach is to extend the 
network functionality through simple building-blocks, which can be composed and 
combined by end-systems in different ways to create new services. The viability of the 
third approach depends on the following factors: it must be sufficiently general and 
useful to support a wide range of end-to-end network applications and must be able to 
justify the cost (financial, operational and performance) of deployment in network 
infrastructure. 
Ephemeral State Processing (ESP) [6,7,8] is one such network-layer building-
block approach which offers a possible solution for the development of new Internet end-
to-end services and capabilities. The basic idea of ESP is to retrieve, store and process 
ESP packets from router nodes by means of creating and computing using tempora1y 
state in the network. Each ESP Packet carries a macro instruction - a ' program' 
(described in the following chapters) and collective programs provide/implement specific 
end-to-end network applications/services. Other publications [ 6, 7] describe end-to-end 
services based on ESP, including 1.) Services for large-scale group applications, in which 
a relatively modest amount of in-network processing can pay big dividends in terms of 
scalability [ 1 O]; and 2.) Topology-exploring services, in which network elements having 
specific characteristics are found and flagged as locations for special processing [ 11]. 
ESP can be considered as a form of active networking, which offers: (1) 
lightweight Packet Processing service and (2) Computations involving multiple packets 
and multiple nodes, and the service is primarily focused on being implemented in fully 
programmable routers. 
Ephemeral State Processing (ESP) [6,7,8] is an evolving research area of active 
networking, and this service offers very limited programmability that can be easily 
implemented in hardware. Multiple implementations of the ESP service are currently 
being investigated. One alternative is the use and adaptation of commercially available 
routers [26] such as the network processor described in [ 18]. Implementations based on 
commodity components have been explored (e.g., Linux-based routers), and have 
implemented the service as a module and user-level daemon on these lower level traffic 
routers often found near the periphery of the network. The goal is to implement ESP in 
core routers which should be able to offer the service at line rates by implementing it on 
the interface card. To that end, this current research approach targets PLD platforms that 
can be field upgraded to meet changing ESP functional and performance requirements. 
Under this approach, another processor, such as the one described in [ 18], would 
implement the routing function within the node. 
This thesis research work aims at implementing ESP on a Special Purpose (SP) 
programmable processor within each network node - an Ephemeral State Processor 
(ESPR). ESP is implemented by a set of macro-instructions, which can be invoked on an 
Ephemeral State Processor (ESPR) at ESP-capable node routers as they receive, process 
and possibly forward specially-marked ESP packets in IP datagrams. Separation of ESP 
packets from other packets (such as the Internet Protocol (IP) packets) is carried out by 
logic inside the router and the ESPR only sees ESP packets. Figure 1.1 provides a high-
level view of how an ESPR can be deployed in a router to perform the ESP service. 
At most one macro-instruction is invoked by each ESP datagram (packet) as it 
enters an ESP capable node. An ESP packet's macro-instruction is executed by a 
successor ESP-capable node every time the packet is forwarded by an ESP-capable 
router. An ESP macro-instruction implemented by the ESPR of an ESP-capable node 
operates on values carried in the packet or stored at a node router in an associative 




( IP and 
ESP 
Packets) 












The ESS allows data values to be associated with keys or tags for subsequent retrieval 
and/or update. The unique characteristic of the ESS is that it supports only ephemeral 
storage of (tag, value) pairs. Each (tag, value) binding is accessible for only a fixed 
interval of time after it is created. The lifetime of a (tag, value) binding in the ESS is 
defined by the parameter ' t' , which is globally specified and required to be 
approximately the same at each node. The value in the binding may be updated by any 
number of instructions (packets) during the lifetime 't. The ESS must provide fast 
associative creation, access, and reclamation of bindings, in order to process packets at 
"wire speeds". For given rates of instruction processing (instructions/sec), binding 
creation (new bindings/instruction) and a given lifetime (seconds), the size of ESS 
necessary to sustain those rates is fixed. 
To our knowledge no research group has developed a SP programmable and 
reconfigurable network node processor architecture to implement ESP, such as the one to 
be described in this thesis. The ESPR microarchitecture can be implemented as an 
Application Specific Integrated Circuit (ASIC) chip or to a Programmable Logic Device 
(PLO) platform and fast/dense/cheap commodity memory chip technology. PLO 
technology is of interest because of its rapidly increasing density and performance at 
decreasing cost. Moreover, the use of PLO technology allows the ESP hardware to 
evolve over time as the concept of ESP evolves. Special purpose fixed-architecture 
communications node processors have been developed and implemented in the past, 
particularly in the context of ASICs, but they lack the programmability and flexibility as 
that of a PLO platform. These ASIC-based technologies offer no reasonable opportunity 
3 
for in-field upgrades of the architecture or its instruction set architecture in response to 
changing network processing requirements. Another approach that has been gaining 
momentum is the use of general-purpose network processors [ 18]. Although such 
platforms have been used in earlier implementations of ESP [8], their general-purpose 
nature imposes limitations on their performance. 
Implementation of ESP service via an ESPR on a PLD platform, allows ESPRs in 
a multi-node network environment to be dynamically and remotely re-programmed to 
incorporate architectural improvements or changes to the macro and micro instruction set 
as ESP evolves. Utilization of PLD technology for implementation of the ESPR within 
ESP capable nodes of a network would promote in-field upgrade capability of an ESPR 
instantiation whenever line speeds may increase or when the density of ESP packets in 
the total IP datagram traffic increases to a level requiring a higher performance ESPR. A 
higher performance ESPR architecture may be obtained by deeper pipelining, by 
instantiating multiple copies of the ESPR to a node PLD platform in a multiprocessor 
configuration, or by other architectural performance enhancements. 
1.2. Goals and Objectives 
The main goal of this thesis research work is to develop a processor 
microarchitecture - Ephemeral State Processor (ESPR) to implement ESP. Two versions 
of the ESPR architecture - ESPR Version l (ESPR.Vl) and ESPR Version 2 (ESPR.V2) 
are developed for performance improvement reasons. Development of both versions can 
be accomplished by means of the following objectives: 
(I) understand the concepts of ESP 
(2) understand the ESP macro instructions and develop an implementing micro 
instruction set 
(3) develop functional/operational/performance requirements for ESPR 
architecture versions 
( 4) develop a unique organization/architecture for the associative ESS 
(5) develop a special-purpose programmable high-performance pipelined 
architecture for ESPR (ESPR.Vl and ESPR.V2) 
4 
(6) perform the design capture of ESPR.Vl and ESPR.V2 on to a Xilinx Virtex 
FPGA [ 17] using behavioral VHDL 
(7) testing ESPR.Vl and ESPR.V2 for validation of correct execution of micro 
and macro instructions of ESP 
This thesis research work was conducted following the above sequence of objectives to 
develop and validate ESPR. Brief contents of the chapters of this thesis are outlined as 
follows. 
Chapter 2 - Here the concept of ESP and a description of example end-to-end network 
service applications is presented in detail. 
Chapter 3 - Highest level Functional Organization/ Architecture of an ESPR is described. 
The instruction set format, types and additional architectural details are described in this 
chapter. 
Chapter 4 - Detailed description of an associative ESS design using Content Addressable 
Memory {CAM) is presented here. 
Chapter 5 - Design of the first version of ESPR - ESPR.Vl, comparison of ESPR with 
general-purpose processors and an analytical performance model for ESPR is described 
in this chapter. 
Chapter 6 - This chapter deals with the detailed post-synthesis and post-implementation 
simulation validation testing of the ESPR.Vl architecture. 
Chapter 7 - This chapter outlines the need for the second version of ESPR - ESPR.V2 
and its design description. The description of a pipelined version of the previously 
designed ESS, for ESPR.V2, is also presented here. 
Chapter 8 - This Chapter discusses the details of Hardware Description Language (HDL) 
design capture of the ESPR.Vl and ESPR.V2 processor systems. 
Chapter 9 - Post-implementation simulation validation of ESPR.V2 is presented here. 
Chapter IO - This chapter concludes the thesis work and gives an insight to possible 
future research and investigation that can be done in this area. 
5 
Chapter Two 
Ephemeral State Processing (ESP) 
This chapter discusses the basic concepts of the ESP mechanism, a brief 
explanation of the Ephemeral State Store (ESS), ESP packets - format and processing, 
network macro instructions, and example practical applications of ESP followed by a 
brief introduction to the design of an Ephemeral State Processor (ESPR). 
2.1 Introduction 
Ephemeral State Processing (ESP) has been proposed as a network layer protocol 
to be implemented in routers to support a range of new scalable end-to-end network 
services and to improve scalability and performance of existing network services. It gives 
control to the end systems to support scalable network applications such as collecting 
network feedback, locating services, identifying 'branch points' [6] , topology discovery 
and other auxiliary functions. The main idea of ESP is to carry service specific 
instructions (macro instructions) in its specially marked packets, enable the ESP capable 
router nodes to process the packets and leave a temporary state in the node according to 
the carried macro instructions and forward the packets to the next node or drop the 
packets with the state being already set for identification. This leads to the key 
requirements [8] for ESP development: 
• provide means for the packets to leave information at a router for other packets to 
modify or pick up later as they pass through the path 
• having a space-time product of storage for state storing 
• having the space-time product of storage consumed as a result of any packet to be 
bounded 
• per packet processing at each node be comparable to that of IP 
The ESP protocol and network macro instructions (shown later in this chapter) are 
designed in such a way to meet the first requirement and it also lies in the hands of 
application services to meet this requirement by using ESP wisely. The design of an 
associative Ephemeral State Store (ESS) with a constant lifetime allows meeting the next 
6 
two requirements. Each ESP packet carries a single macro instruction and so the per-
packet processing time is known and bounded and the current goal is to process packets 
at or near wire speeds of 100 Mbps, which allows nearly a million packets being 
processed per second. With these requirements the ESP mechanism is based on three 
building blocks: 
• an Ephemeral State Store (ESS), which allows packets to deposit small amounts 
of arbitrary state at routers for a short time 
• the ESP protocol and packet format, which defines the way by which the packets 
are processed and forwarded through the network. 
• a set of network macro instructions, which defines the computations on ESP 
packets at the nodes 
Ephemeral State Processing is initiated in any ESP-capable router when the router 
receives an ESP packet. Each router carries out only local operations and the 
responsibility for controlling and coordinating the system lies in the end-systems. The 
ESP header carries a network macro instruction out of a set of pre defined macro 
instructions. An instruction may create or update the contents of the ESS and/or fields in 
the ESP header and may place some information in the packet. A sequence of network 
macro instructions carried in ESP packets, form a practical ESP based application. 
2.2 Ephemeral State Store (ESS) 
Scalability of ESP is provided by the availability of an associative ESS at each 
network node. The associative ESS will allow data values to be associated with keys or 
tags for subsequent retrieval and/or update. The ESS will be unique in that it supports 
only ephemeral storage of (tag, value) pairs. Each (tag, value) binding is accessible for 
only a fixed interval of time after it is created and each tag has at most one value bound 
to it. Both tags and values are fixed size bit strings, the current design uses 64-bit tags 
and 64-bit values, to reduce the probability of collision [8]. 
The lifetime of a (tag, value) binding in ESS will be defined by the parameter '1', 
which is assumed to be approximately the same for each node. Once created, a binding 
remains in the store for '-r' seconds and then vanishes; the value in the binding may be 
updated ( overwritten and read) any number of times during the lifetime. For scalability, 
7 
the value of '-r' should be as short as possible. For robustness, the value of ' -r' needs to be 
long enough for interesting end-to-end services to be completed. This ESS supports two 
operations: 
• put (x, e): bind the value e to tag x. After this operation, the pair (x, e) is in the set 
of bindings of the store for '-r' seconds. 
• get (x): Retrieve the value bound to tag x, if any. If no pair (x, e) is in the store 
when this operation is invoked or if the associated pair's lifetime is expired, the 
special value '..l ' meaning failure of the operation, is returned. (' ..L' - indicates the 
lifetime of the value is expired or the value is not in store). 
2.3 ESP Packet Format and Processing 
ESP packets are processed in ESP supporting routers as they travel through the 
network. Whenever an ESP packet arrives at a node, it is recognized as such and passed 
to the ESPR module for processing. These packets either propagate through to the 
original destination or are discarded along the path. Many end-to-end applications can be 
constructed using two steps - the first set of packets from end-systems establish and 
compute on the state while a second set of packets are used to collect the computed 
information. Two forms of ESP packets are supported: dedicated and piggybacked. A 
dedicated packet carries the ESP packet in an IP payload and piggybacked ESP packets 
carry ESP opcode and operands in an JP option (IPv4) or extension header (IPv6), as well 
as the regular application data (e.g. , TCP/HTTP data) [8]. The ESP packet format is 
shown in Figure. 2.1. 
FL 
(8) 
OP LEN CID < VAR. FIELD > 
(8) (16) (64) (From 128 to 3968 bits) 
FL - Flags (8 bits) 
OP - Opcode (8 bits) 
LEN - Length of the packet (16 bits) 
CID - Computation ID (64 bits) 
CRC 
(32) 
VAR. FIELD - Variable operands field that contains Tag and/or 
Value and/or a micro opcode (From 128 to 3968 bits, 
depending on the macro opcode) 
CRC - Cyclic Redundancy Check (32 bits) 
Figure 2.1. ESP Packet Format 
8 
The 8-bit FL (flag) field is organized as follows, 
LOC E R u 
(3) (1) (1) (3) 
LOC - Location (3 bits) 
E - En-or (1 bit) 
R - Reflector ( 1 bit) 
U - Unused (3 bits) 
Figure 2.2. FLAG field of ESP Packet 
The LOC field identifies where the ESP processing should occur in the router [8], 
either the input side, output side or in the centralized ESP location, or any combination of 
these three locations. The E bit is set when an error occurs while processing an ESP 
packet (e.g., when a tag is not found in the ESS, when ESS is full, etc.). Such packets are 
forwarded to the destination without further processing allowing the end-systems to 
discover that the operation failed. R is the reflector bit, ESP routers forward packets with 
the reflector bit set without processing them [8]. 
CID - Computation ID, is a demultiplexing key: different packets that need to 
access the same state must have the same CID. The OP field identifies the ESP macro 
instruction to be performed, LEN field indicates the length of the ESP packet, VAR. 
FIELD carries the opcode specific operands and CRC field carries the Cyclic 
Redundancy Check code for the entire ESP packet. 
2.4 Macro Instructions of ESP 
Network macro instructions are the second building block of the ESP service. 
Each node in the network supports a predefined set of ESP instructions that can be 
invoked by ESP packets to operate on the ESS. Each ESP macro instruction takes zero or 
more operands, where each operand is one of the following types: 
• a value stored in the local ESS (i.e. identified by a tag carried in the ESP packet) 
• an 'immediate value' (i.e. one carried directly in the packet) 
• a well known router value (i.e. the node's address) 
• an associative or commutative operator (e.g.,<,>=, etc) 
9 
Each ESP packet initiates exactly one network macro instruction and all macro 
instructions are carried out locally in the node, may update the state and/or the immediate 
values in the packet and after completion of execution, the packet that initiated it is either 
dropped or forwarded towards its original destination. A network macro (high-level 
language) instruction is implemented by a program comprised of micro (assembly 
language level) instructions. Macro instructions are combined and executed to implement 
emerging end-to-end application services. The defined macro instructions [8] are 
explained as follows: 
COUNT: 
The COUNT instruction takes two operands (carried in the ESP packet), a tag 
identifying a 'Count (pkt.count) ' value in the ESS and an immediate value ' Threshold'. 
This instruction increments or initializes a counter and forwards or drops the packet, 
according to whether the resulting value is below or above a threshold value. It is used 
for counting packets passing through the router. The Ephemeral State Store (ESS) 
contains a number of (tag, value) pairs. The Ephemeral part of the ESS is that a value 
bound to a tag is active only for a particular period of time 'i'. In this operation, if the 
specified tag in the packet is not currently bound, (i.e.) if there is no such tag found, a 
location is created for that tag in ESS, the value associated with it is set to ' 1' initializing 
it to be the first packet passing through the node. Otherwise if the tag is found, the value 
associated with it is incremented by one. If the resultant value reaches the 'Threshold' 
value, subsequent COUNT packets will increment the counter but will not be forwarded. 
This operation was devised based on networking applications such as Finding 
Path Intersection and Aggregating Multicast receiver feedback. The basis of this 
operation is to determine the number of members of a particular group and is useful for 
counting the number of children (nodes) sending packets through a node. COUNT is 
often used as a 'setup' message for subsequent collection messages. The values set in the 
ESS based on this packet allow later packets to retrieve useful information in performing 
network applications. For example, in Finding a Path Intersection the COUNT operation 
is the first step. The basic idea here is to count the number of router nodes in a particular 
path. If an ESPR module in a router receives a packet with COUNT operation, this router 
10 
is observed to be in that path and a 'setup' message is set in that node by creating a (tag, 
value) pair in ESS. If a tag is not found, a location for this tag is created and the 
associated value is set to 'l' to initiate a 'setup' message. Based on the appropriate 
'Threshold' value the resultant packet is forwarded or dropped to avoid implosion. Figure 
2.3 shows the macro level description of the COUNT operation. 
t0 ._ get (pkt.count); 
if (to != ..l) { put (pkt.count, t0 +I); } 
else { put (pkt.count, l); } 
if (to<= threshold) forward; 
else drop; 
Figure 2.3. COUNT Operation 
The macro level COUNT operation of Figure 2.3. can be explained on a line-by-
line basis as follows. 
Line 1: The value corresponding to tag-count in the packet is retrieved to a register to. 
Line 2: The value is checked for its availability in ESS. 'l.' indicates lifetime expiry of 
this value. If a value is found in ESS and its lifetime has not expired, it is incremented 
and then placed in the ESS binding it to the corresponding tag-count. 
Line 3: If a value is not found in ESS, a location is created for this tag-count in ESS with 
a value of I - meaning counting the initial packet. 
Line 4: If the resultant value is less than or equal to the threshold value carried in packet, 
the packet is forwarded. 
Line 5: Else the packet is discarded. 
COMPARE: 
The COMP ARE instruction carries three operands ( carried in the ESP packet), a 
tag ' V' identifying the value of interest in the ESS, an immediate value 'pkt. value' that 
carries the 'best' value found so far, and an immediate value ' <op> ' used to select a 
comparison operator to apply (e.g., min, max, etc). The COMPARE instruction tests 
whether the tag ' V' has an associated value in the ESS within its lifetime and tests 
whether the relation specified by <op> holds between the value carried in the packet and 
the value in the ESS. If so, the value from the packet replaces the value in the ESS, and 
the packet is forwarded. If not, the packet is silently dropped. The COMPARE instruction 
11 
can be used in a variety of ways but is particularly useful in situations where only packets 
containing the highest or lowest value seen by the node so far are allowed to continue on. 
This operation is mainly used as a second step in Finding Path Intersection after a 
COUNT operation. Figure 2.4 shows a macro level description of the COMP ARE 
operation. 
t0 ._ get (pkt.v); 
if (t0 = 1-) 
{ put (pkt.v, pkt.value); 
forward;} 
else 
if (t0 <op> pkt.value) 




Figure 2.4. COMPARE Operation 
Below is a line-by-line description of the macro level COMPARE operation of 
Figure 2.4. 
Line I: The value corresponding to tag-v in the packet is retrieved to a register to. 
Line 2&6: The value is checked for its availability in ESS, its lifetime expiry and it is 
also checked whether the relation specified by <op> holds between this value and the 
value carried in the packet. 
Line 3&7: If so, the value from the packet replaces the value in the ESS. 
Line 4&8: The resultant packet is forwarded. 
Line 10: If not, the packet is dropped. 
COLLECT: 
The COLLECT macro instruction cames four operands (carried in the ESP 
packet), a tag identifying the ' Count' value in the ESS, a tag identifying a' Value' in the 
ESS to perform an associative or commutative operation on, an immediate value 
'pkt.data', which carries the resultant value from the operation performed from child 
nodes and an immediate value ' <op>' that indicates the actual operator to be applied. 
The COLLECT macro operation is used by a network node to compute an 
associative or commutative operation on values sent back by its children nodes. If register 
12 
to contains the count for the number of children nodes, each COLLECT packet from a 
child node is applied to the node's current result and to is decremented. The parent node 
holds the current result, which is obtained by performing associative or commutative 
operations on values sent by its children nodes. After all children have reported their 
value, the computed result is forwarded to the next hop. Figure 2.5 illustrates the macro 
level description of the COLLECT operation. 
This operation is mainly used in aggregating receiver feedback, for example, loss 
rate corresponding to a group. After obtaining information back on the number of 
children in a group from the COUNT operation, this operation is performed on values 
sent by the children and on corresponding conditions in this operation. This macro 
operation allows particular feedback information such as loss rate to be determined. 
t0 +- get (pkt.count); 
if (to != J_) { 
t1 +- get (pkt.value); 
if (t, != J_) { 
t, .._ t1 <op> pkt.data; } 
else { t1 +- pkt.data; } 
put (pkt.value, t1); 
t0 +-- t0 - 1; 
put (pkt.count, t0); 
if (to= 0) {pkt.data := t1; forward; } 
else { drop; } 
} else abort; 
Figure 2.5. COLLECT Operation 
Below is a line-by-line description of the macro level COLLECT operation of 
Figure 2.5. 
Line 1 : The value corresponding to tag-count in the packet is retrieved to a register to. 
Line 2: The value is checked for its availability in ESS. 'j_' indicates lifetime expiry of 
this value. If the c01Tesponding tag with value is found, it indicates the number of 
children nodes in a particular group. If there is no such tag found, Line 12 is perfonned. 
Line 3: The value corresponding to tag-value in the packet is retrieved to a register t1. It 
corresponds to a value sent by a child node. 
Line 4: The value is checked for its availability in ESS. 'J_' indicates lifetime expiry of 
this value. 
13 
Line 5: If the corresponding tag with value is found, then an associative or commutative 
operation indicated by <op> (a micro opcode carried in the packet) is performed on this 
value and the value (pkt.data) carried in the packet, and the result is placed in t1. 
Line 6: If no such tag with value is found, then the value (pkt.data) carried in the packet 
is placed in t1. 
Line 7: The resultant value in t1 is written into ESS with its associated tag-value. 
Line 8&9: After performing the operation on one child node, the number of children 
nodes is decremented by one and this new value is placed in ESS with its associated tag-
count. 
Line 10: It is now checked to see whether the number of children nodes is zero, (i.e.) 
whether the operation is completed on all children nodes. If there is no child left, then the 
final result from t 1 is placed in the packet and the resultant packet is forwarded. 
Line 11: But if there are some children left, the packet is dropped. 
Line 12: This line indicates an abort statement if the parent node doesn't have the count 
on number of children. It sets a corresponding 'E' bit to 'l' and 'LOC' bits to zero in the 
'FLAGS' part of the packet and forwards the packet to the next node. 
RCHLD: 
The RCHLD macro instruction carries four operands ( carried in the ESP packet), 
a tag specifying the Identifier Bitmap 'tagb' and an immediate identifier value 'idval', a 
tag 'C' identifying count of forwarded packets and an immediate threshold 'thresh'. The 
RCHLD macro instruction is similar to the COUNT macro instruction except that it also 
records the identifiers in packets received from its children. For example, tree-structured 
(8] computations for collecting information from the group members can be carried out in 
two phases: 
• The first phase corresponds to a RCHLD instruction, which uses ESP to record 
the identifiers. 
• The second phase corresponds to the RCOLLECT instruction (which will be 
described next), in which the group members send their identifier values up the 
14 
tree (towards destination) and each node uses RCOLLECT to compute and 
forward the result only after having heard from every child. 
Each group member sends the RCHLD instruction towards the root; this 
instruction causes the interior node or the immediate parent node to receive packets 
carrying this instruction from each of its children. For some useful computational 
applications [8], it is useful to determine whether a packet comes from a child that has 
not been heard from previously. To accomplish this, Bloom Filters [8,9] are used to 
determine a random bit sized identifier for each node called bitmap identifier. Figure 2.6 
illustrates the macro level description of the RCHLD operation. 
to +- get (pkt.tagb); 
if (to != .l) { t0 +- t0 <OR> pkt.idval;} 
else { t0 +- O;} 
put (pkt.tagb, t0 ); 
t1 +- get (pkt.C); 
if (t1 != .l) { put (pkt.C, t 1 + l); } 
else { put (pkt.C, 0); } 
if (t1 <= thresh) 
{ pkt.idval := current node's identifier value; 
forward;} 
else drop; 
Figure 2.6. RCHLD Operation 
Below is a line-by-line description of the macro level RCHLD operation of Figure 
2.6. 
Line 1: The value corresponding to tag-tagb in the packet is retrieved to a register to. 
Line 2: The value is checked for its availability in ESS. '1-' indicates lifetime expiry of 
this value. If the corresponding tag with value is found, indicating the bitmap identifier(s) 
of the other children nodes for an immediate parent, the immediate value carried in the 
packet is bit wise ORed with the value found in ESS meaning the bit corresponding to the 
bitmap identifier of the current child is turned on and is also included ( added) as children 
for the immediate parent. 
Line 3: If its not found, the value is set to ' 0'. 
Line 4: The resulting new value is written into ESS. 
Line 5: The value corresponding to tag-C in the packet is retrieved to a register t1. 
15 
Line 6: The value is checked for its availability in ESS. '1-' indicates lifetime expiry of 
this value. If a value is found in ESS and its lifetime has not expired, it is incremented 
and then placed in ESS binding it to the corresponding tag-count. 
Line 7: If a value is not found in ESS, a location is created for this tag-count in ESS with 
a value of 0. 
Line 8, 9&10: If the resultant value is less than or equal to the threshold value carried in 
packet, the current node's bitmap identifier value is written into the packet, and the 
resultant packet is forwarded. 
Line 5: Else the packet is discarded. 
RCOLLECT: 
In addition to 'Value', 'pkt.data' and '<op>' operands carried in COLLECT 
packet, the RCOLLECT macro instruction carries four more operands in the packet: a tag 
' tagbl' identifying the bloom filter used in the previous RCHLD instruction, a tag 
' tagb2 ' identifying another bitmap for detecting duplicates, a tag 'D' for identifying the 
count of packets forwarded and an immediate threshold value 'thresh ' to control the 
number of duplicated transmissions. 
This instruction is used as a second phase after the RCHLD macro instruction for 
tree-structured computations. The main difference between COLLECT and RCOLLECT 
is that in RCOLLECT the condition for forwarding is when the two Bloom filters match, 
rather than when the count is zero. This packet is sent after a short delay to allow phase 
one packets to be processed. As each packet arrives, the bit corresponding to its bitmap 
identifier is set in the second bitmap, and the value is added into the existing binding. If 
the resulting bitmap is equal to the one from the first phase, it means that all children 
identified in the first phase have been heard from. In that case the accumulated value is 
written into the packet, the bitmap identifier in the packet is replaced with that node's 
identifier, and the packet is forwarded. Otherwise, the packet is discarded. Figure 2.7 
illustrates the macro level description of RCOLLECT operation. 
16 
to +-- get (pkt.tagbl); 
if (t0 != .l) { 
t1 +-- get (pkt.tagb2); 
if (t1 != .l) { t1 +-- t1 <AND> pkt.idval;} 
else { t1 ..._ O;} 
if (t1 != pkt.idval) { t1 ..._ t1 <OR> pkt.idval;} 
put (pkt.tagb2, t1 ); 
t2 +-- get (pkt.value); 
if (t2 != .l) { 
ti+- t2 <op> pkt.data; } 
else { t2 .._ pkt.data; } 
put (pkt.value, t2); 
if(t0=t1){ 
t3 +- get (pkt.D); 
if (t3 != .l) { put (pkt.D, t3 +l); } 
else { put (pkt.D, O); } 
if (t3 <= thresh) 
{pkt.value := t2 ; 
pkt.idval:= node's identifier value; 
forward;} 
else {drop; } 
else {drop; } 
else { abort; } 
Figure 2.7. RCOLLECT Operation 
Below is a line-by-line description of the macro level RCOLLECT operation of 
Figure 2.7. 
Line I: The value corresponding to tag-tagb l in the packet is retrieved to a register to. 
Line 2: The value is checked for its availability in ESS. '1-' indicates lifetime expiry of 
this value. If the corresponding tag with value is found indicating the identifier bitmap(s) 
obtained from previous phase one (RCHLD) operations, described in Line 3 - Line 22 
are executed. If there is no such tag found, Line 23 is performed. 
Line 3: The value corresponding to tag-tagb2 in the packet is retrieved to a register t1. 
Line 4&5: The value is checked for its availability in ESS. '1-' indicates lifetime expiry 
of this value. If the corresponding tag with value is found, indicating the bitmap 
identifier(s) of the other children nodes for an immediate parent, the value is added to the 
existing value, and if the value is not found, its set to '0'. 
Line 6: Then it is checked whether the resulting value is equal to the one carried in the 
packet, if its not equal, the immediate value carried in packet is bit wise ORed with the 
17 
value found in ESS meaning the bit corresponding to the bitmap identifier of the current 
child is turned on and is also included (added) as children for the immediate parent. 
Line 7: The resulting new value is written into ESS. 
Line 8: The value corresponding to tag-value in the packet is retrieved to a register t2_ It 
corresponds to a value sent by a child node. 
Line 9: The value is checked for its availability in ESS. '.1' indicates lifetime expiry of 
this value. 
Line 10: If the corresponding tag with value is found, then an associative or commutative 
operation indicated by <op> (a micro opcode carried in the packet) is performed on this 
value and the value (pkt.data) carried in the packet, and the result is placed in t2. 
Line 11: If no such tag with value is found, then the value (pkt.data) carried in the packet 
is placed in t2. 
Line 12: The resultant value in t2 is written into ESS with its associated tag-value. 
Line 13: The bitmap identifiers are compared for equality to check whether the values 
from all children nodes have been heard and to forward the packet. 
Line 14: If they are equal then, the value corresponding to tag-Din the packet is retrieved 
to a register t3 to have the count of packets. 
Line 15: The value is checked for its availability in ESS. ' .1' indicates lifetime expiry of 
this value. If a value is found in ESS and its lifetime has not expired, it is incremented 
and then placed in ESS binding it to the corresponding tag-D indicating that this packet is 
counted. 
Line 16: If a value is not found in ESS, a location is created for this tag-count in ESS 
with a value of O starting to count the packets. 
Line 17, 18, 19&20: If the resultant value is less than or equal to the threshold value (for 
the maximum number of packets) carried in packet, the resultant value in t2 is placed in 
the output packet's 'pkt.value' field, current node's bitmap identifier value is placed in 
'pkt.idval' field and the resultant packet is forwarded. 
Line 21: Else the packet is discarded. 
Line 22: If the bitmap identifiers do not match meaning there's still some child nodes to 
hear from, then the packet is silently dropped. 
18 
2.5 Example End-to-End Applications using ESP 
End systems utilize ESP to perform various applications. Many applications can 
be constructed using two-step network macro instructions. For example, in Finding Path 
Intersection as shown in Figure 2.8, the first step utilizes the COUNT macro instruction 
in determining the number of nodes along the path and defining a state in each node's 
ESS as it travels. The next step, the COMP ARE instruction, examines the value left by 
the previous COUNT instruction and determines the nearest intersection node along the 
path. 
A 
COUNT Packet to A 
s 
COMPARE Packet to B Coy~ 
node along 
two paths 
Figure 2.8. Finding Path Intersection 
Another example is in Aggregating Multicast Receiver Feedback in which first 
the number of children maintaining a state are counted and then some operation is 
performed on the values collected to deliver some useful information like maximum loss 
rate etc., ESP facilitates such operations without the risk of implosion (see Figure 2.9). 
These computations are viewed as tree structured computations by ESP and are generally 
carried out in two phases. 
In the first phase each group member sends an RCHLD macro-instruction towards 
the root; this instruction causes the interior node or the immediate parent node to receive 
packets carrying this instruction from each of its children and records their identifiers 
(Figure 2.9a). The identifiers are helpful in determining whether the parent has heard 
from all children nodes, and this information is useful in some specific applications 
19 
[8,26]. After a short delay (for processing RCHLD packets), phase two RCOLLECT 
packets are sent towards the destination root (Figure 2.9b ). The parent node receives 
packets from each of its children one by one, and sets the bit corresponding to its bitmap 
identifier in the second bitmap, and the value carried in the packet is added to the existing 
binding in ESS. If the resulting bitmap is equal to the one from the first phase, it means 
that the parent has heard from all its children and the accumulated value is written into 
the packet, the bitmap identifier in the packet is replaced with that node's identifier, and 
the packet is forwarded. Otherwise, the packet is discarded. 
Figure 2.9a. Phase 1 ( RCHLD 
Packet to Root Node) 
RCOLLECT( ) 
Packet sent at 
time T0 
RCOLLECT() 




~ ~~cket sent at 
\_ ~me T 1 
RCOLLECT() t 
Packet sent at I 
time T0 
Figure 2.9b. Phase 2 ( RCOLLECT 
Packet to Root Node) 
Figure 2.9. Reducing Implosion using Two-
Phase Tree Structured Computations 
Various other applications may be implemented such as thinning group feedback 
within a network allowing prevention of the implosion problem [ 1 OJ, simple distributed 
computations requiring data gathering across the network, identifying network topology 
information [8] and network bottleneck identification [2]. 
20 
2.6 Prologue to Ephemeral State Processor (ESPR) 
It is envisioned that a special purpose programmable processor architecture can be 
developed which will allow ESP to be programmed into programmable logic within 
network level routers to support functional applications. This architecture can be 
described in a hardware descriptive language (HDL) and then simulated using an 
appropriate simulator for its first level of architectural functionality validation. Final 
architectural functionality, design correctness and performance can be verified by 
implementing the design in a Field Programmable Gate Array (FPGA) chip through 
virtual prototype implementation and testing. 
Beyond correct operational functionality, other high priorities for the development 
of the ESPR will be to focus on obtaining high performance processing of ESP packets as 
stated above and to enhance efficient resource utilization within a FPGA chip where it 
may be implemented. Fundamental needed functionality of the ESPR architecture and a 
highest-level organization can be developed from the defined macro level instruction set. 
Micro level instructions and a detailed implementing ESPR architecture can then be 
developed to implement the presented macro level instructions. Keeping the ESP 
requirements ( described above) in mind, it is envisioned to design a version of the ESPR 
to process packets at or near line speeds of 100 Mbps. Depending on the network macro 
instructions, the highest performance architectural design of ESPR (ESPR.V2) is being 




Ephemeral State Processor (ESPR) 
This chapter discusses the characterization and requirements needed for designing 
an Ephemeral State Processor (ESPR), highest level functional organization and its basic 
micro instruction set fonnats and types. Finally, it also presents the equivalent micro 
instruction implementation of the already defined high level macro instructions. 
3.1 ESPR Requirements Summary 
Based on the given ESP mechanism, to be implemented into Programmable Logic 
Device (PLD) technology, the basic processor building blocks and processor 
characteristics/requirements of ESP service can be given as, 
• An Ephemeral State Store (ESS) is needed. 
• Compatibility with the ESP protocol and packet format is required. 
• Must support a predefined set of network macro-instructions. 
• Develop an ESP architecture that has an upgrade path and can be 
performance boosted via systematic steps (such as deeper pipelining of a 
pipelined architecture, move from issuing one instruction per-clock-cycle to 
two instructions per-clock-cycle, instantiation of multiple copies of the basic 
ESPR architecture to a single PLD platform in a network node resulting in a 
multiprocessor configuration). 
• Support an in-field upgrade path (e.g., via a software upload) 
Following the above ESP requirements, the ESP processor requires a reduced 
latency ESS, which is designed as an associative memory to store ephemeral (tag, value) 
pairs. A packet storage unit is required to store and send packets to the output, and a way 
of indicating the state of this packet to the next node in the path is done using a "code 
register". The third requirement of being able to implement the network macro-
instructions in the node requires development of a set of micro-instructions, supporting 
high-level architectural configuration and the Instruction Set Architecture (ISA) which 
will be used to implement ESP macro-instructions. 
22 
Requirements/characteristics of the ISA of an ESPR can be high-lighted as 
follows: 
• The number of micro-instructions should be minimized in support of the 
concept of "lightweight" ESP. 
• The number of instruction formats should be minimized. 
• All instructions should be of the same length allowing simplification of the 
architecture. 
• A minimum number of addressing modes should be utilized within the 
instructions. 
• Most data to be processed is in 64-bit format. 
• The architecture should offer high performance but yet it should be kept 
simple (pipelined initial version issuing one instruction per-clock-cycle in the 
spirit of being "lightweight''). 
• The ESS should be integrated into the ESPR pipelined architecture in a 
manner to hide latency. 
3.2 Highest Level Functional Organization of ESPR 
Based on the previously-presented macro instructions of ESP, an initial ESPR 
functional organization can be developed as follows. The required functional units of an 
ESPR will be the ESS, Instruction Memory, Packet Storage RAM, Macro and Micro 
Controllers, Register blocks and basic processor modules. A high level view of an ESPR 
illustrating its main functional units is shown in Figure 3.1. Primary inputs and outputs of 
the system are also shown. 
The overall operation of the ESPR within a network node will be as follows. The 
distinction between an ESP packet and other packets is carried out by external logic 
inside the router and the ESPR sees only the ESP packets. When ESP is activated, the 
Packet RAM in ESPR receives the ESP Packet and the Macro Controller decodes the 
macro opcode in the packet to point to a sequence of micro level instructions held in the 
Micro Instruction Memory, which must be executed to implement the incoming macro 
instruction. The remaining ESPR functional modules implement the sequence of micro 












































.... GPR, TR, 
VR 
GPR - General Purpose Registers 
TR - Tag Registers 
YR - Value Registers 











8 / • 
Output Code 
Thus, the ESPR processes an incoming packet and the resultant packet is either 
forwarded or dropped. Primary outputs of the ESPR are the resultant Output Packet if it is 
forwarded and the resultant Output Code for either DROP, FORWARD or ABORT. The 
8-bit Output Code Register (OCR) generates Output Code for the corresponding 
24 
instructions of FORWARD, ABORT or DROP to the indicate status of the current packet 
to the next available ESP capable router. 
An ESPR _ ON starts the ESPR and a main Reset input helps to reset the entire 
ESPR system. A Configuration Input (CFG_in) provides the Internet Protocol (IP) 
address of the current node and is loaded to the Configuration Register (R2), and a 
Bitmap Input gives the Bloom filter bitmap identifier value of the current node and is 
placed in Bitmap Register (R3). The entire ESP packet is sent to Packet RAM in ESPR in 
32-bit blocks and is output in 32-bit blocks. An Input Packet RAM can be placed off chip 
to buffer the input packets and an Output Packet RAM can also be placed off the ESPR 
chip to test the output packets. Maximum length of an ESP packet is 512-bytes ( 4096 
bits) and the Packet RAM can receive up to 128 blocks. 
A typical ESP packet fonnat is shown in Figure 2.1 and Figure 2.2. The Flags 
field (8-bits) has ' LOC' (3-bits), E (Error- I bit), R (Reflector - 1 bit) and 3 unused bits 
reserved for future use. In some cases, as one of the normal outcomes of packet 
processing, the packet needs to be prevented from being processed any further on the way 
to their destination. To accomplish this, the LOC bits are set to 'O' and the packet is 
simply forwarded to the destination. In some cases, to indicate an error encountered in 
packet processing, the E bit is set to ' 1 ' in the processed packets to indicate that error to 
downstream routers to keep the packet from further processing. 
An ESP packet is retrieved into the Packet RAM (PR) of the ESPR of Figure 3.1, 
as 32 bit blocks from Input Packet RAM (IPRAM) placed off the ESPR chip and the 
output processed ESP packet is given to Output Packet RAM (OPRAM). This is shown, 
focusing on the involved functional units, in Figure 3.2. 
. ACK in Ready ~ PR.ready ~ 
Input :: IDV Packet RAM in ESPR (PR) Output . 
Packet EOP in 
~ Packet 
~ EOP out 
RAM RAM 
• (OPRAM) (IPRAM) 
OPRAM 
32-bit block 32-bit block . . 
Figure 3.2. Packet Processing in Packet RAM of ESPR 
25 
When ESPR is switched on, it is ready to receive and process packets, and then 
the PR (Packet RAM) in ESPR waits until the IDV (Input Data Valid) signal goes high 
from IPRAM. When IPRAM is ready to send packet blocks, it asserts the IDV signal 
high and places 32 bit packet blocks onto the 32-bit Packet Block bus. The IDV signal 
should remain high for at least 2 blocks and the PR starts receiving packet blocks. The 
ACK_in (Acknowledge input) signal goes high for every packet block indicating proper 
receipt of a packet block. When the end of a packet is reached, the IPRAM sends the 
EOP _in signal with the CRC value for the entire packet. 
Similarly, when OPRAM is ready to receive a processed ESP packet, it sends the 
OPRAMready (Output RAM ready) signal to PR. Then the PR sends the address and 32-
bit blocks to OPRAM. The PR sends packet blocks to OPRAM till the length of the entire 
packet is reached and then sends the EOP _ out (End of Packet output) signal to indicate 
the end of the ESP packet. Then the PRready signal goes high to indicate that the PR is 
ready to receive the next packet. This packet processing module also has a 64-bit input 
(not shown here) from a multiplexer so it may choose between the values from registers 
or different pipeline stages of ESPR when executing the STPR (Store To Packet RAM) 
micro instruction. It also has a 64-bit output (not shown here) to the register blocks for 
the LFPR (Load From Packet RAM) micro instruction, which is explained later in the 
description of the pipelined ESPR architecture. 
3.3 Micro Instruction Format, Types, Architecture and Definition 
In this section the goals, objectives and approach in designing a basic micro 
instruction set architecture on basis of the defined macro instructions are discussed. It 
also covers different instruction types (classes) of micro instructions and their basic 
instruction format. These micro instructions allow one to implement the previously 
presented macro instructions. 
A high priority goal and objective of this instruction set architecture design is to 
have an instruction set which will lead to high performance and low cost/complexity of 
the ESPR system. This leads to the design of an Instruction set that has fixed length 
instructions a minimum number of formats and classes. It provides the ESPR with a 
' 
potential for easy decoding and implementation of the instructions and with less time in 
26 
• 
decoding and implementation, potentially leading to a high performance and low 
cost/complexity system. 
The use of a 64-bit width for the micro instructions is addressed as follows. The 
'Branch' type instructions are identified to use the most number of bits (32) in their 
instruction format for implementation of the existing ESP macro instructions. The micro 
instruction sequences required to represent the above presented macro instructions exceed 
256 address locations in memory and so a convenient number (I 6-bit) is used to represent 
the instruction memory address locations. Considering the evolving growth of ESP and 
the future possibility of arising additional complicated macro instructions, the micro 
instruction width was felt to be best set at 64-bits. Also, to achieve the goal of designing a 
lightweight ESPR, to avoid complexities like register renaming etc., in the design of 
future ESPR versions and to support the growth of micro opcode and register file(s) size, 
64-bit instructions will be supported by the ESPR. 
3.3.1. Micro Instruction Format 
The basic instruction format for all micro instructions will be as shown in the 
following Figure 3.3 with the individual field definitions given in Figure 3.4. 
OP RD RSI RS2 TR l VR Iv u Vv l S AIO SHAMT 
(16 bit Br./Jmp Addr/Imm/Offset 
Figure 3.3. Micro Instruction Format 
The opcode specifies the micro operation. Some of the defined macro instructions 
require arithmetic, associative and commutative operations that are performed in these 
micro instructions using operands specified by RSI and RS2 and the result is written into 
RD. T and V fields indicate whether TR and VR is used either as a source or destination. 
The W field specifies the general-purpose register write which indicates that the 
instruction uses destination register operand RD. The AIO and SHAMT fields are also 
sometimes required in these operations. ESS is accessed using the fields TR and VR. 
LMOR is set to 'I' when the ESPR encounters an operator <op> in the COMP ARE, 
COLLECT or RCOLLECT macro instruction. 
27 
0 
OP - Opcode (6 bits) 
RD - Register Destination (5 bits) 
RS I -Register Source I (5 bits) 
RS2 - Register Source 2 (5 bits) 
TR - Tag Register (5 bits) 
T - Tag Register Source or Destination ( I bit) 
0 - Source, l - Destination 
VR- Value Register (5 bits) 
V - Value Register Source or Destination (1 bit) 
0 - Source, I - Destination 
U - Unused 
W - General Pwpose Register Write (1 bit) 
L - LMOR (Load Micro Opcode Register) ( I bit) 
S - Sign bit used in Immediate Type Instructions to denote the sign 
of the immediate value. 
AIO - Address, Immediate, Offset [Address, Immediate Value and Offset (16 bits)] 
SHAMT - Shift Amount (6 bits) 
Figure 3.4. Field Definitions 
3.3.2. Micro Instruction Types (Classes) 
The basic micro instruction types (classes) are designed based on fundamental 
micro operations required to implement the macro instructions and are developed to 
implement the macro operations correctly and completely. A description of the 
instruction types and their functionality is as follows. Detailed descriptions and formats 
of individual micro instructions are described in Appendix A. 
3.3.2.l.ALU / SHIFT Type Instructions 
The necessity o f this type of instruction anses from the COLLECT macro 
operation, which needs associative and commutative operations. Other macro instructions 
also need increment and decrement operations. The instructions under this type are, 
ADD, SUB, INCR, DECR, OR, AND, EXOR, COMP, SHL, SHR, ROL and ROR. 
The instruction fonnat for this type of instruction is shown in the following Figure 3.5. 
3.3.2.2.Immediate Type Instruction 
The one instruction of this type is MOVI. It loads immediate values into registers 
and its format and definition is as shown in the following Figure 3.6. 
28 
63 58 57 53 52 48 47 43 42 25 24 23 6 5 0 





u 14 u ISHAMT 
Instruction Operation Description 
ADD Addition Computes Sum of two operands 
SUB Subtraction Computes Difference of two 
operands 
INCR Increment Increments an operand by I 
DECR Decrement Decrements an operand by I 
OR Logical OR Logical OR of two operands 
AND Logical AND Logical AND of two operands 
EXOR Logical EXOR Logical EXOR of two operands 
COMP Complement Logical NOT of two operands 
SHL Shift Left Logical Left Shift 
SHR Shift Right Logical Right Shift 
ROL Rotate Left Logical Rotate Left 
ROR Rotate Right Logical Rotate Right 
Figure 3 .5. ALU/SHIFT Type Instruction Format and Definition 
63 58 57 53 52 24 23 22 21 6 5 0 
u 16 bit Imm Val u 
Instruction Operation Description 
MOVI Move Immediate Moves immediate value to register 
Figure 3 .6. Immediate Type Instruction Format and Definition 
3.3.2.3.Branch / Jump Type Instructions 
These instructions check conditions and conditionally execute instructions based 
on the checked conditions. All macro instructions involve checking conditions based on 
high-level language constructs such as IF ... ELSE. These micro instructions perform 
similar functions at a lower level. The instructions of this type are BRNE, BREQ, 
BRGE, BLT, BNEZ, BEQZ, JMP and RET. Figure 3.7 shows the format and 
definition. 
29 
63 58 57 53 52 48 47 43 42 22 21 65 
OP RS l RS2 u 16 bit Br./Jmp Addr I 
Instruction Operation Description 
BRNE Branch on NOT Branches to a different location 
Equal specified by 16-bit Address on 
inequality of two operand values 
BREQ Branch on Branches to a different location 
Equal specified by 16-bit Address on 
equality of two operand values 
BRGE Branch on Greater Branches to a different location 
or Equal specified by 16-bit Address on 
greater than or equality of two 
operand values 
BLT Branch on Less Branches to a different location 
Than specified by 16-bit Address on 
comparison of less than operation 
of two operand values 
BNEZ Branch on NOT Branches to a different location 
Equal to Zero specified by 16-bit Address, if the 
operand value is not equal to zero 
BEQZ Branch on Branches to a different location 
Equal to Zero specified by 16-bit Address, if the 
operand value is equal to zero 
JMP Jump Jumps to a location specified by 
16-bit Address 
RET Return Returns from a location to the 
normal PC value 
Figure 3.7. Branch/Jump Type Instruction Format and Definition 
3.3.2.4.LFPR / STPR Type Instructions 
0 
u 
LFPR (Load From Packet RAM) and STPR (Store To Packet RAM) instructions 
are mainly useful in retrieving/placing information from/to the packet to/from registers. 
All macro operations require (tag, value) operands in the packet to be retrieved/placed 
from/to separate registers/Packet RAM. The retrieved values are used to perform local 
calculations and operations in modules of ESPR. These instructions are used to get/put 
tag or value from/to specific fields at a particular offset of the packet to/from local 
General Purpose, Tag or Value Registers (GPR/ TR/ VR). The instructions of this type 
are LFPR and STPR which have the format as shown below in Figure 3.8. 
30 
63 58 57 53 52 




43 42 383736 323130 24232221 
Operation 
Load From Packet 
RAM 
Store To Packet 
RAM 
16 bit Offset 
Description 
Load value at a particular offset 
from the packet to register 
Stores values to a particular offset 
in packet from a register 
6 5 0 
u 
Figure 3.8. LFPR/STPR Type Instruction Format and Definition 
3.3.2.5.GET I PVT Type Instructions 
These instructions are directly equivalent to macro get/put instructions and are 
useful in detailed accessing of ESS. The GET instruction checks to see whether the 
specified tag exists in ESS, if so checks validity of the value and returns the value if 
found. The PUT instruction places the (tag, value) pair in ESS. The BGF and BPF 
instructions branch to a different location specified by Br.Addr on failure of GET and 
PUT operations respectively. Figure 3.9 shows the format and definition for GET and 
PUT instructions. 











Branch on GET 
Failed 
Branch on PUT 
Failed 
22 21 6 5 0 
16 bit Br. Addr u 
Description 
Retrieves the value bound to a tag in ESS 
Places a (tag, value) pair in ESS 
Branches to a different location 
specified by 16-bit address on Failure of GET 
operation 
Branches to a different location 
specified by 16-bit address on Failure of PUT 
operation 
Figure 3.9. GET/PUT Type Instruction Format and Definition 
31 
3.3.2.6.Packet Related Instructions 
The instructions of this type are IN, OUT, FWD, DROP, SETLOC, ABORTl 
and ABORT2. These instructions are used to Input, Output, Forward or Drop a packet 
respectively and the ABORT instructions sets the LOC bits to zero and set/unset the E bit 
in the packet and then forwards the resultant packet. Its format and definition is shown in 
Figure 3.10. 
63 58 57 0 
OP u 
Instruction Operation Description 
IN Input Inputs a packet to Packet RAM 
OUT Output Outputs resultant code for either 
DROP or FWD 
FWD Forward Forwards the packet 
DROP Drop Drops the packet 
ABORT! Abort Sets LOC bits to zero in packet and 
forwards 
ABORT2 Abort Sets LOC bits to zero and E bit to ' l ' 
in packet and forwards 
SETLOC Set LOC bits Sets LOC bits to a specified LOC 
(Location) value 
Figure 3.10. Packet Related Instruction Format and Definition 
3.3.3. Further ESPR Architecture Definition 
Based on the above-defined Instruction Types/Classes and their formats, 
additional specific functional units and components of an ESPR system required to 
complete the definition of its Instruction Set Architecture (ISA) can be defined as 
follows: 
• The ESPR architecture will be Register / Register (R/R), Reduced Instruction Set 
Computer (RISC) type architecture. 
• 32 General purpose 64 bit registers (RO, R l ...... R3 I) - 28 available to Programmer 
(R4, RS ... . ... R3 I) 
• Restricted registers 
32 
• RO- loaded with '000 . .... 0' 
• RI - loaded with '000 ...... I ' 
• R2 - Configuration Register which holds the node's IP address 
• R3 - Bitmap Register which holds the current node's bitmap identifier value 
• PR - Packet RAM to store Input Packets 
• 32 - Sixty Four (64) bit Tag Registers (TR) and 32 - Sixty Four (64) bit Value 
Registers (VR) - 31 available to Programmer (TRI, TR2 .... . .. R31), (VRl , 
VR2 ...... VR31) 
• TR0, VR0- loaded with '000 ..... 0' 
• 8 bit Output Code Register (OCR) to indicate status of the packet in current node 
• 0 - No status, 1 -FWD code, 2-ABORTl , 3 -DROP, 4-ABORT2 
• 8 bit Flag Register (FLR). FLR consists of the bit pattern to set in flags field of the 
packet. 
• 8-bit Micro Opcode Register (MOR) to store the micro opcode in packet (<op>) 
instructions particularly used in a defined 'COMPARE', 'COLLECT' and 
'RCOLLECT' operation 
• Associative Memory - Ephemeral State Store (ESS) 
• 64 bit wide Instruction Memory addressed by 16-bit pointer (MAX - 2** 16 
locations) 
• CRC block - To calculate Cyclic Redundancy Check (CRC) of the received packet 
and to place it back at the end of the packet in PR. 
• 2-bit Condition Code Register (CCR) to indicate the Failure of GET and PUT 
operations in ESS. 
• PC- Program Counter 
• Macro Controller - Decodes the macro opcode and generates the equivalent micro 
code location address to PC. 
• Micro Controller - Controls the ESPR system at the micro level by generating control 
signals 
• 64 bit ALU and SHIFTER - used in the arithmetic and logical computations 
• Decoders and Multiplexers. 
33 
3.4 Micro Instruction Implementation of ESP Macro Instructions 
The previously presented five ESP macro instructions can now be implemented 
with sequences of the presented micro instructions which can be shown in the following 
figures - Figure 3.11 through Figure 3.15. 
LFPR <Offset - 3> TRI 
GET VRI, TRI 
BGF Addrl 
INCR R4, VRI 
MOV VRI,R4 
PUT TRI, VRI 
BPF Addr2 
Addr3: LFPR <Offset - S> R4 
MOV RS, VRI 
BGE R4, RS, Addr4 
DROP 
Addrl: MOV VRt, RO 







Figure 3.11. Equivalent Micro Instruction Sequence for 'COUNT' 
LFPR <Offset - 3> TRI 
GET VRI, TRI 
LFPR <Offset - S> RS 
BGF Addrl 
MOV R4, VRt 
LFPR <Offset - 7> MOR 
NOP 
R4 <OP> RS Addrl 
DROP 
Addrl: MOV VRI, RS 






Figure 3.12. Equivalent Micro Instruction Sequence for 'COMPARE' 
34 
LFPR <Offset - 3> TRI 
GET VRI, TRI 
BGF Addrl 
LFPR <Offset - 7> TR2 
GET VR2, TR2 
LFPR <Offset - 5> R4 
BGF Addr2 
MOV RS, VR2 
LFPR <Offset - 9> MOR 
NOP 




Addr2: MOV VR2, R4 
Addr3: PUT TR2, VR2 
BPF AddrI 
DECR R6, VRI 
MOV VRl,R6 
PUT TRI, YRI 
BPF Addrl 
BEQZ VRI, Addr4 
DROP 
Addr4: STPR <Offset - 5> YR2 
FWD 
OUT 
Figure 3.13. Equivalent Micro Instruction Sequence for 'COLLECT' 
LFPR <Offset- 3> TR2 
GET VR2, TR2 
BGF Addr5 
LFPR <Offset - 7> R8 
MOY R6, VR2 
OR R7, R6, R8 
MOY YR2,R7 
P UT TR2, YR2 
BPF Addr2 
Addr0: LFPR <Offset - 5> TRl 
GET VRI, TRI 
BGF Addrl 
INCR R4, YRI 
MOY VRI, R4 
P UT TRI, VRI 
BPF Addr2 
Addr3: LFPR <Offset- 9> R4 
M OY RS, YRl 
BGE R4, RS, Addr4 
DROP 
Figure 3.14. Equivalent Micro Instruction Sequence for 'RCHLD' 
35 
Addrl: MOY VRl, RI 





Addr4: STPR <Offset - 7> R3 
FWD 
OUT 
AddrS: MOY YR2, RO 
PUT TR2, VR2 
BPF Addr2 
JUMPAddrO 
Figure 3.14. Equivalent Micro Instruction Sequence for 'RCHLD' (continued) 
LFPR <Offset - 3> TRI 
GET YRl, TRI 
BGF Addrl 
LFPR <Offset - 5> TR2 
GET YR2, TR2 
LFPR <Offset - B> R4 
BGF Addr2 
MOV RS, VR2 
AND R6, RS, R4 
BEQ R6, R4, Addr3 
Addr4: OR R7, RS, R4 
MOV VR2,R7 
PUT TR2, VR2 
BPF Addrl 
Addr3: LFPR <Offset - 7> TR3 
GET YR3, TR3 
LFPR <Offset - D> RS 
BGF AddrS 
MOV R9, VR3 
LFPR <Offset- F> MOR 
NOP 
VR3 +- RS <op> R9 
JUMP Addr6 
Addr2: MOV VR2, RO 
MOV RS, VR2 
BNEZ R4, Addr4 
JUMPAddr3 
AddrS: MOV VR3, RS 
Addr6: PUT TR3, VR3 
BPF Addrl 
MOV RIO, VRl 
MOV Rll, VR2 
BEQ RIO, Rll, Addr7 
DROP 
Figure 3.15. Equivalent Micro Instruction Sequence for 'RCOLLECT' 
36 
Addr7: LFPR <Offset - 9> TR4 
GET YR4, TR4 
BGF Addr8 
INCR R12, YR4 
MOY VR4, Rl2 
PUT TR4, YR4 
BPF Addrl 
AddrlO:LFPR <Offset - 10> Rl3 
MOY Rl4, YR4 
BGE Rl3, Rl4, Addr9 
DROP 
Addr8: MOY YR4, RO 





Addr9: STPR <Offset - B> R3 
FWD 
STPR <Offset - D> YR3 
OUT 
Figure 3.15. Equivalent Micro Instruction Sequence for 'RCOLLECT' (continued) 
The above presented micro instruction sequences for the five defined macro instructions 
utilize most of the Packet Related instructions, all of the GET / PUT type instructions, 
LFPR/STPR type instructions, most of the JUMP/BRANCH type instructions, some 
(INCR, DECR, OR, AND) of ALU/SHIFT type instructions and the MOY instruction. 
The rest of the ALU/SHIFT type instructions are included to be utilized in the 
COMPARE, COLLECT AND RCOLLECT macro instruction. The rest of the unused 
micro instructions are reserved for future macro instructions. 
37 
Chapter Four 
Associative Ephemeral State Store (ESS) 
A uruque requirement of ESP is a temporal Ephemeral State Store (ESS) 
associative memory where values are bound to tag fields and a (tag, value) pair is active 
only for a given time period resulting in a reduced capacity store allowing a more light 
weight and scalable processing system. The ephemeral part of ESS is that the value 
corresponding to the tag is accessible only for a fixed amount of time and bindings 
disappear after the Expiration time, '-r' seconds. Packets leave useful information in the 
ESS after computations for later packets to retrieve which help in implementing various 
end-to-end network services. This Chapter discusses the detailed design of ESS and its 
individual components. 
4.1 ESS Design 
The ESS design is based on the method of combining some extra logic with a 
normal random access memory to create associative access. Each location is stored with a 
Value, Expiration time and a control bit (empty bit - E) for the associated logic. Tags are 
stored in separate storage space and are used to find whether the required value exists in 
the ESS. It supports two operations, GET and PUT. 
• GET (x): 
• PUT (x, e): 
Retrieves the value bound to tag x, if any. 
Bind the value e to the tag x. 
Depending on the result of GET and PUT, the ESS gives way to support two more 
operations. 
• BGF addr: Branch on GET Failed to address location indicated by 'addr' 
• BPF addr: Branch on PUT Failed to address location indicated by 'addr' 
38 
The functional blocks of ESS are 
' 
• Block Select Random Access Memory (RAM) used as Content Addressable 
Memory (CAM) 
• Random Access Memory (RAM) 
• Expiration time Calculating block 
• Empty Location Calculating block 
• ESS Controller 
A functional level block diagram is shown in Figure 4.1. Primary Inputs to the 
ESS are TAG, VALUE and GET or PUT operation and the primary outputs are the value 
(for GET operation) and GET Failed (GF) or PUT Failed (PF) depending on the 
operation. The main operation of the ESS is as follows. The CAM is used as a storage 
space for tags and is used to find whether there is a match for the incoming tag. On a 
match it gives the address for the RAM where the values are stored with its respective 
expiration time. Depending on the match, values are accessed based on expiration time. 
The RAM also has a separate empty bit (E) to indicate which location in RAM is empty. 
This is helpful for the PUT operation when writing a new (tag, value) pair. The empty 
location-calculating block is used to determine the empty location in RAM and CAM to 
write a new value and tag based on the empty bits from RAM. A global clock register in 
the expiration time calculating block is used to check for validity of the (tag, value) pair 
by comparing its value with the expiration time field in the ESS. The ESS controller 
generates control signals to all the blocks depending on a GET or PUT operation. 
Components of ESS can be described as follows. 
4.2 Content Addressable Memory (CAM) 
A Content Addressable Memory is a storage array designed to quickly find the 
location of a particular stored value. By comparing the input against the data in memory, 
a CAM determines if an input value matches a value stored in the array. The basic core of 
a CAM has a storage location value and a comparator between the storage location value 
and the input data. The main advantage of a CAM is that its memory size is not limited 
by its address lines and can be easily expanded. It offers increased data search speed by 
















VALUE i;,1, _ 
(From Value Register (VR) 





























GET Failed (GF) 
-
PUT Failed (PF) 
-
Figure 4.1. Functional Block Diagram of ESS 
CAM is used in the ESS design to check whether the (tag, value) pair resides in 
the ESS by comparing the incoming tag with the tags stored in it. To obtain efficient 
search of tags and for high performance GET and PUT operations, a Dual-Port Block 
Select RAM of Virtex FPGA devices will be used in the later presented experimental 
model of ESPR to operate as a CAM. As per the current design, the CAM is 32x64 (built 
using two 16x64 CAMs) and the depth can be increased if need be. It is built (width wise) 
using 8 basic I 6x8 block RAM macros and the depth can also be increased in a similar 
manner by including more basic blocks. As the CAM output is a decoded address, the 
40 
depth is expandable without additional logic. Each CAM location has a single address bit 
output. When data is present at a particular address, the corresponding address line goes 
high and goes low when it is not present. The basic 16x8 CAM and l 6x64 CAM for the 
ESS design is shown in Figure 4.2 and Figure 4.3 respectively. 
[7:0J 
TAG (63:0J 




CLK WRITE ► 
Data Match 8 
MATCH ENABLE 
- ► MATCH RST 
CLK MATCH 
Write Port (A) 
(I X 4096) 
Read Port (B) 
16 MATCH 
(16 X 256) MATCH SIGNAL 










[ 15 :OJ 





[ I 5J 
[OJ 
MATCH [15:0J 
The unique Virtex block RAM approach is used to build the l 6x8 CAM block. This 
methodology is based upon the true Dual-Port feature of the block Select RAM. Ports A 
and B can be configured independently, anywhere from 4096-word xi -bit to 256-word 
x 16-bit. Each port has separate clock inputs and control signals. The internal address 
mapping of the block Select RAM is the primary feature in designing a CAM in a true 
Dual-Port block RAM. Each port accesses the same set of 4096 memory locations using 
an addressing scheme dependent on the port width. This design technique configures port 
A as 4096-word x 1-bit wide and port B as 256-word x 16-bits wide. Each port contains 
independent control signals. Port A is the CAM write port, and port B is the CAM read or 
match port. Both the read and write CAM ports are fully synchronous and have dedicated 
clock and control signals. 
4.2.1. Write Operation 
The CAM write port inputs are an 8-bit data bus (Data_ Write) in Figure 4.2, an 
address bus (ADDR - four bits to address the 16 locations), control signals 
(ERASE_ WRITE and WRITE_ENABLE) and the clock (CLK_ WRITE). The 4-bit 
address bus selects a memory location. Writing new data into this location is equivalent 
to decoding the 8-bit data into a 256-bit 'one-hot' word and storing the 256-bit word. The 
location of the 'one' . is determined by the 'one-hot' decoded 8-bit value. Port A, 
configured as 4096 x 1, has a 1-bit data input and a 12-bit address input. The data input is 
addressed to ' one' for a write and 'zero' for an erase, and the 8-bit data plus the 4-bit 
address is merged in a single I 2-bit address input. With the 8-bit data as MSB and 4-bit 
address as LSB, the resulting 12-bit address input decodes the 8-bit data and selects one 
of the 16 memory locations simultaneously. The clock edge stores a 'one' or a 'zero' at 
the corresponding location depending on write or erase. 
4.2.2. Read Operation 
Port B of Figure 4.2 is configured as 16x256 and 8-bit data (Data_Match) to be 
searched is connected as an 8-bit address bus. Using the fact that a particular location 
corresponds to the decoded 8-bit data, the matching operation is equivalent to searching 
I 6 locations for specific 8-bit data at the same time and port B generates the matches 
42 
concurrently. The MATCH SIGNAL is asserted high when a match occurs and the 16-bit 
output is the decoded value. MATCH_ENABLE and MATCH_RST are the control 
signals for port B. 
The base I 6x64 CAM for this ESS design (32x64) can be obtained by using eight 
I 6x8 CAMs and extra AND gates. Eight l 6x8 CAMs allow for a 64-bit width, with the 
first 8 bits stored in CAMO, next 8 bits in CAMI and so on. A match is found only if all 
8-bit locations match the specified incoming 64-bit tag. An 8-input AND gate for each 
CAM output signal provides the final decoded address. The I 6-bit MATCH output is 
then encoded to provide the 4-bit Address for ESS where value and expiration time are 
stored. Currently the ESS design has only 32 locations. This is a sufficient depth to 
validate the functionality and design of the ESS and to later experimentally validate the 
ESPR with the ESS included. 
4.3 Random Access Memory (RAM) for Storage of Value, Expiration Time and 
Empty Bit 
The RAM storage of Figure 4.1 is used to store value (64-bits), expiration time (8-
bits) and an empty bit ( I-bit). The current design has 32 locations and the address bits 
come from the CAM and are used to store and retrieve value, expiration time and empty 
bit. The empty bit in all locations is set to 'I' initially indicating the location is empty and 
is changed to 'O' whenever the location is written with a value. 
In case of creation of a new (tag, value) binding in the ESS, the I-bit empty 
location value is checked for the availability of space in ESS rather than checking the 
already existing 8-bit expiration time values of all locations. Thus this compromising 
solution of an additional I -bit space for each location in the ESS and an empty location 
check on these I-bit values are preferred over comparing the 8-bit expiration time value 
of all locations with a value of zero. It also significantly reduces and replaces the logic 
overhead involved in having 8-bit comparators for each location of the ESS with a 1-bit 
value check on each location. 
43 
4.4 Expiration Time Calculating Block 
This block is shown in Figure 4.1 and 4.4. A counter operating as a very low 
frequency clock functions as a global clock register (see Figure 4.4). Whenever a value is 
written ( corresponding to any new Tag) to any location, expiration time is calculated by 
adding the global clock register value with the lifetime '•' and it is written in the 
expiration time field in the RAM. The validity of the value in the RAM is checked by 
comparing whether the entry in the expiration time field is less than the global clock 
register value. 
When 8-bits are used to represent the expiration time values, there may be 
possibilities of 'wrap around' situations, in which case, the values may incorrectly be in 
the ESS for a longer time. As this is an initial functionality testing version of ESPR, 8-
bits is used to represent the expiration time value to check the functionality of the 
expiration time calculating block where [8] suggests 10-bit values are sufficient for the 
assumption of a IO second lifetime and a 0.1 resolution clock. In order for correct 
functional operation of ESS in an experimental deployment, expiration time should be 
represented by a larger number of bits (e.g.: 10 bits or more) to avoid the possibility of 
'wrap around' situations. 
EXP. TIME (8 bits) from RAM i EXP. TIME (8 bits) to RAM 
Lifetime Expired to ESS c~ 
GLOBAL CLOCK REGISTER 6 
Figure 4.4. Expiration Time Calculating Block 
4.5 Empty Location Calculating Block 
This block (see Figure 4.1) is used to determine the new location to write value 
and expiration time in RAM and tag in CAM. The empty bits from RAM are input to this 
block and it determines the output address bits to write a new (tag, value) pair. 
4.6 ESS Controller 
Depending on the GET or PUT operations, the controller of Figure 4.1 generates 
control signals to all the blocks in ESS. The inputs to the controller are GET or PUT 
operation from the instruction decode stage of ESPR, the check empty signal from the 
empty location calculating block, the lifetime expired signal from the expiration time 
calculating block and the MATCH SIGNAL from the CAM. Two outputs are the control 
signals - GF (GET FAILED) or PF (PUT FAILED). WRITE_ENABLE, 
ERASE_ WRITE, MATCH_ RST and MATCH_ ENABLE are output control signals to 
the CAM, we (write enable) is a control signal to RAM and cnt (count) is a control signal 
to the Expiration time calculating block. 
4. 7 Operations Performed in ESS and Flowchart 
The ESS operations - Flowchart shown in Figure 4.5 describes GET and PUT 
operation as will be described here and below. 
4.7.1. GET Operation 
The incoming tag from the tag register is given to the CAM to check for a match 
and for the availability of a value in the ESS. If a match occurs, the CAM asserts the 
MATCH SIGNAL high and gives the address to the RAM to get the value. The 
expiration time in that address is read out and given to the expiration time block to check 
the validity of the data. If the value is not expired, it is read out. If it is expired, a null 
value is returned and the controller gives a GF (GET FAILED) output indicating a GET 
failure. This location is then cleaned up, and the empty bit in that location is set to 'l ' 
indicating that the location is empty. If there is no match, a null value is returned and the 
controller generates a GF (GET FAILED) output indicating a GET failure. 
4.7.2. PUT Operation 
The incoming tag from the tag register is given to the CAM to check whether the 
(tag, value) binding already exists in the ESS. If a match occurs, the CAM asserts the 
MATCH SIGNAL high and gives the address to the RAM to get the value. The 
expiration time in that address is read out and given to the expiration time block to check 
45 
the validity of the data. If the value is not expired, a new value is written in that location 
and the empty bit is set to 'O'. If the value is expired, a new value is written, the empty bit 
is set to 'O' and the expiration time is reset again in that location. If there is no match, the 
empty location-calculating block checks to find the empty location to write a new tag and 
value. If there is an empty location, the address of the new location is given to the CAM 
to write the tag and given to RAM to write a value, expiration time and the empty bit is 
set to 'O'. If there is no empty location, the controller generates a PF (PUT FAILED) 
output indicating a PUT failure. Whenever the PUT operation is completed successfully 
the empty bit of the location is reset to 'O' indicating that the location is filled. 
4.7.3. Branch on GET Failed (BGF) / Branch on PUT Failed (BPF) Operation 
The BGF I BPF micro instructions branch to a 16-bit address specified in the 
instruction, on failure of GET / PUT respectively. These micro instructions are actually 
performed by the Branch Detection Unit of the ESPR depending on the result of ESS 
operations. 
4.8 ESS Scalability, Size and Performance 
The ESS organization, architecture and design is a key functional unit related to 
the performance and scalability of the ESP service. We now discuss the scalability and 
performance of the current ESS design. The main components of the ESS described 
above are CAM and RAM, and the scalability has to be defined in terms of these 
memories. The CAM and RAM memories described above can be implemented using 
core block RAM of PLD technology chips which the ESPR would be implemented to. 
The size of these memories can be expanded by adding the required core RAM on-chip, 
by adding the required bits in the existing design, without any change to the existing 
controller design. 
Thus the presented design for ESS is scalable, dependent upon the capacity of 
block RAM memory available in PLD chips and the depth of ESS can be extended 
accordingly to that. Under this limitation the same organization, architecture and design 
for ESS can be used to implement ESS off chip. Therefore, an off-chip implementation of 
ESS is also scalable. The price paid here is a slight decrease in performance of ESS 
46 
because of the time required by the signals to travel through the additional circuitry in 
reconfigurable PLD chips. 
The Current ESS design only has 32 (32x l37) locations. This was sufficient to 
allow its functional validation. The same design can be expanded to a deeper ESS 
assuming sufficient on-chip core RAM. For ESPRs implemented to PLO technology, the 
core RAM determines the size and performance of ESS, and it can be tuned as necessary 
by the utilizing application by including additional block RAM in the design. 
GET 
(i) TAG given to CAM 
(ii) Get MATCH SIGNAL 
and MATCH ADDRESS 
y N 
PUT 
(i) TAG given to CAM 
(ii) Get MATCH SIGNAL 
and MATCH ADDRESS 
EMPTY 
(i) ADDR given 
to RAM 
(i) GF = I 
(ii) VALUE = 0 
(i) ADDR given to RAM 
(ii) EXP. TIME read out 
y 
(ii) EXP. TIME 
read out 
y 
(i) GF = I 
(ii) VALUE = 0 
(i) GF = 0 
(ii) Give ADDR to 
RAM and VALUE 
is read out 
Write VALUE, 
EXP. TIME 
and EMPTY to 
RAM 
N 





EMPTY to RAM 
Write VALUE 
and EMPTY to 
RAM 
Figure 4.5. ESS Operations - Flow Chart 
47 
N 
PF = I 
Chapter Five 
Ephemeral State Processor Version 1 (ESPR.Vl) Architecture 
This chapter deals with the development of the Ephemeral State Processor 
Architecture - Version 1. Later in this thesis, to improve the performance needed, a 
second version of ESPR will be developed and tested. The chapter describes the overall 
system architecture design of ESPR including all connectivity between functional 
modules, and a performance improving pipelined version - ESPR.Vl with its micro 
controller design. 
It is envisioned that the Ephemeral State Processor (ESPR) that performs ESP 
functions will be hardwired in the network layer of routers. It does processing on 
incoming packets and packet processing can occur before or after the routing lookup. The 
packets will come in through the input ports and be processed by the ESPR and passed 
out to the route lookup or output ports and forwarded to the next available ESP capable 
router. 
The incoming packet is stored in the Packet RAM and the ESPR Macro 
Controller decodes the macro opcode network instruction and generates the address of the 
first micro instruction that must be executed to implement the decoded macro instruction. 
The micro instruction memory (see Figure 3.1) is preloaded with the set of micro code 
sequences for corresponding network macro instructions. After a particular micro level 
program for a network instruction is initiated in memory, the Micro Controller takes over 
and generates control signals for all functional modules of the architecture as each micro 
level instruction executes. After the processing is over, according to instructions, the 
packet is either silently dropped or passed on to the next available ESP capable router. 
The pipelined architecture implements required processing of received packets. 
This chapter also evaluates the requirements needed for the development of this 
processor and how they are met and a brief discussion of the general-purpose versus 
special-purpose approach for ESPR design is also presented. Finally an analytical 
performance model for ESPR is devised and presented. 
48 
5.1 Basic Register/Register Architecture Development with ESPR Components 
The Micro Instruction Memory of the basic ESPR architecture shown in Figure 
5.1 is preloaded with the micro code sequences for corresponding network macro 












From Inst. 16 To Mux 





64 From M icrocontroller Output Packet 
<VAR. FIELD > 
PR 
64 













































Fwded From PR 















Output value to --------
Register files 
Figure 5.1. Basic ESPR Architecture 
49 
Flag Register (FLR) is an 8-bit register which holds the corresponding bit patterns for the 
setting of 'LOC' and 'E' (Error) bits in the packet and it is always given to the Flag field 
in the first location of Packet RAM. The CRC-32 block calculates the Cyclic Redundancy 
Check code (CRC) using CRC-32 polynomial and places the resultant CRC code back in 
the packet. The Output Code Register (OCR) generates output code depending on 
whether the packet is Aborted, Forwarded or Dropped. The opcode field in the packet is 
given to the Macro Controller, which on decoding the opcode generates the required 
address to store in the PC of the micro instruction memory. This address corresponds to 
the address location in the Instruction Memory where the micro code sequences for a 
particular network macro instruction is stored. Register (REG) is used to hold the PC 
value which helps in the RET micro instruction. 
After a particular micro code sequence is initiated in the instruction memory, 
every instruction is decoded and processed. The source and destination General Purpose 
Register (GPR) numbers are decoded from the micro instruction and values are loaded 
from/to the General Purpose Register (GPR) file for further computations. The Tag 
Register (TR), Value Register (VR) numbers are also decoded from the micro instruction 
and values are loaded from/to corresponding register files. The register write signal for 
the three register files is provided from the micro instruction and the register read signal 
for the register files is obtained from the micro controller. 
Ephemeral State Store (ESS) performs the GET and PUT instructions of storing 
the (tag, value) pairs and the Lifetime calculation circuit calculates the expiration time for 
each (tag, value) pair. The Condition Code Register (CCR) stores the resultant GF (GET 
FAILED) and PF (PUT FAILED) bits from ESS. The micro instruction opcode is given 
to the Micro Controller, which decodes it and generates control signals for all functional 
units of the architecture. An 8-bit Micro Opcode Register (MOR) stores the micro opcode 
carried in the packet for COMPARE, COLLECT and RCOLLECT operations and is also 
given to the Micro Controller for decoding. The Load Micro Opcode Register (LMOR) 
control signal for this MOR is obtained from the micro instruction. The ALU and Shifter 
perform the arithmetic and logical computations and store the result back into registers. 
The ALU provides overflow, sign and zero status signals that are stored in a 3-bit status 
register. On an overflow exception, the PC is loaded with a specific address by which the 
50 
microcode sequence aborts and the processing stops. The SIGN EXTEND unit is used to 
extend the I 6-bit value to 64-bit value which helps in the MOVI micro instruction. 
The Macro level system flowchart in Appendix B shows the overall macro level 
operation of the ESPR of Figure 5.1. as packets arrive at the input and the appropriate 
macro level instruction is executed. Each step in the macro instruction flowchart 
corresponds to the execution of a micro instruction. The Micro level system flow charts 
in Appendix C and Appendix D represents the clock cycle by clock cycle operation of the 
micro instructions as they are fetched and executed. Each rectangular function block of 
the micro level system flow chart contains a Register Transfer Level (RTL) description of 
the micro operations executed during the clock cycle associated with the function block. 
The first two function blocks common to all micro instructions represents the instruction 
fetching and each micro operation(s) in each block represent activation of corresponding 
functional modules in the architecture for every clock cycle. After the instruction is 
fetched each instruction is decoded and executed separately. The ESPR architecture and 
system flow charts are developed concurrently. 
5.2 Four Stage Pipelined Architecture (ESPR.Vl) 
To improve the performance of the ESPR by increasing the processing speed, the 
basic ESPR architecture of Figure 5.1 is transformed to a pipelined architecture as shown 
in Figure 5.2. It is a 4-stage pipeline with Instruction Fetch (IF), Instruction Decode (ID), 
Instruction Execute (EX) and Write Back (WB) stages. All instructions advance during 
each clock cycle from one pipeline register to the next. The first stage, Instruction Fetch, 
is common to all instructions. This stage contains the Program Counter (PC), Micro 
Instruction Memory, Register (REG) and a multiplexer. PC is loaded with an address 
from the multiplexer depending on micro/macro instructions, overflow exception from 
ALU or incremented PC value. Instructions are read from the instruction memory using 
the address in PC and then placed in the IF/ID pipeline register. The PC address is 
incremented by one and then loaded back into the PC to be ready for the next clock cycle. 
REG is used to hold the address from PC whenever the JMP micro instruction is 




I OR I~ I I:: 1~ 
~ OR ,.. I HAZARD C: il-.11"'1D ; .,!" IDFI sh 
~:ET.UNIT I 
r M 
ctrl sigs u 4 (24:0] X 
JMP RETr---+~ 
~ MICRO . IF _Flush ~ OCR Value CONTROLLER Fl R Va l11~ 5 INST IN [63:01 
fag Reg R w 
., Tag Reg Write 
. 
!NCR PC LPC 
TR TAG from WB stage IF/ 
tTRD frot~ REG. ID Tag Read out ~ From lnstrnction Overnow Exception (64) WB FILE r 
From Macctrlr Address WB w1ite (TR) 
data • 
~-
.  ·• Waluc R, lo _._ 
Read .. • Value Reg Write from WB stage 4 M VR ~ VALUE 




Reg Rea Reg W1ite I I REG RC:: I • • ~ from WB stage l IR I) Fon W.~ GEN. Read Data I out~ 








:: SIGN i. Imm. Value r EXTEND 
Figure 5.2. Four-Stage Pipelined 
ESPR.Vl Architecture 





B ~ MUX b Flush data I BR. 
M 
E all DET. 
Branch-: u ~ 
w 




alu ~ ccnn 
r. i::T/ PI IT . 
s~ ~ . 
a uou~ M Tan ~ PRm--+ U . ESS 







M Value --. 
I PRin--+ ~ GF PF u EX aluoul--+ X I S6 ESS (384) L-- CCR EX OUT,r ESSOUT I _,_ 
s~ WB It M 
• ALU ctrl~ SHctri1l,shamt 
(143) u [iPRin 
1gnext1a. M It X 
u ~ s .___ PRin:;+ A H ALU/SH alu~1 X L I OUT ITRin U i--. F ALU/SHIFT 
VRi11 _ T OUT • WB 
GPRin. M ~ E write ., a Ull 
~uo~ U R data 
TR~~ X • VRi 0, S,Z 
ESSOUT •"" .S9 Pfmfwd 
~ FLR l PKT PKT 
GPR,n M • ctrlin + ctrlout T~~ U ♦ 
a ~~m X ~ PACKET RAM (PR) I-+ '--
Oncode I . -Pl<T ()ff<,., 
Output Packet 





instruction is encountered. The Hazard detection unit generates the control signals for the 
PC and the IF/ID pipeline register. The instruction is then supplied from the IF/ID 
pipeline register to the Instruction Decode (ID) stage. It supplies a 16-bit offset that 
calculates the offset for the packet register in the Execute stage and a 16-bit immediate 
field to the Sign Extend block that sign extends the 16-bit value to a 64-bit value. The 
sign bit for the Sign Extend unit comes from the micro instruction. It also supplies the 
register numbers to read Tag Registers (TR), Value Registers (YR), or General Purpose 
Registers (RS 1, RS2 and RD). The Register Write signal and Write data value for the 
register files come from the WB stage. All these values are stored in the ID/EX pipeline 
register along with the output values Read datal, Read data 2 from the general purpose 
register file, tag readout from tag register file, value readout from the Value register file 
and the sign extended output value for computations in the EX stage. The ID stage also 
contains the Micro Controller, which decodes the opcode in the instruction and generates 
control signals for the Execute (EX) stage and Write Back (WB) stage. These control 
signals are forwarded to the ID/EX and EX/WB pipeline registers where they are utilized. 
The Micro Controller also generates values to be stored in the Flag Register (FLR) and 
Output Code Register (OCR) in the EX stage. 
Execution then takes place in the Execute (EX) stage either in the ESS, 
ALU/SHIFTER, in the Packet Register or in the Branch detection unit. The values stored 
in the ID/EX pipeline register from the ID stage are given to the corresponding execution 
modules. The multiplexers at the input of ESS choose tag and value for ESS 
computations. The tag and value to the multiplexers come either from registers in the ID 
stage, from the packet register or from the ALU output. The Condition Code Register 
(CCR) holds Get Failed (GF) and Put Failed (PF) outputs from the ESS. The multiplexers 
at the input of the ALU choose values for ALU computations either from registers in the 
ID stage, from the packet register, from the ALU output, from the ESS output, or from 
the sign extend block. The Shifter gets values mostly from the general-purpose registers 
through the ALU pass through mode. The multiplexer at the input of PR chooses values 
for the STPR micro instruction either from registers in the ID stage, from the ALU output 
or from the ESS output. The FLR gets its value from the ID stage micro controller and 
connects it to the flag field of PR. The OCR gives the output code from the ID stage 
53 
micro controller to an output port. The jump and conditional branch type micro 
instructions are executed using the Branch detection unit. Two register values are given 
as input to the branch detection unit to check for the equality or inequality depending on 
the type of micro instructions. The multiplexers in front of the Branch detection unit 
choose value from general-purpose registers or from the ALU output. The micro 
controller generated control signals for the execution modules are given to the respective 
modules and the control signals for the WB stage are forwarded to the EX/WB pipeline 
register. The resultant values of execution are also stored in the EX/WB pipeline register. 
After the execution of instructions, results are written back to registers and this 
takes place in the Write Back (WB) stage. The WB stage result is written back to 
registers using a multiplexer. The control signal for this multiplexer comes from the WB 
stage control signal and it chooses between ALU output and ESS output to write back to 
registers in the ID stage. 
Potential hazards such as Data hazards and Branch hazards may arise m a 
pipelined architecture. The hazard detection unit detects any data hazard and stalls the 
pipeline when necessary. This hazard detection unit controls the writing of the PC and 
IF/ID registers plus the multiplexers that choose between the real control values and all 
Os. A multiplexer in the ID stage and EX stage is used to reset the control signals to 'O' 
for stalls. 
A data hazard is detected when the write register of the previous instruction is the 
same as the read register of the next instruction. So in this case, the next instruction reads 
the wrong value of the read register because the write register would not contain the 
correct value in this stage. The forwarding unit in the EX stage helps in eliminating data 
hazards by forwarding the result from the ALU output back as the register value for the 
next instruction instead of waiting to get the result from the WB stage. This forwarding 
unit generates control signals for the multiplexers in front of the ALU, ESS, PR and 
Branch detection unit to choose the value from the ALU output directly instead of from 
the register input. The WB control signals, opcode from the ID stage and register 
numbers are given to the forwarding unit that helps to forward the result for correct 
execution. 
54 
One solution to resolve a branch hazard is to stall the pipeline until the branch is 
complete. But on the other hand a common improvement over stalling upon fetching a 
branch is to assume the branch will not be taken and so will continue to execute down the 
sequential instruction stream. If the branch is taken, the instructions that are being fetched 
and decoded must be discarded. Execution continues at the branch target. To discard the 
instructions the controller flushes the instructions in the IF, ID and EX stages of the 
pipeline. After the execution of a branch condition in the Branch detection unit and if the 
branch has to be taken, multiplexer in front of the PC helps in choosing the new branch 
target address. To flush instructions in the IF stage, a control line called IF Flush is 
added, which resets the instruction field of the IF/ID pipeline register to 'O' to flush the 
fetched instruction. A control signal called IDFlush is used to flush instructions in the ID 
stage. The EXFlush control signal is used to flush the already executed instructions in the 
EX stage. The micro controller determines whether to send a flush signal depending on 
the instruction opcode and the value of the branch condition being tested. 
The pipelined architecture system flow chart in Appendix C shows the stage-by-
stage operation of all the micro instructions in a pipelined architecture. Most of the 
instructions take a single execution phase. The ESS (GET / PUT) instructions may take 
more than one clock cycle (at most 5 clock cycles) to execute. So the ESS has to operate 
at five times the frequency of the overall ESPR. 
5.3 Micro Controller 
The Micro Controller is located in the ID stage of the pipeline and will be 
required to generate 25 control signals to implement all defined micro instructions. The 
final Micro Controller may be predominantly pipelined combinational logic whose input 
is the Opcode (6 bits) and whose outputs are the control signals identified within this 
section. It generates control signals for the ID stage, EX stage and WB stage. The ID 
stage control signals are REGREAD, JMPINST and RETINST. REGREAD is supplied 
to the General Purpose, Tag and Value Register files. JMPINST and RETINST are used 
to determine the flushing of pipeline stage registers. 
The EX stage control signals are given to the Packet processing unit for the PR, 
ESS controller in ESS, ALU, Shifter and Branch detection unit. The control signals for 
55 
the Packet processing unit are LFPRINST, STPRINST, ININST, OUTINST, LDPKREG, 
LDOCR and LDFLR. LFPRINST, STPRINST and ININST correspond to the micro 
instructions LFPR, STPR and TN. The control signal OUTINST corresponds to the OUT 
micro instruction. LDPKEG, LDOCR and LDFLR are given to the Packet RAM (PR), 
Output Code Register (OCR) and Flag Register (FLR) respectively. The control signals 
for the ESS unit are GETINST, PUTINST and LDCCR. GETINST and PUTINST signals 
are given to the ESS controller to perform GET and PUT operations. LDCCR is the 
control signal for the Condition Code Register (CCR). The Shifter control signals are SO, 
S 1 and S2 and ALU control signals are S3, S4, S5 and Ci. The function table for the 
Shifter, ALU and Branch detection unit are shown in Tables 5.1, Table 5.2 and Table 5.3 
respective! y. 
Table 5.1. Function Table for Shifter 
CTRL SJGS (SO, Sl, S2) OPERATION 
000 PASS THROUGH 
001 SHIFT LEFT (SHL) 
010 SHIFT RIGHT (SHR) 
Oil ROTATE LEFT (ROL) 
100 ROT A TE RIGHT (ROR) 
Table 5.2. Function Table for ALU 
CTRL SIGS (S3, S4, S5, Ci) OPERATION 
0000 PASS THROUGH for a 
0001 PASS THROUGH for b 
0010 ONES COMPLEMENT for a 










l IO l NEGATIVE of a 
1110 NEGATIVE ofb 
56 
Table 5.3. Function Table for Branch detection unit 
CTRL SIGS (BRANCH TYPE) OPERATION 
000 BLT 






11 1 BPF 
The WB stage control signals generated by the micro controller are S6 and REGWRITE. 
The S6 control signal is given to the multiplexer in the WB stage to choose between the 
ALU and ESS outputs. The REGWRITE control signal is connected back to the General 
Purpose Register file in the ID stage. Two additional control signals for the WB stage, 
TAG REGISTER WRITE and VALUE REGISTER WRITE, comes from the micro 
instruction and are given to the Tag Register file and Value Register file respectively. 
The active control signals involved in the proper execution of each micro instruction are 
shown below in Table 5.4. For each micro instruction the remaining control signals apart 
from the active ones are interpreted as inactive during its execution. Each control signal 
is a control point and identified within the final ESPR.Vl architecture shown in Figure 
5.2. 
5.4 ESPR.Vl Requirements Evaluation 
In designing a processor to handle special functions there comes the question of 
choosing between the options available for design: either designing a special 
functionality Coprocessor to perform special functions which can be connected to a 
general purpose processor to handle other general purpose functions or designing a stand 
alone special purpose processor. This section discusses requirements evaluation of the 
Ephemeral State Processor and the next section discusses the options available in 
designing ESPR. The components and functional unit requirements of the ESPR system 
defined in Chapter 3 - Section 3.1 can be evaluated as follows to give support to the 
architectural design of ESPR. 
57 
Table 5.4. Control Signals for Micro Instructions 
MICRO INSTRUCTIONS CONTROL SIGNALS 
NOP No Active Signals 
IN ININST, LDPKREG 
OUT OUTINST 
FWD LDOCR, OCR= 0000 000 I 
ABORT! LDOCR, LDFLR, OCR- 000000 I 0, FLR-00000000 
DROP LDOCR, OCR= 0000 0011 
CLR REGREAD, REGWRITE, S6 
MOVE REGREAD, REGWRITE, S6 
MOVI REGWRITE, S6 
ADD S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
SUB S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
INCR S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
DECR S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
OR S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
AND S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
EXOR S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
COMP S3, S4, S5, Ci, REGREAD, REGWRITE, S6 
SHL SO, SI , S2, REGREAD, REGWRITE, S6 
SHR SO, SI, S2, REGREAD, REGWRITE, S6 
ROL SO, SI, S2, REGREAD, REGWRITE, S6 
ROR SO, SI, S2, REGREAD, REGWRITE, S6 
LFPR LFPRINST, REGREAD, REGWRITE, S6 
STPR STPRINST,REGREAD 
BRNE BRANCH TYPE, REGREAD 
BREQ BRANCH TYPE, REGREAD 
BRGE BRANCH TYPE, REGREAD 
BNEZ BRANCH TYPE, REGREAD 
BEQ:Z BRANCH TYPE, REGREAD 
JMP JMPINST 
RET RETINST 
GET GETINST, REGREAD, LDCCR 
PUT PUTTNST, REGREAD, LDCCR 
BGF BRANCH TYPE, REGREAD 
BPF BRANCH TYPE, REGREAD 
ABORT2 LDOCR, LDFLR, OCR=O0O0O I00, FLR=00O00O0I 
BLT BRANCH TYPE, REGREAD 
SETLOC LDFLR 
The conceptual description of Ephemeral State Processing (ESP) requires no data 
memory in the design except the Ephemeral State Store (ESS), and so the ESPR is 
designed as a basic Register/Register Architecture. Based on the ESP requirements 
described in Chapter 2, the main component of design in designing the ESPR is the ESS, 
and a scalable associative memory is designed to meet this requirement. To differentiate 
special operations carried out on tags and values, a separate tag register file and value 
58 
register file are included along with the general-purpose register file for normal 
operations. With the current set of macro level instructions, the number of registers is 
designed to be 32 for validation purposes. This set may grow over time depending upon 
the ESP design requirements. The existing unused bits in the micro instruction can be 
used to add register numbers. The tags and values in ESP are 64-bit wide and so the 
registers are designed to hold 64-bit wide values. All the basic operations in the ESPR are 
carried out on 64-bit wide operands and so the ALU, Shifter and rest of the components 
are designed to handle operands in this manner. 
A RAM (Packet RAM - PR) is designed for storing and processing incoming 
packets. PR is designed to have 128 locations of each 32 bits wide. This is designed 
based on maximum packet size and incoming packet block width. Also offset handling is 
easy in RAM because the operands in a packet are either 8 or 64 bits wide. Thus a RAM 
is used to store packets to provide efficient PLD resource utilization. As the status of the 
current packet, after processing, has to be indicated to the next node, an Output Code 
Register is used to store the status of the current ESP packet. The Flag Register is used to 
operate on flag fields individually. The Micro Opcode Register is used to store the micro 
opcode carried in a packet which may be further used for 'COMPARE', 'COLLECT' or 
'RCOLLECT' macro instructions. The Status Register is used to hold the status after 
ALU operation and the Condition Code Register is used to hold the status after ESS 
operation. 
A separate block is needed for calculation of Cyclic Redundancy Check (CRC) 
which may be helpful to detect whether an error occurred when the packet is received 
before processing (Refer Appendix - B). Macro and Micro Controllers are necessary for 
their respective control operations. As network macro instructions grow over time, there 
might be a future necessity to have further more additional micro instructions and 
additional registers. To reflect this and also that basic operations are over 64-bit values, 
the micro instruction width is chosen to be 64-bits wide. Additional components such as 
the program counter, decoders and multiplexers meet basic requirements for a processor 
design. 
59 
5.5 Special-Purpose Versus General-Purpose approach to ESP 
The two options avai lable in designing ESPR are, 
• Designing a special-purpose processor for ESP as the one described in this 
Chapter 
• Designing a special functionality coprocessor that can be linked to a 
general purpose processor 
A special-purpose processor can be designed as the pipelined architecture as 
described in Section 5.2 in this Chapter. The second alternative is to first design a 
coprocessor that handles functions corresponding mainly to ESS and the Packet RAM 
with its control unit. And then have a general-purpose processor connected to this 
coprocessor module, both combined together to perform ESP. 
Referring to the ESPR system requirements, most of the main components are 
special functional units, which involves almost all of ESS, Packet RAM, Tag and Value 
register file, Macro controller, Output Code Register, Flag Register, Micro Opcode 
Register, Condition Code register and CRC block. And also, most of the micro 
instructions in the micro instruction sequence representation of macro instructions utilize 
most of the special functional units described above, mainly ESS and the Packet Register. 
The micro instruction sequences for macro instructions involving general-purpose 
components are few. The instruction set is defined to handle a lot of general-purpose type 
instructions only in considering future necessity. Considering the above ESPR 
requirements and also to eliminate VO overhead between the general-purpose processor 
and coprocessor, an initial step was taken in designing a special-purpose pipelined 
processor for ESP over a general-purpose approach. A comparison of cost and 
performance of the special-purpose vs general-purpose ESPR has not been conducted. 
The special purpose ESPR design is an initial step towards implementing ESP in routers. 
5.6 Analytical Performance Model for ESPR 
The performance of any processor can be measured by the time it takes to 
complete a specific task, which is commonly described as CPU execution time. CPU 
execution time depends on how fast the hardware can complete basic functions, which in 
tum can be a function of the clock frequency at which the processor performs its 
60 
operations m addition to other factors. A simple formula that defines the basic 
performance measure - CPU execution time [27], for a processor that performs its 
operations sequentially can be given as, 
CPU execution time for a program = 
(Instruction Count for a program) * (avg. clock cycles per instruction) * 
(clock cycle time) ..... ................. . ... ... .... .............. ...... .. .............. .. ... (I) 
A program is comprised of a number of instructions and in a sequential processor, 
each instruction takes a different number of clock cycles (more than one) to complete its 
required function, so the term average clock cycles per instruction is used, and clock 
cycle time is the basic clock cycle period for the processor. The above equation makes it 
clear that, the performance can be improved by reducing either the clock cycle period or 
the number of clock cycles required for a program. As per this, performance 
improvement can be obtained by pipelined implementation of the processor - as was 
done for the ESPR Thus pipelining reduces the average execution time per instruction; 
however there is degradation in the expected performance of pipelined processors due to 
pipeline stalls. And if the stages of the pipeline are perfectly balanced, then the time per 
instruction in a pipelined machine [27] is equal to, 
Time per instruction on unpipelined machine I Number of pipe 
stages ........ . .. . ................................. ......... ........ . ........... . ...... ...... (2) 
Under these conditions, the speed up from pipelining equals the number of pipe stages. 
The ideal CPI ( clock cycles per instruction) for a pipelined machine [27] can be given as, 
Ideal CPI= 
Number of clock cycles per instruction on an unpipelined machine I Number of 
pipe stages ... ..................... ... ............. • . ..... • • • .... . • • • •. . . . . . . .. . . . . . . . . . . . . . (3) 
61 
The ideal CPI is almost always 1. The pipelined CPI [27] is the sum of the base CPI and a 
contribution from stalls. 
Pipeline CPI = Ideal CPI + Pipeline stall cycles per instruction ........... . ... (4) 
By reducing the terms on the right hand side, the overall Pipeline CPI can be reduced and 
thus increasing the instruction throughput per clock cycle. The CPU execution time of a 
program for a pipelined processor [27] can now be given as, 
(CPU execution time) pipelined = 
(Number of clock cycles for instruction count + stall cycles) * (Ideal CPI) * 
(pipelined clock cy cle time) ... .. . ....................... . .. . .... ........... ............... (5) 
Both versions of the Ephemeral State Processor (ESPR - ESPR.V 1 and the to be 
presented ESPR.V2) are pipelined processors and their performance model can be 
derived from the basic pipelined performance equation as described in (5). Hereafter we 
refer to the performance model in terms of the ESPR architecture which refers to both 
versions. The ideal CPI for ESPR can be given as, 
CPhsPR = No. of clock cycles for an instruction in unpipelined ESPR I No. ofpipe 
stages ........ . .............. . .. . ...................... ... .......... . ........................ .. . ... (6) 
All instructions except the IN and OUT instruction can be included in the above 
CPI equation. The IN instruction gets the input packet into ESPR in 32-bit blocks and it 
takes more than one pipelined ESPR clock cycle to complete it depending on the input 
packet size. The number of clock cycles for completion of the IN instruction can be 
determined from studying the macro system flow chart in Appendix B and it includes the 
basic count of clock cycles for the IN instruction and additional cycles to check whether 
the EOP is reached to stop getting input. Similarly, the OUT instruction also takes more 
than one pipelined ESPR clock cycle to output the packet. Depending on the packet size, 
IN and OUT takes an equal number of clock cycles to complete their respective 
62 
operation. The number of pipe stages does not have effect in determining the CPI for IN 
and OUT. CPI for IN and OUT are given in equations (7) and (8) respectively. 
CPIIN = No. of clock cycles for getting input packet using IN instruction ..... . ... (7) 
CPI our= No. of clock cycles for OUT instruction ....... .. ........................ .... .. (8) 
The performance of ESPR is determined based on the time it takes to complete 
the processing of a particular Ephemeral State Processing (ESP) packet after the ESPR is 
switched on. The CPU execution time for the packet is comprised of different execution 
times, which are described below, and it depends on type of input packet and type of 
resultant packet and also depends on whether the packet and Ephemeral State Store (ESS) 
checking failed or succeeded. The complete performance equation for ESPR can be 
derived from the following equations. The base CPU execution time is given as, 
CPUsASE = 
( (avg. micro instruction count for a macro instruction * CPhsPR) + total number 
of stall cycles in a macro instruction) * 
(CLKEsPR) ...... .. ........... .. ....... .......................... ... ................... .... . ..... . .. . ... (9) 
Any ESP packet carries a macro opcode for performing a particular macro 
instruction and so the performance of ESPR is measured in terms of per packet 
processing and is determined by the execution time of a particular macro instruction. A 
macro instruction consists of a sequence of micro instructions in instruction memory, and 
the number of micro instructions executed for a particular macro instruction depends on 
the type of input packet and the availability of the required (tag, value) pair in ESS, as 
can be seen from the macro level system flow chart in Appendix B. So the term avg. 
micro instruction count is used in the above CPUsASE equation and it excludes any IN 
and OUT micro instruction. Stalls in ESPR arise due to branch or jump instructions. 
CLKEsPR is the basic clock cycle period for ESPR. Similarly the CPU execution times for 
the IN and OUT micro instructions are given in equations (10) and (11) as follows. 
63 
CPUIN = CPIIN * CLKESPR••· ................................................ .. ................ (10) 
CPU our= CPI our* CLKEsPR•-- ................................... . ........................ (11) 
The instruction count for the IN or OUT micro instruction is 1, and there are no 
stall cycles during their execution. After the packet is received in ESPR, it is checked for 
any errors and conditions and then the ESS is checked for its availability before the 
packet gets processed. So the CPU execution time spent in checking the packet and ESS, 
is given by, 
CPUcHECK = (No. of clock cycles for packet and ESS checking) * (CLK£spR) ... (12) 
CPUFA = {No. of clock cycles for FWD or ABORT] or ABORT2) * 
(CLKEsPR) .............. . ................................... .... .................... ........ ..... ..... (13) 
CPUFA is the CPU execution time for either the FWD or ABORTl or ABORT2 
micro instruction. In case of failure of any checking as described above, eq. ( 13) helps in 
determining the total execution time. The CPU execution time of the resultant packet, 
depending on the FWD/DROP of packets or failure of checking, can be obtained by the 
combination of any of the above five different CPU execution time equations. CPU 
execution time for forwarded and dropped packets is given as, 
CPUFwD-PACKET = CPUsASE + CPUIN + CPUour + CPUcHECK ......... ...... ... ... (14) 
CPUoROP-PACKET = CPUnAsE + CPUIN + CPUcHECK ................. . ............ ...... (15) 
If any packet or ESS checking fails, the CPU execution time in that case is given as, 
CPUcHK-FAtLED = CPUIN + CPUour + CPUFA + ((1, 2 or 3 clock cycles 
(depending on which checking fails))* CLK£spR) ... .................. ... ... .. .... .. . ... (16) 
64 
Thus, the above performance model is designed to measure the time taken by 
ESPR to complete processing of an incoming packet, depending on the type of packet 
and the processed results of packet. This is a theoretical description of the performance of 
ESPR. Real performance results can be obtained from the post implementation simulation 
results to be described and seen in later chapters. 
65 
Chapter Six 
Post-Synthesis and Post-Implementation Simulation Validation of 
ESPR.Vl Architecture 
The hardware design of any system starts with the design specifications, design 
architecture and design description using Hardware Description Languages (HDL). The 
next step in the design cycle is simulation and design verification using CAD tools to 
synthesize and implement the hardware system to a desired programmable logic 
technology. Field Programmable Gate Array (FPGA) technology using reconfigurable 
logic is the widely used programmable logic as it offers many advantages of cost 
effectiveness, flexibility, ability to reconfigure easily, large number of gate counts in a 
single chip and ability for rapid prototyping and design iteration. This Chapter discusses 
the Post-Synthesis and Post-Implementation validation testing of the ESPR.Vl 
architecture synthesized to a Virtex FPGA chip. 
6.1 Introduction 
Post synthesis simulation and Post Implementation simulation are two major 
essential phases in the design process of a hardware system in terms of organization, 
architecture and design validation. Post synthesis simulation can functionally validate the 
design architecture for its implementation to a specific FPGA chip. This simulation offers 
output results without considering the specific gate and other logic resource delays while 
configuring the chip. The gate and all other logic resource delays in the chip are included 
by the CAD tool while performing Post Implementation Simulation. 
Post Synthesis and Implementation simulation and design validation of the final 
ESPR architecture was done using Xilinx Foundation series 3.li software. It provides 
logic simulators for post synthesis and post implementation testing. One can monitor all 
the inputs, outputs and intermediate signals using this logic simulator. Because of its 
inability to accept test vectors from a file, test vectors for validation are input manually 
and exhaustive testing was not performed. But in all cases the simulation results were 
compared with valid outputs for validation. 
66 
The CAD tool provided by Xilinx for Post Synthesis Simulation validation used 
the environment of a PC (Personal Computer) system - Pentium III 550 MHz Processor, 
with Windows 98 platform and 384 MB (Megabytes) of RAM memory. The design 
capture of ESPR.V I was synthesized into a Xilinx Virtex 800 FPGA chip and 
implemented on a Virtex2 - 4000 FPGA chip. Post Implementation simulation 
verification with desired timing constraints on the design is performed on a PC (Personal 
Computer) system - Pentium III 550 MHz Processor, with Windows 2000 platform and 
640 MB (Megabytes) of RAM memory. The logic resources utilized in the Xilinx Virtex2 
- 4000 FPGA chip to implement the described ESPR.Vl architecture is given in the 
following Table 6.1. 
Table 6.1 Logic Resources Utilization for ESPR.Vl Architecture 
Resources Utilization 
4 Input LUTs 13,840 
Flip flops 7, 388 
Block RAMs 16 
Equivalent System Gates 1,386,196 
6.2 Post-Synthesis Simulation Validation of ESPR.Vl Architecture 
Post synthesis simulation validation provides for the functional validation of the 
ESPR.Vl architecture on a FPGA chip. All the components of ESPR.Vl are first 
developed, synthesized and verified separately for functional correctness. Then the whole 
system of ESPR.V l is connected using individual modules, synthesized and tested for its 
functional validation. Most of the micro instructions are tested on a clock cycle by clock 
cycle basis for proper generation of internal and external signals, and outputs. All the 
individual components and the whole ESPR.Vl system are not tested exhaustively but 
are tested for a large set of varying input conditions. 
The pipelined ESPR.VI system is synthesized to run at a clock cycle (clk_pipe) of 
IO nanoseconds (frequency of I 00 MHz). The simulation validation was first done for 
individual micro instructions and then for small programs using the micro instructions. 
For this simulation the instruction memory is first written with specific micro instructions 
67 
to be tested and then simulated for proper execution in corresponding clock cycles. 
Synthesis was not optimized to run at a maximum clock rate since achieving this is a long 
drawn out process. Our top priority was functional validation of the architecture. Also, 
the Xilinx Virtex FPGA chip used for implementation is an older FPGA chip with long 
propagation delays through its logic resources. 
6.2.1. Simulation Validation of Single Micro Instructions 
Most of the micro instructions were tested and verified individually and then 
programs with sequences of micro instructions were tested for correct execution. The first 
micro instruction to be verified was the 'ADD' (ALU/SHIFTER type instructions) 
instruction and the next instruction to be verified was the 'GET' (GET/PUT type 
instruction for ESS) instruction. Both of these micro instructions exercise most functional 
units in the processor. 
6.2.1.1.Validation of' ADD' Micro Instruction 
To verify the ADD micro instruction, the bit pattern for this instruction was 
written to the instruction memory. This micro instruction utilizes PC, micro instruction 
memory, General Purpose Register File, Controller, ALU and Shifter functional units. 
The ADD instruction is interpreted as, 
ADD RD, RSl, RS2 
The ADD instruction verified here was, 
ADD RS, R4, R3 
- 001001 00101 00100 00011 00000 0 00000 0 000000 100 0000000000000000 000000 
- 24A4180001000000 (Hex. Equivalent for binary representation) 
The source registers R3 and R4 were initially loaded with values 6 and A respectively 
using the MOVI micro instruction. The verification of this instruction is shown in Figure 
6.1 . The verification starts with the IF stage and the instruction (instchk) is fetched during 
the first clock cycle. 
68 
i k:lk_pi pe . . .. IBO I I I L__ 
B i ns tchk.63. (h1 0 24>.4180001000000 0000000000000000 
Bbps igex5 . (he 0 09 l00 
BRSlos4. (hex) 0 04 !OO 
B RS2os4 (hex) 0 03 100 
BIRDosl4 . (hex) 0 05 100 
Bk;PR1rs63 . (he 0 000000000000000A !0000000000000000 
BGPR2rs63 . (he 0 0000000000000006 I0000000000000000 
Ba luoutswb63 . 0 !0000000000000010 
Figure 6.1. Validation of' ADD' Micro Instruction 
During the second clock cycle of ID stage, the opcode for ADD is decoded as '9' 
(opsigex) and the source and destination register are decoded as '4' (RSl os), '3' (RS2os) 
and ' 5' (Rdosl) respectively. In the same ID stage the source registers R4 (GPRlrs) and 
R3 (GPR2rs) are read as 'A' and '6' respectively. In the next clock cycle of EX stage, the 
ALU output (aluoutswb) for the addition of A + 6 = 16, is obtained in hex as 
"000000000000001 O". This validates the ADD micro instruction. 
6.2.1.2.Validation of 'GET' Micro Instruction 
To validate the GET micro instruction, the bit pattern was written m the 
instruction memory. This micro instruction utilizes PC, micro instruction memory, Tag 
Register File, Controller, and ESS functional units. The GET instruction is interpreted as, 
GET TR, VR 
The GET instruction validated here was, 
GETTRl, VRl 
- 011110 00000 00000 00000 00001 0 000011000000000 0000000000000000 000000 
- 7800004180000000 (Hex Eq.) 
The source register TRl was initially loaded with a Tag of 5 using the MOVI 
micro instruction. A Value of 2 was already associated with this Tag and stored in ESS 
using the PUT micro instruction. The GET instruction is executed after sometime to 
validate whether it checks the ESS for this Tag 5 and the lifetime of its Value. The 
validation of this instruction is shown in Figure 6.2. 
69 
7800004180000000 0000000000000000 
0 1B 00 
o F ==============~ ============~o=o ============== 
0 00 
utsvb63 . 0 ~ ================= ===============~ 0=00=00=00=00=00=00=00=2 ========= 
Figure 6.2. Validation of 'GET' Micro Instruction 
The first clock cycle of the IF stage fetches the instruction (instchk) for GET. The ID 
stage decodes the opcode (opsigex) for GET and the source Tag Register (TRos) and the 
destination Value Register (VRos). ESS in the EX stage runs with the clock (clk_ess) of 2 
nanoseconds and at a frequency five times the frequency of the processor. The clock 
(clk_cnt) for the counter in ESS which counts for the lifetime of the value runs at a very 
low frequency and has a period of 80 nanoseconds. The value is read out of the ESS from 
the EX stage as "0000000000000002". This proves a successful validation of the GET 
micro instruction. 
6.2.2. Micro Instruction Program Sequence Validation of ESPR.Vl Architecture 
After functional validation of the ESPR.V l architecture executing single micro 
instructions, the pipelined ESPR.V 1 architecture was validated by loading the micro 
instruction memory with sequences of micro instructions that form small programs. Three 
example programs are shown in this section with its simulation validation. These 
programs are written in a way to show each of the features and components of the 
pipelined processor at work. Most of all these programs have data hazards, which are 
automatically reduced/eliminated by the forwarding unit. One program with a Branch 
type instruction uses the hazard detection unit to control branch hazards as explained in 
the architectural description of ESPR.Vl. This section briefly explains pipelined 
execution of the micro instruction sequences and shows the simulation validation results. 
The first program is explained in a detailed manner and the remaining two programs are 
briefly explained. 
70 
6.2.2.1.Micro Instruction Program Sequence for ALU/SHIFTER Validation 
The following micro instrnction program was loaded in sequence in the micro 
instruction memory. Figure 6.3 shows the micro instruction program sequence, which 
provides an example for testing the ALU/Shifter type instruction execution. 
Data Hazard - R3 
0. MOV R3, Rl - 1C6 ""'-- Data Hazard - R4 
1. ADD R4, lU..Jr3--MlITT80001000000 
2. ADD RS, R4, R3 - 24A4180001000000 
3. ADD R6, R4, R4 - 24C4200001000000 
4. SHL R7, R4, 2 - 44E4000001000001 
5. MOV RS, R3 -1D03000001000000 
Figure 6.3. Program for Validating ALU/Shifter 
This program also tests the forwarding unit to eliminate the data hazards that arise in the 
ADD instructions. In the first two ADD instructions there is a danger of the old data 
being read instead of the new value for R3 from the MOY instruction and the new value 
for R4 from the first ADD instruction. This is a called a Data Hazard. This is because the 
new values are not written until the WB stage but the next sequencing instruction in the 
ID stage wants to read the new value before it is written. The forwarding unit allows for 
the proper execution of the instruction without any hazard by forwarding the required 
data to the computational unit inputs from the ALU output. Figures 6.4a, 6.4b and 6.4c 
show the post synthesis simulation validation results for the above program. 
All the instructions take a single clock cycle to execute. Figure 6.4a shows the IF 
stage for the first three instructions, ID stage for the first two instructions and EX stage 
for the first instruction. Figure 6.4b shows the fetching of the remaining three micro 
instructions, WB stage for first three instructions, EX stage for second, third and fourth 
instructions and ID stage for third, fourth and fifth instruction. Figure 6.4c shows the WB 
stage for the fourth, fifth and sixth instruction and the remaining stages for the remaining 
instructions. All instructions were executed correctly and their execution also validated 
the functional operation of the forwarding units. 
71 
i clk_pii;e. 30 I I I I I I 
iclr ...... 0 -
i1Ye_i1 .... 0 .. 
B inst_in63 n 
B !O)UtlS. ( 0 0000 10001 10002 10003 
B instchkou 0 1C61000001000000 12483180001000000 !24!4180001000000 12 4C42000010 
B Readdatal 0 100000000000~001 10000000000000000 
B Readdata2 0 I 
B 3.luinlout 0 I 10000000000000001 
B aluin2out 0 I !00000000000 
B IEdataout 0 I !00000000000 
I 
Rl being read in ID stage 
Figure 6.4a. Simulation Output for ALU/Shifter Validation 
i clk_pii;e. BO I r I I I I 
i 'lr .. .. .. 0 
i ve_i1.. .. 0 
inst_in63 Ii 0 
i0lllt15 ( 0 0003 10004 !000S !0006 
instchkm 0 24C4200001000000 l44&4000001000002 l1D03000001000000 loooooooooooc 
~eaddatal 0 
~eaddata2 0 
B aluinlout 0 10000000000000002 
aluin2out 0 0000000000000001 10000000000000002 100000000000c 
B lm:!a taou t 0 0000000000000001 10000000000000002 .. 10000000000000003 .... !00000000000C 
Out ut of fir t MOV 
Output of first ADD \ Output of the second ADD 
• • • th p 
instruction in 41h clock cycle 
mstruchon m 5 clock cycle instruction in 61h clock cycle 
Figure 6.4b. Simulation Output for ALU/Shifter Validation (continued) 
72 
i~lk_pii:e. BO l I I I I I 
il--lr_ ..... 0 
i 1Ve_i1 .... 0 
8 inst_in63 i 0 
B~utlS .( 0 0006 1000, !0008 [0009 
8 instchkou 0 0000000000000000 
8 ~eaddatal 0 !0000000000000001 10000000000000000 
8Readdata2 0 
Bialuinlout 0 10000000000000001 !0000000000000000 
Bialuin2out 0 0000000000000000 
B~S:lataout 0 0000000000OD0004 I00000000000ll0008 10000000000000001 .. 1000000000000 
I I \ Output of MOV 
Output of third ADD Output of SHL 
Figure 6.4c. Simulation Output for ALU/Shifter Validation (continued) 
6.2.2.2.Micro Instruction Program Sequence for ESS Validation 
Figure 6.5 shows the micro instruction program sequence, which provides an 
example for validating the ESS. First a series of ALU operations were performed to 
obtain a value for the Tag and Value register. Then the (tag, value) pair was stored in 
ESS using the PUT micro instruction. A GET was performed to obtain the value bound to 
that respective tag. This program also tests the hazard elimination circuitry of the 
ESPR.VI. 
0. MOV R3, Rl - 1C6UMr'<7U\H 
1. ADD R4, 1!.1.Jti~~rrs{jiji 
2. ADD RS, R4, R - .l.'U-"fl'·u,,u-=:00000 
3. MOY TRI, RS -1C050060010000 
4. MOY VRI, RS - lC0S 00000 
5. PUT TRl, YR C00004100000000 
6. BPF 44h - 8400000000001100 
7. SHR R7, R4, 3 - 48E4000001000003 
8. GET TRI, YRI - 7800004180000000 
9. BGF 44h - 8000000000001100 
Data Hazard - RS 
Data Hazard - YRl 
Figure 6.5. Program for Validating ESS 
Figures 6.6a, 6.6b, 6.6c and 6.6d show the post synthesis simulation validation results for 
the above program. Figure 6.6a shows a series of ALU operations similar to the previous 
program. Figure 6.6b shows the operation and result of the ALU operations and fetching 
of the PUT instruction. It also shows a value of '3' being written to the Tag register. 
73 
After the Tag and Value registers are written with '3 ', the PUT instruction stores them in 
the ESS. Figure 6.6c shows the fetching of the BPF, SHR and GET instructions. Figure 
6.6d shows that the correct Value of ' 3' bound to Tag register TRl is obtained as ESS 
output from the GET micro instruction. The signal ' essouts' in Figure 6.6d shows the 
correct value of' 3 ' . 
i lr . 
1 e_ll .... 
0 0000 0001 0002 0003 
0 1C61000001000000 2483180001000000 24!4180001000000 1C0500600100C 
0 ~ ============== 0=00=00=00=00=00=00=0=1 ======00=00=00=00=00=00=00=00 === 
olt============= = ==::;;::::=========== 
01~===================00=00=00=00=00=00=00=01=====:::;:;:;:~~~ 
olt=============================:::'.:o:::::oo=oo:::oo:::oo:::o:::::ooc 
OH======================~~~ o,L_ ___________________________ __J,0::.::0::.::00::.::00::.::ooc:.:oo=ooc 
Figure 6.6a. Simulation Output for ESS Validation 
0 0003 0004 0005 0006 
~:~lC0~5~00~60~01~00~~00~ ::::::::::~1C~0~50~00;18~10~~00~0::::::::::~7~C0~00~04~10~0~00~00~ :::::::::~~84~00~00:00:00~001 
oo!~:::;;::::::~~~~~§::::::!~~~~~::::::::~ 11;::::: 0000000000000002 0000000000000003 
~1~000~~00~00~00~00~~00!1 ::::::::::;:~~~~~~~::::::::::~0~00~0~00~00~0~00~00~~;:::::::::::::: 
0 0000000000000001 0000000000000002 0000000000000003 
Output Value '3' given to Tag Register TRI 
Figure 6.6b. Simulation Output for ESS Validation (continued) 
74 
0 0006 0007 0008 0009 










0000000000000002 0000000000000003 0000000000000000 
0000000000000003 0000000000000000 
OH===============================:=oo=o=oo=o=oo=oo=o:::lo ~03==;;::================ 
0 ~------- ------------~__,..::,:00::,:::00::.:::0=.::00:,:::00::,:::0~00:.::;00::.:::0::...3 ---
ESS Output from 'GET' Micro Instruction 
Figure 6.6d. Simulation Output for ESS Validation ( continued) 
6.2.2.3.Micro Instruction Program Sequence for Branch/Forward Unit Validation 
Figure 6. 7 shows a micro instruction program sequence for branch control 
validation. 
75 
Branch Raz rd and Data Hazard - R4 
o. MOV R3, Rl - 1C610000~'17UIIJUU 
1. MOV R4, 1- 0001000000 
2. BEQ R3, R , 6h - 6003200000000180 
3. ADD RS, R3, R4 - 24A4180001000000 
4. SHL R7, R4, 5 - 44E4000001000003 
5. AND R6, RS, R7 - 38C5380001000000 
6. MOV RS, R6- 1D06000001000000 
7. MOVI R7, Fh - 20E00000~ Data Hazard - R3 
8. INCR R3-2~000 
9. ROR R9, R3, 2 - 5123000001000002 
Figure 6.7. Program for Validating Conditional Branch Control 
The first two MOY micro instructions were used to write registers to check for the branch 
equality (BEQ) condition. The pipelined ESPR.V l assumes the "branch not taken" 
condition to reduce branch hazards. The BEQ condition in the above program succeeds 
because the two registers R3 and R4 are written with the same value of ' l 'and so the 
program execution has to branch to the address (6h) specified by the branch address. 
Because of the branch not taken condition, the ADD and SHL instructions following 
BEQ will be fetched and decoded. When the BEQ instruction reaches the EX stage, the 
correct branch decision is taken by the branch detection unit and the instructions fetched, 
decoded or executed after BEQ were flushed and the program execution branches to the 
address specified by BEQ and the micro instructions starting from that address continues 
to execute in normal sequence. This can be validated from the following post synthesis 
simulation results shown in Figures 6.8a, 6.8b, 6.8c and 6.8d. 
i rlk_pipe BO w I I I J I I 
1~lr .. .... 0 
i11e_i1 . ... 0 
B inst_in63 ~ 0 
8p:out15 . ( 0 10000 10001 10002 10003 
B instchkou 0 11C61000001000000 UCBl00000l000000 16003200000000180 ~ 124!418000 
BReaddatal 0 10000000000000001 \ 
BReaddata2 0 \ 
Baluinlout 0 10000000000000001 \ 
Bialuin2out 0 \ 
B[rndataout 0 MJ00000000 
\ 
Fetching 'BEQ' Micro Instruction 
Figure 6.8a. Simulation Output for Conditional Branch Validation 
76 
1clk_piJll. IBO'" w I I I I I r--
i "lr . . . . . 0 
1'ie_i1 .. 0 
B inst_in63 ~ 0 
BJ])::out15 . ( 0 10003 10004 10006 ~ 10007 
Binstchkot 0 124!4180001!100000 14mooooo1ooooi: 11006000 01000000 \ 1201000001 
BReaddatal 0 I l0000000000000000 \ \ 
Bmeaddata2 0 I \ \ 
Bialuinlout 0 I \ 10000000{ D0000000 \ 
Bialuin2out 0 l 10000000000000001 \ 100000000 0000000 \ 
BIVBdataout 0 10000000000000001 \ 1000~ 00 0000000 \ 
I 
Continuous Fetching of 'ADD' Continuous Fetching of 'SHL' branch address 
l :;? PC bewme, 
assuming branch not taken F h. f . . assuming branch not taken 
. etc mg o mstructlon at 
Flushmg of branch address 6h 
previous instructions 
Figure 6.8b. Simulation Output for Conditional Branch Validation (continued) 
i "lk_piJll . 801- LJ I I 
i "lr .... 0 
i i €_ll . .. 0 
8 inst_in63 n 







0 )0007 ... 10008 







Continuous Fetching and execution of 
instructions in sequence starting from 
branch address 






Figure 6.8c. Simulation Output for Conditional Branch Validation (continued) 
i Jk_pipe. BO l I I I I I I 
i lr . .. ... 0 
i ve_i1 . . . . 0 
l inst_in63 @ 0 
locout15. ( 0 T000A !000B loooc IOOOD 
l instchkou O l0000000000000000 
l ~eaddatal 0 10000000000000000 
lReaddata2 0 
lialuinlout 0 10000000000000001 10000000000000002 10000000000000000 
lialuin2out 0 
l~&lataout 0 Toooooooll)000000F 100000~ 000000002 IBOOOOOOOOO ll6l0000 1000000000( 
I I \ 
Output for MOVI Output for INCR Output for ROR 
Figure 6.8d. Simulation Output for Conditional Branch Validation (continued) 
6.3 Post-Implementation Simulation Validation of ESPR.Vl Architecture 
Unlike Post synthesis validation of any HDL system design, Post implementation 
validation is done to validate the design from a functional and timing perspective -
meaning testing the functionality of the designed system on a chip for its basic operating 
frequency with its included gate and propagation delays. By means of this, it provides the 
important information of how fast the system can run and yet produce functionally 
correct results. Post implementation validation of individual ESPR.Vl components and 
the whole of ESPR.Vl are done and simulation results for two macro instructions are 
provided in this section. The post synthesis validation was done at a basic ESPR.Vl clock 
frequency of 100 MHz for functional validation. Keeping in mind the different types of 
delays associated with implementing a design on a FPGA chip, post implementation 
testing was done at an operating frequency, which produced favorable functional results. 
We did not have a goal in our post implementation simulation validation of the ESPR 
architecture to maximize system clock frequency. How to do that is known but time 
consuming. Our goal was simply to determine the frequency at which the ESPR would 
perform functionally correct. This frequency becomes the base frequency which can be 
improved upon via synthesis and VHDL coding optimizations in addition to deeper 
pipelining of the ESPR.Vl architecture. 
78 
Initially the ESPR.Vl operated at a frequency of 16.7MHz without any timing 
constraints. The Xilinx 4.2i CAD tool has a feature called timing analyzer, which gives 
information about the delay associated with various signals. Once after the signal with a 
maximum delay is found, constraints on various signals and clock signals related to the 
longest delay path signal can be imposed on the 'UCF' (User Constraints File) file 
associated with the HDL design project. This can be done using Constraints editor 
available in Xilinx 4.2i. After imposing constraints on some signals, an improvement in 
timing was obtained and the ESPR.Vl operated at a frequency of 20MHz. The post 
implementation simulation validation was done at the level of macro instructions, and for 
this entire simulation of ESPR the instruction memory is preloaded with the micro 
instruction sequences for all five macro instructions and then simulated for proper 
execution of the macro instructions depending on the input packet. The initialization of 
the instruction memory was done by writing a constraints file (NCF) in the desired format 
and saving this file under the specific HDL design project in Xilinx 4.2i. 
6.3.1. Post-Implementation Simulation Validation of 'COUNT' Macro Instruction 
All macro instructions were tested for functionality and timing correctness; the 
first one tested was the COUNT macro instruction. The following simulation results 
show the execution and completion of the COUNT macro instruction initiated by an ESP 
(COUNT) packet in the ESPR. V 1. 
Figure 6.9a shows the starting of post implementation simulation output of the 
COUNT macro instruction. After the ESPR.Vl is switched on, the IDV signal goes high 
and the preloaded instruction memory fetches the IN micro instruction for getting the 
input packet. This initial step is the same in all macro instructions. The ESPR.Vl clock 
(clk_pipe) frequency is 20 MHz and is the same for instruction memory (clk_im). The 
clock ( elk _p) for the packet processing modules is 2 times faster than the elk _pipe. Clock 
(clk_e) for ESS is five times faster than the pipeline clock. All this can be seen from 
Figure 6.9a. After the IN instruction is fetched, the ESPR.Vl starts getting the input 
packet in 32-bit blocks as shown in Figure 6.9b. The input packet for COUNT has the 
following format - 00070004, 00000001 , 00000000, 00000001 , 00000000, 00000002, 
00000000, CRC. 
79 
clk_ia 1 L__ 
clk_p1pe 1 7 
' 








"' EOPin 0 I 
Ordy. 0 I 
loc:2 (hex)l3 It, 0 D 
b1 taap1n63 ( It, 0 0000000000000.Wl 
cfg_1n63 (belt, 0 oooooooooooooFI,, 
inp31 (hex )l l@l 0 00000000 
1nst_1n63 (h~ 0 0000000000000000 
PRready .. I 
.I.CK_1n . I 
==k I 
locok I elok 
ldopra.a I 
EOP_out I 
pcoutl5 . (hex 0 0000 
instchk63. (h 0 00000011000000000 
datachkID63 . 0 ooooormooooooooo 
f 131 (hex)l3 0 000001100 
outp31 ( hex) 0 000011000 
pc,31 (hex)I J 0 OOQu11QOO 
007 .. (hex)SB 0 DD 
I 
ESPR_ON is high indicating 

























Opcode for IN \ 
instruction is fetched IDV signal goes high 
Figure 6.9a. Simulation Output for COUNT 
' 
=.J Ln.St cu.v ........_ I f320ns p◄Ons p60ns peons r,00ns r,2ons r,•o= r,,ons r,eons 1soons 1sz0ns fS40ns 15,ans 1ss0ns j600ns ,SZDn• 
3llns 111I1111 l 111 1 I 11 1 1 I 111 1 I 111tl 1 111 I 1111!1111 1111 I 111 1 1111 I 1111 , 111 I 1111 1 111 I 111 1 1111 I 1111 111111111 111 1 I 111iiIt11!111111111 I 1111 1 11 1 I1111I1 111111 1 111 1 t I I 11 
clk_1a k:l 
clk_p1pe . ~1 
clk_p ... k:2 
__J -~----~--~ __ __.!-----'.---~--~--
_r------i __~---==.----, r 1 J 1 r 1 r __ 7_'---=='r-
clk_e . r:;:J 
c l k_c 8~ 
c lr . 0 
ve_ia .. 0 
ESPR_on 0 
IDV . ... 0 
EOPin 0 
Ordy o l:tl~============================================== 
loc:2 ( hex )SJ ~ 011=== ==== =================================== 











EOP_out f:I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pcout 15 (hex 0 1nstchk6J ( h 
datachkID63 . II~~:: ::::::~~~~~~~::::;::~~~~;~~:::::::::::::::~~::~~~~~~~:::: :::::::::~~::~~~~~~~:: 1131 ( hex ) IJ 11000?0004 
outpJ l (hex) o:~::::::::::::::::::::::~;::~~0~0~00~0~0~0~0:::::: ::::::::::~~::~~0~0~00~0~0~00~:::: :::::::::::~•~:::1~,o~oioo~o~o~oio:: po31 (hex ) S3 I j :  1 00 0 00 0 0 00  00000000 
007 ( hex ) l8, / OIL---------- - - --------------- - --------- ------
Start of input COUNT 
Figure 6.9b. Simulation Output for COUNT (continued) 
Figure 6.9c shows the CRC and the EOP _in signal. Then the packet is checked for CRC 
and loc bits. Then the ESS is checked for its availability. After this checking is performed 
successfully, the program counter starts fetching the micro code sequence for the 
COUNT macro instruction. 
~ z,uidl.v lJ.ilJ I 16•0n• 16•0n• ro0n, r20n. r•ons r•0n• r•an. r,00n. i,'0n• r,•an. r,•ens r,•an. poens p20n, p•0n• "'' 64tns •• ,1, .. , I,,, , l,11.l,111l,,,,l,,11l1,,.l,,,,l 11 , ,I,, ,, l,,,,l,,,,l,,11l1111l1111 ,,,1l,1,, 11,,111, , 1,,,l,11, ,,,,l,111 .. ,d,,11l,,,il,,11l,,,,l,,11l1111l111,l11 
c l k_a . . k,:1 7 
c!k_p1pe !Ct --, 7 r ---, 
c!k_p p 
c !k_e . . k:3 
c !k_c . ~~ 
c ir . 0 
ve_1a . 0 
ESPR_on 0 
IDV . 0 
IE()Pin . 0 
__J ~. 
~dy . 0 
l oc2 ( hex )l3 ~ 0 
lln taap1n63 ( 
" 0 lcfg_ 1n63 ( he 
" 0 inp3 1 . (hex ) # 
" 0 
I.AFCOllS 100000000 
inst_in63 ( h ~ 0 
-l'Rready .. I 
ACJ<_in -,, 
lcrcok I ~ locok I 
i,.t ok I \ 
ldopraa I \ 
IEOP_out I \ 
pcout!S (hex 0 
instchk6 3 . ( h 0 
'1atachk1D63. 0 
1131 (hex ) l3 0 
'°'utp31 (hex ) 
po31 ( hex)l3 
po7. (hex ) IB 
I 
CRC for COUNT End Of mput Packet (EOP) 
packet 
L-
.---, .---, ~ 7 r-
0005 0 0 06 
a 5 4 0000600( 
\ 
\ \ CRC check OK signal 
Start of micro instruction 
Figure 6.9c. Simulation Output for COUNT (continued) 
Figure 6.9d shows the continuation of execution for COUNT. As this is the first packet 
for ESPR.Vl , there is no tag placed in the ESS and therefore the GET instruction fails for 
that tag (0x000000000000000 l) and jumps to location Ox 14. Then the instructions 
starting from location 0xl4 are executed as shown in Figure 6.9e below. 
81 
~ Zns/dl.V ~ r.zons ra40ns l960ns l980ns llus 11.oz1u• 11.o◄lus µ. o61us µ..o8u5I 11.luls µ.121us µ .1<1u• 11.161u• 11.181u• 1  2us 11.zzlus 
916ns 1 1111 !.111 1111l1111l111d 1111lu11 l 1111 1111 1 1111 1111 1111 '" ' 1111 11111 11 11 1, 11 1 1111 11 11 111111111 11 1.1111 1 11 11 1111 1111 111 1 1111 ,11il1111 1111 1111 
clk_ia . 1 





!ve i •. .. 0 
iEsPR_on . 0 
IDV . . . 0 
IEOPin . 0 
kmiv . .. o f--j~===================================== 
l oc2. (hex) l 3 <!I lo1f======================================== 
ln taapin63 . ( '1 o,
1
f r (~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ic:fg_in63 ( he <!I rr 
inp31 (hex) # <!I 0 




locok . . . 
e lok .. 
ldopra-.. 
EOP_out . 
<>eout 15.(hex O~~~~~~~~o~o~o•~~~~~~~~~~~~ioo~o~?~~~~~~~~~~~~~o~o~oi•~~~~~~~~~~~o~oo~•~~~~~~~~~~~n~~oo~1~•~~~~~~~~~~~0~0~1~s~~~~~~~~~~~--i 
instchk63 _ ( h 00 l=l=~~~~~;··;•;0;0;~06!0;0;0;00;0;0;C;0:;7;8;0!0~0;4;1;8;00;0;0;0;0;0~;•;o;oo;o;o;o;oo;O;;O;OS;O;O;ll~;2~C;B;O;O;O;l;Ol~O~O~O~OO~O~i~'~~•@oo~o~1~•;o;o;o;o;o;;';C;O;OO;O;O;l;B;OO ;O;OO;O;:!;C idatachkID63 . • · ~ 00000000001\.00000 
t 131. (hex JIJ 0o!t::::~~~~~i~~::::::~::~~~~~~~~~~~~i~:~~~~~g~~~~::::~~~~::::~~;'~::::~~~::::~~~:::::~~ outp31 (hex ) It" \TQQOOOOOl 00000000 \ 
po31. ( hex ) S 3 0 7 lnOOOOOOl 00000000 \ 
pc,7 .. (hex ) 1 8 0 !t::::::::~•::::::::::::::::f::=:=:=:=:=::::=::=::=:=:=::=:::::::::::: :::::::::=~~==================== 
LFPR mst ' \ \ 
GET for the tag GET fails and execution jumps to 
(0x000000000000000l) from location 14 after address 7., The fetched 
packet using LFPR inst· before inst. At addresses 8 & 9 are not executed 
Figure 6.9d. Simulation Output for COUNT (continued) 
Zns / div ~ µ. . Z4us µ Z6us µ. 28us ,l.3us ll.32:us ll . 34us µ..36us µ..38us ,l. 4us: µ . 42:us µ.Hus µ . 46us p..48us µ .sus 11.szus: µ. s4ui 
1 23Zus 111 I 11 ,, I 1 11 , I,;, 1 I 1111 I 1; 1 ii 1111 1 11 1I ,111 1111 I 1111 111 1 I111 ii 1111 I 11 11!1111!1111 111 1 I, 111 I1111 I 111 1 I 111 .I 111 .I 1111 I 1111I1 111 I 1111 I 11 11 I 1111 11 11 I 1111 !111, I,, 
r L 
elk ill. Cl 
clk_ pipe . .. Cl ---, r ----, l 
r---. r l 
c lk_p ~2 
ic:lk_e .. . ~3 
ic:l k_ c .. .. to, 
ic:lr . 0 
fve ia 0 
ESPR_on 0 
IDV. . . 0 
EOPin ... 0 
Ordy . .. .. 0 
loc2 . (hex)l3 @I 0 
bitaapin63 ( @ 0 
clg_in63. ( he @I 0 
inp31 . (hex)I 
" 0 
1nst_1.n63 . (h, @ 0 
"'Rready 
iACl(_in. 
ic:rcok . . . .. 
locok 
lefok . . .. 
ldopra. 
IEoP_out 
0018 0019 oooc 000D ~DODE pcoutl5 (hex 0 10016 JOOl 7 ?000000000000300 TQ000000000000000 
' 
S-480000001 000140 lCAO OOC 
1nstchk63 ( h 0 7C00004100000000 1840000000 00006CO 
datachkID63 . 0 .. 
1131 . (hex ) # 3 0 
outp 31 (hex ) 0 
0031 (hex ) #3 0 
bo7 . (hex) #8 0 
I 
PUT inst. for creating 








JUMP inst. jumps 
to location OxC 




Location Ox 15 has a PUT instruction, thus useful in creating a state in ESS which 
later packets can retrieve, and the JUMP micro instruction at location Ox 17 makes the 
execution jump to location 0xC. Execution continues from there, and the count value is 
checked whether it has reached the threshold so that forwarding of packets has to be 
stopped to avoid the problem of implosion. This is done by a branch condition, and as the 
threshold is not reached the current packet has to be forwarded to the next available node. 
Branch is performed in location 0xE and it branches to location 0x20 where the packet 
has to be forwarded. This is shown in the following Figure 6.9f which proves correct 
execution of the COUNT macro instruction. Then the FWD and OUT micro instructions 
are executed and the output code for FWD (01) is given as the primary output. Also the 
packet is output to the output port of ESPR as shown in Figure 6.9g and Figure 6.9h. 
When the entire COUNT packet is given as output, the EOP _ out signal goes high to 
indicate the end of packet and the PRready signal goes high to indicate that the Packet 
RAM (PR) is ready to accept the next packets as can be shown in Figure 6.9h. 
~ Zns/d1v ~ µ. 54u.s ,l .56us: µ.seus µ..6us µ.6Zus ll.6◄us µ..66us µ. .68us µ..7us µ . 7:lus: µ.74us µ. . 76us µ. .78us ,l .8us 11.ezus µ .e 
I 1. · I I I, ,,I 1l1111l,11,l1111I,, , , 1,1il,u,l111,l1111l1111l1111l111,l,,,1l1111l111.l1111l,111l1111l1111l11, ,l111, 11,111111 11111111.l,,11 1 szz us 








IDV . 0 
EOPin 0 
Or<ly . 0 
loc2 (hez)'3 
" 0 b1 taap1n63 ( i, 0 
cfg_in63 ( he kl o 
1np31 . (hex)I kl 0 








pcoutl S (hex 0 
rnstchk63 ( h 0 
da tachkID6 3 0 
f 131 (hex)IJ 0 
outp31 ( hex ) 0 
po31 . (hex) #J 0 
007 . (hel<) l 8 0 
Ill lltl 1111 1111 1111 1 111 II 
"' 
LJ 7 r 
~ r 7 r 1 1 7 ,--------, .-------, r--
~
0001 ooo, noo10 






I I BGE inst. branches 
BGE inst. to location Ox20 
0020 OOZl 














OUT inst. I 
OCR produces the 
output code for 
FWD (01) 
Figure 6.9f. Simulation Output for COUNT (continued) 
Zns/div ~ ll-7" us ll.76us fl . 78us ll . 8us p. . 82u s µ..8 4us ll. 86u s µ . B8us ll-9us: 
73us 11111 1 11 11 1111 r1 11l1111l1111l111 1 111 11 I I, I \ Ii I I, I 1111 1111 1111 1111 1111 1111 1111 111 1 1111 1111 1111 11 11 111 1 11 1 1 1111 1111 1111 1111 111 1 11 11 111 1 111 1 1111 111 1 I 
clk_ i • 1 7 
clk_pipe . 1 ;-----, .--------, r---, r 1 r 7 r--, .r-7.. 
clk_p . . . 2 
clk_ e . J 
clk_c. D< 
clr . . . .. 0 
ve_1a . 0 
ESPR_on . 0 
IDV. 0 
EOP1n. 0 
Ordy . . 0 
lc>e:2. ( hex)IJ l@l 
bi taap1n6J. ( I@ 
cfg_in63. ( hel@I 
inp31 . (hex ) .r l@I 
inst_in63. (hl@l 










. . . 
. . 
(hex 
instchk63 . (h 
d atachkID63 





31 . (hex ) l3 

















I H QQ070004 0000000.1 00000000 
00070004 00000001 00000000 
Ol 
" 
OCR produces the 
out put code fo 
FWD (01) 
r 
I ~ Output packet forCOUNT 
LDOPRAM to load the 
output RAM 




~ lns/div ~ ,2.llus j2 . 12us J2.13us iz .14us tz.lSus j2. 16us f2 .17us: j2.18us iz.19us 12-2us f2.21us f2.22us j2. 23us j2.24us f2 .2Sus f2.26• 
z l 0Sus 1111 , 1 11 I 1111 I 1111 I 1 11 1 I 1111 I 111 1 I 11 1 1 I 1 11 1 I, 111 I 1111 I 11 11 I 11 1 1I 11 111 1111 I 1 11 1 I 1111 I 11 1,I 11 11 I111 .I11 tI I11 11I t111 I11 11l1, 11I1111I111 ii 11 11 I 1111 I 111 1 I 1 111 I 11 1 1 I 1 
iclk_i a t 1 
"'lk_p1pe . . :::1H/---;:::==~ 
"°lkk_p ~2 1---, ..----, ..----, ~ r_---,_c---,=,:---'r ,........, 7':= ,........,--,........,' ,........, -- '-----: ,...--'r---7=:::---=c..~ ,........,~-,........,--'' r 
"°l _e . . . ,._3 I 1....-1 '---' '--' '--' '---' L......J L......J L......J L......J L......J L......J L......J 
"'lk_c . . . . BF1 ... n------------------------------------------
"'lr. 0 
ve_ i• .. 0 
ESPR_on . 0 
IDV . .. 0 
EOPin ... 0 
Ordy . ... 0 
loc:2 . (hex ) l 3@ 
ltn taapin63 ( I@ 
1Cfg_in63. ( he Ii' 









JlCOUt15 (he x 
1nstchk63 (h 
da tachk!D63 
f 131 . (he x)l3 
outp31. (hex ) 
po31 (hex ) l 3 

































6.3.2. Post-Implementation Simulation Validation of 'COMPARE' Macro 
Instruction 
The next validated macro instruction was COMP ARE, and to test the correct 
functionality the COMPARE instruction was tested after the COUNT instruction so that 
it can utilize the state left in ESS by COUNT. The initial stages of getting the input 
packet and error checking are similar to the previously-mentioned COUNT instruction 
and are shown as follows in Figure 6.1 0a and Figure 6.1 Ob. Figure 6.1 0c shows the start 
of execution of the COMP ARE packet. The GET micro instruction location 0x27 does 
not fail because the tag 0x0000000O0000000 1 carried in the packet is already associated 
with a value by the COUNT packet and using GET, this value is retrieved. Figure 6.10d 
shows the correct execution by dropping the packet and outputting code '3' for 
'DROP(ed)' packets. A branch condition (for eg.: BOE) opcode carried in the packet is 
performed in location 0x2E, on the existing value, and the value carried in the packet. 
Execution branches according to the condition. Thus the resultant packet gets dropped 
based on the condition. 
~ Zns/div Ul.J.JI 12 62us J2 64us f2 66us J2 68us 12 ?us 12 72us J2 .74us J2 . 76us j2.78us iz.sus F,-BZus f2.84us !2-86us f2.88us J2.9us f. · 9 
2.6lus 1111 ,:11l1111l,:111i11.t,~ .. 1 .... 1,:,,11111 ,;,,1,,,,1,: .. 1,,,d1111l1111l1111l1111l,11d,,11l1111l1111 1111l1111l1111l1111l,11il,111l1111l 1111 1111111 11 1111 





. le l L_____J I I r-
ie l L...-...r 7~--=--- -~•--=:::::::r-- 7 I __ '----=,,,,-r--- 1....--...J '-----'I'" 
1c2 • 
. . - i,:3 
18'°'FJ-H,~---_ -_ -_-_ -_ -_-_ -_ -_ -_ -_ -_ -~--_ -_ -_ -_-_-_ -_ -_ -_ -~--_ -_ -_ -_ -_ -_ -_ -_-_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -_ -~--_ -_ -_ -_ -_ -_ 
.,_lr . . .. 0 
rve_lll ... , . 0 rjb::::::::;======================================================= IEsPR_on . . . . 0 r 
IDV . 0 
IEOP1n . . . ... 0 rjb:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::;;;;;; lardy. . .  r.: 
loc2 . (hex)IJk, 01~;;~===================================:::::: 
lb1taapin6J . ( kt 01~;;~===============================:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 
cfg_1n63 . (be IP O ,;:;;~~~~~==7ii~iiiilii:===,~~f===s~~E'=S~~c=====s~~E: inp31. (hex )I @II O I= [OOOBOl04 00000001 00000000 00000001 00000000 --iooooooo2 100000000 
1nst_in63 . ( h Iii O F=~:::%:~====~~:~:~====~~~~~:~====p::~~~====f~~~~~====f~~~~~====~ PRready ... . . r 
ACK_1n . . . . ~ 
crcok .. .. . . LL====t================================================== loa:ik .. lefok .. 
ldopraa .. . . 
IEOP_out m~==t:===========::::::::::::::::::::::::::::::::::::::::=::::::::::::::::::::::::::::::::::::::::::::::::::::::::= ==== 
[Pc:outl5. (hex o]~~;;;;;~;;;;;;;;;;;;;;;;;;;;;;;;~================================= ========== 
instchk63 . ( h Ol~~~~~~f~~~~~~~~~~~~~~~:::::~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~; ldatac !D63 . il: 










•i~~~~~~i~!~~:~~~~100~0i00~00~1 ~~~~~~oioo~oo~oo~o~~~~~~oo~oo~oo~o~2 ~~~;3i""!""ij""l"E" loutp31 . (hex)  XI oooooooi oooooooo 00000001 00000000 00000002 00000000 
tpoJl. (hex) IJ ojt~~~~3~~;~~00~0~00~00~1~~~~~~1o~o~oo~oo~o~o :=:=:=:=::~~~~~============================== 
~7 . (hex)l 8 01L 
Start of input COMAPRE 
Figure 6.10a. Simulation Output for COMPARE 
85 
~ 2ns/ div ~ J2.9us l2.92us J2 . 94us J2.96us iz . 98us p us 
z se4us 11l1111l,,11l,,11l1111l1111l1111l,,1il, I 11 I I, I 
:::lk_ia .. 1 
::lk__p1pe. 1 
=lk_p ... 2 
=lk_e 3 
=lk_c lo, .. . . 
:::lr 0 




'.)rdy . 0 
loc2.(hex) l 31@> 
bi t aapin63 . ( I@> 
=fg_ i n63 . ( he~ 
inp31 ( hex) # 18 
inst_1n63 . (h I@ 





ldopra.a . .. . 
£0P_out . . 
pcout15. (hex 
instchk63 . (h 
:latachkID63 . 





31 ( he x ) l3 













1 11 1 111 1 1 111 1 111 1111 1111 1111 1111 1111 1111 1 1 11 1111 1111 1111 1111 1111 1111 1111 1 111 1111 111 1 1111 1111 11 11 11 
,---, r ~ r , r 7 r--, ~ 












00000000 I II 00000000 
00000000 I 1100000000 \ 
\ 
CRC for COMPARE packet CRC check OK signal 
Figure 6.10b. Simulation Output for COMPARE (continued) 
~ 2n s/d1v UJ..UI p.28us ,3 . 3u s f3.32us p.34us f3.36us f3.38us p.4us p . 42us f3. 44us f3 .46us p . 48us p . 5us p.S2us p .54us ,3 . S6us p . 5{ 
3 268us I 1111111111 1 111 , 111111 1 ii, 1111111 1 I, 1 11 I 111.!1111l1111 11 1 11111 1 11111 1 I, 11111111 11111 l 111tl 1 111l 1111 l 111 d11 11l 111.l111 .l 1 11 .I 111.111 1 .1 111111111 111 .l 11,d1111 
clk_ ia .. .... lcl L.J 
c lk_pipe . . ... 1 1----J ~ r--7 -, r---- 7 r 
clk_p .. ,.,2 
clk_e .. lc3 
c lk_c .. .. ~F 
clr . . .. . 0 
ve_i• .. 0 




loc2. (hex )l 3 @I 0 
b1 taapin63. ( @I 0 
cfg_in63 . ( he @I 0 
1np31. (hex)I" 0 








pcoutl S. (hex 0 noo26 0027 0028 OOZ9 
OOZA 002B 
1nstchk63 (h 0 54000060000000CO 7800004 1 80000000 
54.10000001000140 TI8000000000000C80 TT1C80000101000000 s 
-
0000000000000001 0000000000000000 
datac bk!D6 3 . 0 
f 131 ( hex) l 3 0 /' 
out p31. (hex) .,,- TQOOOOOOl 
00000000 00000002 00000000 
0 00000001 I 00000000 000000 02 00000000 
pc,31 . (hex)l3 0 / 
007 ( hexl l8 0 / 
GET microinstruction 
Figure 6.10c. Simulation Output for COMPARE (continued) 
86 
GET does not fail 
and follows normal 
execution Branch condition 
carried in packet is 
retrieved using LFPR 









Figure 6.10d. Simulation Output for COMPARE (continued) 
6.4 Results and Conclusions 
The ESPR.Vl will correctly operate at a frequency of 20 MHz without 
architectural, VHDL coding or synthesis optimizations. From the post implementation 
simulation results, an average COUNT packet takes 2.15 microseconds to be processed in 
ESPR. V 1 and an average COMP ARE packet takes 1.43 microseconds to be processed in 
ESPR.Vl. 
Thus, verification and validation of the final pipelined ESPR.Vl architecture was 
achieved by testing ESPR.Vl with example packets and testing of macro instructions 
using post synthesis and post implementation simulation verification/validation 
87 
techniques. Performance results have been calculated for the COUNT and COMP ARE 
macro instructions. The ESPR.Vl system was not tested exhaustively but was validated 
for correct functionality for a given performance level (20 MHz), for varying input packet 
formats with different macro instruction opcode. 
88 
Chapter Seven 
Ephemeral State Processor Version 2 (ESPR.V2) Architecture 
The ESPR.V l architecture described in Chapter 5 is a Four-Stage pipelined 
architecture with the ESS being staged with all other execution units in the EX stage of 
the pipeline. The ESS operated at a clock frequency five times faster than that of the 
pipeline clock. Because of the number of functional units and their structures in the EX 
stage and because of the complexity of some of these units, long signal propagation 
delays (latency) can occur within this stage. To overcome this problem and also to design 
an overall performance enhanced architecture, a second version of the ESPR architecture 
is designed. 
The mam objective in the design of ESPR.V2 is to increase the speed 
(performance) by which the processor can operate to meet ESP service needs. To meet 
this objective, study and analysis of ESPR.Vl revealed that a bottleneck lies in the EX 
stage of ESPR.V 1 as anticipated. To reduce the bottleneck, the EX stage of ESPR.V 1 was 
partitioned to multiple stages resulting in a deeper pipelined ESPR. The essence of 
improving the performance of ESPR. V 1 is to hide the latency of ESS by partitioning the 
ESS such that it can be implemented over three stages of a pipeline. This was done in 
addition to other architectural adjustments resulting in a Five-Stage pipelined ESPR.V2. 
The performance enhancing pipelined design of Ephemeral State Store (ESS) and 
hence the five stage pipelined architectural design of ESPR - ESPR.V2, is discussed in 
the following chapter. 
7.1. Pipelined ESS 
In order to partition the work being done in the EX stage of ESPR.V 1, the ESS, 
now fully in the EX stage, was transformed into a pipelined version. The flow of 
operations to be performed in the ESS for the 'GET' and 'PUT' instructions can be seen 
from Figure 4.5 of Chapter 4. To hide the latency and to achieve the functionality needed 
for ' GET' and 'PUT', the operational work of ESS is distributed into three pipeline 
stages. The functional block diagram of ESS can be seen in Figure 7 .1 and the three-stage 
pipelined version of ESS can be seen from Figure 7 .2. Figures 7 .1 and 7 .2 show only a 
89 
high-level view of ESS and its pipelined version with its primary inputs and outputs. 
Detailed description of each stage of the pipelined ESS with supplementing diagrams and 
additional control signals is shown in the following sub section with Figures 7 .3, 7.4 and 
7.5. 
elk . ALUE 64, OUTV_ 
clock I -,, 
. 
. 
GET GF . 
PUT 






VALUE 641 • 
I • 
Figure 7.1. High-Level Block Diagram of ESS 
empty m 






n atch empty 
addr 
5 5 
CAM for LI MUX Tag llUX ddr 








exp time out 
Ll - TM I EL TC Stage Latch 
L2 - EL TC / EUD Stage Latch 
ELC - Empty Location Calculating block 




















Figure 7.2. High-Level View of Three-Stage Pipelined ESS 
90 
The first stage is the 'Tag Match (TM)' stage in which the tag to match is given as 
input along with the necessary operations ('GET' or 'PUT') to be performed. As all other 
components of ESS such as Value, Expiration Time and Empty values are placed in 
RAM, the second pipeline stage, called the 'Empty Location and Lifetime Check 
(ELTC)' stage, checks for the lifetime of the corresponding (tag, value) binding in the 
Expiration Time RAM if there is a tag match or checks for an empty location if it is a 
'PUT' operation and there is no match. The third stage, called the 'ESS Update (EUD)' 
stage, updates the (tag, value) binding in ESS if it is a 'PUT' operation or retrieves a 
value if it is a ' GET' operation. The failure of a 'GET' operation - 'GET Failed' (GF), 
and failure of a 'PUT' operation - 'PUT Failed' (PF), is known from stages 1 and 2 
respectively. The following sub sections describe each pipeline stage separately in detail. 
7.1.1 Tag Match (TM) Stage 
The first stage, called the 'Tag Match (TM)' stage, contains only the CAM 
(32x64) with its necessary control signals. The design of CAM and its operation has 
already been discussed in Chapter 4. The CAM and its control signals form the entire first 










TAG 64 elk 
CAM for 
Tag 






Figure 7.3. CAM in Tag Match (First) Stage of Pipelined ESS 
For both 'PUT' and 'GET' functionality, the tag to match is given to CAM along with the 
specified operation, and if there is a match, the match signal (match sig) goes high along 
with the corresponding 5-bit match address (match addr) in one clock cycle. The control 
91 
signals Match Enable (ME), Match Reset (MR), Write Ram (WR), Write Enable (WE) 
and Erase Ram (ER) are generated internally in this pipeline stage with the existing 
signals from this stage and control signals from the remaining two stages. 'ME' is 
activated on either of 'GET' or 'PUT'. This CAM is also staged in the third stage of the 
pipeline for updating the new tag if it is a 'PUT' operation. So the 'WR' and 'WE' 
signals get activated on the third stage if it is a 'PUT' operation and there is a no match in 
the first stage and there is an empty location for the new tag. The 'mux addr' comes from 
choosing between the empty address and the match address. 
7.1.2 Empty Location and Lifetime Check (ELTC) Stage 
The ELTC stage, shown in Figure 7.4 is the second stage of the pipelined ESS. It 
consists of a Multiplexer (MUX) for choosing either the empty address or match address, 
Empty RAM, Empty Location Calculating (ELC) block, Expiration Time RAM and 
Expiration Time Calculating (ETC) block. All Components and control signals for the 













empty l ~atch 
data out -; Stg 
- ELC match 32/ ---,, addr 
/ -













ELC - Empty Location Calculating block 






Figure 7.4. Components of EL TC (Second) Stage of Pipelined ESS 
92 
Dependent on the tag match from the first stage, the address for the whole of ESS is 
chosen from the multiplexer. The operation of the ELC and ETC are the same as 
described in Chapter 4. If it is a 'GET' operation, and if there is a match from the first 
stage, the lifetime for the (tag, value) binding is checked by the ETC block by comparing 
the current clock value and value being read from the expiration time RAM at the match 
address location. 'Get Failed (OF)' is generated either from the first stage if there is no 
match or from the second stage if lifetime of the binding has expired. On success of 
'GET', the 'mux addr' is given to the third stage for retrieving the value. If it is a 'PUT' 
operation and if there is a match from the first stage, the second stage checks for the 
expiration time to decide whether to update it in the third stage or not. On failure of a 
match from the first stage, empty ram is checked for an empty location to place this new 
(tag, value) binding in ESS. ' Put Failed (PF)' is generated in this stage if there is no 
match and no empty location. Writing to Empty RAM and Expiration RAM takes place 
in the third stage when needed, to update on a ' PUT' operation for a new (tag, value) pair 
and on the expiration oflifetime for an existing (tag, value) pair respectively. 
7.1.3 ESS Update (EUD) Stage 











F. 7 5 Main Component of EUD (Third) Stage of Pipelined ESS 1gure .. 
93 
The EUD stage contains the Value RAM for retrieving the value if there is a successful 
' GET' operation and for updating (writing) the value if there is a 'PUT' operation. Other 
components of the ESS such as CAM, Expiration Time RAM and Empty RAM are also 
updated here in this third stage if it is a 'PUT' operation for a new (tag, value) binding 
(see Figure 7.2). On account of lifetime expiry for an existing (tag, value) pair on a 
'PUT' operation', only the Expiration Time RAM gets updated. No operation is 
performed in the third stage if 'GET' or 'PUT' fails - 'GF' or 'PF'. 
Operations formally performed in five clock cycles in the original ESS 
organization/architecture have been transformed into a three stage pipelined ESS. The 
next section deals with how this three staged ESS is incorporated into the existing Four-
Stage pipelined ESPR.Vl resulting in a Five-Stage pipelined ESPR.V2 architecture. 
7.2. Five-Stage Pipelined ESPR.V2 Architecture 
To improve the performance of the ESPR architecture further, the ESS is 
pipelined as described above, and the ESPR.Vl architecture is further pipelined into 
ESPR.V2 architecture by including the pipelined ESS and some necessary modifications 
to the existing ESPR.Vl architecture. Basic operations performed by the ESPR, its 
functionality, and the macro and micro instructions of the already defined ISA of 
ESPR.Vl will remain the same for ESPR.V2. 
ESPR.V2 is a Five-Stage pipelined architecture with Instruction Fetch (IF), 
Instruction Decode (ID), Instruction Execute/Tag Match (ETM), Branch Detection/Life 
Time Check (LTC), and ESS/Register Update (UD) stages allowing 5 instructions to be 
active in the pipeline at the same time. Figure 7.6 shows the ESPR.V2 pipelined 
microarchitecture. The IF and ID stages of ESPR.V2 are similar to that of ESPR.Vl. The 
EX stage of ESPR. VI is split into two execute stages resulting in stages - ETM and L TC 
in ESPR.V2. The WB stage of ESPR.Vl is transformed into the UD stage of ESPR.V2 
for updating both the register file and the ESS. All sequential functional units, including 
the ESS components in each stage, operate at a Master Clock (MC - clk_pipe) frequency 
and the Packet RAM operates at twice the MC frequency to enable proper packet 
processing. The architecture contains full hazard detection and elimination capability in 




From Instruction Overflow Exception I From Macctrlr Address F 
I 
I 


















WB rite (YR) 













Branch from L TC stage EXFlush 
ctr! sigs 
ctrl 
CAM in ESS 
ID ltputPK~ PKT 
I 64 
/ FWDE PROC 




























data I S6 
Outpu( 












The Micro Controller in the ID stage generates required control signals for all 
remaining functional units in the ETM, LTC and UD stage. The ETM stage consists of 
the first stage of pipelined ESS - the CAM, and other execution units like ALU, Shifter, 
Packet Processing unit, registers related to packet processing module such as Flag 
Register (FLR) and Output Code Register (OCR), Micro Opcode Register (MOR), 
Forwarding Unit to eliminate the hazards and some multiplexers. Values read from the ID 
stage register files, sign extend value or forwarded values from the ETM or LTC stage 
are given to the ALU and Shifter for arithmetic, logical and shift operations. The 
Forwarding unit is used to provide control signals to the Forward Multiplexers to choose 
input values for the ALU, Shifter, Packet Processing Module (for 'STPR' micro 
instruction) and for the CAM. 
The Packet Processing Unit of the ETM pipeline stage consists of a Packet RAM 
(PR), Packet Processing Unit Controller, Cyclic Redundancy Check (CRC) calculation 
unit and processing modules for Load From Packet RAM (LFPR) and Store To Packet 
RAM (STPR) instructions. A high-level functional view of the Packet Processing Unit is 
shown in the following Figure 7.7. 
Input from MUX 
~ 
IN Packet Input 
~ Packet ~ ctrl sigs 
OUT Processing Unit ~ ~ ~ Packet RAM ~ 
LFPR ~ Controller (PR) 
STPR (128 X 32) 
~ 
EOP in .,. -~ .. ctrl sigs 
OR_Ready ctr) sigs 
IDV CRC ,... 
STPR 1• Value .. 
~ 
Processing 1, offset 
LFPR 
~ Unit for LFPR CRC 
- and STPR ~ Calculation Offset 
-
from ID Module 
stage 
Figure 7.7. High-Level View of Packet Processing Unit 



















The Controller in the Packet Processing unit generates the necessary control 
signals for the PR, CRC module and for the processing module for LFPR and STPR 
instructions. The PR is 32 bits wide to hold the incoming packets in 32-bit blocks in each 
clock cycle and 128 bits deep to hold the maximum packet size, and can be extended to 
any size deeper without any change in the existing design. As PR is 32 bits wide, it takes 
2 clock cycles for both LFPR and STPR instructions to handle 64 bit data, and so the 
whole of the packet processing module operates at twice the frequency of the ESPR.V2 
pipeline frequency (clk_pipe). The CRC calculation module checks the CRC of incoming 
packets to precede the further operation of ESP macro instructions. It also calculates CRC 
for the outgoing ESP packets and places this at the end of the packet before giving it to 
the output port. 
Depending on the micro instructions, the Micro Controller in the ID stage of 
Figure 7 .6 generates the Flag Register code to be placed in FLR and PR and Output Code 
to be placed in the OCR. The control signals needed for the ETM, L TC and UD stages 
and necessary inputs to the ETM stage are placed in the ID/ETM pipeline register for 
further operations of the current micro instruction. The Fourth stage, the L TC stage, holds 
the Branch Detection Unit and the second stage of ESS. The instruction following either 
a 'PUT' or 'GET' is always the Branch on PUT Failed ('BPF') or Branch on GET Failed 
(' BGF') instructions respectively. The branch detection unit placed in this stage makes 
use of the 'Put Failed (PF)' or ' Get Failed (GF)' from the second stage of ESS to make 
the branch decision. Inputs to the branch detection unit come from either the register files 
or from the ETM stage. The control signals for the L TC and UD stage are forwarded 
from the ETM stage pipeline register to ETM/L TC and LTC/UD registers respectively. 
The final stage, the UD stage, holds the third stage of ESS for updating Value RAM and 
CAM or retrieving from the Value RAM. Write Back to register files either from the 
previous stages or from value RAM also happens in this stage. 
Chapter Eight 
VHDL Design Capture of ESPR Architectures 
Use of a Hardware Description Language (HDL) is one of the best ways to 
describe a system to make the design vendor-independent, reusable and retargetable. And 
downloading the HDL design of a system to a FPGA chip makes it more convenient for 
systems that require reconfigurability. There are various ways of coding using HDLs 
including Behavioral Coding Style, Register Transfer Level (RTL) coding style and Gate-
Level Coding Style. Behavioral level Coding Style used to describe a system is the 
easiest method and is also easy to understand. But the synthesizing and implementing 
CAD tool may not synthesize and implement this design to operate as needed. Necessary 
modifications can be made in the existing behavioral design or coding styles can be 
combined to make the CAD tool implement the design in a silicon chip efficiently. After 
the architectural design of the ESPR has been developed, most time is spent in design 
description using VHDL for the functional modules, ways to ameliorate them and the 
application of constraints on the existing design using the CAD tool for improvement in 
functional and timing performance. This Chapter discusses the HDL design approach 
used in describing the ESPR architecture, ways of initializing memory on chip and the 
constraints that can be applied to the design. Design capture of both ESPR.Vl and 
ESPR.V2 architectures was done using Xilinx Foundation 4.2i CAD tools, using VHDL 
as a description language and the described ESPR was implemented to a Yirtex FPGA 
chip. The design was then synthesized and post-synthesis simulated for functional 
validation and then implemented (virtual prototype) to the FPGA chip and post-
implementation simulation tested for timing and performance validation. 
8.1 Design Partitioning and Design Capture 
Most of the functional units of the ESPR are described using behavioral level and 
a combination of behavioral and RTL level code whenever needed. Gate level coding 
style is also used for some modules to achieve the exact desired functionality on chip. 
The whole of ESPR.Vl and ESPR.V2 is designed based on a bottom-up, hierarchical and 
modular approach. Since the design of the pipelined architectures involved many 
functional units, it was necessary to design and test the individual lower level modules 
before using them to design a whole processor. And so the whole design of ESPR is 
partitioned into separate stages, and it became easy to separate them on the basis of their 
pipeline stages. Bottom-up and hierarchical level coding is needed in such a design of 
interfacing separate functional modules, and it has to be made sure that each of the low 
level modules function correctly. The modular approach also helps to separate out 
individual modules and to reuse them if they have identical functionality. Figure 8.1 and 
Figure 8.2 illustrate how the code was laid out at a high level for ESPR.Vl and ESPR.V2 










ESPR Functional Modules for 4-stage 
pipeline 




















ESPR.V2 has the same functional hierarchy as that of ESPR.Vl except the EX stage is 
split into Execute/Tag Match (ETM) stage and Branch Detection and Lifetime Check 
(LTC) stage, and the WB stage is transformed into the updating stage for ESS and 

































ETM/L TC Functional 
Stage Components 
Register 
Figure 8.2. High-Level Hierarchy of ESPR.V2 
The 4-stage functional components of ESPR. VI and the 5-stage functional components 
of ESPR. V2 are shown in the following figures, Figure 8.3 through Figure 8.10. The 
detailed hierarchies of the HDL design of both the architectures are illustrated here to 
show the complexity involved in the pipelined processor design of these special purpose 
architectures. At most, care is taken in the HDL description of individual functional 
modules and they are optimized for speed on-chip rather than the area of the chip. After 
100 
successful synthesis, simulation and implementation, the performance and the area 
occupied by the design in the chip are compared. Figure 8.3 and Figure 8.4 shows the 
hierarchy of the IF and ID stage functional components respectively that are utilized by 












































Figure 8.4. High-Level Hierarchy of ID Stage for both ESPR.Vl and ESPR.V2 
lnstruction memory in the IF stage was initially designed using the Lookup Tables (LUT) 
of the Virtex FPGA in ESPR.Vl design and later modified to use the core block RAM 
available on chip to give a significant performance improvement in memory design. 
10 I 
Various options for designing the register files GPR, TR and VR of the ID stage were 
studied, coded in VHDL and tested and an optimized final design is used in the ESPR.Vl 
and ESPR.V2 architecture. The following Figure 8.5 and Figure 8.6 shows the high level 






















ALU, Shifter, ESS, 
Br.Det. Unit and 
PKT. PROC UNIT 






Figure 8.6. High-Level Hierarchy of WB Stage of ESPR.Vl 
The Packet Processing unit and ESS of the EX stage have their own internal functional 
components that can be seen from the previous chapters. They are not shown here. The 
102 
packet processing unit is the same for both architectures, ESPR.V l and ESPR.V2. The 
whole of ESS placed in the EX stage ofESPR.Vl is split into three stages in ESPR.V2 as 
can be seen from the high-level hierarchy of the ETM, LTC and UD stages of ESPR.V2. 


















ALU, Shifter, CAM 
and PKT. PROC 
UNIT 




















Figure 8.9. High-Level Hierarchy of UD Stage of ESPR.V2 
8.2 Initializing the Memory Contents 
Using the Xilinx CAD tool, there are various ways to describe a memory unit 
using HD Ls. The design of a memory module can be either hard coded as array structure 
storage, or by using the stack of an already existing RAM module primitive which can be 
implemented as LUTs on chip, or by using the block core RAM memory available. Out 
of the three ways described above, the usage of core RAM turned out to be the most 
efficient and resulted in higher performance of the ESPR. Table 8.1 provides the detailed 
comparison chart for both designs of instruction memory using LUTs versus Block RAM 
design. 
Table 8.1. Comparison of designs for Instruction Memory 
Parameters LUT Design Block RAM Design 
Frequency (MHz) 27.59 63.9 
Delay (ns) 24.37 7.10 
Block RAMs Used 0 4 
Gate Count 136, 553 67,333 
Number of Slices 730/19,200 (3%) 87/ 19,200 (1%) 
104 
LUT and Block RAM design were used in testing the design of the instruction 
memory and the above performance results were obtained. As required by the ESPR 
design, the instruction memory has to be preloaded with micro instruction sequences that 
represent Macro code of ESP service. For that, the instruction memory needs to be 
initialized with the contents - here in this case, the micro instruction sequences. More 
time was spent in determining and finding ways [ 17] to initialize the memory contents 
using Xilinx HDL CAD tool. 
One easy way is to write the contents into memory and then read them out, while 
performing the necessary simulation. Xilinx provides a way to edit the memory contents 
in the simulation editor before performing the simulation. These two ways tend to be 
fruitless. It is because the micro instruction sequences that must be in memory to provide 
any ESP service is huge and occupies up to nearly 256 instruction memory locations. So 
it is troublesome to write each and every micro instruction while performing the 
simulation, and also difficult to edit the contents each time on simulation. There is one 
other way in which the memory can be initialized by writing using the constraints editor 
[ 17] provided by Xilinx. The same method can also be done by means of writing an 
external constraints file prior to synthesis of the whole design or can be written in the 
VHDL design file for the instruction memory. The following description shows these two 
ways of initializing the instruction memory. 
8.2.1. Initializing a RAM Primitive via a Constraints File 
A 'NCF' (Netlist Constraints File) - 'filename.ncf' is used to initialize the 
memory contents. The NCF file must have the same root name as the input netlist (e.g., if 
the input netlist name was ' inst_mem.edf' then the NCF file should be named as 
'inst_mem.ncf', and the instance name(s) of the RAM primitive should also be known. It 
should be written in the NCF file as follows, 
INST instname INIT = Value 
where ' instname' is the instance name of the RAM. This must be a RAM primitive, 
enclosed in quotes and 'Value' is a hexadecimal number. 
For example, if the instance name of 'RAM32xls' primitive is 'RAMl' then the 
contents of 'RAMl' could be set in the NCF file by placing the following line in a NCF 
file. 
INST "RAMl" INIT = ABCD0000; 
The following example gives a clear picture of how initializing the instruction memory 
can be done using the way described above. Consider the following instruction sequence 
to be initialized into memory. 
MOV R3, Rl -1C61000001000000 (Eq. HEX Value for instruction) 
ADD R4, R3, R3-2483180001000000 
ADD RS, R4, R3-24A4180001000000 
The hexadecimal numbers on the right is the equivalent value for the micro instructions 
on the left, and the 64-bit values are laid out in order of (63 down to 0). Figure 8.10 
shows the contents of the NCF file for the above sequence. The instance name 
"esprcomp/IFFULL/ifpipecomp/instrmernnew/IMEM/R320/R321" describes the level of 
hierarchy with the top level module 'esprcomp' in the left and the lowest level module 
'R321' at the end. 
8.2.2. Initializing a Block RAM in VHDL 
The block RAM structures can be initialized in VHDL for synthesis and 
simulation. The VHDL code uses a 'generic' to pass the initialization. The generic types 
are not supported by the present day Synopsys FPGA compiler, and a built-in dc_script 
(e.g. , translate_off) is used to attach the attributes to the RAM. The following Table 8.2 
illustrates the RAM initialization properties to be used along with the generics in VHDL. 
Figure 8.11 shows the example instruction sequence starting from address location zero 
for the VHDL code described below. Figure 8.12 shows an example VHDL code for 
initializing the block RAM for the instruction memory. 
INST 11esprcomp/IFFULUifpipecomp/instrmemnew/IMEM/R320/R32 l II INIT=00000000; 
INST 11 esprcomp/lFFULU ifpipecomp/insttn1emnew/IMEM/R32 l/R32 l " INIT=00000000; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R322/R32 l II INIT=00000000; 
INST "esprcomp/lFFULU ifpipecomp/instrmemnew/IMEM/R3223/R32 l" INIT=00000000; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3224/R32 l" IN1T=00000007; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3225/R32 l" INIT=00000000; 
INST "esprcomp/IFFULU ifpipecomp/instnnemnew/IMEM/R3243/R32 l " IN1T=00000006; 
INST 11esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3244/R32 l " INIT=00000006; 
INST 11esprcomp/IFFULUifpipecomp/instrmemnew/IMEM/R3245/R32 l" INIT=00000000; 
INST 11esprcomp/IFFULUifpipecomp/instrmemnew/IMEM/R3246/R32 l" INIT=00000000; 
INST "esprcomp/IFFULUifpipecomp/instrmemnew/IMEM/R3247 /R32 l" INIT=00000000; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3248/R32 l" IN1T=00000003; 
INST II esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3249/R32 l " IN1T=00000002; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3250/R32 l" INIT=00000004; 
INST 11esprcomp/IFFULUifpipecomp/instnnemnew/IMEM/R325 l/R321" INIT=00000000; 
INST 11esprcomp/IFFULUifpipecomp/instrmemnew/IMEM/R3252/R32 l" INIT=00000000; 
INST "esprcomp/IFFULUifpipecomp/instnnemnew/IMEM/R3253/R32 l" INIT=00000005; 
INST "esprcomp/IFFULL/ifpipecomp/instrmemnew/IMEM/R3254/R32 l" INIT=0000000 I ; 
INST II esprcomp/IFFU LU ifpipecomp/instnnemnew/IMEM/R3255/R32 l" INIT=00000006; 
INST 11esprcomp/IFFULL/ifpipecomp/inst1memnew/IMEM/R3256/R32 l II INIT=00000000; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3257 /R32 l" INIT=00000000; 
INST 11esprcomp/IFFULUifpipecomp/instrmemnew/IMEM/R3258/R32 l II INIT=00000007; 
INST 11esprcomp/IFFULU ifpipecomp/instnnemnew/IMEM/R3259/R32 l " INIT=0000000 l; 
INST 11 esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R3260/R321 11 INIT=0000000 l ; 
INST "esprcomp/IFFULU ifpipecomp/instrmemnew/IMEM/R326 l /R32 l" INIT=00000006; 
INST "esprcomp/lFFULU ifpipecomp/instrmemnew/IMEM/R3262/R32 l " INIT=00000000; 
INST "esprcomp/lFFULU ifpipecomp/instrmemnew/IMEM/R3263/R32 l" INIT=00000000; 
Figure 8.10. NCF file for Initializing Instruction Memory 
Table 8.2. Block RAM Initialization Properties 
Property Memory Cells 
INIT 00 255-0 
-
INIT 01 511 - 256 
-
............ 
.. ... .. . .. . . 
. ........ ... 
..... ... .. . .. 




MOVI R4, l 
ADD RS, R4, R3 
MOVTRl, RS 
MOVVRl, RS 
PUT TRI, VRl 
BPF ADDRl (0X41) 
NOP 
NOP 
LFPR <O 3> TRI 
GETTRl, VRl 
BGF ADDR2 (0XlB) 
NOP 
NOP 
INCR R4, VRl 
Figure 8.11. Example Micro Instruction Sequence 
-- Instruction Memory Design using Block RAM 
library IEEE; 
use IEEE . std_ logic 1164 . all; 
--synopsys translate_off; 
library unisim; 
use unisim.vcomponents . all; 
--synopsys translate_on; 
entity INSTMEM is 
port(clk, we, en, rst: in std_ logic; 
addr: in std_ logic_vector(7 downto 0); 
inst_ in: in std_ logic_vector(63 downto 0); 
inst_out: out std_logic_vector(63 downto O)); 
end entity INSTMEM; 
architecture behavioural of INSTMEM is 
component RAMB4_S16 is 
port(ADDR: in std_ logic_vector(7 downto O); 
CLK: in std_ logic ; 
DI: in std_ logic_vector(15 downto O); 
DO: out std_ logic_vector(l5 downto O); 
EN, RST , WE: in std_logic); 
end component RAMB4 S16; 
Figure 8.12. VHDL Code for Instruction Memory using Block RAM 
attribute INIT 00: 
-
string; 
attribute INIT 01: string; 
-
attribute INIT 02: string; 
-
attribute INIT 03: string; 
-
attribute INIT 04: string; 
attribute INIT 05: string; 
-
attribute INIT 06: string; 
-
attribute INIT 07: string; 
-
attribute INIT 08: string; 
-
attribute INIT 09: string; 
-
attribute INIT OA: string; 
-




attribute INIT OD: string; 
attribute INIT OE: string; 
attribute INIT OF: string; 
-
attribute INIT_OO of InstramO : label is 
"00000000000006COOOOOOOC00000000010400000000000000000004000000000"; 
attribute INIT_OO of Instraml : label is 
"0100000000000000800000000000000000000000810001000100010000000000"; 
attribute INIT_OO of Instram2 : label is 
"0001000000000000004100600000000000000041000100601800000000000000"; 
attribute INIT_ OO of Instram3 : label is 
"2C80000000008000780054000000000084007C001C051C0524A4208000000400"; 
begin 
InstramO: RAMB4 Sl6 
--synopsys translate_off 
GENERIC MAP ( INIT_ OO => 
X"00000000000006COOOOOOOC00000000010400000000000000000004000000000") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>Clk, DI=>inst_ in(lS downto 0), DO=>inst_ out(lS 
downto 0), EN=>en, RST=>rst, WE= >We); 
Instraml: RAMB4 Sl6 
--synopsys translate_off 
GENERIC MAP ( INIT_ OO => 
X"0100000000000000800000000000000000000000810001000100010000000000") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_ in(31 downto 16), DO=>inst_ out(31 
downto 16), EN= >en, RST=>rst, WE=>we); 
Instram2: RAMB4 S 16 
--synopsys translate_off 
GENERIC MAP ( INIT 00 => 
X"0001000000000000~ 04100600000000000000041000100601800000000000000") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_ in(47 downto 32), DO=>inst_ out(47 
downto 32), EN=>en, RST=>rst, WE=>We); 
Instram3: RAMB4_ S16 
-- synopsys translate_off 
GENERIC MAP ( INIT 00 => 
X"2C80000000008000i80054000000000084007C001C051C0524A4208000000400") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI =>inst_ in(63 downto 48), DO=>inst_ out(63 
downto 48), EN= >en, RST=>rst, WE=>We); 
end architecture behavioural; 
Figure 8.12. VHDL Code for Instruction Memory using Block RAM (continued) 
109 
8.3 Timing Constraints 
The Xilinx 4.2i Foundation CAD tool provides a means (constraints editor) for 
specifying constraints for timing, placement, mapping, routing etc., on the specific design 
to provide some performance improvements in terms of area and/or speed of the design. 
It is up to a designer to consult the Xilinx Constraints Guide [ 17] and apply their own 
needed constraints to the design. The constraints can be externally specified using a 
'UCF' (User Constraints File) or can be described using the constraints editor. In the 
ESPR architecture design, only timing constraints of specific extent were applied to 
ESPR.V 1 and ESPR.V2 to determine the performance. And it is believed that, more 
optimum performance of any design can be obtained by applying more tight constraints 
at the expense of longer synthesis and implementation time. 
110 
Chapter Nine 
Post-Implementation Simulation Validation of ESPR.V2 Architecture 
Following the design layout and design description of the ESPR.V2, the next step 
is to synthesize and simulate the design. After HDL post-synthesis simulation to validate 
functional correctness, and prior to implementing and prototyping the design on a FPGA 
prototype board, the ESPR.V2 has to be validated for both functional and timing 
(performance) correctness via Post-Implementation simulation. This process is referred to 
as virtual hardware prototyping as it involves timing validation of the system. This 
section presents the Post-Implementation HDL simulation validation of the ESPR.V2 
architecture. Simulation results are presented in a step-by-step fashion. The ESPR.V2 
was first simulated executing single micro instructions to validate their correct functional 
and timing operations. The ESPR.V2 architecture was then simulated executing short 
sequences of micro instructions. Lastly, it was validated that the ESPR.V2 architecture 
correctly executes all macro instructions for which it was developed to execute. Post 
Implementation simulation validation of the architectural design was performed on a PC 
(Personal Computer) system - Pentium III 550 MHz Processor, with Windows 2000 
platform and 640 MB (Megabytes) of RAM memory. The utilized HDL simulator is 
contained within the Xilinx Foundation 4.2i CAD tool set utilized during this research 
project. The logic resources utilized in the Xilinx Virtex2 - 4000 FPGA chip to 
implement the described ESPR.V2 architecture is given in the following Table 9.1. 
Table 9.1 Logic Resources Utilization for ESPR.V2 Architecture 
Resources Utilization 
4 lnput LUTs 5,902 
Flip flops 916 
Block RAMs 33 
Equivalent System Gates 2,256,291 
111 
9.1 Validation of Correct Execution of Single Micro Instructions 
All the micro instructions described for the ESPR. V2 architecture were tested 
separately for their functional and timing correctness. The individual pipeline stages for 
each instruction were also tested for proper generation of control and data signals. This 
section presents two micro instructions flowing through the five pipeline stages to show 
correct execution in all stages. The first micro instruction to be presented is the Shift 
Right (SHR) micro instruction in the following Figures 9. 1 a and 9.1 b. The micro 
instruction with its equivalent Hex. Format is given as, 
SHR R7, R4, 1 -48E4000001000001 (Hex. Format) 
From Figure 9.la, after the Instruction Fetch (IF) stage at approximately 10 µs, 
the SHR instruction is read out from ' instchk' value. Prior to that, a value of ' 0xA' is 
written into register R4 using the 'MOVI' micro instruction to be used by the 'SHR' 
instruction to result in a value of ' Ox5' in register R7. The opcode for the SHR instruction 
(Ox 12) is decoded and read from the 'op2o' variable at the end of the Instruction Decode 
(ID) stage at approximately 12 µs. The Shifter is placed in the ETM stage of ESPR.V2 
and the value of 'OxA' from register R4 is shifted right by one position as specified by 
the instruction and a value of 'Ox5' is read out from the ETM stage. 
II 1111 1111 1111 1111 1111 1111 1111 II 
1k:;lk_p1. . . :1 7 ... I I I 
i tlr ... C -
ive_a J I 
B inst_in63 . (h j 0 10000000000000000 
B cx:outlS . (hex 0 
B instchk63. (h 0 48!4000001000001 10000000000000000 
BklatachkID63 . 0 
Bbp2o5 (hex)I 0 08 112 
B~e2o63 . (hex ) 0 I 000000000000000A / 10000000000000000 
Blalu3o63 . (he~ 0 I / 1000000000000000A 
B~Rl 2o63 . (he~ 0 I / 
I I 
SHR instruction 
at the end of IF 
stage 
Opcode decoded 
for SHR and 
output at the end 







I Shifter Output of 
the third (ETM) 





Figure 9.1 a. Simulation Output for SHR Micro Instruction 
I I 2 
Figure 9.1 b shows the continuation of the simulation output of the SHR micro 
instruction. The L TC stage passes the data value of ' Ox5' and the value is written back to 
register R 7 during the UD stage. This can be seen from the value also being read out by 
the variable 'GPRI 2o' of Figure 9.1 bat approximately 18 µs. 
i clk_pi ...... ,, I . 
i clr . " . . . . . . . . 
i ve_i1 ... . ... 0 
B inst_in63. (h ~ 0 
B t):Out15 . (hex 0 
B instchk63. (h 0 !0000000000000000 
B~atachkID63. 0 
Bop2o5 . (hex)I 0 112 
Bse2o63 . (hex) 0 10000000000000000 
Balu3o63 . (hex 0 1000000000000000A 
81.,Rl 2o63. (hex 0 
1111 1111 1111 1111 111 1 1111 1111 11 11 1111 1111 11 11 111 1 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 II 
I 
. .. 
I l r I I 
1000000000000000A 1000000000000000s !0000000000000000 
loo 7 
7 
10000000000000005 I 10000000000000000 l000000000000000! 
/ IoooooooooooooooA 1000000000000000s 
I 
Pass through output of 
the fourth (L TC) stage 
1 
Value being written into 
register R7 during UD 
stage 
Figure 9.lb. Simulation Output for SHR Micro Instruction (continued) 
Figures 9.2a and 9.2b show another micro instruction - 'Load From Packet RAM 
(LFPR)'. This instruction utilizes the packet processing unit. LFPR with its hexadecimal 
code is given as, 
LFPR <Off-3> TRl -54000060000000C0 
The instruction is fetched and the opcode is decoded similar to the previous SHR micro 
instruction as can be seen from Figure 9.2a. In Figure 9.2a, ' clk_p' is the clock used 
within the packet processing unit that operates at twice the frequency of 'elk _pi' (pipeline 
clock frequency of ESPR. V2). During the ETM stage, the value of 'Ox 1' from the Packet 
RAM at offset '3 ' is retrieved in two clock cycles (clk_p) of 32-bit values each, that starts 
from nearly 34.5 µs from the variable 'po' and the ETM stage outputs a 64-bit value of 
'Ox! ' from packet processing unit at 36 µs. 
i lk_pi . . .... . ... . . 
l lk_il ... . ...... . . 
lk_p . . . 
Binst_in63 ... (heK)I oot]:=~~::=========================== 
B utlS ..... (heK)ll6 □ ,~=====~~~=-~====;~=~~~========= 
. . (heK)l64 Ol~======~5i=40=0=00=6=oo=oo=o=oo=c=o ===:::!;:;::========:::;:::======= 
B p2o5. . . .... (heK)l 6 ol~======;t=-========~==;;:;;;=~::::;;=~==;;:~:::;:;:== 
B 31. . . . . . (hex)l32 Oll======;;t::::=========;-;f:::.=::=00=00::0::00::::1=~=~ =~=00:::01=0=00::::4 == 
B lu3o63 . . . . (heK)l64 Ol~====;.========;-l==========i===== 
B IOCOKPITRrs63 . (heK)I Olt:=:==::::;~=======;;~=======::::;,~===== 
LFPR instruction 




outputted at the 
end of ID stage 
PKT PROC Unit Output 
from the third (ETM) 
stage of ESPR.V2 
Figure 9.2a. Simulation Output for LFPR Micro Instruction 
Figure 9 .2b shows the continuation of the simulation output for the LFPR instruction. At 
38 µs, the output value for the LFPR instruction is passed through the L TC stage and 
during the UD stage the value of 'Oxl' is written to the tag register TRl. 
i lk_pi . 
i lk_ll .. 
l lk_p. 
. (heK)l 6H O ~==================+============= 
... (heK)l16 0 ~=====:::=======================!========== 
.(heK)l 64 
(hex)l6 00 
. (heK)#32 0 l==~0::::00::0::00::0::l =:==='.===:::=:00:::0:::::70:::::00::4========1~;:;:;:;;:;;:;:;~===::::; 
B IOCOKPITRrs63. (heK)# olb::===========================;;;;:;.~~~;':=~=~=-~~ 
B atachkID63 _ (heK)#64 01L- -----------------=.r===~--\---A:.O-=..:OO;.c.O.;.:O-=..:O~OO:..::cO.;.:OO-=-O 
Pass through output of 
the fourth (L TC) stage 
Value being written into 
tag register TRI during 
UD stage 
Figure 9.2b. Simulation Output for LFPR Micro Instruction (continued) 
114 
9.2 Small Micro Program and Individual Functional Unit Testing of ESPR.V2 
In the testing process of ESPR.V2, validation of proper execution of single micro 
instructions was first achieved and will be followed by the validation of small micro 
program sequences. This section presents the validation of two small program sequences 
and a couple of instructions which also validates the functionality of main individual 
functional units such as the ALU Unit, Packet Processing Unit and ESS. As ESPR.V2 is a 
five-stage pipelined processor, micro program sequences explained below have hazards, 
and the following simulation results also validate the hazard detection unit and 
forwarding unit that eliminates the hazards. 
9.2.1 Validation of ALU Unit and JMP Instruction of ESPR.V2 
Figure 9.3 shows the micro instruction program sequence used to validate the 
ALU unit and JMP instruction and also provides an example to test the forwarding unit. 
The micro instruction memory is preloaded with the bit patterns for this instruction 
sequence. As can be seen from the program sequence, a data hazard arises when the next 
instruction in sequence in the ID stage wants to read the new value before the data is 
written into the same hazard prone register in the UD stage. The forwarding unit of 
ESPR.V2 takes care of the hazardous situation by forwarding the needed value either 
from the ETM stage or the L TC stage to the input of the ETM stage. 
0. MOVI R3, 1 - 2060 Data Hazard - R4 
1. ADD R4, IUJrl-WITTlm00l00000O 
2. ADD RS, R4, R3 - 24A4180001000000 
3. JMP 32h - 7000000000000C80 
Figure 9.3. Program for Validating ALU Unit and JMP Instruction 
The following Figures 9.4a and 9.4b show the Post-Implementation simulation validation 






















111111 1111 1111 111 1 1111 1111 1111 1111 1111 1111 1111 111111111 111111111 1111 1111 1111 1111 1111 1111 111111 111 111111111 1111 1111 11111 11 11 1111 I 
i clk_ ia . . . ... P2 l 1 1 
i lk_p1 ... "~ h ___ .... ___ __, I '~---~ 




i faaacsig . .. _ 
l >'lacctrlr . 
" Hr------------------------------------
itput1n . 1J H~------------------------------------1 IDV . . • D Hr------------------------------------
1IEP1 J H~------------------------------------
i~Rr .. . .... ,_; l:;1~================================== B loc2 (hex )IJ t°' 0 
B inp31 (hex)I , oll===== ================================ 
B rnst_in6 3 . ( h,'1' O ~1=24=s=31=a=oo=o=1=00=0=00=0 ===~12=°'=4=1s=o=oo=1=00=0=00=0====r=1=00=0=00=0=00=0=0=0=ca=o= ===1=•0=00=0=00=0=00=0=00=0=00=0== 
B fa_aac_ctrlr <r Or-- ----------------------------------
oAK ... . . . 
o=k . . ..... . 




Bstag2 . (hex)# 
B pcout15 . (hex 
B 1nstchk6 3 . ( h 
8 datachkID63 
41=====::::;;:;;:::::;=======::::;;:;;:::::;=======~=========:;:::::::;:;:::= 
0 t:;;:;::;:::;:;:;:::;:;:;::=;::=:'.::10=0=02= =========1=00=0=3 ========10=0=04===::::;;::::;::====='.:10=0=05~ 
0 12060000001000040 12483180001000000 124A4180001000000 17000000000000C80 
0 / / / -- 10000000000000001 
I / 
Fetching of instructions 
of the micro program 
sequence from 
instruction memory 
Output of the first 
MOVI mstruchon at 
the UD stage (51h stage) 
Figure 9.4a. Simulation Output for ALU Unit and JMP Instruction Validation 
olz .. . 
4 
0 l0005 10032 
0 17000000000000C80 10000000000000000 \ 
0 100110000000000001 I0000000000000002 I 
0 I I \ 
0 I I \ 




I I I I 
Jump Micro 
Instruction 
Output of the first . 
ADD instruction at the JUMP signal 









\ \l \ 
PC value 
changed to the 
JUMP address 
Output of the second 
ADD instruction at the 
UD stage (5th stage) 
Figure 9.4b. Simulation Output for ALU Unit and JMP Instruction 
Validation (continued) 
--- - - --
---- - - -
The instructions before the 'JMP' instruction were executed correctly, and as the 'JMP' 
instruction is encountered, it is identified in the ID stage and the Program Counter (PC) is 
loaded with the JMP address (0x32) and the execution is continued from there on. 
9.2.2 Validation of Packet Processing Unit of ESPR.V2 
This section shows the simulation results for validating the packet processing unit 
using TN and OUT micro instructions instead of a sequence of instructions. Instructions 
that utilize the Packet Processing unit are Load From Packet RAM (LFPR), Store To 
Packet RAM (STPR), TN and OUT. As LFPR has already been discussed in the previous 
section, this section discusses the IN and OUT instruction to validate the Packet 
Processing unit. Figures 9.Sa and 9.5b show the Post-Implementation simulation output 
of the initial and final segment for the TN instruction and Figures 9.6a and 9.6b show the 
result for the OUT instruction. 
I. Sus r,"" r, \us ,,us 
Ill 11111 1111,1111 1111111 11 111111111 1111 111 
r, -i•a [" I,. lu:r l,C·;s 1,0.Su:r 111u.s ,111. Sus 11zus ,l, •'",1,"'" I 
1111 Ill 111111111 11111111 11 111 111 1 l t lllt111 11111 111 1 11111111 1111 11111 1111111 11 11111111 
l lk_p1 ...... -
-· 
_J I r 
1 Flk_ia . ... . ,.: I I J l J 
lFlk_p ....... --
-- ~ 
l lk_c ....... BF 
1iclr . . •• . 0 
l)Je_ia .... ... G 
1 fuacsig. ... u 
l IDV . .. . . Q . . , I 
l~i 
ljPUtln .. . . I I 
1iaacctr lr . . . u I 
ipRr .. 0 I 
Bled (heK)I J Y" 0 I 
B1nst _in63 (h ,. 0 I 
B fa_aac_ctr lr ~ 0 I 
B inp31. (hex )# ~ 0 I 100070004 100000001 10000000 
oil!( I • I ~ , ....... OFOk ... I \ I II o~o .... ... 
o lz . I \ I ....... I olf'Rr ...... 
I \ I I o Ider ... 
BIJ>COut15. (hex 0 I \ I I 
8 rnstchk63 (h, 0 0400000000000000 / 10000000000000 0 _l I I 
8£131 
-
(hex )l 3 0 r I 
INM·i~r/ 
IDV being / 
Start bf Input high for two 
Instruction clock cycles Packet - First 
-32 bit block 
\ 100070004 I I 
.i .f Sttond ~2-b;t 
block 
A sign I going 
high on receiving 
each 32-bit blocks 
F igure 9.5a. Simulation Output for Packet Processing Unit (IN) Validation 
After the IN instruction is fetched from memory, the IDV signal in the ETM stage 
has to be high for two Packet Processing unit clock cycles (clk_p) and then the input 
packet from the Input Packet RAM is fetched in 32-bit blocks in each pipeline clock 
(clk_pi). As can be seen from Figure 9.5a, the AK (Acknowledge) signal goes high on 
receiving each 32-bit block of the input packet. 
Figure 9.5b shows the final segment of the IN instruction. The end of the input 
packet is detennined by the EPi (End of Packet Input) signal going high, when receiving 
the Cyclic Redundancy Check (CRC) for the packet. On receiving the CRC for the input 
packet, the CRC calculation unit performs the CRC check for the entire packet and the 
execution for the corresponding ESP packet continues from there on. 
i lk_pi . 
i lk_a .. 
i lk_p ... 
i lk_c . 
lr . . . .... 0 
1 e_ia . . .. 0 1--11- ----------------------- - - - - -
i fuacsig .. . 0 ~l- ----------------------------










for the packet 
CRC check 
OK 
Figure 9.Sb. Simulation Output for Packet Processing Unit Validation (continued) 
The following Figures 9.6a and 9.6b illustrate validation of the OUT instruction of the 
Packet Processing unit. 
I I 8 
: 1np31 (hex )I . 
,AK .... 
I k .. 
I Q •. 
, ldor 
, lz .. 
' Rr .. ... . 
: tag2 (hex )I 
utlS . (hex 
: 1nstchk63 . ( h 
: atachkID63. 
: f 131 (hex)#J 
: utp31 (hex) 





































Figure 9.6a. Simulation Output for Packet Processing Unit (OUT) Validation 
. 








0 1000000 J 





























Figure 9.6b. Simulation Output for Packet Processing Unit Validation (continued) 
The Load Output RAM ('ldor') signal goes high at the ETM stage on executing the OUT 
instruction and it goes high for each block as the packet is output in 32-bit blocks. The 
End of Output Packet signal goes high on receiving the CRC (final 32-bit block) of the 
packet and the Packet RAM Ready (PRr) signal goes high indicating the Packet RAM is 
ready to receive input packets. 
119 
9.2.3 Validation of ESS of ESPR. V2 
Figure 9.7 shows the micro program sequence for the ESPR.V2 ESS validation. 
Data Hazard - R4 
0. MOVI R4, 1 - 2 040>AAf ___ Data Hazard - RS 
1. ADD RS, R4, ~3-24A41R0Ol\l990000 
2. MOV TRI, RS~icoso Data Hazard - VRl 
3. MOV VRl, RS C0500018!1)1)()()()-0 
4. PUT TRI, VR~- 7C00004100000000 
5. BPF 41h - 8400000000001040 Data Hazard - TRI 
6. OP - 48E4000001000003 
7. LFPR <0 - 3> ~.,,....,2111110060000000C0 
8. GET TRI, RI - 7800004180000000 
9. BGF lBh - 80000000000006C0 
Figure 9.7. Program Sequence for Validating ESS 
To get a value into the tag and value registers for performing the 'PUT' operation, 
a series of ALU operations were performed initially and then a 'PUT' is invoked to place 
a specific (tag, value) pair in ESS. The LFPR instruction is used to get a tag value into the 
tag register TR l from the packet which was previously placed in Packet RAM using the 
IN instruction. Later a ' GET' operation is performed to retrieve the value bound to the 
tag. Figures 9.8a, 9.8b, 9.8c and 9.8d show the Post-Implementation simulation output for 
the ESS Validation via the above program sequence. 
o ldor ... . ... . 
o lz . .... .. .. . 
0 Rr ... . . ... . 
B tag2. (hex)I 
B utlS . (hex 
B instchk63 . (h 
B atachkID63 . 
8 afo63 . (hex) 
8 f 131. (hex)l3 




0 l0004 10005 









Start of fetching of 
instructions from memory 
-









Figure 9.8a. Simulation Output for ESS Validation 
o ldor .. . ... . . 
utl S. (hex 
B instchk63 . (h 
B atachkID63. 
B afo63 . (hex) 


















Fetching of PUT 
instructions from memory 
Tag (Oxl) and Value (Oxl) 
being placed in ESS through 
'PUT' which is not shown 
here 
o ldor 
utp31 . ( hex ) 
31 (he x )l 3 
J lU •..• . 
8 tag2 (hex)# 





0 IOOOC IOOOD IOOOB 
0 (154000060 OlOOOOCO 11780000 4180000000 II80000000000006CO 1100000000000 
0 ~ 
0 \ 10000000000000( 
0 \ 
0 ,., 0000001 10000 0000 1100070004 




Continuous Fetching of 























V alue of Oxl 
retrieved from ESS 
during the fmal 





Figure 9.8d. Simulation Output for ESS Validation (continued) 
1, 
9.3 Validation of Macro Instructions of ESP on ESPR.V2 
After successful validation of individual micro instructions and testing of 
individual functional units, the goal is to now validate the ESP macro instructions. All 
five macro instructions were validated through virtual prototype simulation. This section 
concentrates on only four of the macro instructions - COUNT, COMPARE, RCHLD and 
RCOLLECT. These are the four macro instructions used in the ESP applications 
described in Chapter 2. 
Figures 9.9a through 9.9f show the simulation validation output for the COUNT 
Macro instruction. Figure 9.9a shows the initiating packet sequence blocks for COUNT. 
A different sequence of micro instructions (not shown) is executed before the execution 
of the COUNT macro instruction to place a (tag, value) pair in ESS. This avoids the 
fail ure (ESS) of the initial 'GET' micro instruction in the sequence of micro instructions 
for COUNT (see Figure 3.11) as can be seen from Figure 9.9b. 
-l~lk_ia 
. - . 
l lk_pi . . . B: 







i IDV . . . 
1IEP1 
iklRr . .. . . . . . 
B loc:2 (hex )l 3 ' · 
Bfa_aa.c_ctrlr 
B inst_in63 ( hi,• 
B inp31 (hex ) t 
oAl< . 
ocok . . 
o EPo .. . . . . 
o l dor 
0 12 . . . . 
oPRr 
BocoutlS . (hex 
B 1n:stcbk63 ( h1 
BdatachkID63 
Bda4o63 (hex) 
8 f 131 (hex )l 3 
Boutp31 (hex) 









0 00070004 00000001 001)00000 
-1' 
" 
, I r 
I / 1f I / / 
I / I / 
I / I / I / 
0 / / 
0 0 4000 000.llOOOOOOO 10 00000 , 0 00000000 / I / 
0 , I / I / 
0 I / / 
0 I I / 110007000< I / 
0 , I / ll000'i'0 0 04 / rI0000•) 00 «" 00000001 0( 
0 , / 10007000.1 ooouoooo 100000001 Tc 
/ I / I I lnput Packet 
blocks (32-bit) ACK Signal for input packet blocks 
Figure 9.9a. Simulation Output for Validation of COUNT Macro Instruction 
Jldor 
J lz . 
J Rr . 
B utlS . (hex 
B instchk.&3. (h 

































Retnevmg a value of 
0x0l from ESS 
/ 
/ 







The execution continues followed by the 'INCR' and ' PUT' micro instructions. As a 
binding is already placed in ESS indicating that a 'COUNT' packet has already passed 
through this node earlier, the current packet increments the ESS value to include its count 
of passage through the node. Then the ESS state is updated to this value by a 'PUT' 
micro instruction as shown in Figure 9.9c. 

























Figure 9.9c. Simulation Output for Validation of COUNT Macro 
Instruction ( continued) 
Then a threshold check is performed between a value carried in the packet and the value 
in the ESS. The value carried in the packet, '0x02' at offset '4 ' is retrieved using a 
' LFPR' instruction as shown in Figure 9.9d. The current binding in the ESS for tag 'TRI ' 
has a value of '0x02' , the incremented value. A 'BGE' instruction is invoked as shown in 
Figure 9.9d to perform the threshold check for COUNT. The values are equal indicating 
the threshold is reached, so the packets are forwarded to the next node as shown in Figure 
9.9e. Figure 9.9f shows the final segment of the resultant output packet being forwarded. 
o ldor ..... . . . 
0 1!0018 H019 1j01A 1TOOZ4 
8 lllC.!.OOOOlOluOOOO•) ll64042BOOOOD00900 Ill 40000000000000c Iooooooooooornooo 
0 I I !0000000000000002 
0 I I IOOOOJOOC00000C02 I 
0 I I I I 
0 I !IOOJOOC02 !0000000( I 1100010004 I I 
0 I IIOOC,OOD02 • !00000000 I 1100070)04 I I 
B lu3o63 (he& 0 I \ !OOOOOOOOO® OOCOZ I I 
I I I ruction following Execution branches instruction to to this address MOV Inst the LFPR 
move the value (0x02) from 
\ BG~ filstmtfon 
alue 0x02 retrieved ( ) 
ESS into register RS from packet using LFPR 
instruction 






utlS . (he& 
instchk6 3 . ( h 
atachkID63 . 
0 10026 10027 I 110028 
7 .. (he&)l8 






















FWD instruction OUT instruction FWD Code LDOR signal going 
high to output 
packet to the next 
node 




I , I 
I J' I I I j# I I / 
I / / I 4 
0 I / / I 
8 I / / I 
0 I / / I 
0 I / / I 
0 I / / I 
0 I 100000000 / 100000002 IOO!l()OOOO I 
0 I l00000020 .. 100000002 ,mooooooo I 11u.FCOliSF _. II 
I 
LDOR signal 
going high for 
each 32-bit block 
/ \ /End of Pack: I Output packet m 
32-bit blocks 
Output (EPo) 
Packet RAM ready 
(PRr) signal going 
high - ready to store 
next packet 
CRC / 
Figure 9.9f. Simulation Output for Validation of COUNT Macro 
Instruction (continued) 
The next macro instruction to be validated is the COMP ARE instruction. Figure 
9.1 Oa shows the fetching of the IN micro instruction and initial segment of the 32-bit 
input packet blocks. 
acctrlr . 
IDV 
i . . 
Rr . 
I loc2 . (heK)l3 ~ 
I fa_aac_ctrlr • 
I inst_in&3 . (h ~ 



















0 I0400000000000000 , 1ooooooooooooaooo 
I a4o&3 .(heK) 
I f 131. (heK) l3 
I utp31 .(heK) 
I 31 . (heK)l3 


























, I t 
I I 
I I 





I / 1100080104 
I / 11100080104 11100000000 !00000001 
I I 1oooso10~ l 100000000 IOOOOOOC! 
I \ / 
I \ / 
Input packet blocks 
Figure 9.10a. Simulation Output for Validation of COMPARE Macro Instruction 
125 
I 
Tag TR l (OxO l) is retrieved from the packet usmg the 'LFPR' instruction and the 
following ' GET' instruction for this tag fails as can be seen from Figure 9.1 Ob. Then a 
value (Ox02) is obtained from the packet to bind with the tag TRl using the 'PUT' 
instruction as shown in Figure 9 .1 Oc. Then the packet is forwarded to the next ESP 
capable node as shown in Figures 9. 10d and 9.lOe with the output code set to OxOl 
(FWD). 
o ldor ....... . 
8 a4o63 . (hex) 
8 f131 (hex)#3 
8 utp31 . (hex) 
8l)J31 . (hex)#3 
8 7 . (hex)IS 
8 ro7 . (hex)# 
l iao IDo ..... 
0 IOOOD IOOOE 




0 I llIOOOOOOOl 100000000 11100080104 
0 I 11100000001 ")000001)0 I !OOOBll<l4 
0 I \ / 
0 I \ / 
I \ / 
1; \ / Value (0x0I) obtained 
IOOOF !10015 








I \ I 
GET fails and branches to 
GET instruction for tag TRI using 
LFPR instruction 
address 0xlS 
o ldor . . .... . 
o lz ... .. .. . . 
-
0 
Figure 9.10b. Simulation Output for Validation of COMPARE 
Macro Instruction (continued) 
.. 
10016 10017 l!OO!B !0019 
0 llC8000010100COOO 154!(000;)01000140 llC05000180000000 ll?C00004100000000 
0 I 











11100000002 100000000 lli00080104 
!1100000002 - !01)000000 I !OC080104 
/ 
/ 
Retrieving value 0x02 
from packet RAM 
Fetching of PUT 
instruction to bind this 
value with tag TRI in 
ESS 
Figure 9.10c. Simulation Output for Validation of COMPARE 
Macro Instruction (continued) 
I 
B a4o63 . (hex) 
Bf 131. (hex)l3 
B utp31. (hex) 
B 31 . (hex)#3 


















I •01 ... 
I I 




Figure 9.10d. Simulation Output for Validation of COMPARE 
Macro Instruction (continued) 








0 I / I 
fl •000 100000002 I 100000(100 / / 










/cacvatue/ End of Output 




Figure 9.l0e. Simulation Output for Validation of COMPARE 
Macro Instruction ( continued) 
Figure 9 .11 a shows the initiating packet block sequence for the RCHLD macro 
instruction on execution of the IN instruction and Figure 9.11 b shows the ending 
sequence of the input packet block with the CRC. 
127 
i,''" 1·, 5•,s ~'" ~ Swc ,I us F, Su• I"'' r, s,,- 110,,. 1•0 s,.,. l,1us ,Jl 'us l'-2 « J:.'~• I I,. I 11111111 111111111 111111111 111111111 111111111 111111111 111111111 ,,,. 11111 Ill 11111 111111111 I 1111 11111111 111111111 1111l1 I 
1lclk_pi ... . . :1 7 I r 
11<::lk_ia . ... . ':1 r-, 7 I I r 
1lclk_p .. . . leo 
• 1lclk_c ... BF 
11<::lr . . . . . . ... 0 
1tve_ia . . . . 0 . 
1 luacs19 . .. 0 r-, 
1 IDV .. . . . . . 0 J 
11EPi . . . . . 0 
llJlUtln .. . 0 
111lacctrlr. 0 
1!0Rr ... . .. 0 
B loc2 . (he><)l3 ~ 0 
B inst _ in63. (h, ~ 0 
B fa_aac_ctr lr I-; 0 
B inp31. (he;c)t 1 0 1000B0304 100000001 
oAK ... . . . . . . . r l I 
ok:ok ... 
olEPo . . . .. . 
o lz . . . . . I .. 
olPRr 
o ldor . . 
BIJleou t l 5 (he;c 0 
B 1nstchk63. (h 0 T0400000000000000 10000000000000000 
B 1131 (hex)t3 0 ... I000B0304 
BIP031. (he><)l3 0 \ '\ 
\ \ 
lN Instruction Start of Input Packet 
Figure 9.1 la. Simulation Output for Validation of RCHLD Macro Instruction 
l e_ia . ..... 0 
l fudcsig .. .. 0 













































CRC check OK 
Figure 9.llb. Simulation Output for Validation of RCHLD Macro 
Instruction (continued) 
I 
To avoid the initial failure of the ' GET' instruction in the ESS, a value for the tag (TR2) 
(can be seen from the micro instruction sequence representation for ' RCHLD' from 
Figure 3.14 of Chapter 3) is written into ESS (using a sequence of micro instructions) to 
make the RCHLD macro instruction execute a different and more extensive set of micro 
instructions that represent it. Then, the initial checks for availability of ESS and CRC 
check are performed and the initiating micro instruction sequence for the RCHLD 














\ 100000001 I 













Output Value (0xl) of 
LFPR instruction 
Figure 9.llc. Simulation Output for Validation of RCHLD Macro 
Instruction (continued) 
The GET instruction does not fail retrieving the identifier bitmap value as can be seen 
from Figure 9 . 11 d, because of the external PUT instruction which placed a (tag, value) 
pair in the ESS. The sequence continues executing until it encounters another GET 
instruction (for counting the passing packets) where it fails as shown in Figure 9.11 e. 
,.. 
0 0010 10011 
0 11ccoooo201000000 I 
0 I 
0 l00000003 I 
0 11000 00003 I 
0 0000000000000001 l0000000000000000 I 




Value retrieved by GET GETdoesn~ Continuous execution 







0 I 100000002 





















I I ~ 
Ioooooooooon=, I 
I 
Branches to address 0x23 
Figure 9.lle. Simulation Output for Validation of RCHLD Macro 
Instruction (continued) 
The instruction sequence continues executing as it can be followed from the micro 
instruction representation of the RCHLD macro instruction (see Figure 3.14). Finally a 
' BGE' instruction is executed which checks the threshold value to either FWD or DROP 
the packet. The value of the input packet block at offset 0x9 is 0x4 (threshold). This value 
is placed in register R4 using the LFPR instruction which is not shown here. The value 
from register VRl (Ox I) is moved into register RS. When a 'BGE R4, RS 2Ch' 
instruction is executed, the value of R4 is greater than RS indicating the threshold is not 
reached and the packet has to be forwarded. The instruction execution branches to 
address 0x2C as can be seen from Figure 9. I 1 f. 
I Rr ... .. 0 •• 
, ldor . . . ... . 
ut15. (hex 
: instchk63 . (h 
: f 131 . (hex)l3 
: 31. (hex)l3 




































Then a STPR instruction is executed at address 0x2C followed by a FORWARD and an 
OUT, that can be shown in Figures 9.11 g and Figure 9.1 1 h. The CRC of the output 
packet is different from the input packet because of the STPR instruction. 
J Ill .. . .. ... . 
. . 
' 
0 OOZ E 10021 
0 IOCOOOOOOOOOOOOOO 10800000000000000 !0000000000000000 
0 .,. .,. 
0 I 100000003 100000000 / 
0 I 100000003 1000000011 
0 I /!0000000000000001 10000000000000000 
0 I I 101 ~ 
I 
---
.. I ·--· -- I . --· - --.- . ------ •. ..,....... .. ...... 
I I FWDL, 
FWD Instruction OUT Instruction 
Figure 9.llg. Simulation Output for Validation of RCHLD Macro 
Instruction (continued) 




0 I0C000004 100000000 ./ 










ks of Output Bloc 
Packet 
/ I 
End of Output Packet CRC value of 
C94C9D04 







RCOLLECT is the macro instruction which requires execution of most of the 
micro instructions of ESPR.V2. The following description briefly explains the Post-
Implementation validation of the RCOLLECT macro instruction of ESP. Figure 9.12a 
shows the initial input packet for the RCOLLECT macro instruction. Figure 9. l 2b shows 
the initiating sequence of micro instructions to implement the functionality of 
RCOLLECT macro instruction. 
II Ill 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 l i lt 11 11 1111 1111 1111 1111 lltl 1111 1111 1111 1111 t 
1clk_pi . . . . .. :11-, l I I l I L__ 




1clk_c . . . . . . IBF 
l lr ........ 0 
i~e_a . . . . . . 0 
1 fuacsig 1 I 
1~utin ... . 0 
11tacctrlr ... 0 
1 IDV. ..... 0 I 
1EPi ...... . . 0 
10Rr ... 0 
B loc2. (hex)l3 @ 0 
B fa_aac_ctr lr ll 0 
B inpJl (hex)I ~ 0 100120404 100000001 
B 1nst_in6J. (h, ~ 0 .. 
o li\.J( .... I I I I t l ocok ... . . . . i/ 
o~ o ... . . .. I I 
o l z .. I I o l dor . .... I 
oPRr 
... . . . . 
7 I 
Bpcout15 (hex 0 10001 I / 
B instchk63. (h, 0 10400000000000000 10000000000~00000 / 
BdatachkID63 . 0 ,,. I I 
/ I I 
IN Instruction 
Start of lnput 
Packet 
ACK signal 
Figure 9.12a. Simulation Output for Validation of RCOLLECT Macro Instruction 
After the ESPR is switched on, the Packet RAM is loaded with the input packets for the 
corresponding macro instruction. The packet is then checked for CRC and other checks 
such as whether the ESS is full etc. After these checks are performed successfully, the 
program counter starts fetching the micro code sequence for the RCOLLECT macro 
instruction as shown in Figure 9. l 2b. Similar to the previous RCHLD instruction, a (tag, 
value) pair is placed in the ESS prior to the fetching of the initiating sequence for 
RCOLLECT, and so the GET instruction in Figure 9.12b does not fail and continues 
execution from there on. The second GET fails and it executes till JMP instruction in the 
ADDR2 (0x26) block because R4 has a value of zero. Then it fails in the GET instruction 
in ADDR3 (Ox 1 B) block and branches to ADDR5 (0x2B) block. In the ADDR5 block, 
132 
the execution of ' BEQ RIO, RI I, ADDR7' fails because RIO has a value of Ox! from 
YR I and RI I has a value of OxO from VR2 and so the packet gets dropped as can be seen 
from Figure 9 .12c. 
l l Z .. ...... . . 
ildor ..... . .. 


























Value obtamed from 
offset '3' of the packet 
using LFPR instruction 











10034 10035 10036 
1600AS80000000D40 J1400000000000000 I Jooooooooocoooooo 
/ / I 1000000000000000s 
/ / I 
/ I 
/ I 
/ I l03 
I / 
DROP Instruction I DROP code 
BEQ fails and 
continuous execution 
Figure 9.12c. Simulation Output for Validation of RCOLLECT Macro 
Instruction (continued) 
Chapter Ten 
Conclusions and Future Research 
The main goal of this thesis research was to develop and validate a hardware 
processor architecture for implementing ESP service, using PLD technology into network 
routers. The goal was achieved by studying the concepts of ESP, developing a 
"lightweight ISA" (37 micro instructions) for the existing macro level instruction set of 
ESP, and then developing ESPR architectures (ESPR.Vl and ESPR.V2) to implement the 
micro-instructions of the developed ISA. Both architectures were validated via HDL 
post-synthesis and post-implementation simulation testing. It is felt the developed set of 
37 micro-instructions of the ISA of both architectures should be sufficient in number and 
functionality to support a much larger and extensive macro level instruction set one may 
use to support ESP. 
The second version of the ESPR architecture - ESPR.V2, was designed with 
increasing performance over that of ESPR. V 1 as a goal and the aim was achieved. 
ESPR. VI could operate at a frequency of 20 MHz with some timing constraints applied. 
On the other hand ESPR.V2 - the five-stage pipelined architecture, could operate at 30 
MHz in the same technology FPGA chip. The performance improvement was achieved 
strictly from architectural enhancements to ESPR.Vl. A comparison graph of 
performance of both the architectures and their main functional units are shown in Figure 
10.1. Both ESPR architectures are pipelined, contain an associative ESS for 
storage/retrieval of ephemeral data, and are evaluated in terms of suitability for 
implementation to a PLO platform. For a commercial "production" implementation, the 
ESS probably would be implemented off the PLD platform using cheap and fast 
commodity memory implementing the ESS organization. 
Table 10.1 gives the approximate throughput measured in packets per second 
(pps) obtained using the ESPR.V2 architecture through virtual prototype simulation. 
Since each macro instruction executes a different set of micro instructions according to 
the previous state in the ESS, and also, since it is not experimentally tested, the 
throughput results using post implementation simulation are considered to be an 
approximate but reliable estimate. It should also be noted that the post-implementation 
134 
simulation results of Table 10.1 were achieved after implementation of the ESPR.V2 
architecture to a moderate speed and older FPGA chip. The Kpps rates shown in Table 
10.1 could and would be significantly increased via implementation of the ESPR.V2 
architecture to a more modem and higher speed FPGA chip. 
Performance Comparison of ESPR Architectures 
70 ~-----------~ 
60 

















Figure 10.1. Performance Comparison ofESPR.Vl and ESPR.V2 
Table 10.1. Throughput of ESP Macro Instructions in ESPR.V2 Architecture 
Macro Operations Throughput in ESPR.V2 
(Kpps) (approx.) 
COUNT() 810 
COMPARE () 857 
COLLECT () 833 
RCHLD () 500 
RCOLLECT () 517 
The experimental results obtained using an Intel IXP 1200 [ 18] router as stated in 
[8] produces an estimate of 340 Kpps and 232 Kpps for the COUNT () and COMP ARE () 
macro instructions respectively using an SRAM implementation of ESS. The HDL 
simulation results obtained through post implementation simulation of ESPR.V2 cannot 
be directly compared to the experimental results of [8] as such, because of the issues of 
size of ESS and non-experimental version etc. The comparison does though gives a fairly 
reliable indication that the ESPR.V2 architecture as implemented to the Xilinx Virtex2 
4000 FPGA chip can process ESP packets 2-4 times faster than the Intel IXP 1200 as 
reported in [18]. 
In summary, the ESPR architecture and its design has been successfully mapped, 
placed, and routed to a single chip PLD platform and successfully tested via post 
implementation HDL functional and performance virtual prototype simulation testing. It 
has also been proved that the pipelined processor architectures can be successfully 
synthesized and implemented into an FPGA chip with the design capture being done 
mostly at the behavioral level of HDL abstraction. 
This validates the research goal of being able to develop Special Purpose ESP 
processors and program them into PLD platforms in communications node routers and in-
field reprogram architectural changes/updates and entire new ESP processor architectures 
into the PLD platform when needed for implementation of new ESP functionality and/or 
increased performance as communications line speeds increase. 
Future Research can address issues such as: Experimental testing of ESP and 
ESPR architectures at the network level and improving the performance of ESPR 
architectures via deeper pipelining, using a multiple-issue superscalar or VLIW 
architectural concepts and via considering a single-chip packet-driven multiprocessor 
approach to ESP. Use of commercially available simple-pipeline-architecture GP 
processors can also be evaluated and compared on a cost/performance/adaptability basis 
to the ESP implementation approach addressed within this thesis. 
Static and dynamically reconfigurable processor architectures are currently an active 
research area [20,21 ,22,23]. Unfortunately, none of these past reconfigurable 
architectures can directly and immediately meet our application requirements. Our 
current ESPR architecture could obtain a future performance boost via deeper pipelining, 
inclusion of one additional pipeline within a single ESPR resulting in a dual-issue ESPR 
architecture, and through use of the ESPR as a basic processor module in an envisioned 
dynamically reconfigurable single-chip multiprocessor ESPR system. This system could 
possibly be based upon some of the framework presented in [23,24,25,26]. It is felt some 
of the architectural framework of [23,24,25,26] could potentially be used to meet network 
node processing performance needs imposed by expected extremely high 
communications line speeds of the future. 
Appendices 
Appendix A - Presents the Micro Instruction Set Architecture and Definition for the 
ESPR Architectures. 
Appendix B-Presents the Macro System Flowchart for ESPR. 
Appendix C- Shows the Micro System Flowchart for ESPR.Vl. 
Appendix D - Shows the Micro System Flowchart for ESPR.V2. 
Appendix E- Presents the VHDL Code for ESPR.V2. 
VHDL Code for ESPR.Vl can be obtained from [28]. 
Appendix A 
Micro Instruction Set Architecture and Definition 
0. NOP (OTHER Type Instruction) - No Operation 
63 58 57 0 
I 000000 
1 
1. IN (OTHER Type Instruction) - Input Packet to Packet Register 
63 58 57 0 
000001 
If (IDV = 1) then { 
PR ...._ Input Packet 
ACK_in +-- 1 } 
} Else wait. 
2. OUT (OTHER Type Instruction) - Outputs the Packet to Output port and also sends Output 
Code Register as Output 
63 58 57 
0000 10 I 
If (OPRAMready = = 1) then { 
Output port +--
Output Code +--
} Else wait. 
Packet Register 
Output Code Register 
0 
3. FWD (OTHER Type Instruction) - Sets Forward Code in Output Code Register to Forward the 
packet. 
63 58 57 0 
L._0_00_0_1_1...L.. ___________________________ I 
Output Code Register .-- 1 (FWD Code) 
139 
4. ABORTI (OTHER Type Instruction) - Sets the LOC bits to zero in packet by loading Flag 
Register to Flag field of Packet and the packet is forwarded. 




FLR .,__ "00000000" 
Output Code Register .__ 2 (ABORTl Code) 
Flag field of PR ~ Flag Register 
5. DROP (OTHER Type Instruction) - Drops the packet and is indicated by setting Drop code in 
Output Code Register 
63 58 57 0 
00010 1 
Output Code Register .__ 3 (DROP code) 
Output Code ,.__ Output Code Register 
6. CLR - Clears the register RD by moving RO, which contains 0 to RD 
63 58 57 53 52 48 47 24 0 
I 000110 I RD I RO I I I I 
RD .,__ RO 
7. MOVE RD, RS - Move value in RS to RD 
63 58 57 53 52 48 47 24 0 
000111 I RD I RS I I 'I 
RD .,__ RS 
8. MOVI RD, Imm. Val ( I Type Instruction) - Move Sign Extended Immediate value to RD 
63 58 57 53 52 2423 22 21 65 0 
001000 I RD I 16 bit Imm Val 
RD .,__ Sign Extended Imm. val 
9.ADD RD, RSI, RS2 (ALU Type Instruction) - Adds RSI and RS2 and places the result in RD 
63 58 57 53 52 48 47 43 42 24 0 
00100 1 I RD I RSI I RS2 I 1 11 
RD +-- RSl + RS2 
IO.SUB RD, RSI, RS2 (ALU Type Instruction) - Subtracts RS2 from RSI and places the result in 
RD 
63 58 57 53 52 48 47 43 42 24 0 
001010 I RD I RSI I RS2 I 
11 
RD RSI - RS2 
11. INCR RS (ALU Type Instruction) - Increments RS by adding it with RI, which contains I and 
places the result in RD 
63 58 57 53 52 48 47 43 42 24 0 
00 IO II I RS I RS I RI I 
RS .__ RS+ RI 
12. DECR RS (ALU Type Instruction) - Decrements RS by subtracting RI from RS and places the 
result in RD 
63 58 57 53 52 48 47 43 42 24 0 
00 1100 I RS I RS I RI I 
RS .__ RS - RI 
13. OR RD, RSI, RS2 (ALU Type Instruction) - Logical OR of RSI and RS2 and places result in 
RD 
63 58 57 53 52 48 47 43 42 24 0 




RD +--- RSI (OR) RS2 
14. AND RD, RSI , RS2 (ALU Type Instruction)- Logical AND of RSI and RS2 and places result in 
RD 
63 58 57 53 52 4847 4342 24 0 




RD +--- RSI (AND) RS2 
15. EXOR RD, RSI, RS2 (ALU Type Instruction) - Logical EXOR of RSI and RS2 and places 
result in RD 
63 58 57 53 52 48 47 43 42 24 0 
00111 l I RD I RS I I RS2 I 
RD +--- RSI (EXOR) RS2 
141 
16. COMP RD, RS (ALU Type Instruction)- Logical 
63 58 57 53 52 48 47 
OT of RS and place result in RD 
24 0 
010000 I RD I RS I 
111 
RD +- (NOT) RS 
17. SHL RD, RS, SHAMT (SHIFT Type Instruction) - Logical shift left of RS by SHAMT and 
result is placed in RD 
63 58 57 53 52 48 47 24 0 
0 1000 1 RD ISHAMT 
RD RS << SHAMT (Default shift by 1) 
18. SHR RD, RS, SHAMT (SHIFT Type Instruction) - Logical shift right of RS by SHAMT and 
result is placed in RD 
63 58 57 53 52 48 47 24 0 
010010 I RD ISHAMT I 
RD RS >> SHAMT (Default shift by 1) 
19. ROL RD, RS, SHAMT (SHIFT Type Instruction) - Logical rotate left of RS by SHAMT and 
result is placed in RD 
63 58 57 53 52 48 47 24 0 
010011 RD ISHAMT I 
RD +- RS<< SHAMT 
20. ROR RD, RS, SHAMT (SHIFT Type Instruction) - Logical rotate right of RS by SHAMT and 
result is placed in RD 
63 5857 5352 48 47 24 0 
010 100 I RD RS ISHAMT 
RD +- RS >> SHAMT 
21. LFPR <Offset> RD (LFPR / STPR Type Instruction) - Loads 64 bit value at a given offset from 
Packet Register (PR) to RD 
63 58 57 53 52 24 22 21 65 0 
0 10101 I RD I 11 I 
16 bit Offset 
RD +- PR[Offset I to PRIOffset + 63) 
22. STPR <Offset> RS (LFPR / STPR Type Instruction) - Stores 64 bit value at a given offset in 
Packet Register (PR) from RS 
63 58 57 53 52 48 47 22 21 6 5 0 
0101 JO I 16 bit Offset 
PR[Offset] to PR[Offset + 631 .--- RS 
23. BRNE RSI, RS2, Addr (JUMP I BRANCH Type Instruction)- Checks if RSI not equal to RS2; 
if yes, execution branches to sequence of instructions starting at Br. Addr by placing Br. Addr in PC, 
else PC is incremented and resumes execution of normal sequence of instructions. 
63 58 57 
010111 
53 52 48 47 43 42 




RSI != RS2 then 
.___ Br. Addr 
PC +-- PC+I 
22 21 6 5 0 
16 bit Br. Addr 
24. BREQ RSI, RS2, Addr (JUMP / BRANCH Type Instruction) - Checks if RSI equal to RS2; if 
yes, execution branches to sequence of instructions starting at Br. Addr by placing Br. Addr in PC, 
else PC is incremented and resumes execution of normal sequence of instructions. 
63 58 57 
o, 1000 
1 
53 52 48 47 43 42 




RSI = RS2 then 
.___ Br. Addr 
PC +-- PC + I 
22 21 6 5 0 
16 bit Br. Addr 
25. BRGE RSI, RS2, Addr (JUMP / BRANCH Type Instruction) - Checks if RSI greater than or 
equal to RS2; if yes, execution branches to sequence of instructions starting at Br. Addr by placing 
Br. Addr in PC, else PC is incremented and resumes execution of normal sequence of instructions. 
63 58 57 
01 100 I 
53 52 48 47 43 42 




RSI>= RS2 then 
.--- Br. Addr 
PC +-- PC + l 
22 21 6 5 0 
16 bit Br. Addr 
26. BNEZ RS, Addr (JUMP / BRANCH Type Instruction) - Checks if RSI not equal to RO (0); if 
yes, execution branches to sequence of instructions starting at Br. Addr by placing Br. Addr in PC, 
else PC is incremented and resumes execution of normal sequence of instructions. 
63 58 57 53 52 48 47 43 42 22 21 65 0 
0110 10 
1 




RS!= RO then 
+-- Br.Addr 
PC +-- PC+l 
27. BEQZ RS, Addr (JUMP I BRANCH Type Instruction) - Checks if RSl equal to RO (0); if yes, 
execution branches to sequence of instructions starting at Br. Addr by placing Br. Addr in PC, else 
PC is incremented and resumes execution of normal sequence of instructions. 
63 58 57 
OJ IOI l 




RS =RO then 
+-- Br.Addr 
PC +-- PC+l 
22 21 6 5 0 
16 bit Br. Addr 
28. JMP Addr (JUMP I BRANCH Type Instruction) - Jumps to a location specified by Br. Addr by 
placing Br. Addr in PC 
63 58 57 22 21 6 5 0 
0 11100 I 16 bit Br. Addr 
PC +-- Br. Addr 
29. RET (JUMP / BRANCH Type Instruction) - Returns from execution of a subroutine to normal 
sequence execution by placing Reg in PC. 
63 58 57 0 
I 0 1110 I I 
PC +-- Reg 
30. GET VR, TR (GET / PUT TYPE INSTRUCTION) - Gets Value in VR Corresponding to Tag 
TR and Sets CCR as GF = 1, for Failure of GET operation. 
o 11110 
1 
Tag and Value given to ESS 
If match found: then, 
If Lifetime not expired then, 
VR +- Value 
GF +-- 0 
Else 
GF +- 1, VR+- 0 
Clean that location and sets Empty (E) bit to 1 
Else 
GF +- 1, VR+-- 0 
144 
31. PUT TR, VR (GET / PUT TYPE INSTRUCTION) - Puts Tag and Value (creates a tag, value 
binding) in ESS by placing tag from TR and value from VR into ESS. Sets CCR as PF = 1, for failure 
of PUT operation 
011111 TR 
Tag and Value given to ESS 
If match found: then, 
Else 
If Lifetime not expired then, 
Value +-- VR 
Else 
Tag +-- TR 
Value +-- VR 
Reset Expiration Time 
If Empty Location then, 
Else 
Tag +-- TR 
Value +-- VR 
Store Expiration Time 
Empty bit +-- 0 
PF 1 
32. BGF Addr (GET I PUT TYPE INSTRUCTION) - Checks the Condition Code Register (CCR) 
for failure of GET operation. If GF is 1 indicating failure of GET, execution branches to sequence of 
instructions starting at Br. Addr by placing Br. Addr in PC, else PC is incremented and resumes 
execution of normal sequence of instructions. 
l00000 16 bit Br. Addr I 
If GF = 1 then PC +-- Br. Addr 
Else PC ~ PC+ 1 
33. BPF Addr (GET / PUT TYPE INSTRUCTION) - Checks the Condition Code Register (CCR) 
for failure of PUT operation. If PF is 1 indicating failure of PUT, execution branches to sequence of 
instructions starting at Br. Addr by placing Br. Addr in PC, else PC is incremented and resumes 
execution of normal sequence of instructions. 
10000 1 16 bit Br. Addr I 
If PF= 1 then PC +-- Br. Addr 
Else PC ~ PC+ 1 
145 
34. ABORT2 (OTHER Type Instruction) - Sets the LOC bits to zero and E bit to ' 1' in packet by 
loading Flag Register to Flag field of Packet and the packet is forwarded. 
63 58 57 0 
,00010 
1 
FLR .__ " 00000001" 
Output Code Register .__ 4 (ABORT2 Code) 
Flag field of PR +- Flag Register 
35. BLT RSI, RS2, Addr (JUMP / BRANCH Type Instruction) - Checks if RSl is less than RS2; if 
yes, execution branches to sequence of instructions starting at Br. Addr by placing Br. Addr in PC, 
else PC is incremented and resumes execution of normal sequence of instructions. 





RSI I RS2 
RSl < RS2 then 
.__ Br.Addr 
PC +- PC+l 
22 21 65 0 
16 bit Br. Addr 
36. SETLOC (OTHER Type Instruction) - Sets the LOC bits in packet to a specified given value. 
63 58 57 
l00100 
1 
FLR (7 downto 5) .--
Flag field of PR +-





MACRO LEVEL SYSTEM FLOW CHART 
Start 









INCR R4, VRI 
MOVVRl , R0 
MOVVRl , R4 PUT TRI , VRI 
PUT TRI, VRI BPF ADDR2 
BPF ADDR2 IMP ADDR3 
ADDR3 
LFPR <Offset-S> R4 
MOVRS, VRI 




MOV R4, VR I 
LFPR <Offset-7> MOR 
ADDRl 
R4 <OP> RS ADDRl MOVVRl , RS 
DROP PUTTRl , VRl 









LFPR <Offset-?> TR2 
GETTR2, VR2 
LFPR <Offset-5> R4 
ADDR2 PUT TRI , VRI 
BGF ADDR2 MOVVR2,R4 
BPF ADDRI 
ADDR3 
MOVRS, VR2 PUTTR2, VR2 BEQZ VR I , ADDR4 ADDR4 
LFPR <Offset-9> MOR BPF ADDRl 
DROP 
DECR R6, VRI 
VR2 ._ RS <op> R4 
MOVVR1 , R6 OUT 
JMP ADDR3 
RCHLD 
LFPR <Offset-3> TR2 
GET TR2, VR2 
BGF ADDRS 
LFPR <Offset-7> R8 
MOV R6, YR2 
OR R7, R6, R8 
MOV YR2,R7 
PUT TR2, YR2 
BPF ADDR2 
ADDRO 
LFPR <Offset-S> TRl 
GET TRl, VRI 
MOY YR2,R0 
PUT TR2, VR2 




INCR R2, VRl 
MOY YRl, R4 
PUT TRI, VRI 
BPF ADDR2 
ADDR3 
LFPR <Offset-9> R4 
MOV RS, VRl 
BGE R4, RS, ADDR4 
DROP 
MOV VRI,R0 




STPR <Offset-7> R3 ~I FWD H OUT 
0 RCOLLECT 
LFPR <Offset-3> TRI 
GET TRI , VRI 
BGF ADDRl 
LFPR <Offset-S> TR2 
GET TR2, YR2 
LFPR <Offset-B> R4 
BGF ADDR2 
MOY RS, YR2 
AND R6, RS, R4 
BEQ R6, R4, ADDR3 
ADDR4 
OR R7, RS, R4 
MOY YR2, R7 
PUT TR2, YR2 
BPF ADDRI 
ADDR3 
LFPR <Offset-7> TR3 
GET TR3, YR3 
LFPR <Offset-D> R8 
BGF ADDRS 
MOY R9, VR3 
LFPR <Offset-F> MOR 






MOY RS, VR2 




PUT TR3, VR3 
BPF ADDRl 
MOY RIO, YRl 
MOYR11 , YR2 
OUT 
BEQ RIO, Rl 1, ADDR7 
DROP 
LFPR <Offset-9> TR4 
GET TR4, VR4 
MOYYR4, Rl2 
PUT TR4, VR4 
BPF ADDRl 
ADDRlO 
LFPR <Offset-IO> Rl3 
MOY Rl4, YR4 
BGE Rl3, R14, ADDR9 
DROP 




STPR <Offset-B> R3 
















SYSTEM FLOW CHART FOR ESPR.Vl ARCHITECTURE 
JouT 
OUTPUT +-- PR 





























RD +-- RO 
A 
C 
RD +-- RS I + RI 
!ID STAGE 
IEX&WB STAGE 





1OP 12 DECR OP1 3 OR OPl4 AND 




RD ._ RS « SHAMT 
OP2I 
LFPR 
RD._ PR[OfT : Off+63) 
OP1 8 
SHR 
RD ._ RS « SHAMT 
OP22 
STPR 
















OCR ._ 4 
FLR .-·•0OO0OOO I" 
Flag of PR .-FLR 
OP36 
SETLOC 
FLR(7 to 5)._ LOC 






t ~ r, ID STAGE BRNE I I 
EX STAGE 

















PC +- Inst[ 16 bits Addr] 
y 
N 
PC+- Inst[ 16 bits Addr] 
OP25 
BRGE 




PC +- lnst[ l6 bits Addr] 
OP35 
BLT 




PC+- Inst[ 16 bits Addr] 
OP26 
BNEZ 




PC +- Inst( 16 bits Addr] 
OP28 
JMP 








tlD STAGE -. 
OP30 
GET 
TAG and VALUE given 
to ESS Module 






TAG and VALUE given to 
ESS Module 
CCR+- Condition Code y 


















ACK_in ._ I 




SYSTEM FLOW CHART FOR ESPR.V2 ARCHITECTURE 
ready = l? 
OUTPUT ._ PR 




OCR ._ I 
Start 


















~ lrn STAGE OP7 MOVE 
t ETM ' RS ' VALUETOALU STAGE PASS THROUGH 
I LTC STAGE 
UD 
STAGE 
RD ._ RS 
OP8 
MOVI 
SIGN EXTENDED IM M. VAL 
TO ALU PASS THROUGH (P.T) 
RD ._ P.T. O/P 
OP9 
ADD 
RSI + RS2 
PASS THROUGH 
RD ._ RSI +RS2 
RSI - RS2 
RD ._ RSI - RS2 
OPII 
INCR 
RSI + RI 












RS I - RI RSI (OR) RS2 I 
RD +- RS - RI 
RD +- RS I (OR) RS2 
. D i OPl4 OP1 5 I OPl6 
AND EXOR NOT 
RSI (AND) RS2 I I RS I (EXOR) RS2 I I RSI (NOT) RI 
PASS THROUGH 











RSI « SHAMT 
RD ._ RS « SHAMT 
OP1 8 
SHR 
RSI » SHAMT 
OP1 9 
ROL 
RS (ROL) SHAMT 
PASS THROUGH 
RD._ RS (ROL) SHAMT 
RD ._ RS» SHAMT 
OP20 
ROR 
RS I (ROR) SHAMT 















OP2 1 OP22 OP34 lGE LFPR STPR 
• ABORT2 , 
' I PR[Off : Off+63] 11 PR[Off : Ofl'+-63] ._ RS I OCR .- 4 
:r, FLR ._ •. 0000000 1" 
Flag of PR.- FLR 
" 
,, 
;EI PASS THROUGH 
" 
,, 
" I RD+- PR[Off:Off+63] 11 PASS THROUGH I I PASS THROUGH I 
::;E 
OP36 l OP30 OP31 
SETLOC GET PUT 
• 
FLR(7 to 5) ._LOC TAG GIVEN TO 'TM ' TAG GIVEN TO 'TM ' 
Flag of PR._ FLR STAGE OF ESS STAGE OF ESS 
' 
,, 
LIFETIME LIFETIME / EMPTY 
CHECK STAGE LOC CHECK STAGE 
IN ESS IN ESS 
,, 
" RETRIEVE ESS UPDATE WITH 
PASS THROUGH I VALUE FROM TAG, VALUE, 
ESS EXP.TIME AND EMPTY 
LOC 
} 
~ I OP23 I OP24 I OP25 IOP26 IOP28 
tlD STAGE BRNE BREQ BRGE BNEZ JMP 
RS I - RS2 I I RSI - RS2 I I RS I - RS2 RS RO 
Result If Result 
I PC +- lnst[l 6 bits Addr] 
y 
= 0? N Greater? y - v ; 
LTCSTAGE 
N y y N 





I OP32 IOP33 OP35 IOP27 IOP29 
tlD STAGE BGF BPF BLT BEQZ RET 
.. I I RS I - RS2 RS - RO A 1 I PC +- REG 
I 
N 
y y y y 
PC+- Inst[ 16 bits Addr] PC +- Inst[ 16 bits Addr) PC +- lnst[16 bits Addr] PC +- lnst[ 16 bits Addr] 
APPENDIXE 
VHDL CODE FOR ESPR.V2 ARCHITECTURE 
1. ESPR Top-Level Module with Instruction Memory for 'RCOLLECT' Macro Instruction 
library.IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity esprtop is 
generic(N: positive:= 64; 
M: positive := 32; 
Addr: positive:= 16); 
port( 
cfg_in, bitmapin: in std_logic_vector(N-1 downto 0); 
clk_im, clk_pi, clk_c, clk_p, cir, macctrlr, we_im, fnunacsig, putin, IDV, £Pi, ORr: in std_logic; 
loc: in std _logic_ vector(2 down to O); 
inp: in std _logic_ vector(M-1 downto 0); 
fm_mac_ctrlr: in std_logic_vector(Addr-1 downto 0); 
inst_in: in std _logic_ vector(N-1 downto 0); 
pcout: out std _logic_ vector(Addr-1 downto 0); 
instchk: out std_logic_vector(N-1 downto 0); 
oo: out std _logic_ vector(? downto 0); 
stag: out std _logic_ vector(2 downto 0); 
po, fl: out std_logic_vector(M- 1 downto 0); 
outp: out std _logic_ vector(M-1 down to 0); 
AK, EPo, ldor, PRr, cok, lz: out std_ logic; 
datachklD: out std _logic_ vector(N- 1 downto 0)); 
end entity esprtop; 
architecture esprtop _ beh of esprtop is 
-- All Components 
--IF Stage 
component ifst_ifidreg is 
generic(N: positive:= 64; 
Addr: positive := 16); 
portUump in, branch in, retin, macctrlr, oflow, fmmacsig: in std _logic; 
fm inst reg_EX, frn_inst_reg_ID, fm_mac_ctrlr: in std_logic_vector(Addr-1 downto O); 
elk, cir, we _ im, clock: in std _logic; 
NOP _out: out std_logic; 
instrin: in std _logic_ vector(N-1 downto 0); 
pcout: out std _logic_ vector(Addr-1 down to O); 
inst_ out: out std _logic_ vector(N- 1 downto O)); 
end component ifst_ifidreg; 
--ID stage 
component idstreg is 
generic(N: positive := 64; 
Addr: positive:= 16); 
port(inst_in: in std _logic_ vector(N-l down to 0); 
--cfg in, bitrnapin: in std _logic_ vector(N-1 downto 0); 
WE write data: in std logic vector(N-1 downto 0); 
Joe: in std logic vecto~(2 do~to O); 
ffpin: in std _logic_ vector(? downto 0); 
elk, NOP _in, ID _flush_BR, regwr_sig, trw, vrw, jmpin, retin, lmfmex: in std_logic; 
162 
morfmex: in std _logic_ vector( 5 downto 0); 
TRDstin, VRDstin, RDstin: in std_logic_ vector(4 downto 0); 
IDFout: out std_logic; 
WB_ctrl_out: out std_logic_vector(3 downto 0); 
EX_ctrl_ out: out std_logic _ vector( 12 downto 0); 
PKT_ ctr!_ out: out std _logic_ vector(6 downto O); 
GPR_readl_out, GPR_read2_out, sign_ext_out: out std_logic_vector(N-1 downto O); 
TR_read_out_ID, VR_read_out_ID: out std_logic_vector(N- 1 downto 0); 
Br_ Addr _ out, PKT_ Offset_ out: out std _logic_ vector(Addr-1 downto 0); 
shamt_out: out std_logic_ vector(S downto 0); 
lmor _ out, TRD _ out, VRD _ out, jumps, rets: out std _logic; 
ocr_val_out_id, aer_val_out_id: outstd_logic_vector(7 downto 0); 
opcodeexout: out std_logic_ vector(S downto 0); 
ctrlsigsoutID: out std _logic_ vector(24 downto 0); 
wrdataout: out std _logic_ vector(N-1 downto 0); 
RSl_out, RS2_out, RD _out, TR_out, VR_out: out std_logic_ vector(4 downto O)); 
end component idstreg; 
--ETM Stage 
component ex3top is 
port(clk, clock, clk_pkt, cir, IDV, EPi, ORr, EX_Flush_in, putin, Im: in std_logic; 
irarn: in std _logic_ vector(3 l downto 0); 
flag, ocrID: in std_logic_ vector(? downto 0); 
PKToffid: in std_logic_ vector(6 downto 0); -- for LFPR and STPR 
braddrin: in std _logic_ vector( 15 down to 0); 
ctrlinEX: in std _logic_ vector(24 down to 0); 
WBinfmid: in std _logic_ vector(3 down to 0); 
RS 1 rgid, RS2rgid, RDrgid, TRrgid, VRrgid: in std _logic_ vector( 4 down to O); 
FSTRD, FSTTRD, FSTVRD, VSTRD, VSTIRD, VSTVRD: in std_logic_vector(4 downto O); --new 
op_in, prop_in: in std_logic_vector(S downto 0); 
GPRl id, GPR2id, TRidv, VRidv, extid, WBdatain, aofmex: in std_logic_ vector(63 downto O); 
EXctid: in std _logic_ vector(9 down to 0); 
PKTctid: in std_logic_ vector(6 downto 0); 
shamt: in std_logic_ vector(S downto 0); 
regrd, trwx, trww, vrwx, vrww, rwx, rww: in std_logic; --new 
alu _ 0: out std _logic; 
ctrloutEX: out std _logic_ vector(24 downto 0); 
opoutEX, mo: out std_logic_ vector(S downto 0); 
aluout, GPRlout, GPR2out, tagsigout: out std_logic_ vector(63 downto O); 
RSl_out, RS2_out, RD _out, TR_out, VR_out: out std_logic_ vector(4 downto O); 
WBct_ out: out std _logic_ vector(3 downto 0); 
braddrout: out std_logic_vector(l5 downto 0); 
gf, pf, ess_full, le, AK, PRr, ldor, EPo, cok, lz: out std_logic; 
outvalue: out std_logic_vector(63 downto 0); 
oo: out std_logic_ vector(? downto O); 
stag: out std _logic_ vector(2 down to 0); 
oram, fl: out std_logic_ vector(3 I downto 0); 
po: out std _logic_ vector(3 l downto 0)); 
end component ex3top; 
--LTC Stage 
component ex4top is 
port(clk: in std_logic; 
WBctrlin: in std _logic_ vector(3 down to 0); 
out fm alu: in std logic vector(63 downto 0); 
RS lin, RS2in, VRin, VSTRD, VSTVRD: in std _logic_ vector( 4 downto O); 
RDin fm4, VRDin fm4 , TRDin_fm4: in std_logic_vector(4 downto O); 
op_~ in std_logic~vector(5 downto 0); 
163 
GPRinl, GPRin2, PT in: in std_logic_vector(63 downto O); 
brtype: in std_logic_vector(2 downto O); 
ccr _ inp, ccr _ing: in std _logic; 
branch: out std_logic; 
WBctout: out std_logic_vector(3 downto O); 
WBdataout: out std_logic_ vector(63 downto O); 
WBRDout, WBVRDout, WBTRDout: out std _logic_ vector( 4 down to O)); 
end component ex4top; 
--UD Stage 
component stage5 is 
port(WB _ in 1 : in std _logic; 
aluout_fm_ex, essout_fm_st5 : in std_logic_vector(63 downto O); 
dataout: out std_logic_ vector(63 downto O)); 
end component stage5; 
--signals 
signal ffpsig: std _logic_ vector(? down to O); 
-- IF 
signal instsig: std_logic_vector(63 downto O); 
signal bsig_EX4o, ovf, NOP _IFo: std_logic; 
signal brao: std _logic_ vector( 15 down to O); 
--ID 
signal data_ WBo: std_logic_ vector(63 downto O); 
signal grw _ EX4o, trw _ EX4o, YIW _EX4o, IDFL, jmp _ IDo, ret_IDo: std _logic; 
signal RS lo,RS2o,TRo,VRo,RDo: std_logic_ vector(4 downto O); 
signal IDFo, Lm, TRWR_IDo, VRWR_IDo: std_logic; 
signal WBo: std _logic_ vector(3 down to O); 
signal EX34ct_IDo: std _logic_ vector( 12 downto O); 
signal PKct2o: std_logic_ vector(6 downto O); 
signal GR 120, GR22o, se2o, Tda2o, V da2o, wrdata _ IDo: std _logic_ vector(N-1 downto O); 
signal PKTOff_IDo: std _logic_ vector(Addr-1 downto O); 
signal sh2o: std_logic_vector(5 downto O); 
signal ocro, aero: std_logic_vector(7 downto O); 
signal op2o: std_logic_vector(5 downto O); 
signal ctlo: std _logic_ vector(24 down to O); 
--ETM 
signal EXFL, rfsig: std_logic; 
signal op3o, mo: std_logic_vector(5 downto O); 
signal alu3o, GR13o, GR23o: std_ logic_vector(63 downto O); 
signal ct13o: std _logic_ vector(24 down to O); 
signal RS l E3o,RS2E3o,TRE3o,VRE3o,RDE3o: std _logic_ vector( 4 downto O); 
signal WBct3o: std _logic_ vector(3 down to O); 
signal braddr _ EX3o: std _logic_ vector( 15 down to O); 
signal GFo, PFo, EFo, leo: std_logic; 
signal POff: std_logic_ vector(6 downto O); 
signal EX34cto: std _logic_ vector(9 downto O); 
signal fl o: std _logic_ vector(3 l down to O); 
--LTC 
signal WBct4o: std _logic_ vector(3 down to O); 
signal TRE4o, VRE4o, RDE4o: std_logic_vector(4 downto 0); 
signal da4o: std_logic_ vector(63 downto O); 
--UD 
signal esso: std_logic_vector(63 downto O); 
--Other signals 
signal ts: std_logic _ vector(63 downto O); 
begin 
--Output 
instchk <= instsig; 
datachk.lD <= data WBo· 
- , 
fl <= flo; 
ffpsig <= fl o(7 downto 0); --ID 
--other signals 
--ID 
grw _ EX4o <= WBct4o(0); -- reg write 
trw_EX4o <= WBct4o(3); -- tag reg write 
vrw_EX4o <= WBct4o(2); -- val reg write 
IDFL <= bsig_ EX4o or ovf; 
--ETM 
EXFL <= IDFL; 
POff <= PKTOff_IDo(6 downto 0); 
EX34cto <= EX34ct_IDo(9 downto 0); 
--UD 
--esso <= (others => '0'); 
IFCOMP: ifst_ifidreg port 
mapUump _in=>jmp _ IDo,branch _ in=>bsig_ EX4o,retin=>ret_ IDo,macctrlr=>macctrlr,oflow=>ovf,fmmacsi 
g=>fmmacsig,fm _ inst_reg_EX =>braddr_ EX3o,fm _ inst_reg_ ID=>brao,fm _mac_ ctrlr=>fm _mac_ ctrlr,clk= 
>elk _pi,clr=>clr, we _im=>we _ im,clock=>clk _im,NOP _ out=>NOP _ If o,instrin=>inst_in,pcout=>pcout,inst 
_ out=>instsig); 
IDCOMP: idstreg port 
map(inst_in=>instsig,WB _write_ data=>data _ WBo,loc=>loc,ffpin=>ffpsig,clk=>clk _pi,NOP _ in=>NOP _ IF 
o,ID _flush_ BR=>IDFL,regwr _sig=>grw _ EX4o,trw=>trw _ EX4o,vrw=>vrw _ EX4ojmpin=>jmp _IDo,retin 
=>ret_ IDo,lrnfmex=> Lm,morfmex=>mo,TRDstin=> TRE4o, VRDstin=> VRE4o,RDstin=>RDE4o,IDF out= 
>IDFo,WB ctrl out=>WBo,EX ctrl out=>EX34ct IDo,PKT ctr! out=>PKct2o,GPR readl out=>GR1 2 
o,GPR read2 o~t=>GR22o,sig;- ext- out=>se2o,TR read out ID;;;->Tda2o,VR read ~ut rn:>Vda2o,Br 
- - - - - - - -- -
Addr _ out=>brao,PKT _ Offset_ out=>PKTOff_IDo,shamt_ out=>sh2o,lmor _ out=>Lm,TRD _ out=> TRWR _ I 
Do, VRD _ out=> VR WR_ IDojumps=>j mp_ IDo,rets=>ret_ IDo,ocr _ val_ out_ id=>ocro,aer _ val_ out_ id=>aer 
o,opcodeexout=>op2o,ctrlsigsoutlD=>ctlo, wrdataout=>wrdata _ IDo,RS l _ out=>RS 1 o,RS2 _ out=>RS2o,RD 
_out=>RDo,TR_out=>TRo,VR_out=>VRo); 
EX3COMP:ex3top port 
map( clk=>clk _pi,clock=>clk _ c,clk _pkt=>clk _p,clr=>clr ,IDV=>IDV ,EPi=>EPi,ORr=>ORr,EX _Flush_ in= 
>EXFL,putin=>putin,lm=>Lm,irarn=>inp,flag=>aero,ocrID=>ocro,PKToffid=>POff,braddrin=>brao,ctrlin 
EX=>ctlo, WBinfmjd=> WBo,RS I rgid=>RS I o,RS2rgid=>RS2o,RDrgid=>RDo,TRrgid=> TRo,VRrgid=> V 
Ro,FSTRD=>RDE3o,FSTTRD=>TRE3o,FSTVRD=>VRE3o,VSTRD=>RDE4o,VSTTRD=>TRE4o,VST 
VRD=> VRE4o,op _ in=>op2o,prop _in=>op3o,GPR I id=>GRl 2o,GPR2id=>GR22o,TRidv=> T da2o,VRidv 
=> V da2o,extid=>se2o, WBdatain=>da4o,aofmex=>alu3o,EXctid=>EX34cto,PKTctid=>PKct2o,shamt=>sh 
2o,regrd=>ctlo(22),trwx=>WBct3o(3),trww=>WBct4o(3),vrwx=>WBct3o(2),vrww=>WBct4o(2),rwx=> 
WBct3o(O),rww=>WBct4o(O),alu_O=>ovf,ctrloutEX=>ctl3o,opoutEX=>op3o,mo=>mo,aluout=>alu3o,GP 
R lout=>GRl 3o,GPR2out=>GR23o,tagsigout=>ts,RS I_ out=>RS I E3o,RS2 _ out=>RS2E3o,RD _ out=>RDE 
3o,TR out=> TRE3o, VR out=> VRE3o,WBct_ out=> WBct3o,braddrout=>braddr _ EX3o,gf=>GF o,pf=>PF 
o,ess_full=>Efo,le=>leoAl(=>AK,PRr=>PRr,ldor=>ldor,EPo=>EPo,cok=>cok,lz=>lz,outvalue=>esso,oo 
=>oo,stag=>stag,oram=>outp,fl =>fl o,po=>po ); 
EX4COMP: ex4top port 
map( clk=>clk _pi,WBctrlin=>WBct3o,out_fm _ alu=>alu3o,RS 1 in=>RS 1 E3o,RS2in=>RS2E3o,VRin=> VR 
E3o,VSTRD=>RDE4o,VSTVRD=> VRE4o,RDin _fm4=>RDE3o,VRDin _ fm4=>VRE3o,TRDin _ fm4=>T 
RE3o,op _in=>op3o,GPRinl =>GR 12o,GPRin2=>GR22o,PTin=>da4o,brtype=>ct13o( 19 downto 
165 
17),ccr_inp=>PFo,ccr_ing=>GFo,branch=>bsig_EX4o,WBctout=>WBct4o,WBdataout=>da4o,WBRDout 
=>RDE4o,WBVRDout=>VRE4o,WBTRDout=> TRE4o ); 
WBCOMP: stage5 port 
map(WB _ inl =>WBct4o( I ),aluout_ fin_ ex=>da4o,essout_ fm _ st5=>esso,dataout=>data _ WBo ); 
end architecture esprtop_beh; 
2. IFSTAGE 
--Individual Componenets 
-- IF STAGE FULL 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
use IEEE.std_ logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity ifst_ifidreg is 
generic(N: positive := 64; 
Addr: positive := 16); 
port(jump _ in , branch _in, retin, macctrlr, oflow, fmmacsig: in std _logic; 
fm_inst_reg_EX, fm_inst_reg_ID, fm_mac_ctrlr: in std_logic_vector(Addr-1 downto O); 
elk, cir, we_im, clock: in std_logic; 
NOP_ out: out std _logic; 
instrin: in std _logic_ vector(N-1 down to O); 
pcout: out std_logic _ vector(Addr-1 downto O); 
inst_ out: out std _ logic_ vector(N-1 down to O)); 
end entity ifst_ifidreg; 
architecture ifidstregbeh of ifst_ ifidreg is 
-- IF pipe component 
component if_pipe is 
generic(N: positive := 64; 
Addr: positive :=16); -- 16 
port(jump in, branch in, retin, macctrlr, oflow, frnmacsig: in std_logic; 
fm_inst_reg, fm_mac_ctrlr: in std_logic_ vector(Addr-1 downto O); 
elk, cir, we im, clock: in std logic; 
instrin: in std _logic_ vector(N-1 downto O); 
NOP _out: out std_logic; 
inst out: out std logic vector(N-1 downto O); 
pc_~ut: out std_logic_~ector(Addr-1 downto 0)); 
end component if_pipe; 
-- IFID register component 
component ifidreg is 
port(clr, elk: in std_logic; 
instrin: in std logic vector(63 downto O); 
instrouttoid: ;-ut st(_logic _ vector( 63 down to O)); 
end component ifidreg; 
--Iner PC Gen 
component ipcchk is 
port(opfipcin: in std_logic_vector(5 downto O); 
opipcout: out std_logic); 
end component ipcchk; 
--MUX to choose inst reg address 
component mux_inst is 
port(a: in STD_LOGTC_ VECTOR (15 downto 0); 
b: in STD_LOGIC_ VECTOR (15 downto 0); 
s: in STD_LOGIC; 
y: out STD_ LOGIC_ VECTOR ( 15 downto 0) ); 
end component mux _inst; 
-- signals 
signal instoutsig: std _logic_ vector(N-1 downto 0); 
signal ipc, fmmacsigl , sinstmux: std_logic; 
signal muxinstaddr: std _logic_ vector(l 5 down to O); 
begin 
sinstmux <= jump_in or retin; 
fmmacsigl <= fmmacsig and ipc; 
ifpipecomp: if_pipe port mapUump_in=>jump_in, branch_in=>branch_in, retin=>retin, 
macctrlr=>macctrlr, oflow=>oflow, fmmacsig=>fmmacsigl, fm_inst_reg=>muxinstaddr, 
fm _mac_ ctrlr=>fm_mac _ ctrlr, clk=>clk, clr=>clr, we _im=>we _im, clock=>clock, instrin=>instrin, 
NOP _out=>NOP _out, inst_out=>instoutsig, pc_out=>pcout); 
ifidregcomp: ifidreg port map(clr=>clr, clk=>clk, instrin=>instoutsig, instrouttoid=>inst_out); 
ipcgencomp: ipcchk port map(opfipcin=>instoutsig(63 downto 58), opipcout=>ipc); 
instmuxcomp:mux _ inst port map(a=>fm _inst_reg_ EX,b=>fm _inst_reg_ ID,s=>sinstmux,y=>muxinstaddr); 
end architecture ifidstregbeh; 
-- Iner PC generation 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity ipcchk is 
port(opfipcin: in std_logic_ vector(5 downto 0); 
opipcout: out std_logic); 
end entity ipcchk; 
architecture ipcchkbeb of ipcchk is 
begin 
process( opfipcin) is 
begin 
if( opfipcin = "00001 0") then 
opipcout <= '0'; 
else 
opipcout <= 'l '; 
end if; 
end process; 
end architecture ipcchkbeh; 
--MUX for choosing inst reg addr 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
167 
entity mux _ inst is 
port(a: in STD_LOGIC_VECTOR (15 downto 0); 
b: in STD_LOGIC_ VECTOR (15 downto 0); 
s: in STD_LOG IC; 
y: out STD_ LOGIC_ VECTOR ( 15 downto 0) ); 
end entity mux_inst; 
architecture mux inst arch of mux inst is 
begin - - -
process (a, b, s) 
begin 
if ( s = '0') then 
y<=a; 
else y <= b; 
end if; 
end process; 
end architecture mux_inst_arch; 
-- IF pipe stage 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use lEEE.std_logic_arith.all; 
use IEEE.std_ logic_ unsigned.all; 
entity if_pipe is 
generic(N: positive := 64; 
Addr: positive := 16); -- 16 
portUump_in, branch_in, retin, macctrlr, oflow, fmmacsig: in std_logic; 
fm _ inst_reg, fm _mac_ ctrlr: in std_ logic_ vector(Addr- 1 down to 0); 
elk, cir, we_im, clock: in std_logic; 
instrin: in std _ logic_ vector(N-1 down to 0); 
NOP _out: out std_logic; 
inst_ out: out std _logic_ vector(N- 1 down to 0); 
pc_ out: out std _logic_ vector(Addr-1 down to 0)); 
end entity if_pipe; 
architecture if_pipe_beh ofif_pipe is 
--reg below pc 
component reg0 is 
port(in_fm_pc: in std_logic_vector( l5 downto 0); 
jump_in, branch_ in, cir, elk: in std_logic; 
out_ to _pc: out std _logic_ vector( 15 down to 0)); 
end component reg0; 
--Mux before pc 
component mux_bf_pc is 
generic(Addr: positive:= 16); 
port ( fm mac ctrlr, fm inst reg, fm_reg, incdpc: in std_logic_vector (Addr-1 downto 0); 
jumi._in, branch_in~ret~, macctrlr, oflow, incpc: in std_logic; 
pcaddr: out std_logic_ vector (Addr-1 downto 0) ); 
end component mux_bf_pc; 
-- Instruction memory 
component INSTMEM is 
port(clk, we, en, rst: in std_logic; 
addr: in std _logic_ vector(? downto 0); 
168 
inst_in: in std_logic_vector(63 downto 0); 
inst_out: out std_logic_vector(63 downto 0)); 
end component INSTMEM; 
-- program counter 
component pc is 
port(clk,clr,lpc, incpc: in std _logic; 
in_addr: in std_logic_vector(Addr-1 downto 0); 
out_addr: out std_logic_vector(Addr-1 downto 0)); 
end component pc; 
-- IF stage signals 
component ifsigfmbr is 
port(branchsig, jsig, rsig, macctrlr, fmmacsig: in std_logic; 
lpc_out, NOP _out, incpcout: out std_logic); 
end component ifsigfmbr; 
signal sigreg, sigincrpc, siginpc, sigoutpc: std logic vector(Addr-1 downto 0); 
signal incrpcsig, lpc, oneen: std_logic; - -
begin 
pc_ out <= sigoutpc; 
oneen <= ' l '; 
muxpc: mux_bf__pc port map(fm_mac_ctrlr=>fm_mac_ctrlr, fm_inst_reg=>fm_inst_reg, fm_reg=>sigreg, 
incdpc=>sigoutpc, jump_ in=>jump _ in, branch_ in=>branch _ in, retin=>retin, macctrlr=>macctrlr, 
oflow=>oflow, incpc=>incrpcsig, pcaddr=>siginpc); 
pctr: pc port map(clk=>clk, clr=>clr, lpc=>lpc, incpc=>incrpcsig, in_addr=>siginpc, out_addr=>sigoutpc); 
pcreg: reg0 port map(in _fm __pc=>sigoutpc, jump _in=>jump _in, branch _in=>branch _in, clr=>clr, clk=>clk, 
out_to __pc=>sigreg); 
instrmemnew: INSTMEM port map(clk=>clock, we=>we_im, en=>oneen, rst=>clr, addr=>sigoutpc(7 
downto 0), inst_ in=>instrin, inst_ out=>inst_ out); 
IFsigs: ifsigfmbr port map(branchsig=>branch _in, jsig=>jump _ in, rsig=>retin, macctrlr=>macctrlr, 
fmmacsig=>fmmacsig, lpc _ out=>lpc, NOP_ out=>NOP _ out, incpcout=>incrpcsig); 
end architecture if__pipe_beh; 
-- register below pc 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std_ logic_ unsigned.all; 
entity reg0 is 
port(in_fm__pc: in std_logic_vector(l5 downto 0); 
jump_in, branch_in, cir, elk: in std_logic; 
out_ to _pc: out std _logic_ vector( 15 down to 0)); 
end entity reg0; 
architecture reg_ beh of reg0 is 
signal !reg: std _logic; 
signal cl: std _logic_ vector(! downto 0); 
signal out_to__pcs: std_logic_vector( l5 downto O); 
169 
begin 
lreg <= jump_in or branch in; 
cl <= cir & lreg; -
process(clk, cl, in_fm_pc, out_to_pcs) is 
begin 
if (rising_ edge( elk)) then 
case cl is 
when" JO" => out_to_pcs <=(others=> 'O'); 
when" 11" => out_to_pcs <=(others=> 'O'); 
when "01" => out_to_pcs <= in_fm_pc; 
when "00" => out_to_pcs <= out_to_pcs; 
when others => null; 
end case; 
end if; 
out_ to _pc <= out_ to _pcs; 
end process; 
end architecture reg_ beh; 
-- M ux before PC 
Library IEEE; 
use IEEE.std_logic _ 1164.all; 
entity mux_bf_pc is 
generic(Addr: positive:= 16); 
port ( fro_ mac_ ctrlr, fm _inst_reg, fm_reg, incdpc: in std _logic_ vector (Addr-1 downto O); 
jump_ in, branch _in, retin, macctrlr, oflow, incpc: in std _logic; 
pcaddr: out std _logic_ vector (Addr-1 downto 0) ); 
end entity mux_bf_pc; 
architecture mux_arch ofmux_bf_pc is 
signal jb _ret_ mac: std _logic_ vector( 4 downto O); 
signal jorb _in: std _ logic; 
signal pcsig: std _logic_ vector(Addr-1 down to O); 
begin 
jorb_in <= jump_in or branch_ in; 
jb_ret_mac <= jorb_in & retin & macctrlr & oflow & incpc; 
process (fm _mac_ ctrlr, fm _inst_reg, fm _reg, jb _ret_ mac, pcsig, incdpc) is 
begin 
case jb_ret_mac is 
when "00000" => pcsig <=(others => '0'); 
when "0000 l" => pcsig <= incdpc; 
when "0001 O" => pcsig <= "0000100000000000"; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "00011" => pcsig <= "0000100000000000"; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "00100" => pcsig <= fm_mac_ctrlr; 
when "0010 l" => pcsig <= fm _mac_ ctrlr; 
when "00110" => pcsig <= "0000100000000000"; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
170 
when "00111" => pcsig <= 11000010000000000011 ; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "0 1000" => pcsig <= fm_reg; 
when 110 1001" => pcsig <= incdpc; 
when "0 1010" => pcsig <= "0000100000000000"; --" 10000"; --"0000100000000000" ; -- Overflow 
exception, this Address has abort and out instructions 
when "0 101 I" => pcsig <= "000010000000000011 ; --" 10000"; --11 0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "01100" => pcsig <= fm_mac_ctrlr; 
when "0 110 I" => pcsig <= fm _mac_ ctrlr; 
when "0111011 => pcsig <= "0000100000000000"; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "01111" => pcsig <= "0000100000000000"; --" 1000011 ; --110000 100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when" 10000" => pcsig <= fm_inst_reg; 
when II I 000 I" => pcsig <= fm _ inst_reg; 
when "I 00 1 0" => pcsig <= "0000 I 00000000000"; --" 10000"; --110000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "I 00 l l 11 => pcsig <= "0000 I 00000000000"; --" I 0000"; --110000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when " l O I 00" => pcsig <= fm _ inst_reg; 
when " IO l O I" => pcsig <= fin_ inst_reg; 
when" 10110" => pcsig <= "0000 100000000000"; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when" 10111" => pcsig <= "0000100000000000"; --" 10000"; --"0000100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when " 11000" => pcsig <= fin_ inst_reg; 
when "1 100 l" => pcsig <= fin _inst_reg; 
when" 11010" => pcsig <= "0000100000000000"; --" 10000"; --"0000 100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when "11011" => pcsig <= "0000 l 00000000000"; --11 10000"; --"0000 l 00000000000
11
; -- Overflow 
exception, this Address has abort and out instructions 
when 11 1110011 => pcsig <= fm_inst_reg; 
when "1110 l" => pcsig <= fm _inst_reg; 
when 11 11110" => pcsig <= "000010000000000011 ; --11 1000011 ; --"0000 100000000000"; -- Overflow 
exception, this Address has abort and out instructions 
when 11 11111 11 => pcsig <= 11000010000000000011 ; --" 1000011 ; --110000100000000000
11
; - - Overflow 
exception, this Address has abort and out instructions 
when others => null; 
end case; 
pcaddr <= pcsig; 
end process; 
end architecture mux _ arch; 
-- Full Instruction Memory Design -Initialised for 'RCOLLECT' with the inital PUT 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
--synopsys translate_ off; 
library unisim; 
use unisim.vcomponents.all; 
--synopsys translate_ on; 
entity INSTMEM is 
port( elk, we, en, rst: in std _logic; 
addr: in std_logic_vector(7 downto 0); 
171 
inst_ in: in std_ logic_ vector( 63 down to 0); 
inst_out: out std_ logic_ vector{63 downto 0)); 
end entity INSTMEM; 
architecture behavioural ofINSTMEM is 
component RAMB4 S 16 is 
port(ADDR: in std _ l~gic _ vector(? down to 0); 
CLK: in std_ logic; 
DI: in std _ logic_ vector( 15 down to 0); 
DO: out std _ logic_ vector( 15 down to 0); 
EN, RST, WE: in std_logic); 
end component RAMB4 _SI 6; 
attribute TNlT _ 00: string; 
attribute TNlT _ 0 I: string; 
attribute TNlT _ 02: string; 
attribute TNlT _ 03: string; 
attribute INIT _ 04: string; 
attribute IN1T_05: string; 
attribute INIT _ 06: string; 
attribute TNlT _ 07: string; 
attribute INIT _ 08: string; 
attribute INIT _ 09: string; 
attribute TNlT _ 0A: string; 
attribute INJT _OB: string; 
attribute INJT _ 0C: string; 
attribute TNlT _ 0D: string; 
attribute INIT _ OE: string; 
attribute TNlT_0F: string; 
attribute INIT _ 00 of Instram0 : label is 
"00000 1400000 l 240000000C0000O0000 I 0400000000000000000004000000000"; 
a ttribute INIT _ 01 oflnstram0 : label is 
"034000000AC000000 I C00000 l 24000000000000006C00000000002C000000940" ; 
attribute INJT 02 of lnstram0 : label is 
"0000 I 240000000000340000006CO0 5 800000000002C000000B40000003 C00000"; 
attribute INJT 03 of lnstram0 : label is 
"000004000000 l 2400000000000000000 I 0C000000240000000000D4000000000"; 
attribute INIT 04 of lnstram0 : label is 
"00000340000002C000000O00O00O00000F800000I24000000000000000001300"; 
attribute INIT 05 oflnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 oflnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute lNIT 07 of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT 08 oflnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT 09 of lnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A of InstramO : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT OB of lnstramO : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
172 
attribute INIT _ 0C of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ OD of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000" · 
attribute INIT _ OE of Instram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0F oflnstram0: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 00 of Ins tram I : label is 
"800000000000000080000000000000000000000081000 1000100010000000000"; 
attribute INIT _ 0 I oflnstram I : label is 
"0100000000008000000000000000000080000100000001000100010000000000"; 
attribute INIT_02 oflnstraml : label is 
"0000000000008000010000000000000001008000010000000000800000800100"; 
attribute INIT _ 03 of Ins tram I : label is 
"0 I 000 I000000000000008000010000000000800000000000000000000 I 000 I 00"; 
attribute INIT _ 04 of lnstraml : label is 
"0000000000000000000000000000000000000000000000008000000000000000"; 
attribute INIT _ 05 of Ins tram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of Ins tram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 07 of Ins tram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 08 of Ins tram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_09 oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB oflnstram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000''; 
attribute INIT QC oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OF oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 00 oflnstrarn2 : label is 
"008200A000000000004 l 0060000000000000004l00010060 l 800000000000000"; 
attribute INIT 0 I of lnstrarn2 : label is 
"OOOOOOOOOOOOOOC300E000000000008200022000200020000002000000000000"; 
attribute INIT 02 of Instrarn2 : label is 
"ooooooooooc300030000000000000000000200020000000000004803ooooooo3"; 
attribute INIT 03 oflnstrarn2 : label is 
"00040000000000000 I 0400040004000000000 I 04012 000000000 5 8000002000 I"; 
attribute INIT 04 oflnstram2 : label is 
"0000000300000000000000000000000000000000000001040004000000007000"; 
attribute INIT 05 of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
173 
attribute INIT _ 07 of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 08 of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 09 oflnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 0A oflnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_0B oflnstram2: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_0C oflnstram2: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_0D oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ OE of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ OF of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 00 of Instram3 : label is 
"7800540000008000780054000000000084007C00 I C05 l C0524A4208000000400"; 
attribute INIT _ 0 1 of Instram3 : label is 
"55000000800078005400000084007CO0 I C0734E5600638C5 l CA0548000008000"; 
attribute INIT _ 02 of lnstram3 : label is 
"000084007C0O 1 C085500000070005C04 I CAO I C00548000007000000854001020"; 
attribute INIT 03 of lnstram3 : label is 
"I DC055A0000084007C00 1COC2D80000080007800540000001400600AlD601D40"; 
attribute INIT _ 04 of lnstram3 : label is 
"08005 8000C00580300000800880000007000000084007C00 I C000000 l 400640D"; 
attribute INIT 05 of lnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of lnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 07 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 08 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 09 oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0C of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OF of lnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
begm 
Instram0: RAMB4 _ S 16 
--synopsys translate_ off 
174 
GENERIC MAP ( 
fNIT _ 00 => X"00000 1400000 l 240000000C000000000 l 0400000000000000000004000000000", 
fNIT _ 0 l => X"034000000AC000000 l C00000 l 24000000000000006C00000000002C000000940", 
INIT _ 02 => X"0000 l 240000000000340000006C005800000000002C000000B40000003COOOOO", 
INIT _ 03 => X"000004000000 I 2400000000000000000 I OC000000240000000000D4000000000", 
INIT _ 04 => X"00000340000002COOOOOOOOOOOOOOOOOOF800000I24000000000000000001300" , 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000OOOO", 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000OOOOO", 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOOO", 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000OOOOOO", 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000OOOO", 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000OOOOO", 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000OOOOO", 
fNIT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOOOOO", 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOOO", 
INIT _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOO", 
INIT _ OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000OOOOOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(l5 downto 0), DO=>inst_out(l5 downto 0), EN=>en, 
RST=>rst, WE=>we); 
Ins tram l: RAMB4 S 16 
--synopsys translate_ off 
GENERIC MAP ( 
INIT _ 00 => X" 80000000000000008000000000000000000000008 l 000 I 000 I 000 I 0000000000", 
INIT _ 01 => X"0 I 00000000008000000000000000000080000100000001000100010000000000", 
INIT _ 02 => X"00000000000080000 I 000000000000000 I008000010000000000800000800 I 00", 
INIT _ 03 => X"0 I 000I000000000000008000010000000000800000000000000000000 I 000 I 00", 
INIT _ 04 => X"0000000000000000000000000000000000000000000000008000000000000000", 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000OOOOOOOOOOO", 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000OOOOOOOOOOO", 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000OOOOOOOOOO", 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000OOOOOOOOO", 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000OOOOOOOOO", 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000OOOOOOOOOOO", 
!NIT_ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000OOOOOOOOOOO", 
!NIT_ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000OOOOOOOOOOOO", 
INIT _OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000OOOOOOOOOOOOO", 
INJT _OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000OOOOOOOOOOOO", 
INIT _OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000OOOOOOOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK =>elk, DI=>inst_ in(3 l down to I 6), DO=>inst_ out(3 l down to 16), EN=>en, 
RST=>rst, WE=>we); 
Instram2: RAMB4 Sl6 
--synopsys translate_ off 
GENERIC MAP ( 
INIT 00 => X"008200A000000000004 l 0060000000000000004 l 000 I 0060 l 800000000000000", 
INIT - 0 I => X"OOOOOOOOOOOOOOC300E000000000008200022000200020000002000000000000", 
INIT- 02 => X"OOOOOOOOOOC30003000000000000000000020002000000000000480300000003" , 
INIT - 03 => X"00040000000000000104000400040000000001040 l 200000000058000002000 l", 
INIT - 04 => X"000000030000000000000000000000000000000000000 I 040004000000007000", 
INIT - 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000OOOOOOOOOOOOOOO", 
INIT - 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOO" , 
INIT - 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOOO", 
INIT - 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOOO", 
175 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000000000000000", 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000000", 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000000000000", 
INIT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000", 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000", 
INIT _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000", 
INIT _OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, Dl=>inst in(47 downto 32), DO=>inst out(47 downto 32), EN=>en, 
RST=>rst, WE=>we); - -
Instram3: RAMB4 S 16 
--synopsys translate_ off 
GENERIC MAP ( 
INIT _ 00 => X"7800540000008000780054000000000084007C00 1 C05 l C0524A4208000000400", 
INIT _ 0 l => X" 5 5000000800078005400000084007C00 I C0734 E560063 8C5 l CA0548000008000", 
INIT _ 02 => X"000084007C00 1 C085500000070005C04 l CAO I C00548000007000000854001 D20", 
INIT _ 03 => X" 1 DC055A0000084007C00 l COC2D80000080007800540000001400600A 1D601D40", 
INIT _ 04 => X"080058000C00580300000800880000007000000084007C00 1 C000000 1400640D", 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000OOOO", 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000OOOO" , 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000OOO", 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000OOOO", 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000OOOOOOO", 
INIT 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000OOOOOO", 
INIT- OB=> X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000OOOOO", 
INlT - 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000OOOO", 
INIT-OD=> X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000OOOO", 
INIT-OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000OOO", 
INIT)F => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(63 downto 48), DO=>inst_out(63 downto 48), EN=>en, 
RST=>rst, WE=>we); 
end architecture behavioural; 
-- Program counter 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity pc is 
port( clk,clr,lpc, incpc: in std _logic; 
in addr: in std logic vector( 15 downto 0); 
mrt _ addr: out std _logic_ vector( 15 downto 0)); 
end entity pc; 
architecture behavioral of pc is 
signal clipc: std _logic_ vector(2 downto 0); 
signal out_ addrs: std _logic_ vector( 15 down to 0); 
begin 
clipc <= cir & !pc & incpc; 
176 
process(clk, clipc, in_addr, out_addrs) is 
begin 
if(rising_edge(clk)) then 
case clipc is 
when" 110" => out_addrs <=(others => '0'); 
when" 111" => out_addrs <=(others => '0'); 
when "101" => out_addrs <= (others => '0'); 
when "I 00" => out_ addrs <= ( others => '0'); 
when "0 lO" => out_ addrs <= in_ addr; 
when "00 I" => out_ addrs <= in_ addr + I; 
when "0 11" => out_addrs <= in_ addr + I; 
when "000" => out_addrs <= out_addrs; 
when others => null; 
end case; 
end if; 
out addr <= out_ addrs; 
end process; 
end architecture behavioral; 
-- IF stage signals 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic _unsigned.all; 
entity ifsigfmbr is 
port(branchsig, jsig, rsig, macctrlr, fmmacsig: in std _logic; 
lpc_out, NOP _out, incpcout: out std_logic); 
end entity ifsigfmbr; 
architecture ifsigfmbr _ beh of i fsigfmbr is 
signal bjr: std_ logic; 
signal bf: std _ logic_ vector( I down to 0); 
begin 
bjr <= branchsig or jsig or rsig or macctrlr; 
bf <= bjr & fmmacsig; 
process(bf) is 
begin 
case bf is 
when "00" => 
lpc_out <= '0'; 
NOP_ out <= '0'; 
incpcout <= '0'; 
when "01" => 
lpc_out <= ' l'; 
NOP_ out <= '0'; 
incpcout <= 'l '; 
when " 10" => 
lpc_out <= 'l'; 
177 
NOP out <= 'I'· 
- , 
incpcout <= 'O'; 
when "11" => 
lpc_out <= ' l'; 
NOP out <= 'l '· 
- , 
incpcout <= 'O'; 
when others => 
lpc_out <= 'O'; 
NOP_ out <= 'O'; 
incpcout <= 'O'; 
end case; 
end process; 
end architecture ifsigfmbr _ beh; 
-- IF-ID stage register 
library IEEE; 
use IEEE.std_logic_ l l64.all; 
use IEEE.std _ logic _arith.all; 
use IEEE.std _ logic_ unsigned.all; 
entity ifidreg is 
port( cir, elk: in std _logic; 
instrin: in std_logic_ vector(63 downto O); 
instrouttoid: out std_ logic_ vector( 63 down to O)); 
end entity ifidreg; 
architecture ifidreg_ beh of ifidreg is 
begin 
rpr:process(clk, cir, instrin) is 
begin 
if(falling_ edge( elk)) then 
case cir is 
when ' I' => 
instrouttoid <=(others=> 'O'); 
when 'O' => 
instrouttoid <= instrin; 
when others => null; 
end case; 
end if; 
end process rpr; 
end architecture ifidreg_ beh; 
3. ID STAGE 
-- ID/ETM stage components and register 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
178 
entity idstreg is 
generic(N: positive:= 64; 
Addr: positive:= 16); 
port(inst_in: in std_logic_vector(N-1 downto 0); 
--cfg_in, bitmapin: in std _logic_ vector(N-1 downto 0); 
WB_write_data: in std_logic_vector(N-1 downto 0); 
Joe: in std _ logic_ vector(2 down to 0); 
ffpin: in std _logic_ vector(? downto 0); 
elk, NOP _in, ID _flush_BR, regwr sig, trw, vrw,jmpin, retin, lmfmex: in std logic; 
morfmex: in std _logic_ vector(S d~mto 0); -
TRDstin, VRDstin, RDstin: in std _logic_ vector( 4 down to 0); 
ID Fout: out std _logic; 
WB _ctr!_ out: out std _logic_ vector(3 dovmto 0); 
EX_ ctr!_ out: out std _ logic_ vector( 12 down to 0); 
PKT_ctrl_out: out std_logic_vector(6 downto 0); 
GPR_readl_out, GPR_read2_out, sign_ext_out: out std_logic_vector(N-1 downto O); 
TR _read_ out_ ID, VR _read_ out_JD: out std _logic_ vector(N- l down to 0); 
Br _Addr _ out, PKT_ Offset_ out: out std _logic_ vector(Addr-1 down to 0); 
shamt_out: out std_ logic_ vector(S downto 0); 
lmor _ out, TRD _ out, VRD _ out, jumps, rets: out std _logic; 
ocr_val_out_id, aer_ val_out_id: out std_logic_ vector(? downto 0); 
opcodeexout: out std_logic_vector(S downto 0); 
ctrlsigsoutID: out std_logic_vector(24 downto 0); 
wrdataout: out std _ logic_ vector(N-1 downto 0); 
RSl_out, RS2_out, RD _out, TR_out, VR_out: out std_logic_ vector(4 downto 0)); 
end entity idstreg; 
architecture idstreg_ beh of idstreg is 
-- ID stage component 
component id_stage is 
generic(N: positive:= 64; 
Addr: positive:= 16); 
port(inst_ in: in std _logic_ vector(N-1 downto 0); 
WB _write_ data: in std _logic_ vector(N-1 down to 0); 
loc: in std_logic_ vector(2 downto O); 
ftp in: in std _ logic_ vector(? down to 0); 
elk, NOP _in, ID_flush_BR, regwr_sig,jmpin, retin, lmfrnex: in std_logic; 
morfrnex: in std _logic_ vector(S downto 0); 
trw, vrw: in std_logic; 
RDstin: in std _logic_ vector( 4 down to 0); 
TRDstin, VRDstin: in std _logic_ vector( 4 downto 0); 
opcodeout: out std _logic_ vector(S downto 0); 
ID Flush: out std logic; 
WB ctr! out: outstd logic vector(3 downto 0); 
EX3trlyut: out std]ogic_::-vector(l2 downto 0); 
PKT ctr! out: out std logic vector(6 downto 0); 
GPR-read l out, GPR read2 out, sign ext_ out: out std _logic_ vector(N-1 downto 0); 
TR read mrt VR read out: ~ ut std log ic vector(N-1 downto 0); 
Br -Add; PKT Offset: ~ ut std logiZ vect;r(Addr-1 downto O); 
sh;mt o~t: outs td logic vect;r(S d;wnto 0); 
lmor -;ut, TRD, VRD, j~mp, ret: out std_logic; 
ocr ;;al stout aer val out: out std logic vector(? downto O); 
ctri'sigs;ut: o~t stc[logic _ vector(24 dow~o 0); 
wrdataout: out std_logic_vector(N-1 downto 0); 
RS l_out, RS2_out, RD _out, TR_out, VR_out: out std_logic_ vector(4 downto O)); 
end component id_ stage; 
179 
-- ID/ETM register component 
component ess_idexreg is 
generic(N: positive:= 64; 
Addr: positive := 16); 
port( elk, ID _Flush: in std _logic; 
ctrlin: in std _logic_ vector(24 down to O); 
WB _ in: in std _logic_ vector(3 down to O); 
EX_in: in std_logic_ vector(l2 downto O); 
PKT _in: in std _logic_ vector( 6 down to O); 
GPR_readl_in, GPR_read2_in, sign_ext_in: in std_logic_vector(N-1 downto O); 
TR_read_in, VR_read_in: in std_ logic_vector(N-1 downto O); 
Br_Addr_ in, PKT_Offset_in: in std_logic_vector(Addr-1 downto O); 
shamt_in: in std_logic_vector(S downto O); 
lmor_in,jin_id, rin_id: in std_logic; 
ocr _in _id, aer _ in_id: in std _logic_ vector(? downto O); 
RS l_in, RS2_in, RD _in, TR_in, VR_in: in std_logic_ vector(4 downto O); 
opcodein: in std_logic_vector(S downto O); 
opcodeexout: out std _logic_ vector( 5 down to O); 
ctrlout: out std _logic_ vector(24 down to O); 
WB _ out: out std _logic_ vector(3 downto O); 
EX_ out: out std _logic_ vector( 12 downto O); 
PKT_out: out std_logic_vector(6 downto O); 
GPR_readl_out, GPR_read2_out, sign_ext_out: out std_logic_vector(N-1 downto O); 
TR_read_out_ID, VR_read_out_ID: out std_logic_vector(N-1 downto O); 
Br _Addr _ out, PKT_ Offset_ out: out std _logic_ vector(Addr-1 down to O); 
shamt_out: out std_logic_ vector(S downto O); 
lmor _ out, TRD _ out, VRD _ out, jout_id, rout_id: out std _logic; 
ocr _ out_ id, aer _ out_id: out std _logic_ vector(? downto O); 
RS !_out, RS2_out, RD _out, TR_out, VR_out: out std_logic_ vector(4 downto O)); 
end component ess_idexreg; 
-- JBR component 
component jbrchk is 
port(clk, IDFin: in std_logic; 
IDF _ outl: out std_logic); 
end component jbrchk; 
-- signals declaration 
signal IDFs, IDFsl, IDFs2: std_logic; 
signal WBs: std _logic_ vector(3 down to O); 
signal EXs: std_logic_vector(l2 downto O); 
signal PK Ts: std _logic_ vector( 6 down to O); 
signal TRrs, VRrs, GPRls, GPR2s, signs: std_logic_vector(N-1 downto O); 
signal BrAddrs, PKTOs: std_logic_ vector(Addr-1 downto O); 
signal shamts: std_logic_vector(S downto O); 
signal lmors, TRDs, VRDs, js, rs: std_ logic; 
signal ocrs, aers: std _logic_ vector(? downto O); 
signal RS Is, RS2s, RDs, TRns, VRns: std _logic_ vector( 4 downto O); 
signal ops: std _logic_ vector(S downto O); 
signal ctrls: std _logic_ vector(24 downto O); 
begin 
IOFout <= IDFs2; 
IDFs2 <= IDFs or IDFs I; 
180 
idstagecomp: id stage port map(inst in=>inst in, WB write data=>WB write data, loc=>loc, 
ffpin=>ffpin, clk-=>clk, NOP _in=>NOP _in, ID_flush_BR=>ID_flush_BR, reg;;._sig=>regwr_sig, 
jmpin=>jmpin, retin=>retin, lmfmex=>lmfmex, morfmex=>morfmex, trw=>trw, vrw=>vrw, 
RDstin=>RDstin, TRDstin=> TRDstin, VRDstin=>YRDstin, opcodeout=>ops, ID _Flush=>IDFs, 
WB _ctr!_ out=> WBs, EX_ ctr!_ out=>EXs, PKT_ ctrl_ out=>PKTs, GPR _read l _ out=>GPR Is, 
GPR _read2 _ out=>GPR2s, sign_ ext_ out=>signs, TR _read_ out=> TRrs, YR _read_ out=> YRrs, 
Br_ Addt=>BrAddrs, PKT_ Offset=>PKTOs, shamt_ out=>shamts, lmor _ out=>lmors, TRD=> TRDs, 
VRD=> YRDs, jump=>js, ret=>rs, ocr _ val_stout=>ocrs, aer _ val_ out=>aers, ctrlsigsout=>ctrls, 
wrdataout=>wrdataout, RS l _ out=>RS 1 s, RS2 _ out=>RS2s, RD_ out=>RDs, TR_ out=>TRns, 
YR_out=>YRns); 
idexregcomp: ess_idexreg port map(clk=>clk, ID_Flush=>IDFs2, ctrlin=>ctrls, WB_in=>WBs, 
EX_ in=>EXs, PKT_ in=>PKTs, GPR _read 1 _ in=>GPR Is, GPR _read2 _ in=>GPR2s, sign_ ext_ in=>signs, 
TR_read _ in=>TRrs, YR _read_ in=>VRrs, Br_ Addr _ in=>BRAddrs, PKT_ Offset_in=>PKTOs, 
shamt_in=>shamts, lmor _ in=>lmors, jin _id=>js, rin _id=>rs, ocr _ in _id=>ocrs, aer _in_ id=>aers, 
RS I _in=>RS ls, RS2 _ in=>RS2s, RD_ in=>RDs, TR_ in=>TRns, YR _in=>VRns, opcodein=>ops, 
opcodeexout=>opcodeexout, ctrlout=>ctrlsigsoutlD, WB _ out=> WB _ ctr!_ out, EX_ out=> EX_ ctr!_ out, 
PKT_ out=>PKT _ ctr!_ out, GPR _read l _ out=>GPR _read I_ out, GPR _read2 _ out=>GPR_read2 _ out, 
sign_ext_out=>sign_ext_out, TR_read_out_ID=>TR_read_out_ID, VR_read_out_ID=>YR_read_out_ID, 
Br_ Addr _ out=>Br _ Addr _ out, PKT_ Offset_ out=>PKT _ Offset_ out, shamt_ out=>shamt_ out, 
lmor _ out=>lmor _ out, TRD _ out=> TRD _ out, YRD _ out=> YRD _ out, jout_id=>jumps, rout_id=>rets, 
ocr _ out_id=>ocr _ val_ out_id, aer _ out_id=>aer _ val_ out_id, RS l _ out=>RS l _ out, RS2 _ out=>RS2 _ out, 
RD_out=>RD_out, TR_out=>TR_out, VR_out=>YR_out); 
jbrchkcomp: jbrchk port map(clk=>clk, IDFin=>IDFs, IDF _outl=>IDFsl); 
end architecture idstreg_beh; 
--Individual components 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use lEEE.std _logic_ arith.all; 
use IEEE.std_ logic_ unsigned.all; 
entity id_ stage is 
generic(N: positive:= 64; 
Addr: positive := l 6); 
port(inst in: in std logic vector(N-1 downto 0); 
--cfg_ in,-bitmapin:-in std-=_logic _ vector(N-1 downto 0); 
WB write data: in std logic vector(N-1 downto 0); 
Joe: i n std Jogic _ vecto-;:(2 do~to 0); 
ffpin: in std _logic_ vector(? downto 0); 
elk, NOP in, ID flush BR, regwr _sig, jmpin, retin, lmfmex: in std _logic; 
morfmex:-in std logic -vector(5 downto 0); 
trw, vrw: in std_logic; 
RDstin: in std _logic_ vector( 4 down to 0); 
TRDstin, VRDstin: in std _logic_ vector( 4 downto 0); 
opcodeout: out std_logic_vector(5 downto 0); 
ID Flush: out std logic; 
WB ctrl out: outstd logic vector(3 downto 0); 
EX ctrl ~ut: out std l ogic ~ ector(l2 downto 0); 
PKT ctrl out: out std logi-;; vector(6 downto 0); 
GPRreadl out, GPR- read2 out, sign_ext_out: out std_logic_vector(N-1 downto O) ; 
TR ~ad mrt YR read out: ~ ut std logic vector(N- 1 downto 0); 
Br -Addr~ PKT Offset: ~ut std _logiZ_ vect~r(Addr-1 downto O); 
shamt_out: outs td_logic_ vector(5 downto 0); 
181 
lmor_out, TRD, VRD, jump, ret: out std_logic; 
ocr_ val_ stout, aer _ val_ out: out std _logic_ vector(? down to 0); 
ctrlsigsout: out std _logic_ vector(24 downto 0); 
wrdataout: out std _logic_ vector(N-1 down to 0); 
RSI_out, RS2_out, RD_out, TR_out, VR_out: out std_logic_vector(4 downto 0)); 
end entity id_stage; 
architecture id_stage_beh ofid_stage is 
--components 
--tag and val reg file 
component tagregfile is 
port(TRNUMS, TRNUMD: in std_Iogic_vector(4 downto 0); 
tag_in: in std_logic_vector(63 downto 0); 
elk, tr_ write: in std _logic; 
tag_ out: out std _logic_ vector( 63 downto 0)); 
end component tagregfile; 
--controller 
component cntunit0 is 
port(opcode: in std_ logic_vector(5 downto 0); 
Joe: in std _logic_ vector(2 down to O); 
ffpin: in std _logic_ vector(? downto 0); 
ocr _ val, aer _ val: out std _logic_ vector(? downto 0); -- for 8 bit output code register 
ctrlsigs: out std_logic_vector(24 downto 0)); 
end component cntunit0; 
--GPR file 
component regfile is 
port(RD, RS I, RS2: in std_logic_ vector(4 downto 0); 
cfg_in, bitmapin: in std_logic_vector(N-1 downto 0); 
writedata: in std_logic_ vector(63 downto 0); 
elk, regwrite, regread: in std _logic; 
rdwrdataout: out std _logic_ vector(63 downto 0); (need to have later) 
readdatal , readdata2: out std_logic_vector(63 downto 0)); 
end component regfile; 
-- SIGN EXT UNIT COMP 
component signext is 
generic(N: positive:= 64; 
imm: positive := 16); 
port(immval: in std _logic_ vector(imm-1 downto 0); 
sign: in std_ logic; 
extdval: out std _logic_ vector(N-1 downto 0)); 
end component signext; 
-- extra comp 
component idextra is 
generic(N: positive := 64); 
port(regw, trwr, vrwr: in std_logic; 
wrregin, wrtagin, wrvalin: in std_logic_ vector(N-1 downto 0); 
wrdataout: out std _logic_ vector(N-1 down to 0)); 
end component idextra; 
-- opcode for controller 
component mux _ ct is 
port (n_op, Im_op: in STD_LOGIC_ VECTOR (5 downto 0); 
Im: in STD LOGIC; 
optoct: out STD_LOGIC_ VECTOR (5 downto 0) ); 
end component mux _ ct; 
signal opcodesig, optoctsig: std_logic _ vector(5 downto 0); 
182 
signal RSl_sig, RS2_sig, TR_sig, VR_sig: std_logic_vector(4 downto 0); 
signal immsig: std_logic_vector(Addr-1 downto 0); 
signal temp_ ctr!_ sigs: std _ logic_ vector(24 downto 0); 
signal regrdsig: std _logic; 
signal witagsig, wrvalsig, wrregsig: std_logic_vector(N-1 downto 0); 
begin 
opcodesig <= inst_in(63 downto 58); 
opcodeout <= optoctsig; 
ctrlsigsout <= temp_ctrl_sigs; 
RSl_sig <= inst_in(52 downto 48); 
RS2_sig <= inst_in(47 downto 43); 
TR_sig <= inst_in(42 downto 38); 
VR_sig <= inst_in(36 downto 32); 
immsig <= inst_in(21 downto 6); 
regrdsig <= temp_ctrl_sigs(22); 
ID_Flush <= 1D _flush_BR or NOP _in or jmpin or retin; 
jump<= temp_ctrl_sigs(21); 
ret <= temp_ctrl_sigs(20); 
WB _ctrl_out <= inst_in(37) & inst_in(3 l) & temp_ctrl_sigs(5) & inst_in(24); 
EX_ctrl_out <= temp_ctrl_sigs( l9 downto 17) & temp_ctrl_sigs(l4 downto 6) & temp_ctrl_sigs(l); 
PKT_ctrl_out <= temp_ctrl_sigs(24 downto 23) & temp_ctrl_sigs(l6 downto 15) & temp_ctrl_sigs(3 
downto 2) & temp_ctrl_sigs(0); 
Br_ Addr <= inst_ in(2 l downto 6); 
PKT_ Offset <= inst_in(2 l down to 6); 
shamt_out <= inst_in(5 downto 0); 
lmor_out <= inst_in(23); 
TRD <= inst_in(37); 
VRD <= inst_in(3 l ); 
RD _out <= inst_in(57 downto 53); 
RSl_out <= inst_in(52 downto 48); 
RS2_out <= inst_in(47 downto 43); 
TR_out <= inst_in(42 downto 38); 
VR _out <= inst_ in(36 downto 32); 
GPR_readl_out <= wrregsig; 
TR_read _ out <= wrtagsig; 
VR_read_out <= wrvalsig; 
GPRfile: regfile port map(RD=>RDstin,RS I =>RS 1 _ sig,RS2=>RS2 _ sig,writedata=>WB _write_ data, 
clk=>clk, regwrite=>regwr_ sig, regread=>regrdsig, readdatal =>wrregsig, readdata2=>GPR _read2 _ out); 
TRfile: tagregfile port 
map(TRNUMS=>TR _sig,TRNUMD=> TRDstin,tag_in=>WB _write_ data,clk=>clk,tr _ write=>trw,tag_ out= 
>wrtagsig); 
VRfile: tagregfile port 
map(TRNUMS=> VR _sig,TRNUMD=> VRDstin,tag_in=>WB _write_ data,clk=>clk,tr _ write=>vrw,tag_ out 
=>wrvalsig); 
Control: cntunit0 port map( opcode=>optoctsig, loc=>loc, ffpin=>ffpin, ocr _ val=>ocr _ val _stout, 
aer _ val=>aer _ val_ out, ctrlsigs=>temp _ctr!_ sigs ); 
signextunit: signext port map(immval=>immsig, sign=>inst_in(22), extdval=>sign _ ext_ out); 
183 
extraunit: idextra port map(regw=>regwr _ sig, trwr=>trw, vrwr=>vrw, wrregin=>wrregsig, 
wrtagin=>wrtagsig, wrvalin=>wrvalsig, wrdataout=>wrdataout); 
muxctcomp: mux_ct port map(n_op=>opcodesig, lm_op=>morfmex, lm=>lmfmex, optoct=>optoctsig); 
end architecture id_ stage_ beh; 
--Individual Components 
-- New Design using block ram- T AG/Y AL reg file 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity tagregfile is 
port(TRNUMS, TRNUMD: in std_logic_vector(4 downto O); 
tag_ in: in std _logic_ vector( 63 down to O); 
elk, tr_ write: in std _logic; 
tag_out: out std_logic_vector(63 downto O)); 
end entity tagregfile; 
architecture tagregfile _ beh of tagregfile is 
--components 
component tag_ block is 
port(addr: in std_logic_vector(4 downto O); 
din: in std _logic_ vector(3 I down to O); 
dout: out std_logic_vector(31 downto O); 
elk: in std_ logic; 
wtr: in std_logic); 
end component tag_ block; 
component muxreg I is 
port(SRC, DST: in std_logic_ vector(4 downto O); 
s _ wr: in std _logic; 
RSRD: out std_logic_ vector(4 downto O)); 
end component muxreg I ; 
--signals 
signal regaddress: std _logic_ vector( 4 down to O); 
begin 
rnuxreg_comp_tag: muxregl port map(SRC=>TRNUMS, DST=>TRNUMD, s_wr=>tr_write, 
RSRD=>regaddress ); 
tag_blk_compO: tag_block port map(addr=>regaddress, din=>tag_in(63 downto 32), dout=>tag_out(63 
downto 32), clk=>clk, wtr=>tr_write); 
tag_ blk _ comp 1: tag_ block port map(addr=>regaddress, din=>tag_in(3 l down to 0), dout=>tag_ out(3 l 
downto 0), clk=>clk, wtr=>tr_write); 
end architecture tagregfile _ heh; 
--Individual Components 
-- Tag reg Design using Block RAM 
library IEEE; 
use IEEE.std_logic_ 1164.all; 
entity tag_ block is 
port(addr: in std_logic_vector(4 downto O); 
din: in std_logic_ vector(3 I downto 0); 
dout: out std _logic_ vector(3 l down to 0); 
elk: in std_logic; 
wtr: in std _logic); 
end entity tag_ block; 
architecture tag_ behave of tag_ block is 
component RAMB4 _ S 16 _ S 16 is 
port(ADDRA, ADDRB: in std_logic_vector(7 downto 0); 
CLKA, CLKB: in std_logic; 
DIA, DIB: in std_logic_vector{l5 downto 0); 
DOA, DOB: out std_logic_ vector{l5 downto 0); 
ENA, ENB, RSTA, RSTB, WEA, WEB: in std_logic); 
end component RAMB4 _ S 16 _ S 16; 
signal vcc, gnd: std_logic; 
signal addr_ablk, addr_bblk: std_logic_ vector(? downto 0); 
begin 
vcc <= ' l'; 
gnd <= '0'; 
addr_ablk <= "00" & addr & vcc; 
addr_bblk <= "00" & addr & gnd; 
tagram0: RAMB4_Sl 6_Sl 6 port map(ADDRA=>addr_ablk, ADDRB=>addr_bblk, CLKA=>clk, 
CLKB=>clk, DIA=>din(3 I down to 16), DlB=>din( 15 down to 0), DOA=>dout(3 l downto 16), 
DOB=>dout(l5 downto 0), ENA=>vcc, ENB=>vcc, RSTA=>gnd, RSTB=>gnd, WEA=>wtr, WEB=>wtr); 
end architecture tag_ behave; 
-- MUX fro choosing btw RS I and RD 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity muxreg 1 is 
port(SRC, DST: in std_logic_ vector(4 downto 0); 
s_wr: in std_logic; 
RSRD: out std_logic_ vector(4 downto 0)); 
end entity muxregl ; 
architecture muxreg I_ beh of muxreg l is 
begin 
process(SRC, DST, s_wr) is 
begin 
cases wr is 
when '0' => RSRD <= SRC; 
when ' l ' => RSRD <= DST; 




end architecture muxregl_beh; 
-- GPR REG FILE 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity regfile is 
port(RD, RS l , RS2: in std_logic_ vector(4 downto 0); 
cfg_in, bitmapin: in std_logic_vector(N-1 downto 0); 
writedata: in std _logic_ vector( 63 downto 0); 
elk, regwrite, regread: in std _logic; 
-- rdwrdataout: out std_ logic_ vector(63 downto 0); (need to have later) 
readdatal , readdata2: out std _logic_ vector(63 downto 0)); 
end entity regfile; 
architecture reg_ beh of regfile is 
-- components 
component blockram is 
port(addrl , addr2: in std_logic_ vector(4 downto 0); -- for RS I and RS2, need to have mux for choosing 
either RS l or RD 
dinl, din2: in std_logic_vector(l5 downto 0); 
doutl , dout2: out std_logic_ vector( I 5 downto 0); 
elk: in std_logic; 
wrl, enrl , enr2: in std_logic); --wrl - write is always in port I , enrl, enr2 - for reg reads from 2 ports 
end component blockram; 
component muxreg is 
port(SRC, DST: in std _logic_ vector( 4 downto 0); 
s _ wr: in std_ logic; 
RSRD: out std_logic_ vector(4 downto 0)); 
end component muxreg; 
--signals 
signal regaddrl: std_logic_ vector(4 downto 0); 
signal en! , en2: std_logic; 
begin 
en!<= regread or regwrite; 
en2 <= regread; 
muxreg_ comp: muxreg port map(SRC=>RS I, DST=>RD, s _ wr=>regwrite, RSRD=>regaddr I); 
bram _ comp I: blockram port map(addrl =>regaddr I, addr2=>RS2, din I =>writedata(63 downto 48), 
din2=>writedata(63 downto 48), doutl=>readdatal(63 downto 48), dout2=>readdata2(63 downto 48), 
clk=>clk, wr I =>regwrite, enr l =>en I , enr2=>en2); 
bram comp2: blockram port map(addrl=>regaddrl , addr2=>RS2, dinl=>writedata(47 downto 32), 
din2~>writedata(47 downto 32), doutl =>readdatal(47 downto 32), dout2=>readdata2(47 downto 32), 
clk=>clk, wr l =>regwrite, enr I =>en I, enr2=>en2); 
bram comp3: blockram port map( addr I =>regaddr I, addr2=>RS2, din I =>writedata(3 l down to 16), 
din2~>writedata(3 I downto 16), doutl =>readdatal (31 downto 16), dout2=>readdata2(3 l downto 16), 
clk=>clk, wrl=>regwrite, enrl=>en l , enr2=>en2); 
186 
bram_comp4: blockram port map(addrl=>regaddrl, addr2=>RS2, dinl=>writedata(l5 downto 0), 
din2=>writedata(l5 downto 0), doutl =>readdatal(lS downto 0), dout2=>readdata2(15 downto 0), 
clk=>clk, wr I =>regwrite, enr I =>en 1, enr2=>en2); 
end architecture reg_beh; 
--Individual components 
-- MUX for choosing btw RS I and RD 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std_ logic_ unsigned.all; 
entity muxreg is 
port(SRC, DST: in std_logic_ vector(4 downto 0); 
s_wr: in std_logic; 
RSRD: out std _logic_ vector( 4 down to 0)); 
end entity muxreg; 
architecture muxreg_ beh of muxreg is 
begin 
process(SRC, DST, s_wr) is 
begin 
cases wr is 
when '0' => RSRD <= SRC; 
when 'I' => RSRD <= DST; 
when others => null; 
end case; 
end process; 
end architecture muxreg_ beh; 
-- Block Ram 
library IEEE; 
use IEEE.std_logic_I l64.all; 
entity blockram is 
port(addrl, addr2: in std_logic_vector(4 downto O); -- for RSI and RS2, need to have mux for choosing 
either RS I or RD 
dinl , din2: in std_logic_vector(lS downto 0); 
doutl , dout2: out std_logic_vector(lS downto 0); 
elk: in std_logic; 
wr 1, enr I, enr2: in std _logic); --wrl - write is always in port I, enrl, enr2 - for reg reads from 2 ports 
end entity blockram; 
architecture ram behave ofblockram is 
component RAMB4 _ S 16 _ S 16 is 
port(ADDRA, ADDRB: in std_logic_vector(7 downto O); 
CLKA, CLKB: in std_logic; 
DIA, DIB: in std _logic_ vector(l 5 downto 0); 
DOA, DOB: out std_logic_vector( l5 downto 0); 
ENA, ENB, RSTA, RSTB, WEA, WEB: in std_logic); 
end component RAMB4 _ S 16 _ S l 6; 
187 
signal gnd: std _logic; 
signal addr_ablk, addr_bblk: std_logic_ vector(? downto 0); 
begin 
gnd <= '0'; 
addr _ ab lk <= "000" & addrl ; 
addr_bblk <= "000" & addr2; 
gpregram0: RAMB4 _SI 6 _ S 16 port map(ADDRA=>addr _ ablk, ADDRB=>addr _ bblk, CLKA=>clk, 
CLKB=>clk, DIA=>dinl , DIB=>din2, DOA=>doutl , DOB=>dout2, ENA=>enrl , ENB=>enr2, 
RST A=>gnd, RSTB=>gnd, WEA=>wr I, WEB=>gnd); 
end architecture ram_behave; 
--Sign Extend Unit 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std_logic _ arith.all; 
use IEEE.std _logic _unsigned.all; 
entity signext is 
generic(N: positive := 64; 
irnm: positive := 16); 
port(immval: in std_logic_vector(imm-1 downto 0); 
sign: in std_logic; 
extdval: out std _logic_ vector(N-1 downto 0)); 
end entity signext; 
architecture signextd _ beh of signext is 
signal stoint, intval 1: integer; 
begin 
process(immval, sign, stoint, intval I) 
begin 
stoint <= conv _ integer(immval); 
case sign is 
when '0' => intval 1 <= stoint; 
when 'I' => intval l <= -stoint; 
when others => null; 
end case; 
extdval <= conv _ std _logic_ vector(intvall , N); 
end process; 
end architecture signextd _ beh; 
-- micro instructions controller for ESPR 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std logic arith.all; 
use IEEE.std =logic= unsigned.all; 
entity cntunit0 is 
port( opcode: in std _logic_ vector(5 downto 0); 
188 
Joe: in std _logic_ vector(2 down to 0); 
ffpin: in std _logic_ vector(? downto 0); 
ocr _ val, aer _ val: out std _logic_ vector(? downto 0); -- for 8 bit output code register 
ctrlsigs: out std _logic_ vector(24 downto 0)); 
end entity cntunit0; 
architecture cntbeh of cntunit0 is 
begin 
process(opcode, Joe, ffpin) is 
begin 
case opcode is 
when "000000" => 
ctrlsigs <=(others=> '0'); -- NOP for opcode '000000' 
ocr _val<= ( others => '0'); -- No Status 
aer _ val <= ( others => '0'); 
when "00000 I" => 
-- IN goes to inp ykt_ ctr Ir 
ctrlsigs <= "0 10000000000000000000000 l "; 
ocr _val <= ( others => '0'); 
aer _ val <= ( others => '0'); 
when "0000 IO" => 
--OUT 
ctrlsigs <= " I 000000000000000000000000"; 
ocr _ val <= (others=> '0'); 
aer_ val<= (others=> '0'); 
when "000011" => 
-- FWD goes to outp ykt_ ctr Ir 
ctrlsigs <= "0000000000000000000001100"; 
ocr _ val <= "0000000 l"; 
aer _ val <= ffpin; 
when "000 100" => 
-- ABORT! goes to outpykt_ctrlr -- ABORTl-sets LOC bits to '0' 
ctrlsigs <= "0000000000000000000001100"; 
ocr _ val <= "000000 l 0"; 
aer_val(3) <= '0'; 
aer _ val(? downto 4) <= ffpin(7 down to 4); -- AER = Unused(7-5), R(4), E(3), LOC(2-0) 
aer _ val(2 downto 0) <= ffpin(2 downto 0); 
when "00010 I" => 
-- DROP 
ctrlsigs <= "0000000000000000000001000"; 
ocr_val <= "00000011"; 
aer _ val <= ( others => '0'); 
when "000110" => 
-- CLR - for GPRs - has reg write ctr! signal on 
ctrlsigs <= "00I0000000000000000110000"; -- S6 - I for ALU out in WB, 0 for ESS out in WB 
ocr_val <=(others=> 'O'); 
aer_val <=(others=> '0'); 
when "000111" => 
-- MOVE for GPRS - has reg write ctr! signal on 
ctrlsigs <= "0010000000000000000110000" ; 
ocr_val <= (others=> '0'); 
aer _ val <= (others => '0'); 
when "00 l 000" => 
-- MOVI for GPRS 
ctrlsigs <= "0000000000000000000110000" ; 
189 
ocr_val <=(others=> '0'); 
aer_val <=(others => '0'); 
when "00100 1" => 
-- ADD for GPRS last bit is for load status reg 
ctrlsigs <= "00 I 0000000000000 l 00 110000"; 
ocr _ val <= ( others => '0'); 
aer_ val <= ( others => '0'); 
when "00 l O 10" => 
-- SUB for GPRS last bit is for load status reg 
ctrlsigs <= "00 10000000000000 101 l 10000"; 
ocr _ val <= ( others => '0'); 
aer_val <=(others=> '0'); 
when "001011" => 
-- INCR for GPRS last bit is for load status reg 
ctrlsigs <= "00I0000000000000 110 110000"; 
ocr _ val <= ( others => '0'); 
aer_val <=(others=> '0'); 
when "001100" => 
-- DECR for GPRS last bit is for load status reg 
ctrlsigs <= "00I00000000000001 11 1 I 0000"; 
ocr_val <=(others=> '0'); 
aer_val <= (others=> '0'); 
when "00 1101" => 
-- OR for GPRS last bit is for load status reg 
ctrlsigs <= "00l0000000000001010110000"; 
ocr _ val <= (others=> 'O'); 
aer _ val <= (others=> '0'); 
when "001110" => 
-- AND for GPRS last bit is for load status reg 
ctrlsigs <= "00I00000000000010 111 10000"; 
ocr _ val <= ( others => '0'); 
aer _ val <= ( others => '0'); 
when "001 111" => 
-- EXOR for GPRS last bit is for load status reg 
ctrlsigs <= "00l0000000000001 100110000"; 
ocr _ val <= ( others => 'O'); 
aer _ val <= ( o thers => '0'); 
when "O I 0000" => 
-- ONES COMP for GPRS last bit is for load status reg 
ctrlsigs <= "00I0000000000000010110000"; 
ocr _ val <= ( others => '0'); 
aer_val <=(others=> 'O'); 
when "01000 1" => 
-- SHL for GPRS 
ctrlsigs <= "00100000000000I0000110000"; 
ocr _ val <= (others=> '0'); 
aer _ val <= ( others => 'O'); 
when "010010" => 
-- SHR for GPRS 
ctrlsigs <= "0010000000000100000 110000"; 
ocr _ val <= (others=> '0'); 
aer_val <=(others=> '0'); 
when "0100 11" => 
-- ROL for GPRS 
ctrlsigs <= "00I000000000011 00001 10000"; 
ocr_ val<= (others=> '0'); 
190 
aer_val <= (others => '0'); 
when "0101 00" => 
-- ROR for GPRS 
ctrlsigs <= "00 I 000000000 I 000000 I 10000"· 
ocr _ val <= ( others => '0'); ' 
aer_val <= (others => '0'); 
when "01010 1" => 
-- LFPR 
ctrlsigs <= "0000000010000000000 I 10000"; 
ocr_val <= (others=> '0'); 
aer_val <=(others => '0'); 
when "010110" => 
-- STPR 
ctrlsigs <= "0010000001000000000000000"; 
ocr _ val <= (others => '0'); 
aer_val <= (others=> '0'); 
when "010111" => 
-- BRNE 
ctrlsigs <= "00 I 0000 I 00000000000000000"; 
ocr _ val <= (others=> '0'); 
aer_val <=(others => '0'); 
when "0 11000" => 
-- BREQ 
ctrlsigs <= "00 I 000 I 000000000000000000"; 
ocr _ val <= ( others => '0'); 
aer _ val <= (others=> '0'); 
when "011001" => 
-- BRGE 
ctrlsigs <= "00 I 000 l l 00000000000000000"; 
ocr _ val <= (others=> '0'); 
aer _ val <= (others => '0'); 
when "011010" => 
-- BNEZ 
ctrlsigs <= "00 I 00 I 0000000000000000000"; 
ocr_val <= (others => '0'); 
aer_ val <= (others => '0'); 
when "011011" => 
-- BEQZ 
ctrlsigs <= "00I0010 I 00000000000000000"; 
ocr_val <=(others=> '0'); 
aer _ val <= (others => '0'); 
when "0 11100" => 
-- JMP 
ctrlsigs <= "000 I 000000000000000000000"; 
ocr _ val <= (others => '0'); 
aer_val <= (others => '0'); 
when "011101" => 
-- RET 
ctrlsigs <= "0000 I 00000000000000000000"; 
ocr_val <= (others => '0'); 
aer_val <= (others => '0'); 
when "01 1110" => 
-- GET 
ctrlsigs <= "00 I00000000100000000000 IO"; 
ocr _val <= (others => '0'); 
aer_ val <= (others => '0'); 
191 
when 11011111 11 => 
-- PUT 
ctrlsigs <= "00 l 0000000 l 000000000000 I 011 ; 
ocr _ val <= (others=> '0'); 
aer _ val <= (others=> '0'); 
when II l 0000011 => 
-- BGF -- have this as branch type instr - no connection with ESS ctr! 
ctrlsigs <= 1100000 I !00000000000000000011 ; 
ocr _ val <= (others=> '0'); 
aer _ val <= (others=> '0'); 
when II I 0000 l 11 => 
-- BPF -- have this as branch type instr - no connection with ESS ctr! 
ctrlsigs <= "000001 l 10000000000000000011 ; 
ocr_val <=(others => '0'); 
aer _ val <= (others=> '0'); 
-- NEWLY ADDED as on 5-6-02 
when" 10001011 => 
-- ABORT2 goes to outp_pkt_ctrlr -- ABORT2-sets LOC bits to 'O' and sets E bit to'!' 
ctrlsigs <= "000000000000000000000110011 ; 
ocr _ val <= "0000010011 ; 
-- AER = Unused(7-5), R( 4), E(3), LOC(2-0) 
aer _ val(3) <= ' I'; 
aer_ val(? downto 4) <= ffpin(7 downto 4); -- AER = Unused(7-5), R(4), E(3), LOC(2-0) 
aer _ val(2 downto 0) <= ffpin(2 downto 0); 
when II I 000 I l 11 => 
-- BLT 
ctrlsigs <= "00 l 000000000000000000000011 ; 
ocr _ val <= ( others => '0'); 
aer_val <= (others => '0'); 
when 11 10010011 => 
-- SETLOC 
ctrlsigs <= 11000000000000000000000010011 ; 
ocr_ val <= (others => '0'); 
aer _ va1(2 downto 0) <= loc; 
aer_ val(7 downto 3) <= ffpin(7 downto 3); 
when 11 100101 11 => 
ctrlsigs <=(others=> '0'); 
ocr _ val <= ( others => 'O'); 
aer_val <= (others=> '0'); 
when II I 00 11 O" => 
ctrlsigs <= (others => '0'); 
ocr_val <= (others=> '0'); 
aer _ val <= (others => '0'); 
when II l 00 111 11 => 
ctrlsigs <=(others => '0'); 
ocr_val <= (others=> '0'); 
aer _ val <= (others=> '0'); 
when 11 101000" => 
ctrlsigs <= (others=> '0'); 
ocr _ val <= (others=> '0'); 
aer _ val <= (others => '0'); 
when II IO l 00 I 11 => 
ctrlsigs <=(others=> '0'); 
ocr _ val <= (others=> '0'); 
aer_val <=(others=> '0'); 
when " 10101011 => 
192 
ctrlsigs <= (others=> 'O'); 
ocr _ val <= (others=> 'O'); 
aer _ val <= (others => 'O'); 
when "IO 10 11" => 
ctrlsigs <=(others => 'O'); 
ocr_val <=(others => 'O'); 
aer_ val<= (others => 'O'); 
when "l O 1100" => 
ctrlsigs <= (others=> 'O'); 
ocr_val <=(others=> 'O'); 
aer _ val <= (others=> 'O'); 
when" IOI 101" => 
ctrlsigs <= (others=> 'O'); 
ocr _ val <= (others => 'O'); 
aer _ val <= ( others => 'O'); 
when " IO I 11 O" => 
ctrlsigs <= (others => '0'); 
ocr _ val <= (others=> 'O'); 
aer _ val <= (others=> 'O'); 
when " IO I 11 I " => 
ctrlsigs <= (others => 'O'); 
ocr _ val <= ( others => 'O'); 
aer _ val <= (others => 'O'); 
when " I I 0000" => 
ctrlsigs <=(others => 'O'); 
ocr _val <= (others => '0'); 
aer_val <=(others=> 'O'); 
when " 11000 I" => 
ctrlsigs <= ( others => 'O'); 
ocr_ val <= (others => '0'); 
aer _ val <= ( others => 'O'); 
when" 110010" => 
ctrlsigs <= (others=> 'O'); 
ocr _val <= (others=> 'O'); 
aer _ val <= (others => 'O'); 
when " 1100 11" => 
ctrlsigs <=(others => 'O'); 
ocr _ val <= (others=> 'O'); 
aer_val <= (others => 'O'); 
when" 110100" => 
ctrlsigs <= (others => 'O'); 
ocr _ val <= ( others => 'O'); 
aer _ val <= ( others => 'O'); 
when " I IO IO I" => 
ctrlsigs <=(others=> 'O'); 
ocr _ val <= (others => 'O'); 
aer _ val <= (others => 'O'); 
when" I 10110" => 
ctrlsigs <= (others => 'O'); 
ocr _ val <= (others => 'O'); 
aer_val <=(others=> '0'); 
when "110111" => 
ctrlsigs <= (others => 'O'); 
ocr _ val <= (others => 'O'); 
aer _ val <= (others => 'O'); 
when "111000" => 
193 
ctrlsigs <=(others => 'O'); 
ocr_val <=(others => 'O'); 
aer _ val <= ( o thers => 'O'); 
when II I 11 00 I 11 => 
ctrlsigs <=(others => 'O'); 
ocr _ val <= (others=> 'O'); 
aer _ val <= ( others => 'O'); 
when " 11 IO IO" => 
ctrlsigs <=(others=> 'O'); 
ocr _ val <= ( others => 'O'); 
aer _ val <= ( others => 'O'); 
when " I I I O 11" => 
ctrlsigs <= (others => 'O'); 
ocr_val <=(others=> 'O'); 
aer _ val <= ( others => 'O'); 
when 11 111100" => 
ctrlsigs <=(others => 'O'); 
ocr_val <=(others => 'O'); 
aer _ val <= (others=> 'O'); 
when " 111 lO ! 11 => 
ctrlsigs <= (others => 'O'); 
ocr_val <= (others=> '0'); 
aer_val <= (others => 'O'); 
when " 11111 O" => 
ctrlsigs <= (others=> 'O'); 
ocr _ val <= ( others => 'O'); 
aer _ val <= ( others => 'O'); 
when " 111 I I I" => 
ctrlsigs <= (others=> 'O'); 
ocr_val <=(others=> '0'); 
aer_val <=(others => 'O'); 
when others => 
ctrlsigs <= (others => 'O'); 
ocr_val <= (others=> 'O'); 
aer _ val <= ( others => 'O'); 
end case; 
end process; 
end architecture cntbeh; 
-- extra circuit for getting written values 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std_ logic_ unsigned.all; 
entity idextra is 
generic(N: positive:= 64); 
port(regw, trwr, vrwr: in std_logic; 
wrregin, wrtagin, wrvalin: in std_logic_vector(N-1 downto 0); 
wrdataout: out std _logic_ vector(N-1 downto O)); 
end entity idextra; 
architecture idextra_beh of idextra is 
signal wrtv: std_logic_vector(2 downto O); 
begin 
194 
wrtv <= regw&trwr&vrwr; 
process(wrtv, wrregin, wrtagin, wrvalin) is 
begin 
case wrtv is 
when " I 00" => wrdataout <= wrregin; 
when "0 IO" => wrdataout <= wrtagin; 
when "00 I" => wrdataout <= wrvalin; 
when others=> wrdataout <= (others => '0'); 
end case; 
end process; 
end architecture idextra_beh; 
-- MUX for choosing the INST opcode for controller 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity mux _ ct is 
port (n_op, lm_op: in STD_LOGIC_ VECTOR (5 downto 0); 
Im: in STD_LOGIC; 
optoct: out STD_ LOGIC_ VECTOR (5 downto 0) ); 
end entity mux_ct; 
architecture mux _ ct_ arch of mux ct is 
begin 
process (n_op, lm_op, Im) is 
begin 
case Im is 
when '0' => optoct <= n_op; 
when 'I' => optoct <= Im_ op; 




-- ID/EX stage Regsiter 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _ logic _unsigned.all; 
entity ess_idexreg is 
generic(N: positive := 64; 
Addr: positive := 16); 
port(clk, ID_Flush: in std_logic; 
ctrlin: in std_logic_ vector(24 downto 0); 
WB_in: in std_logic_vector(3 downto 0); 
EX_in: in std_logic_vector(12 downto 0); 
PKT_in: in std_logic_vector(6 downto 0); 
GPR read I in GPR read2 in, sign ext in: in std _logic_ vector(N-1 downto O); 
TR read in - YR read in: ~ std logic v-;;ctor(N-1 downto O); 
Br -Add;-rn', PKT Offset in: in std _logic_ vector(Addr-1 down to 0); 
sh;mt_i; in std_l;;-gic_ ve~tor(5 downto 0); 
195 
lmor_in,jin_id, rin_id: in std_logic; 
ocr _in _id, aer _in_id: in std _logic_ vector(? down to O); 
RS I _in, RS2 _in, RD _in, TR _in, VR_ in: in std _logic_ vector( 4 downto O); 
opcodein: in std_logic_vector(S downto 0); 
opcodeexout: out std _logic_ vector(S downto O); 
ctrlout: out std _logic_ vector(24 downto O); 
WB _ out: out std _logic_ vector(3 downto O); 
EX_ out: out std _logic_ vector(l 2 downto O); 
PKT_ out: out std _logic_ vector( 6 downto O); 
GPR_readl_out, GPR_read2_out, sign_ext_out: out std_logic_vector(N-1 downto O); 
TR _read_ out_ ID, YR _read_ out_ ID: out std _logic_ vector(N-1 downto O); 
Br _Addr _ out, PKT_ Offset_ out: out std _logic_ vector(Addr- 1 down to O); 
shamt_out: out std_logic_vector(S downto O); 
lmor_ out, TRD _ out, VRD _ out, jout_id, rout_id: out std _logic; 
ocr _ out_id, aer _ out_id: out std_logic _ vector(? down to O); 
RSl_out, RS2_out, RD_out, TR_out, VR_out: out std_logic_vector(4 downto O)); 
end entity ess_idexreg; 
architecture ess _ idexreg_ beh of ess _idexreg is 
begin 
process( elk, ID _Flush, ctrlin, WB _in, EX _in, PKT _in, GPR _read I _in, GPR _read2 _in, sign_ ext_in, 
TR_read_in, VR_read_in, jin_id, rin_ id, Br_Addr_in, PKT_Offset_in, lmor_in, shamt_in, ocr_in_id, 
aer_in_id, RSl_in, RS2_in, RD_in) is 
begin 
if(falling_edge(clk)) then 
case ID Flush is 
when 'O' => 
WB out <= WB _ in; 
EX_out <= EX_in; 
PKT_out <= PKT_in; 
GPR read I out <= G PR read I in; 
GPR=read2=out <= GPR=read2_ in; 
TR_read_out_ID <= TR_read_in; 
VR_read_out_ID <= VR_read_in; 
sign_ ext_ out <= sign_ ext_in; 
Br_ Addr _ out <= Br_ Addr _ in; 
PKT Offset_ out <= PKT_ Offset_ in; 
shamt_out <= sharnt_in; 
lmor_out <= lmor_in; 
TRD_out <= WB_in(3); 
VRD_out <= WB_in(2); 
ocr_out_id <= ocr_in_id; 
aer_out_id <= aer_in_id; 
RS l_out <= RS !_in; 
RS2_out <= RS2_in; 
RD_out <= RD_in; 
TR_out <= TR_in; 
VR_out <= VR_ in; 
opcodeexout <= opcodein; 
jout_id <= jin_id; 
rout_id <= rin _id; 
ctrlout <= ctrlin; 
196 
shamt: in std_logic_vector(5 downto 0); 
regrd, trwx, trww, vrwx, vrww, rwx, rww: in std_logic; --new 
alu_ 0: out std_logic; 
ctrloutEX: out std _logic_ vector(24 downto 0); 
opoutEX, mo: out std_ logic_ vector(5 downto 0); 
aluout, GPRI out, GPR2out, tagsigout: out std _logic_ vector(63 downto 0); 
RSl _out, RS2_out, RD_out, TR_out, VR_out: out std_logic_vector(4 downto 0); 
WBct_ out: out std _logic_ vector(3 downto 0); 
braddrout: out std _logic_ vector(i 5 down to 0); 
gf, pf, ess_full , le, AK, PRr, ldor, EPo, cok, lz: out std_logic; 
outvalue: out std _logic_ vector( 63 down to 0); 
oo: out std _logic_ vector(? down to 0); 
stag: out std_logic_vector(2 downto 0); 
oram, fl: out std _logic_ vector(3 l downto 0); 
po: out std _logic_ vector(3 l downto 0)); 
end entity ex3top; 
architecture ex3top _ beh of ex3top is 
--components 
component ex3stage is 
port(clk, clock, clk_pkt, cir, IDV, EOP _in, OPRAMready: in std_logic; 
inp _ fm _ram: in std _logic_ vector(3 I down to 0); 
flag, ocrID: in std _logic_ vector(? down to 0); 
PKToffid: in std_logic_ vector(6 downto 0); -- for LFPR and STPR 
RS I rgid, RS2rgid, TRrgid, VRrgid: in std _logic_ vector( 4 downto 0); 
FSTRD, FSTTRD, FSTVRD, VSTRD, VSTTRD, VSTVRD: in std_logic_vector(4 downto 0); --new 
op_in, prop_in: in std_logic_vector(5 downto 0); 
GPRlid, GPR2id, TRidv, VRidv, extid, WBdatain, aofmex: in std_logic_ vector(63 downto O); 
EXctid: in std _logic_ vector(9 down to 0); --12 downto l 0(branch) get in next stage 
PKTctid: in std_logic_ vector(6 downto 0); 
shamt: in std _logic_ vector(S downto 0); 
regrd, trwx, trww, vrwx, vrww, rwx, rww, putin, lmor: in std_logic; --new 
alu _ 0 , ACK _in, PRready, ldopram, EOP _ out, crcchkok, locz: out std _logic; 
ashout, al out, a2out, tagmuxout, pkttoregs, outvalue: out std_logic_vector(63 downto O); 
oo: out std _logic_ vector(? downto 0); 
stag: out std _logic_ vector(2 downto 0); 
gf, pf, ess_full, le: out std_logic; 
mopregout: out std_logic_vector(S downto 0); 
out to ram, firstoutp: out std logic vector(3 I downto 0); 
pkt= out: out std _logic_ vector(3 l do-;,,nto 0)); 
end component ex3stage; 
component ex3_ex4_reg is 
port(clk, EX_Flush_in: in std_ logic; 
braddrin: in std _logic_ vector(l 5 down to 0); 
ctrlinEX: in std_ logic_ vector(24 downto 0); 
opinEX: in std_logic_ vector(5 downto 0); 
WB in fm ex: in std logic vector(3 downto 0); 
RS ( in-:=_~ex, RS2]n_fm=ex, RD_in_fm_ex, TR_in_fm_ex, VR_in_fm_ex: in std_logic_vector(4 
downto 0); 
aluout_fm_ex, pktout_fm_ex, GPRlin, GPr2in: in std_logic_vector(63 downto O); 
braddrout: out std_logic_vector( l S downto 0); 
ctrloutEX: out std_logic_ vector(24 downto 0); 
opoutEX: out std _logic_ vector(5 downto 0); 
aluout_to _ wb, pktout_to _ wb, GPR 1 out, GPR2out: out std _logic_ vector(63 downto O); 
198 
RS I_ out_ to _regs, RS2 _ out_ to _regs, RD_ out_to _regs, TR_ out_ to _regs, YR_ out_ to _regs: out 
std_ logic_ vector(4 downto 0); 
WB_out_fm_ wb: out std_ logic_ vector(3 downto 0)); 
end component ex3_ex4_ reg; 
--signals 
signal ashoutsig, alsig, a2sig, pktinsig, pktoutsig, taginsig: std_ logic_vector(63 downto 0); 
begin 
tagsigout <= taginsig; 
ex3comp: ex3stage port 
map( clk=>clk,clock=>clock,clk _pkt=>clk _pkt,clr=>clr,lDV=>IDV ,EOP _in=>EPi,OPRAMready=>ORr,in 
p _ fm _ram=>iram, flag=>flag,ocrlD=>ocrID,PKToffid=>PKToffid,RS I rgid=>RS I rgid,RS2rgid=>RS2rgid, 
TRrgid=> TRrgid,VRrgid=> VRrgid,FSTRD=>FSTRD,FSITRD=>FSTTRD,FSTVRD=>FSTVRD,VSTR 
D=>VSTRD,VSITRD=>VSITRD,VSTVRD=>VSTVRD,op_in=>op_ in,prop_in=>prop_ in,GPRlid=>GP 
RI id,GPR2id=>GPR2id, TRidv=> TRidv, VRidv=> VRidv ,extid=>extid, WBdatain=> WBdatain,aofmex=>ao 
fmex,EXctid=>EXctid,PKTctid=>PKTctid,shamt=>shamt,regrd=>regrd,trwx=>trwx,trww=>trww,vrwx=> 
vrwx, vrww=>vrww .rwx=>rwx,rww=>rww ,putin=>putin,lmor=>lm,alu _ O=>alu _ O,outvalue=>outvalue,A 
CK _ in=> AK,PRready=>PRr,ldopram=>ldor,EOP _ out=>EPo,crcchkok=>cok,locz=>lz,ashout=>ashoutsig, 
a I out=>a I sig,a2out=>a2sig,tagmuxout=>taginsig,pkttoregs=>pktinsig,oo=>oo,stag=>stag,gf=>gf,pf=>pf,e 
ss _ full=> ess _ full ,le=>le,mopregout=>mo,out_ to _ram=>oram,firstoutp=>fl ,pkt_ out=>po ); 
ex3regcomp: ex3 _ ex4 _reg port 
map( clk=>clk,EX _Flush_ in=>EX _Flush_ in,braddrin=>braddrin,ctrlinEX=>ctrlinEX,opinEX=>op _in, WB 
_ in_fm_ex=>WBinfmid,RSl_ in_fm_ex=>RSlrgid,RS2_ in_fm_ex=>RS2rgid,RD_in_fm_ex=>RDrgid,TR 
_ in_ fm _ ex=> TRrgid, YR_ in_ fin_ ex=> VRrgid,aluout_ fin_ ex=>ashoutsig,pktout_ fin_ ex=>pktinsig,GPRl in 
=>G PR I id, G PR2 in=>G PR2 id, braddrout=>braddrout,ctrloutEX =>ctrloutEX,opou tEX =>opoutEX,aluout _ t 
o _ wb=>aluout,pktout_ to_ wb=>pktoutsig,GPR I out=>GPR I out,GPR2out=>GPR2out,RS l _ out_to _regs=>R 
S l _out,RS2 _out_ to_regs=>RS2_out,RD _out_to_regs=>RD _out, TR_ out_ to_regs=>TR_ out,VR_out_to_re 
gs=>VR_out,WB _out_fin_ wb=>WBct_out); 
end architecture ex3top _ beh; 
--Individual components 
-- EX 3rd STAGE 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _ logic_ unsigned.all; 
entity ex3stage is 
port(clk, clock, cik_pkt, c ir, IDV, EOP _ in, OPRAMready: in std_ logic; 
inp_fm_ram: in std_ logic_ vector(3 l downto 0); 
flag, ocrlD: in std _logic_ vector(? down to 0); 
PKToffid: in std_ logic_vector(6 downto 0); -- for LFPR and STPR 
RS I rgid, RS2rgid, TRrgid, VRrgid: in std _logic_ vector( 4 downto 0); 
FSTRD, FSTTRD, FSTVRD, VSTRD, VSTTRD, VSTVRD: in std_logic_vector(4 downto 0); --new 
op_ in, prop_in: in std_ logic_vector(S downto 0); 
GPR I id, GPR2id, TRidv, VRidv, extid, WBdatain, aofmex: in std_logic _ vector(63 downto 0); 
EXctid: in std _logic_ vector(9 downto 0); --1 2 downto I 0(branch) get in next stage 
PKTctid: in std_ logic_ vector(6 downto 0); 
shamt: in std _logic_ vector(S downto 0); 
regrd, trwx, trww, vrwx, vrww, rwx, rww, putin, lmor: in std_logic; --new 
alu _ 0 , ACK_in, PRready, ldopram, EOP _ out, crcchkok, locz: out std _logic; 
199 
ashout, a lout, a2out, tagmuxout, pkttoregs, outvalue: out std_logic_vector(63 downto O); 
oo: out std _logic_ vector(? downto O); 
stag: out std _logic_ vector(2 down to O); 
gf, pf, ess_full , le: out std_logic; 
mopregout: out std_logic_vector(5 downto O); 
out_to_ram, firstoutp: out std_logic_vector(31 downto O); 
pkt_ out: out std _logic_ vector(3 l downto O)); 
end entity ex3stage; 
architecture ex3 _ beh of ex3stage is 
--Components 
--ALU 
component alu_chkO is 
generic (N: integer :=64); 
port (a, b: in std_logic_vector(N-1 downto O); 
S3,S4,S5,Cin: in std_logic; 
result: out std_logic_vector(N-1 downto O); 
o: out std_logic); 
end component alu_chkO; 
--Shifter 
component shift is 
generic (N: positive:= 64; 
M: positive := 6); 
port(input: in std_logic_ vector (N- 1 downto O); 
SO, SI , S2: in std_logic; 
shamt: in std _logic_ vector (M-1 downto O); 
output: out std _logic_ vector (N-1 downto O)); 
end component shift; 
--MUX before ALUSH 
component muxalush is 
port(GPR_in, TR_in, VR_in, ALU_Sh_out, ext_in, FST_out, PR_in: in std_logic_vector(63 downto O); 
S8: in std_logic_vector(2 downto 0); 
alumuxout: out std _logic_ vector(63 down to O)); 
end component muxalush; 
--MUX after ALUSH 
component muxout is 
port(aluout_ in, shout_in: in std_logic_vector(63 downto O); 
Sout: in std _logic; 
alu_sh_out: out std_logic_vector(63 downto O)); 
end component muxout; 
--FWD 
component fwd_ new is 
port(curop_in, prevop_in: in std_logic_vector(5 downto O); 
regrd, trwx, trww, vrwx, vrww, rwx, rww: in std_logic; 
RS 1 _in, RS2 _ in, EX_ WB _ RD _ in, EX_ WB _ TRD _in, EX_ WB _ VRD _in, RDoswbtryout, 
VRDoswbtryout, TRDoswbtryout, TR_in, VR_in: in std_logic_vector(4 downto O); 
pktmuxtopk: out std _logic_ vector(2 downto O); 
essmux_tag: out std_logic_vector(l downto O); 
S8: out std_logic_vector(2 downto O); 
S9: out std _logic_ vector(2 downto O); 
SSh: out std _logic_ vector(2 downto O); 
Salush _ out: out std _logic); 
end component fwd_ new; 
--MUX before ESS!T AG 
component muxtag is 
port(TR _in, ALU_ Sh_ out, FST _ out, PR_ in: in std _logic_ vector(63 downto O); 
200 
Stag: in std_logic_vector(2 downto 0); 
tagmuxout: out std _logic_ vector(63 downto 0)); 
end component muxtag; 
--MUX before PKT 
component muxpkt is 
port(GPR_in, TR_in, VR_in, ALU_Sh_out, FST_out, PR_in: in std_logic_vector(63 downto O); 
Spkt: in std _logic_ vector(2 downto 0); 
pktmuxout: out std _logic_ vector(63 downto 0)); 
end component muxpkt; 
--PKT PROC UNIT 
component pktproc is 
generic(M: positive := 32; 
N: positive := 64); 
port(clk, ininst_p, IDV, EOP _in_p, outinst_p, OPRAMready, lfpr_p, stpr_p: in std_logic; 
inp_fm_ram: in std_logic_vector(M- 1 downto 0); 
inp_fm_mux: in std_logic_vector(N-1 downto 0); 
flaginp: in std _logic_ vector(? downto 0); 
crcchkok: out std_logic; 
lfstoff: in std_logic_ vector(6 downto 0); 
ldopram, EOP _out, PRready, ACK_in, locz: out std_logic; 
foutp: out std _logic_ vector(M- 1 downto 0); 
out_to_regs: out std_logic_vector(N-1 downto 0); 
out_ to _ram, pktout: out std_logic _ vector(M-1 downto O)); 
end component pktproc; 
--aereg 
component aereg is 
port( elk, ldaer : in std_ logic; 
flagval_ in: in std_logic _ vector(? downto 0); 
aerout : out std _logic_ vector(? downto 0)); 
end component aereg; 
--ocreg 
component ocreg is 
port(clk, !doer : in std_logic; 
val_ in : in std _logic_ vector(? down to 0); 
ocrout : out std _logic_ vector(? downto 0)); 
end component ocreg; 
--moreg 
component moreg is 
port( elk, ldmor : in std _logic; 
mop_fmpkt_in: in std_logic_vector(5 downto 0); 
mopout : out std_logic_ vector(5 downto 0)); 
end component moreg; 
--ESS 
--In order not to mess up with the existing one, I have the whole ofESS here 
component esstop0 is 
port(tag_in, value_in: in std_logic_vector(63 downto O); 
elk, clock, ess _ we, ess _re, putin: in std _logic; 
gf, pf, ess_full , le: out std_logic; 
outvalue: out std_logic_vector(63 downto 0)); 
end component esstop0; 
-- Signals 
signal sO_sh, s l_sh, s2_sh, s3_alu, s4_alu, s5_alu, cin_alu: std_logic; 
signal lfpr _petri, stpr _petri, ldpkreg_pctrl, ldocr _petri, ldaer _petri, in _petri, out_pctrl: std _logic; 
signal aeroutsig, moroutsig, morout, ocrout: std _logic_ vector(? down to O); 
signal gf_ fm _ ess, pf_ fm _ ess, ccroutsigg, ccroutsigp, Osig: std _logic; 
201 
sign~! aluinl sig, aluin2sig, aluoutsig, shftinsig, shftoutsig, essoutvalue, tag_to _ ess, val_to _ ess, bdusig I, 
bdus1g2, PRmuxsigin, pktmuxoutsig, tagmuxoutsig: std_logic_ vector(63 downto 0); 
signal S8sig, S9sig, SShsig: std _logic_ vector(2 downto 0); 
signal stag_sig, spkt_sig: std_logic_vector(2 downto 0); 
signal Ssig, we_ess, re_ess: std_logic; 
signal stag_ fmfwd: std_logic _ vector(] downto 0); 
begin 
alu _ 0 <= Osig; 
a l out <= aluin I sig; 
a2out <= aluin2sig; 
pkttoregs <= PRmuxsigin; 
stag <= stag_sig; 
tagmuxout <= tagrnuxoutsig; 
lfpr _petri <= PKTctid(4); -- ctrlsigs(I 6) 
stpr_pctrl <= PKTctid(3); -- ctrlsigs( 15) 
we_ess <= EXctid(9); -- ctrlsigs(l4) 
re_ess <= EXctid(8); -- ctrlsigs(l3) 
!doer _petri <= PKTctid(2); -- ctrlsigs(3) 
ldaer_pctrl <= PKTctid(l); -- ctrlsigs(2) 
ldpkreg_pctrl <= PKTctid(0); -- ctrlsigs(0) 
in _petri<= PKTctid(S); --ctrlsigs(23) 
out_pctrl <= PKTctid(6); --ctrlsigs(24) 
s0 _ sh <= EXctid(7); -- ctrlsigs( 12) 
s l_sh <= EXctid(6); -- ctrlsigs( l I) 
s2_sh <= EXctid(S); -- ctrlsigs(I0) 
s3 _ alu <= EXctid( 4 ); -- ctrlsigs(9) 
s4_alu <= EXctid(3); -- ctrlsigs(8) 
s5 _ alu <= EXctid(2); -- ctrlsigs(7) 
cin_alu <= EXctid(l); -- ctrlsigs(6) 
oo <= ocrout; 
stag_sig <= cir & stag_fmfwd; 
-- Mapping 
alucomp: alu_chk0 port map(a=>aluinlsig, b=>aluin2sig, S3=>s3 alu, S4=>s4 alu, S5=>s5 alu, 
Cin=>cin_alu, result=>aluoutsig, o=>Osig); - - -
alumuxlcomp: muxalush port map(GPR_in=>GPRlid, TR_in=>TRidv, VR_in=>VRidv, 
ALU_ Sh_ out=>aofmex, ext_ in=>extid, FST _ out=>WBdatain, PR_ in=>PRmuxsigin, S8=>S8sig, 
alumuxout=>aluinl sig) ; 
alumux2comp: muxalush port map(GPR_in=>GPR2id, TR_in=>TRidv, VR_in=>VRidv, 
ALU_ Sh_ out=>aofmex, ext_ in=>extid, FST _ out=> WBdatain, PR_ in=> PRmuxsigin, S8=>S9sig, 
alumuxout=>aluin2sig); 
shiftcomp: shift port map( input=>shftinsig, S0=>s0 _ sh , S 1 =>s l _ sh, S2=>s2 _ sh, shamt=>sharnt, 
output=>shftoutsig); 
shmuxcomp: muxalush port map(GPR_in=>GPRlid, TR_in=>TRidv, VR_in=>VRidv, 
ALU_ Sh_ out=>aofmex, ext_ in=>extid, FST _ out=>WBdatain, PR _in=>PRmuxsigin, S8=>SShsig, 
alumuxout=>shftinsig); 
alushmuxoutcomp: muxout port map(aluout_in=>aluoutsig, shout_ in=>shftoutsig, Sout=>Ssig, 
alu _sh_out=>ashout); 
fwdcomp: fwd_new port map(curop_in=>op_in, prevop_in=>prop_in, regrd=>regrd, trwx=>trwx, 
trww=>trww, vrwx=>vrwx, vrww=>vrww, rwx=>rwx, rww=>rww, RS l_in=>RSlrgid, 
202 
RS2_in=>RS2rgid, EX WB RD in=>FSTRD, EX WB TRD in=>FSTTRD, 
EX_ WB _ VRD _ in=>FSTVRD, lli)oswbtryout=> VSTRD, VRDoswbtryout=> VSTVRD, 
TRDoswbtryout=> VSTTRD, TR_ in=> TRrgid, VR _ in=> VRrgid, pktmuxtopk=>spkt _ sig, 
essmux _ tag=>stag_ fmfwd, S8=>S8sig, S9=>S9sig, SSh=>SShsig, Salush _ out=>Ssig); 
tagmuxcomp: muxtag port map(TR_in=>TRidv, ALU_Sh_out=>aofmex, FST_out=>WBdatain, 
PR _in=>PRmuxsigin, Stag=>stag_ sig, tagmuxout=>tagmuxoutsig); 
pktmuxcomp: muxpkt port map(GPR_in=>GPRl id, TR_in=>TRidv, VR_in=>VRidv, 
A LU_ Sh_ out=>aofmex, FS T _ out=> WBdatain, PR _in=>PRmuxsigin, Spkt=>spkt _ sig, 
pktmuxout=>pktmuxoutsig); 
pktcomp: pktproc port map(clk=>clk_pkt, ininst_p=>in_pctrl, IDV=>IDV, EOP _in_p=>EOP _in, 
outinst_p=>out_pctrl, OPRAMready=>OPRAMready, lfpr _p=>lfpr _petri, stpr _p=>stpr _petri, 
inp _ fm _ram=>inp _ fm _ram, inp _ fm _ mux=>pktrnuxoutsig, flaginp=>aeroutsig, crcchkok=>crccbkok, 
lfstoff=>PKToffid, ldopram=>ldopram, EOP _out=>EOP _out, PRready=>PRready, ACK_in=>ACK_in, 
locz=>locz, foutp=>firstoutp, out_ to _regs=>PRmuxsigin, out_ to _ram=>out_to _ram, pktout=>pkt_ out); 
aeregcomp: aereg port map( clk=>clk, ldaer=>ldaer _pctrl, flagval _in=>flag, aerout=>aeroutsig); 
ocregcomp: ocreg port map(clk=>clk, ldocr=>ldocr_pctrl, val_in=>ocrID, ocrout=>ocrout); 
esscomp: esstop0 port map(tag_in=>tagmuxoutsig, value_in=>VRidv, clk=>clk, clock=>clock, 
ess_we=> we_ess, ess_re=>re_ess, putin=>putin, gf=>gf, pf=>pf, ess_full=>ess_full, le=>le, 
outvalue=>outvalue); 
morcomp: moreg port map(clk=>clk, ldmor=>lmor, mop_fmpkt_ in=>PRmuxsigin(5 downto 0), 
mopout=>mopregout); 
end architecture ex3 _ beh; 
--Individual Components 
-- Behavioral level description 
-- overflow table 
-- lstnum 2ndnum sign o 
+ + I -- addition 
+ I -- addition 
+ I -- subtraction 
+ + I -- subtraction 
library IEEE; 
use IEEE.std _logic_ I 164.all; 
use IEEE.std _ logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity alu_cbk0 is 
generic (N: integer :=64); 
port (a, b: in std_logic_vector(N-1 downto 0); 
S3,S4,S5,Cin: in std_ logic; 
result: out std _logic_ vector(N- 1 downto 0); 
o: out std_logic); 
end entity alu_cbk0; 
architecture behavioral of alu _ chk0 is 
signal sig: std _logic_ vector(N-1 down to 0); 
signal sel : std _logic_ vector(3 down to 0); 
begin 
sel <= S3&S4&SS&Cin; 
203 
addsubprocess: process(a, b, sel, sig) is 
begin 
case sel is 
when "0000" => 
sig <= a; 
o<= '0'; 
when "0001" => 
sig <= b; 
o <= '0'; 
when "0010" => 
sig <= not a; 
o<= '0'; 
when "0011" => 
sig <= not b; 
0 <='0'; 
when "0 I 00" => 
sig <= a+b ; 
if( a(N-1) = '0' and b(N-1) = '0' and sig(N-1) = 'I') then 
0 <= ' ]'; 
els if( a(N-1) = 'I' and b(N- I) = 'I' and sig(N-1) = '0') then 




when "0101" => 
sig <= a-b; 
if( a(N-1 ) = '0' and b(N-1) = 'I' and sig(N-1) = 'I') then 
o <= ' I'; 
elsif( a(N-1) = 'I' and b(N-1) = '0' and sig(N-1) = '0') then 




when "0110" => 
sig <= a+"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOOOOO I" ; 
if( a(N-1) = '0' and sig(N- I) = 'l ') then 




when "1000" => 
sig <= b+"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOO l "; 






when "011 1" => 
sig <= a-"000000000000000000000000000000000000000000000000000000000000000 I"; 





when " I 00 I " => 
sig <= b-"000000000000000000000000000000000000000000000000000000000000000 I"; 
if( b(N-1) = 'I' and sig(N-1 ) = 'O') then 
o <= ' l '; 
else 
0 <= '0'; 
end if; 
when " IO IO" => 
sig <= a orb; 
o <= 'O'; 
when " l O 11" => 
sig <= a and b; 
o <= 'O'; 
when "1100" => 
sig <= a xor b ; 
0 <= '0'; 
when " I IO I" => 
sig <= a; 
o <= 'O'; 
when " 111 O" => 
sig <= a; 
o <= 'O'; 
when " I 11 I" => 
sig <= a; 
o <= 'O'; 
when others => null; 
end case; 
result <= sig; 
end process addsubprocess; 
end architecture behavioral; 
--Shifter 
library IEEE; 
use IEEE.std logic 1164.all; 
use IEEE.std - logic - arith.all; 
use IEEE.std =logic= unsigned.all; 
entity shift is 
generic (N: positive:= 64; 
205 
M: positive := 6); 
port(input: in std _ logic_ vector (N- l downto 0); 
SO, SI , S2: in std _logic; 
shamt: in std_logic_vector (M-l downto 0); 
output: out std_logic_ vector (N-1 downto 0)); 
end entity shift; 
architecture shifter beh of shift is 
signal s: std _ logic_ vector (2 down to 0); 
begin 
s <= S0&Sl&S2; 
shftprocess:process(shamt, input, s) is 
variable shft, inpt: integer; 
variable shftout: std _logic_ vector(N- l downto 0); 
variable inpu, outpu: unsigned(N-1 downto 0); 
variable shfu: unsigned(M- l downto 0); 
variable in_ var, temp _reg: std _logic_ vector (N- l down to 0); 
begin 
shft := conv _integer(shamt); 
inpt := conv _ integer(input); -- unsigned.all 
inpu := conv _ unsigned(inpt, N); --arith.all 
shfu := conv_unsigned(shft, M); 
in_var := input; 
temp _reg := input; 
cases is 
when "000" => shftout := input; -- pass thru 
-- LEFT SHlFT 
when "00 I" => 
outpu := shl(inpu, shfu); 
shftout := conv _ std _logic_ vector( outpu, N); 
-- RIGHT SHIFT 
when "0 l 0" => 
outpu := shr(inpu, shfu); 
shftout := conv _std _logic_ vector( outpu, N); 
-- ROT A TE LEFT 
when "0ll" => 
for i in shamt'low to shamt'high loop 
if(shamt(i) = 'l ') then 
for j in Oto ((2**i)- l) loop 
temp_reg(j) := in_var((N-(2**i))+j); 
end loop; 
fork in (2 **i) to N-1 loop 
temp_reg(k) := in_var(k-(2**i)); 
end loop; 
in_var := temp_reg; 
end if; 
end loop; 
shftout :"' temp _reg; 
206 
--ROT A TE RIGHT 
when "l 00" => 
for i in shamt'low to shamt'high loop 
if (shamt(i) ='I ') then 
for j in N-1 downto N-{2**i) loop 
temp_regU) := in_var(j-{N-(2**i))); 
end loop; 
fork in ((N-{2**i))-l) downto 0 loop 
temp_reg(k) := in_var(k+{2**i)); 
end loop; 
in_ var := temp _reg; 
end if; 
end loop; 
shftout := temp_reg; 
when "IO l" => shftout := input; -- pass thru 
when "11 0" => shftout := input; -- pass thru 
when "111" => shftout := input; -- pass thru 
when others => shftout := input; -- pass thru 
end case; 
output <= shftout; 
end process; 
end architecture shifter_beh; 
--MUX used as mux before ALU/Shifter 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
entity muxalush is 
port(GPR _ in, TR _in, VR_in, ALU_ Sh_ out, ext_in, FST _ out, PR _in: in std _ logic_ vector(63 down to O); 
S8: in std_logic_vector(2 downto 0); 
alumuxout: out std_ logic_ vector( 63 downto 0)); 
end entity muxalush; 
architecture muxalush beh of muxalush is 
signal alumuxout l: std _logic_ vector( 63 down to 0); 
begin 
process(S8, GPR_in, TR_in, VR_in, ALU_Sh_out, ext_in, FST_out, PR_in, alumuxoutl) is 
begin 
case S8 is 
when "000" => alumuxoutl <= GPR_in; 
when "00 1" => alumuxoutl <= TR_in; 
when "010" => alumuxout l <= VR_in; 
when "011" => alumuxoutl <= ALU_Sh_out; 
when "101" => alumuxoutl <= FST_out; 
when" I IO" => alumuxoutl <= PR_in; 
when "111" => alumuxoutl <= ext_ in; 
when others => alumuxout l <= alumuxoutl; 
end case; 
alumuxout <= alumuxout I ; 
end process; 




use IEEE.std_ logic_l 164.all; 
entity fwd_ new is 
port(curop_in, prevop_in: in std_ logic_ vector(5 downto 0); 
regrd, trwx, trww, vrwx, vrww, rwx, rww: in std_ logic; 
RSl_ in, RS2_in, EX_ WB_RD_in, EX_WB_TRD_in, EX_ WB_ VRD_in, RDoswbtryout, 
VRDoswbtryout, TRDoswbtryout, TR_ in, VR_in: in std _logic_ vector( 4 down to 0); 
pktmuxtopk: out std _logic_ vector(2 downto 0); 
essmux_tag: out std_ logic_vector(i downto 0); 
S8: out std_ logic_vector(2 downto 0); 
S9: out std _ logic_ vector(2 down to 0); 
SSh: out std_logic_ vector(2 downto 0); 
Salush_out: out std_ logic); 
end entity fwd_new; 
architecture fwd_ new_ beh of fwd_ new is 
begin 
--FOR ALU MUXI 
s8p:process(regrd, trwx, trww, vrwx, vrww, rwx, rww, RSl_ in, RS2_in, EX_ WB_RD_in, 
EX_ WB _ TRD _in, EX_ WB _ VRD _ in, RDoswbtryout, TRDoswbtryout, VRDoswbtryout, curop _in, 
prevop_ in, TR_in, VR_in) is 
begin 
if(prevop _ in = "0 lO IO I") then --LFPR (LFPR o/p in PKt proc unit will change, so giving from there itself 
to ALU as passthru) 
S8 <= "110"; -- PKTREG o/p as ALU input 
elsif(curop_in = "001000") then-- MOVI 
S8 <= "1 11 "; -- sign ext val as ALU input 
elsif((regrd = ' I' and RS I _ in /= "00000" and EX_ WB _ RD _in= RS I_ in and rwx = ' I') or (regrd = 'I' and 
TR_in I= "00000" and EX_ WB _ TRD _in = TR _ in and trwx = 'l ') or (regrd = 'I' and VR _ in /= "00000" and 
EX_ WB_ VRD_in = VR_in and vrwx = 'I')) then 
S8 <= "0 I I"; -- ALU output as ALU input 
elsif{(regrd = 'I' and RS I _in /= "00000" and RDoswbtryout = RS I_ in and rww = 'I ') or (regrd =' I' and 
TR_in /= "00000" and TRDoswbtryout = TR_in and trww = 'I') or (regrd = 'I' and VR_in /= "00000" and 
VRDoswbtryout = VR_in and vrww = ' l')) then 
S8 <= "IOI" ; -- 4th stage output as input 
elsif(regrd ='I' and TR_in /= "00000" and RS l_ in = "00000" and trwx = '0' and trww = '0') then 
S8 <= "00 I"; -- TR as ALU input 
elsi f(regrd = 'I' and VR_ in /= "00000" and RS l_ in = "00000" and vrwx = '0' and vrww = '0') then 
S8 <= "O I 0"; -- VR as ALU input 
elsif(regrd = 'l' and RS l_ in /= "00000" and EX_ WB_ RD _in /= RSl_in and RDoswbtryout /= RS l_ in) 
then 
S8 <= "000"; -- GPR as ALU input 
else 
S8 <= "000"; -- GPR as ALU input 
end if; 
end process s8p; 
-- FOR ALU MUX2 
s9p:process(regrd, trwx, trww, vrwx, vrww, rwx, rww, RSl_in, RS2_in, EX_ WB_RD_in, 
EX_ WB _ TRD _ in, EX_ WB _ VRD _ in, RDoswbtryout, TRDoswbtryout, VRDoswbtryout, TR_ in, VR _ in) is 
begin 
208 
if((r~grd = '1 ' and RS2 _ in /= "00000" and EX_ WB _ RD _ in = RS2 _in and rwx = ' l ') or (regrd =' I' and 
TR_ m I= "00000" and EX_ WB_TRD_in = TR_in and trwx = '!')or (regrd = 'I' and VR in /= "00000" and 
EX_ WB_ YRD_in = VR_in and vrwx = ' I')) then -
S9 <= "O 11"; -- ALU output as ALU input 
elsif((regrd = '1' and RS2 _ in /= "00000" and RDoswbtryout = RS2 in and rww = '1 ') or (regrd = 'I' and 
TR_in /= "00000" and TRDoswbtryout = TR_ in and trww = 'l ') or(regrd = ' I' and VR_in /= "00000" and 
VRDoswbtryout = YR_ in and vrww = 'I')) then 
S9 <= "10 I"; - - 4th stage output as input 
elsif(regrd = '1' and TR_ in /= "00000" and RS2 _in = "00000" and trwx = '0' and trww = '0') then 
S9 <= "00 1"; -- TR as ALU input 
elsif(regrd = '1' and VR _ in /= "00000" and RS2 _in = "00000" and vrwx = '0' and vrww = '0') then 
S9 <= "010"; -- YR as ALU input 
elsif(regrd =' I ' and RS2_ in /= "00000" and EX_ WB_RD_in /= RS2_ in and RDoswbtryout /= RS2_in) 
then 
S9 <= "000"; -- GPR as ALU input 
else 
S9 <= "000"; -- GPR as ALU input 
end if; 
end process s9p; 
-- For Shifter MUX 
sshp:process(regrd, trwx, trww, vrwx, vrww, rwx, rww, RS l _in, RS2_in, EX_WB_RD_in, 
EX_ WB_TRD_in, EX_ WB_ VRD_in, RDoswbtryout, TRDoswbtryout, VRDoswbtryout, curop_in, 
prevop_ in, TR_in, VR_in) is 
begin 
if(prevop _in = "0 10 IO 1 ") then --LFPR (LFPR o/p in PKt proc unit will change, so giving from there itself 
to ALU as passthru) 
SSh <= " 11 O"; -- PKTREG o/p as ALU input 
els if( curop _ in = "001000") then-- MOVI 
SSh <= "111"; -- sign ext val as ALU input 
elsif((regrd = ' I' and RS I _ in /= "00000" and EX_ WB _RD _ in = RS 1 _in and rwx = '1 ') or (regrd = '1 ' and 
TR_in /= "00000" and EX_ WB_TRD_in = TR_in and trwx =' !')or (regrd = '1' and YR_in /= "00000" and 
EX_ WB _ VRD _in = VR _ in and vrwx = ' I')) then 
SSh <= "0 11 "; -- ALU output as ALU input 
elsif((regrd = 'I ' and RS I_ in /= "00000" and RDoswbtryout = RSI _in and rww = 'I ') or (regrd = 'I' and 
TR_in /= "00000" and TRDoswbtryout = TR_in and trww = ' I') or (regrd =' I' and VR_in /= "00000" and 
VRDoswbtryout = YR_in and vrww =' I')) then 
SSh <= "10 l" ; -- 4th stage output as input 
elsif(regrd = '1' and TR_ in /= "00000" and RS I _in= "00000" and trwx = '0' and trww = 'O') then 
SSh <= "001"; --TR as ALU input 
elsif(regrd = '1' and VR_in /= "00000" and RS l _in = "00000" and vrwx = '0' and vrww = 'O') then 
SSh <= "010"; -- VR as ALU input 
e lsif(regrd = ' I ' and RS !_in /= "00000" and EX_ WB_RD _ in /= RSl_in and RDoswbtryout /= RS l_in) 
then 
SSh <= "000"; -- GPR as ALU input 
else 
SSh <= "000"; -- GPR as ALU input 
end if; 
end process sshp ; 
aluoutp:process( curop _ in) is 
begin 
if( curop _ in = "O J 000 l " or curop _in = "0 I 0010" or curop _in= "0 10011" or curop _ in = "O 10 I 00") then -- all 
the shift operations 
Salush_out <= ' 1'; 
else 
209 
Salush _ out <= '0'; 
end if; 
end process aluoutp; 
-- ESS MUX FOR TA GREG 
emtp:process(regrd, trwx, trww, vrwx, vrww, rwx, rww, RS I in, RS2 in, EX WB TRD in, 
TRDoswbtryout, curop_in, prevop_in, TR in) is - - - - -
begin -
if(curop_in = "OJ 11 10" or curop_in = "011111") then 
if(prevop _in = "0 1010 l ") then --LFPR (LFPR o/p in PKt proc unit will change, so giving from there itself 
to ALU as passtbru) 
essmux_tag <= " 11"; --PKTREG o/p as tag input 
elsif(regrd = ' I ' and TR_in /= "00000" and EX_ WB_TRD_ in = TR_in and trwx = 'l') then 
essmux_tag <= "0 I"; -- ALU output as ESS input 
elsif(regrd = 'I' and TR_in /= "00000" and EX_ WB_TRD_in /= TR_in and TRDoswbtryout = TR_in and 
trww = ' I ') then 
essmux_tag <="IO"; -- FST output as ESS input 
elsif(regrd = '1' and TR_in /= "00000" and EX_ WB_TRD_in /= TR_in and TRDoswbtryout /= TR_in) 
then 
essmux_ tag <= "00"; --TR as ESS input 
else 
essmux_tag <= "00" ; --normal TR as input 
end if; 
end if; 
end process emtp; 
-- PKREG mux 
pmp:process(regrd, t1wx, trww, vrwx, vrww, rwx, rww, RS l_in, RS2_in, EX_ WB_ VRD_in, 
VRDoswbtryout, EX_ WB_RD_in, RDoswbtryout, EX_ WB_TRD_in, TRDoswbtryout, curop_in, 
prevop_in, TR_in, VR_in) is 
begin 
if(curop_in = "0JOI 10") then-- STPR 
if(prevop _in = "0 10 IO l ") then --LFPR (LFPR o/p in PKt proc unit will change, so giving from there itself 
to ALU as passthru) 
pktmuxtopk <= " IOI"; --PKTREG o/p as pkt input 
elsif((regrd = ' l ' and RS I _ in /= "00000" and EX_ WB _RD_ in = RS 1 _ in and rwx = 'I') or (regrd = 'I ' and 
TR_in /= "00000" and EX_ WB_TRD_in = TR_in and trwx = ' l ') or (regrd = ' l ' and VR_in /= "00000" and 
EX_ WB _ VRD _ in = VR _in and vrwx = 'I')) then 
pktmuxtopk <= "0 11 "; -- ALU output as PKT input 
elsif((regrd = '1' and RSl_ in /= "00000" and EX_ WB_RD_in /= RSl_in and RDoswbtryout = RSl _in and 
rww = 'I') or (regrd = ' l ' and TR_in /= "00000" and EX_ WB_TRD_in /= TR_in and TRDoswbtryout = 
TR_in and trww = 'l') or (regrd = 'I' and VR_in /= "00000" and EX_ WB_ VRD_in /= VR_in and 
VRDoswbtryout = VR_in and vrww = 'l ')) then 
pktmuxtopk <= "100"; -- FST data as PKT input 
elsif(regrd = ' l ' and TR_ in /= "00000" and RS I _in = "00000" and trwx = 'O' and trww = 'O') then 
pktmuxtopk <= "00 1"; -- TR as PKT input 
elsif(regrd = ' l' and VR _in /= "00000" and RS I _in = "00000" and vrwx = '0' and vrww = 'O') then 
pktmuxtopk <= "010"; -- VR as PKT input 
elsif(regrd = 'I ' and RS l_in /= "00000" and EX_ WB_RD _in /= RSl_in and RDoswbtryout /= RSl_in) 
then 
pktmuxtopk <= "000" ; -- GPR as PKT input 
else 
pktmuxtopk <= "I IO" ; -- hold on to prev value 
end if; 
else 
pktmuxtopk <= "111 "; -- zero it out 
210 
end if; 
end process prop; 
end architecture fwd new beh· 
- - , 
--MUX used as mux after ALU/Shifter output 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity muxout is 
port(aluout_in, shout_in: in std_logic_vector(63 downto 0); 
Sout: in std_logic; 
alu_sh_out: out std_logic_vector(63 downto 0)); 
end entity muxout; 
architecture muxout beh of muxout is 
signal alush I : std _logic_ vector( 63 downto 0); 
begin 
process(Sout, aluout_in, shout_in, alushl ) is 
begin 
case Sout is 
when '0' => alushl <= aluout_in; 
when ' I '=> alush l <= shout_in; 
when others => alush I <= alush 1; 
end case; 
alu _sh_ out <= alush 1; 
end process; 
end architecture muxout_ beh; 
--MUX used as mux before ESS-T AG 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity muxtag is 
port(TR_in, ALU_Sh_out, FST_out, PR_in: in std_logic_vector(63 downto O); 
Stag: in std _logic_ vector(2 down to 0); 
tagmuxout: out std _logic_ vector( 63 down to 0)); 
end entity muxtag; 
architecture muxtag_ beh of muxtag is 
begin 
process(Stag, TR_in, ALU_ Sh_out, FST_out, PR_in) is 
begin 
case Stag is 
when" 100" => tagmuxout <= (others => '0'); --first bit is 'cir' 
when "10 I" => tagmuxout <= (others => '0'); 
when" I 10" => tagmuxout <=(others => '0'); 
when " I 11" => tagmuxout <= (others => '0'); 
when "000" => tagmuxout <= TR_in; 
when "00 I" => tagmuxout <= ALU_ Sh_ out; 
when "O 10" => tagmuxout <= FST_out; 
when "0 11" => tagmuxout <= PR_ in; 




end architecture muxtag_beh; 
--MUX used as mux before PKT 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity muxpkt is 
port(GPR_in, TR_in, VR_in, ALU_Sh_out, FST_out, PR_in: in std_logic_vector(63 downto O); 
Spkt: in std _logic_ vector(2 downto 0); 
pktmuxout: out std_logic_vector(63 downto 0)); 
end entity muxpkt; 
architecture muxpkt_ beb of muxpkt is 
signal pktmuxoutl: std_logic _ vector(63 downto 0); 
begin 
process(Spkt, GPR_in, TR_in, VR_in, ALU_Sh_out, FST_out, PR_in, pktmuxoutl) is 
begin 
case Spkt is 
when" 111" => pktmuxoutl <=(others=> '0'); 
when "000" => pktmuxoutl <= GPR_in; 
when "001 " => pktmuxoutl <= TR_in; 
when "0 JO" => pktmuxoutl <= VR_in; 
when "11 0" => pktmuxout I <= pktmuxout I ; 
when "0 11" => pktmuxoutl <= ALU_Sh_out; 
when " JOO"=> pktmuxoutl <= FST_out; 
when "10 I" => pktmuxout I <= PR_ in; 
when others=> pktmuxoutl <= pktmuxoutl ; 
end case; 
pktmuxout <= pktmuxout l ; 
end process; 
end architecture muxpkt_ beh; 
-- PKT PROCESSING TOP MODULE 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity pktproc is 
generic(M: positive := 32; 
N: positive:= 64); 
port(clk, ininst_p, IDV, EOP _in_p, outinst_p, OPRAMready, lfpr_p, stpr_p: in std_logic; 
inp _ fm _ram: in std _logic_ vector(M-1 down to 0); 
inp_fm_mux: in std_logic_ vector(N-1 downto 0); 
flaginp: in std_logic_ vector(? downto 0); 
crcchkok: out std logic; 
lfstoff: in std_logic_vector(6 downto 0); 
ldopram, EOP _out, PRready, ACK_in, locz: out std_logic; 
foutp: out std_logic_ vector(M-1 downto 0); 
out_ to _regs: out std _logic_ vector(N-1 downto O); 
out_to_ram, pktout: out std_logic_vector(M-1 downto 0)); 
end entity pktproc; 
212 
architecture pktproc _ beh of pktproc is 
-- Main PKt PROC 
component pktram is 
port(off_addr: in std_logic_vector(6 downto 0); 
din: in std _logic_ vector(3 l down to 0); 
dout: out std _logic_ vector(3 l down to 0); 
elk: in std_ logic; 
wepr: in std _logic); 
end component pktram; 
-- PKT CTRLR 
component pktctrl is 
port(clk, ininst, IDV, EOP _in, outinst, OPRAMready, zsig, lfpr, stpr: in std_logic; 
weipr, ldfreg, incr_ag, clrag, ldopram, ldlenreg, EOP _out, subo, lfclk, lsclk, sfclk, ssclk, ackin, ldf_FR, 
LD _ CRCreg, crc_Z, outcrc, clrcrc: out std_logic); 
end component pktctrl; 
--ADDRGEN 
componentaddgenis 
port(clk, cir, incag: in std_logic; 
inad_ag: in std_logic_ vector(6 downto 0); 
outad_ag: out std_logic_ vector(6 downto 0)); 
end component addgen; 
--FIRST REG 
component freg is 
port(ldfreg, elk: in std_logic; 
addr: in std_logic_ vector(6 downto 0); 
inp: in std _logic_ vector(3 l downto 0); 
outp: out std _logic_ vector(3 l downto 0)); 
end component freg; 
-- Length Reg 
component lenreg0 is 
generic(al: positive := 16); 
port(leninpl: in std_logic_vector(al-1 downto 0); 
elk, ldlenreg, subsig: in std_ logic; 
lenoutp: out std_logic_vector(al-1 downto 0)); 
end component lenreg0; 
-- Offset Length Equality 
componentofl:leneq is 
generic(al: positive := 16); 
port(lenin: in std_logic_vector(al-1 downto 0); 
zo: out std_logic); 
end component ofl:leneq; 
-- LA sigs 
component la is 
port(ai l , ai2, ai3: in std_logic_vector(6 downto 0); 
ls: in std_logic_vector(l downto 0); 
ao: out std _logic_ vector(6 downto 0)); 
end component la; 
-- Length Selection 
component lsel is 
port(ss: in std_logic; 
flensig, slensig: in std _logic_ vector( 15 downto 0); 
lselout: out std _logic_ vector( 15 downto 0)); 
end component lsel; 
--LFPRCKT 
component lfprckt is 
port(lfc, lsc: in std_logic; 
213 
inp32: in std_logic_ vector(3 I downto 0); -- for LFPR 
offsi: in std_logic_vector(6 downto 0); 
offso: out std_logic_ vector(6 downto 0); 
outp64: out std_ logic_ vector( 63 down to 0)); 
end component lfprckt; 
--STPRCKT 
component stprckt is 
port(sfc, ssc: in std _logic; 
inp64: in std_logic_vector(63 downto 0); -- for STPR 
offi: in std_logic_vector(6 downto 0); 
offo: out std _logic_ vector( 6 downto 0); 
outp32: out std_logic_ vector(3 l downto 0)); 
end component stprckt; 
-- STPR SEL 
component ssel is 
port(inp 1, inp2, fos, outinp: in std _logic_ vector(3 I down to 0); 
fin: in std _ logic_ vector(? down to 0); 
sts, ldfmfr, oc: in std_logic; 
inpo: out std_logic_vector(3l downto 0)); 
end component ssel; 
-- CRC MODULE 
component crcmod is 
port(ldcr, crcz: in std _ logic; 
infmpkt: in std _ logic_ vector(3 l downto 0); 
crcinfmpkt, crcin: in std _ logic_ vector(3 l downto 0); 
crccalc _ out: out std _ logic_ vector(3 l down to 0); 
crcchkok: out std _logic); 
end component crcmod; 
--CRC STORE 
component crest is 
port(clk: in std_logic; 
ere_ calc _ in : in std _ logic_ vector(3 1 down to 0); 
ere_ calc _ out : out std _ logic_ vector(3 I downto 0)); 
end component crest; 
-- OPRAM DAT A OUT 
component crcout_ram is 
port(epout: in std_logic; 
ere_ cin, outramin: in std _ logic_ vector(3 l downto 0); 
outramout: out std_logic_vector(3 1 downto 0)); 
end component crcout_ram; 
-- Activating Inst 
component tstin is 
port(in_ inst, elk: in std_ logic; 
ininstout: out std _logic); 
end component tstin; 
signal clrsig, incagsig, weprsig, zsigl , crcensig, ldfregsig, ldlenrsig, subsig, lfsig, lssig, sfsig, sssig, lforls, 
sforss, EOP _outsig, lffsig, ldcrsig, crczero, ocsig, clrcrcsig: std_logic; 
signal lfssfs: std _ logic_ vector( l downto 0); 
signal add_off, outagsig, offsig, offsigl: std_logic_vector(6 downto O); 
signal lengthsig, lengthoutpsig: std _logic_ vector( 15 down to O); 
signal foutpsig, outram, toram, dintoram: std _logic_ vector(M-1 down to O); 
signal calccrcin, crcintomod: std_logic_vector(M-1 downto O); 
signal ininst_s, outinst_s, lfpr _s, stpr_s, EOP _in _s: std_logic; 
begin 
214 
foutp <= foutpsig; 
out_to_ram <= outram; 
EOP out <= EOP outsio· 
- - o, 
PRready <= EOP _ outsig; 
locz <= not(foutpsig(0) or foutpsig(l) or foutpsig(2)); 
lforls <= lfsig or lssig; 
sforss <= sfsig or sssig; 
lfssfs <= lforls & sforss; 
-- Activating instructions 
INcomp: tstin port map(in_inst=>ininst__p, clk=>clk, ininstout=>ininst s); 
OUTcomp: tstin port map(in_ inst=>outinst__p, clk=>clk, ininstout=>mrtinst_s); 
LFPRcomp I: tstin port map(in _inst=>lfpr __p, clk=>clk, ininstout=>lfpr _ s); 
STPRcomp l: ts tin port map(in _ inst=>stpr __p, clk=>clk, ininstout=>stpr_s); 
EOPcomp: tstin port map(in_ inst=>EOP _in__p, clk=>clk, ininstout=>EOP _in_s); 
--PKT PROC COMPONENTS 
addgencomp: addgen port map(clk=>clk, clr=>clrsig, incag=>incagsig, inad_ag=>add_off, 
outad_ag=>outagsig); 
pktramcomp: pktram port map(off_addr=>add_off, din=>dintoram, dout=>outram, clk=>clk, 
wepr=>weprsig); 
pktctrlcomp: pktctrl port map(clk=>clk, ininst=>ininst s, IDV=>IDV, EOP in=>EOP in s, 
outinst=>outinst_s, OPRAMready=>OPRAMready, zsig=>zsigl, lfpr=>lfp;:-_ s, stpr=>;tp~ s, 
weipr=>weprsig, ldfreg=>ldfregsig, incr_ag=>incagsig, clrag=>clrsig, ldopram=>ldopram, 
ldlenreg=>ldlenrsig, EOP _ out=>EOP _ outsig, subo=>subsig, lfclk=>lfsig, lsclk=>lssig, sfclk=>sfsig, 
ssclk=>sssig, ackin=>ACK _ in, ldf_FR=>lffsig, LD _ CRCreg=>ldcrsig, ere_ Z=>open, outcrc=>ocsig, 
clrcrc=>cl rcrcsig); 
fregcomp: freg port map(ldfreg=>ldfregsig, clk=>clk, addr=>outagsig, inp=>dintoram, outp=>foutpsig); 
lengthreg: lenreg0 port map(leninp I =>lengthoutpsig, clk=>clk, ldlenreg=>ldlenrsig, subsig=>subsig, 
lenoutp=>lengthsig); 
offieneqcalc: offleneq port map(lenin=>lengthsig, zo=>zsig I); 
lselcomp: lsel port map(ss=>subsig, flensig=>foutpsig(3 l downto 16), slensig=>lengthsig, 
lselout=>lengthoutpsig); 
lacomp: la port map(ai l=>outagsig, ai2=>offsig, ai3=>offsigl , ls=>lfssfs, ao=>add_off); 
lfprcomp: lfprckt port map(lfc=>lfsig, lsc=>lssig, inp32=>outram, offsi=>lfstoff, offso=>offsig, 
outp64=>out_ to _regs); 
stprcomp: stprckt port map(sfc=>sfsig, ssc=>sssig, inp64=>inp_fm_mux, offi=>lfstoff, offo=>offsigl, 
outp32=>toram); 
stprselcomp: ssel port map(inp l =>inp _ frn _ram, inp2=>toram, fos=>foutpsig, outinp=>outram, 
fin=>flaginp, sts=>sforss, ldfrnfr=>lffsig, oc=>ocsig, inpo=>dintoram); 
CRCmodcomp: crcmod port map(ldcr=>ldcrsig, crcz=>clrcrcsig, infrnpkt=>dintoram, 
crcinfrnpkt=>dintoram, crcin=>crcintomod, crccalc _ out=>calccrcin, crcchkok=>crcchkok); 
CRCstorecomp: crest port map( clk=>clk, crc _ calc _in=>calccrcin, ere_ calc _ out=>crcintomod); 
outramcomp: crcout_ram port map( epout=>EOP _ outsig, ere_ cin=>calccrcin, outramin=>outram, 
outramout=>pktout); 
end architecture pktproc _ beh; 
215 
--For PKT RAM 
-- Using RAM128Xl for 128X32 RAM 
library IEEE; 
use IEEE.std_logic_ l l64.all; 
use IEEE.std_ logic_arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity pktram is 
port(off_addr: in std_logic_vector(6 downto O); 
din: in std _logic_ vector(3 l downto O); 
dout: out std _logic_ vector(3 l downto O); 
elk: in std_logic; 
wepr: in std_logic); 
end entity pktram; 
architecture pktram _ beh of pktram is 
component ram_ l 28x ls is 
port(clk, we: in std_ logic; 
addr: in std_logic_ vector(6 downto O); 
data_ in: in std _logic; 
data_out: out std_logic); 
end component ram_ l 28x Is; 
begin 
Rl28 l: ram_ l28xls port map(clk=>clk, we=>wepr, addF>off_addr, data_in=>din(O), data_out=>dout(O)); 
Rl282: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(l), data'-out=>dout( l )); 
Rl283: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(2), data_out=>dout(2)); 
Rl284: rarn_ l28xl s port map(clk=>clk, we=>wepr, addF>off_addr, data_in=>din(3), data_out=>dout(3)); 
Rl285: ram_l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(4), data_out=>dout(4)); 
R l286: ram 128x Is port map(clk=>clk, we=>wepr, addr=>off addr, data in=>din(5), data out=>dout(5)); 
Rl287: ram- 128xls port map(clk=>clk, we=>wepr, addr=>off addr, data-in=>din(6), data-out=>dout(6)); 
- - - -
Rl288: rarn_l 28x l s port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(7), data_out=>dout(7)); 
Rl289: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(8), data_out=>dout(8)); 
Rl28IO: ram_ l28x ls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(9), 
data_ out=>dout(9) ); 
Rl281 I: ram_ 128xl sport map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(lO), 
data_ out=>dout( IO)); 
RI 2812: ram_l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din( l l), 
data_out=>dout(l I)); 
R 128 13: ram_ l 28xl s port map( clk=>clk, we=>wepr, addr=>off _ addr, data _in=>din( 12), 
data_ out=>dout{l2)); 
Rl2814: ram_l28xl s port map(clk=>clk, we=>wepr, addF>off_addr, data_in=>din(l 3), 
data _out=>dout(l3)); 
R 12815: ram_ l 28x ls port map{ clk=>clk, we=>wepr, addr=>off _ addr, data_ in=>din( 14), 
data_out=>dout(l 4)); 
Rl2816: ram_l28xl s port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din( l5), 
data out=>dout(IS)); 
Rl 2817: ram_ l28x l s port map(clk=>clk, we=>wepr, addF>off_addr, data_in=>din(16), 
data out=>dout( 16)); 
RJ28 I 8: ram_ l 28xls p011 map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(l 7), 
data out=>dout(l 7)); 
R J 2819: ram_ 128x Is port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(i8), 
data_ out=>dout( 18)); 
216 
RI2820: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(l9), 
data_ out=>dout( 19)); 
RI 2821: ram_ I 28x Is port map( clk=>clk, we=>wepr, addr=>off_ addr, data_in=>din(20), 
data_ out=>dout(20) ); 
Rl2822: ram_l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(21), 
data_out=>dout(2 l)); 
Rl2823: ram_ J28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(22), 
data_ out=>dout(22)); 
Rl2824: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(23), 
data_ out=>dout(23)); 
Rl2825: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(24), 
data_ out=>dout(24)); 
Rl2826: ram_l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(25), 
data_ out=>dout(25)); 
Rl2827: ram_ l28xl s port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(26), 
data_ out=>dout(26)); 
Rl2828: ram_ l28xls port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(27), 
data_ out=>dout(27)); 
Rl2829: ram_ l28xl s port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(28), 
data_ out=>dout(28)); 
R 12830: ram_ l 28x Is port map( clk=>clk, we=>wepr, addr=>off_ addr, data_ in=>din(29), 
data_ out=>dout(29)); 
RI283 I: ram_ I 28xl sport map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(30), 
data_ out=>dout(30)); 
RI 2832: ram_ l28x Is port map(clk=>clk, we=>wepr, addr=>off_addr, data_in=>din(3 l), 
data_ out=>dout(3 I)); 
end architecture pktram_beh; 
--For PKT RAM-RAM128X l 
library IEEE; 
use IEEE.std_ logic_l l64.all; 
entity ram_ l 28x Is is 
port( elk, we: in std _ logic; 
addr: in std_logic_ vector(6 downto 0); 
data_in: in std_logic; 
data_out: out std_logic); 
end entity ram_ I 28x Is; 
architecture behvram of ram_ I 28x Is is 
component RAM I 28x IS is 
port(WE, D, WCLK, AO, Al , A2, A3, A4, AS, A6: in std_logic; 
0 : out std_logic); 
end component RAM128xlS; 
begin 
Rl281 : RAM1 28xIS port map(WE=>we, D=>data_in, WCLK=>clk, A0=>addr(0), Al=>addr(I) , 
A2=>addr(2), A3=>addr(3), A4=>addr(4), A5=>addr(5), A6=>addr(6), O=>data_out); 
end architecture behvram; 
-- PKT Controller 
library IEEE; 
use IEEE.std_logic_ I l64.all; 
217 
entity pktctrl is 
port(clk, ininst, IDV, EOP _in, outinst, OPRAMready, zsig, lfpr, stpr: in std logic; 
weipr, ldfreg, incr_ag, clrag, ldopram, ldlenreg, EOP _out, subo, lfclk, ls;;-lk, sfclk, ssclk, ackin, ldf_FR, 
LD_CRCreg, crc_Z, outcrc, clrcrc: out std_logic); 
end entity pktctrl; 
architecture pktctrl_ beh of pkt ctr I is 
component FD is 
port(D, C: in std _logic; 
Q: out std_logic); 
end component FD; 
component FD_ I is 
port(D, C: in std_logic; 
Q: out std _logic); 
end component FD_ l; 
signal idv_bar, eopi_bar, oprbar, zbar: std_logic; 
signal pd0, pd! , pd2, pd3 , pd4, pd5, pd6, pd7, pd8, pd9, pdl0, pdl I , pdl2, pd 13: std_logic; 
signal pt0, ptl , pt2, pt3 , pt4, pt5, pt6, pt7, pt8, pt9, ptlO, ptl I, pt 12, pt 13: std_logic; 
begin 
idv_bar <= not(IDV); 
eopi_ bar <= not(EOP _in); 
oprbar <= not(OPRAMready); 
zbar <= not(zsig); 
dff_p0: FD port map(D=>pd0, C=>clk, Q=>pt0); 
dff _pl : FD port map(D=>pd l , C= >clk, Q=>pt I); 
dff_p2: FD port map(D=>pd2, C=>clk, Q=>pt2); 
dff_p3: FD port map(D=>pd3, C=>clk, Q=>pt3); 
dff_p4: FD port map(D=>pd4, C=>clk, Q=>pt4); 
dff_p5: FD port map(D=>pd5, C=>clk, Q=>pt5); 
dff_p6: FD port map(D=>pd6, C=>clk, Q=>pt6); 
dff_p7: FD port map(D=>pd7, C=>clk, Q=>pt7); 
dff_p8: FD port map(D=>pd8, C=>clk, Q=>pt8); 
dff_p9: FD port map(D=>pd9, C=>clk, Q=>pt9); 
dff_pl0: FD port map(D=>pdl0, C=>clk, Q=>ptlO); 
dff _p 11: FD_ l port map(D=>pd 11, C=>clk, Q=>ptl I); 
dff_pl2: FD port map(D=>pdl2, C=>clk, Q=>pt12); 
dff_p13: FD_! port map(D=>pdl3, C=>clk, Q=>pt13); 
--next state equations 
pd0 <= ( ininst or (idv_bar and ptO)); 
pd! <= ( (IDV and pt0) or (eopi_bar and pt2) ); 
pd2 <= pt! ; 
pd3 <= EOP _in and pt2; 
pd4 <= pt3; 
pd5 <= (outinst or (oprbar and pt5)); 
pd6 <= (OPRAMready and pt5); 
pd7 <= (pt6 or (zbar and pt8) ); 
pd8 <= pt7; 
pd9 <= zsig and pt8; 
pd IO <= lfpr; 
218 
pd! I<= ptl0; 
pd 12 <= stpr; 
pd13 <= pt12; 
--output equations 
weipr <= pt 1 or pt 12 or stpr or pt! 3 or pt6; 
incr_ag <= ptl or pt7; 
ldfreg <= pt I; 
clrag <= pt9 or pt6 or pt4; 
LD _ CRCreg <= pt0 or pt2 or pt3 or pt4 or ptS or pt8 or pt9; 
ldopram <= pt7 or pt9; 
ldlenreg <= pt6; 
subo <= pt7; 
EOP _out <= pt9; 
lfclk <= ptl0; 
lsclk <= ptl 1; 
sfclk <= ptl2; 
ssclk <= ptl3; 
ackin <= pt I; 
ldf_FR <= pt6; 
crc_Z <= pt6; 
clrcrc <= pt6; 
outcrc <= pt6 or pt7 or pt8; 
end architecture pktctrl_beh; 
-- Address Generator for pktram 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity addgen is 
port(clk, cir, incag: in std_ logic; 
inad _ ag: in std _logic_ vector( 6 downto 0); 
outad_ag: out std_logic_ vector(6 downto 0)); 
end entity addgen; 
architecture add gen_ beh of add gen is 
signal ciag: std _logic_ vector(! down to 0); 
signal outad _ ags: std _logic_ vector( 6 downto 0); 
begin 
ciag <= cir & incag; 
process(clk, ciag, inad_ag, outad_ags) is 
begin 
if(rising_edge(clk)) then 
case ciag is 
when" 10" => outad_ags <=(others=> '0'); 
when "11" => outad ags <= (others => '0'); 
when "0 l " => outad= ags <= inad _ag + 1; 
when "00" => outad_ags <= outad_ags; 




outad_ag <= outad_ags; 
end process; 
end architecture add gen_ beh; 
-- For Firstregister 
library IEEE; 
use IEEE.std_logic_ l l64.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity freg is 
port(ldfreg, elk: in std_ logic; 
addr: in std _logic_ vector( 6 downto 0); 
inp: in std_logic_ vector(3 l downto 0); 
outp: out std_logic _ vector(3 l downto 0)); 
end entity freg; 
architecture freg_ beh of freg is 
begin 
process(clk, ldfreg, addr, inp) is 
begin 
if(rising_ edge( elk)) then 
if(ldfreg = 'I') then 
if(addr = "0000000") then 





end architecture freg_ beh; 
-- Length Reg 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity lenreg0 is 
generic(al: positive := 16); --1 6 
port(leninpl: in std_logic_vector(al-1 downto 0); 
elk, ldlenreg, subsig: in std_logic; 
lenoutp: out std_logic_vector(al-1 downto 0)); 
end entity lenreg0; 
architecture lenreg0 _ beh of lenreg0 is 
begin 
process( elk, ldlenreg, leninp I , subsig) is 
begin 
if(rising_edge(clk)) then 
if(ldlenreg = 'I') then 
lenoutp <= leninp I ; 
elsif(subsig =' I') then 





end architecture lenreg0 _ beh; 
-- Equality check unit 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std_ logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity offleneq is 
generic(al: positive:= 16); --4 
port(lenin: in std_logic_ vector(al-1 downto 0); 
zo: out std_logic); 
end entity offleneq; 
architecture offleneq_beh of offleneq is 
signal leninsig: std _logic_ vector(al-1 downto 0); 
begin 
process(lenin, leninsig) is 
variable ole_or: std_logic; 
begin 
ole_or := '0'; 
for i in al- I downto 0 loop 
ole_or := ole_or or lenin(i); 
end loop; 
zo <= not (ole_or); 
end process; 
end architecture offieneq_ beh; 
-- For Jen and add signals 
library IEEE; 
use IEEE.std_logic_l l 64.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity la is 
port(ail , ai2, ai3: in std _logic_ vector(6 downto 0); 
ls: in std_ logic_ vector( I downto 0); 
ao: out std_logic_ vector(6 downto 0)); 
end entity la; 
architecture la beh of la is 
begin 
process(ls, ai I, ai2, ai3) is 
begin 
case ls is 
when "1 0" => 
ao <= ai2; -- LFPR Offset 
221 
when "0 1" => 
ao <= ai3; -- STPR Offset 
when "00" => 
ao <= ail; -- PKRAM address 
when " 11" => 
ao <= (others=> '0'); 
when others => 
ao <=(others=> '0'); 
end case; 
end process; 
end architecture la_beh; 
-- MUX for length sel 
library IEEE; 
use IEEE.std_logic_ l l64.al1; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity lsel is 
port(ss: in std _logic; 
flensig, slensig: in std _logic_ vector( 15 downto 0); 
lselout: out std _logic_ vector( 15 down to 0)); 
end entity lsel; 
architecture lsel_ beh of lsel is 
begin 
process(ss, flensig, slensig) is 
begin 
case ss is 
when '0' => lselout <= flensig; 
when ' I ' => lselout <= slensig; 
when others => lselout <=(others => '0'); 
end case; 
end process; 
end architecture lsel_beh; 
-- FOR LFPR CKT 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic _unsigned.all; 
entity I fprckt is 
port(lfc, lsc: in std_logic; 
inp32: in std_logic_ vector(31 downto 0); -- for LFPR 
offsi: in std_logic_vector(6 downto 0); 
offso: out std _logic_ vector( 6 down to 0); 
outp64: out std _logic_ vector( 63 downto 0) ); 
end entity lfprckt; 
architecture lfprckt_ beh of lfprckt is 
signal sig64: std _logic_ vector(63 downto 0); 
signal lfsc: std _logic_ vector( l down to 0); 
signal offs: std_logic_ vector(6 downto 0); 
222 
begin 
lfsc <= lfc & lsc; 
process(lfsc, inp32, sig64, offs, offsi) is 
begin 
case lfsc is 
when" 10" => 
sig64(3 l downto 0) <= inp32; 
sig64(63 downto 32) <= sig64(63 downto 32); 
offs <= offsi; 
when "01" => 
sig64(63 downto 32) <= inp32; 
sig64(3 l downto 0) <= sig64(3 I downto 0); 
offs <= offsi + 1; 
when "00" => 
sig64 <= sig64; 
offs <= offs; 
when" 11" => 
sig64 <= sig64; 
offs <= offs; 
when others => 
sig64 <= sig64; 
offs<= (others => '0'); 
end case; 
offso <= offs; 
outp64 <= sig64; 
end process; 
end architecture lfprckt_ beh; 
-- FOR STPR CKT 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity stprckt is 
port(sfc, ssc: in std_logic; 
inp64: in std_logic_ vector(63 downto 0); -- for STPR 
offi: in std_logic_vector(6 downto O); 
offo: out std_logic_ vector(6 downto O); 
outp32: out std_logic _ vector(3 l downto O)); 
end entity stprckt; 
architecture stprckt_ beh of stprckt is 
signal sfsc: std _logic_ vector( 1 down to O); 
signal ofs: std _logic_ vector(6 downto O); 
begin 
sfsc <= sfc & ssc; 
process(sfsc, inp64, offi, ofs) is 
begin 
223 
case sfsc is 
when" 10" => 
outp32 <= inp64(3 J downto O); 
ofs <= offi; 
when "01" => 
outp32 <= inp64(63 downto 32); 
ofs <= offi + I; 
when "00" => 
outp32 <= (others=> 'O'); 
ofs <= ofs; 
when "11" => 
outp32 <=(others=> 'O'); 
ofs <= ofs; 
when others => 
outp32 <= (others=> 'O'); 
ofs <=(others=> 'O'); 
end case; 
offo <= ofs; 
end process; 
end architecture stprckt_ beh; 
-- For STPR SEL 
library IEEE; 
use IEEE.std_logic_ I 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity ssel is 
port(inp I, inp2, fos, outinp: in std _logic_ vector(3 l down to O); 
fin: in std _logic_ vector(? downto 0); 
sts, ldfmfr, oc: in std _logic; 
inpo: out std _logic_ vector(3 l down to O)); 
end entity ssel; 
architecture ssel beh of ssel is 
signal slo: std _logic_ vector(2 down to O); 
begin 
slo <= sts & ldfmfr & oc; 
process(slo, inp 1, inp2, fos, fin, outinp) is 
begin 
case slo is 
when "l 00" => 
inpo <= inp2; -- LFPR, STPR Offset 
when "000" => 
inpo <= inp I ; -- PKRAM address 
when "011" => 
inpo <= fos(3 I downto 8)&fin; 
when "010" => 
inpo <= fos(3 l downto 8)&fin; 
224 
when "00 I" => 
inpo <= outinp; 
when others => 
inpo <=(others=> '0'); 
end case; 
end process; 
end architecture ssel beh· 
- , 
-- CRC MODULE 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
entity crcmod is 
port(ldcr, crcz: in std _logic; 
infmpkt: in std _logic_ vector(3 l downto 0); 
crcinfmpkt, crcin: in std _logic_ vector(3 l downto 0); 
crccalc_out: out std _logic_ vector(3 l downto 0); 
crcchkok: out std_logic); 
end entity crcmod; 
architecture crcmod beh of crcmod is 
-- components for CRC-32 calculation 
component crc32wl 6 is 
port( crcin: in std _logic_ vector(3 l downto 0); 
Data_ in: in std _logic_ vector(31 down to 0); 
CRCout: out std _logic_ vector(3 1 downto 0)); 
end component crc32wl 6; 
component compcrc is 
port( calccrc, crcin: in std _logic_ vector(31 downto 0); 
crcchkout: out std_logic); 
end component compcrc; 
component crcout is 
port(ldcr, crcz: in std_logic; 
crcoutin, cin2: in std_logic_ vector(3 l downto 0); 
crcoutout: out std _logic_ vector(3 I downto 0)); 
end component crcout; 
signal CRCsignal, crcc _ out: std _logic_ vector(3 l downto O); 
begin 
crccalc _ out <= crcc _ out; 
CRCoutcomp: crcout port map(ldcr=>ldcr, crc=>crcz, crcoutin=>CRCsignal, cin2=>crcin, 
crcoutout=>crcc _ out); 
CRC32: crc32w16 port map(crcin=>crcin, Data_in=>infmpkt, CRCout=>CRCsignal); 
CRCcmp: compcrc port map(calccrc=>crcin, crcin=>crcinfmpkt, crcchkout=>crcchkok); 
end architecture crcmod _ beh; 
-- CRCtop 
library IEEE; 
use IEEE.std _logic_ 1164 .all; 
225 
entity crc32w 16 is 
port( crcin: in std_logic_ vector(3 I downto 0); 
Data_in: in std_logic_ vector(3 I downto 0); 
CRCout: out std _logic_ vector(3 l down to 0)); 
end entity crc32wl6; 
architecture crc32w 16 _ beh of crc32w 16 is 
component nextCRC32_D32 is 
port( Data: in std _logic_ vector(3 l down to 0); 
CRC: in std _logic_ vector(3 l downto 0); 
NewCRC: out std_logic_ vector(3 l downto 0)); 
end component nextCRC32_D32; 
begin 
crc32comp: nextCRC32_D32 port map(Data=>Data_in(31 downto 0), CRC=>crcin, NewCRC=>CRCout); 
end architecture crc32w 16 _ beh; 
-- CRC-32 for 32-input width 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity nextCRC32_D32 is 
port( Data: in std _logic_ vector(3 l downto 0); 
CRC: in std _logic_ vector(3 l downto 0); 
NewCRC: out std _logic_ vector(3 I downto 0)); 
end entity nextCRC32_D32; 
architecture crc_beh ofnextCRC32_D32 is 
signal D: std _logic_ vector(3 l downto 0); 
signal C: std _logic_ vector(3 1 downto 0); 
begin 
process(Data, CRC, D, C) is 
begin 
D <= Data; 
C<= CRC; 
NewCRC(0) <= D(3 l) xor D(30) xor D(29) xor D(28) xor D(26) xor D(25) xor 
D(24) xor D(l6) xor D(12) xor D(IO) xor D(9) xor D(6) xor 
D(0) xor C(0) xor C(6) xor C(9) xor C(IO) xor C(l2) xor 
C(l 6) xor C(24) xor C(25) xor C(26) xor C(28) xor C(29) xor 
C(30) xor C(3 l ); 
NewCRC(I) <= D(28) xor D(27) xor D(24) xor D(l 7) xor D(l6) xor D(l3) xor 
D(l2) xor D(J I) xor D(9) xor D(7) xor D(6) xor D(I) xor 
D(0) xor C(0) xor C(l) xor C(6) xor C(7) xor C(9) xor 
C(I I) xor C(12) xor C(l3) xor C(J6) xor C(l 7) xor C(24) xor 
C(27) xor C(28); 
NewCRC(2) <= D(3 I) xor D(30) xor D(26) xor D(24) xor D(l8) xor D(l 7) xor 
D( 16) xor D(14) xor D( 13) xor D(9) xor D(8) xor D(7) xor 
D(6) xor D(2) xor D( l) xor D(0) xor C(0) xor C( I) xor 
C(2) xor C(6) xor C(7) xor C(8) xor C(9) xor C(l3) xor 
C( 14) xor C( 16) xor C( 17) xor C( 18) xor C(24) xor C(26) xor 
C(30) xor C(3 l ); 
NewCRC(3) <= D(31) xor D(27) xor D(25) xor D(J9) xor D(l 8) xor D(l 7) xor 
D( 15) xor D( 14) xor D( 10) xor D(9) xor D(8) xor D(7) xor 
226 
0(3) xor 0(2) xor 0( l) xor C( I) xor C{2) xor C(3) xor 
C(7) xor C(8) xor C(9) xor C{lO) xor C{l4) xor C(l5) xor 
C( l 7) xor C( 18) xor C( 19) xor C(25) xor C(2 7) xor C(3 l ); 
NewCRC(4) <= 0(31) xor 0(30) xor 0(29) xor 0(25) xor 0(24) xor 0 (20) xor 
0( 19) xor D( 18) xor 0 ( 15) xor 0( 12) xor D( 11) xor 0(8) xor 
0(6) xor 0(4) xor D(3) xor 0(2) xor D(O) xor C(O) xor 
C(2) xor C(3) xor C(4) xor C(6) xor C(8) xor C{l 1) xor 
C{l2) xor C(l5) xor C(l8) xor C{l9) xor C(20) xor C(24) xor 
C(25) xor C(29) xor C(30) xor C(31 ); 
NewCRC{5) <= D(29) xor D(28) xor 0(24) xor D(2 l) xor 0(20) xor D{l 9) xor 
D( 13) xor D( 10) xor D(7) xor 0(6) xor D(5) xor 0( 4) xor 
D(3) xor D( I) xor D{O) xor C(O) xor C( 1) xor C(3) xor 
C(4) xor C(5) xor C(6) xor C(7) xor C(lO) xor C{l3) xor 
C{l9) xor C(20) xor C(21) xor C(24) xor C(28) xor C(29); 
NewCRC(6) <= D(30) xor D(29) xor D(25) xor 0 (22) xor 0(21 ) xor 0(20) xor 
D( l4) xor 0(11) xor D(8) xor D(7) xor 0(6) xor D(5) xor 
D(4) xor D(2) xor 0(1) xor C(l) xor C(2) xor C(4) xor 
C(5) xor C(6) xor C(7) xor C(8) xor C{l I) xor C(l4) xor 
C(20) xor C(2 l) xor C(22) xor C(25) xor C(29) xor C(30); 
NewCRC(7) <= D(29) xor 0(28) xor 0 (25) xor 0(24) xor 0(23) xor 0(22) xor 
0 (21) xor 0( 16) xor D( 15) xor 0 ( I 0) xor 0(8) xor D(7) xor 
D(5) xor 0(3) xor D(2) xor 0 (0) xor C(O) xor C(2) xor 
C(3) xor C(5) xor C(7) xor C(8) xor C( I 0) xor C( 15) xor 
C( 16) xor C(2 l) xor C(22) xor C(23) xor C(24) xor C(25) xor 
C(28) xor C(29); 
NewCRC(8) <= D(3 l) xor D(28) xor D(23) xor 0(22) xor O( 17) xor 0( 12) xor 
D( 11) xor D( 10) xor D(8) xor 0 ( 4) xor 0(3) xor 0(1) xor 
0(0) xor C(O) xor C( I) xor C(3) xor C(4) xor C(8) xor 
C( I 0) xor C( 11) xor C( 12) xor C( 17) xor C(22) xor C(23) xor 
C(28) xor C(31); 
NewCRC(9) <= D(29) xor D(24) xor 0 (23) xor D( l8) xor 0(13) xor D(l2) xor 
0( 11) xor D(9) xor 0 ( 5) xor 0( 4) xor 0(2) xor D( I) xor 
C(l) xor C(2) xor C(4) xor C(5) xor C(9) xor C{l l) xor 
C{l2) xor C(l3) xor C(l8) xor C(23) xor C(24) xor C(29); 
NewCRC{lO) <= 0 (31) xor D(29) xor D(28) xor 0(26) xor 0(19) xor 0(16) xor 
D{l4) xor 0(13) xor 0 (9) xor 0 (5) xor 0(3) xor D(2) xor 
0 (0) xor C(O) xor C(2) xor C(3) xor C(5) xor C(9) xor 
C( 13) xor C( 14) xor C( 16) xor C( 19) xor C(26) xor C(28) xor 
C(29) xor C(3 l ); 
NewCRC( 11) <= D(3 l) xor 0(28) xor 0 (27) xor D(26) xor 0 (25) xor 0(24) xor 
0 (20) xor D( 17) xor D( 16) xor D( 15) xor D( 14) xor D( 12) xor 
0 (9) xor 0(4) xor D(3) xor 0 (1) xor 0(0) xor C(O) xor 
C{l) xor C(3) xor C(4) xor C(9) xor C(\2) xor C{l4) xor 
C( 15) xor C( 16) xor C( 17) xor C(20) xor C(24) xor C(25) xor 
C(26) xor C(27) xor C(28) xor C(3 l ); 
NewCRC(l2) <= D(3 l) xor 0(30) xor D(27) xor 0(24) xor 0 (2 1) xor 0(18) xor 
D( 17) xor D(15) xor D( 13) xor 0( 12) xor D(9) xor D( 6) xor 
D(5) xor 0(4) xor D(2) xor D(l) xor 0 (0) xor C(O) xor 
C{l) xor C(2) xor C(4) xor C(5) xor C(6) xor C(9) xor 
C(l2) xor C(l3) xor C(l5) xor C(l7) xor C(l8) xor C(21) xor 
C(24) xor C(27) xor C(30) xor C(3 l ); 
NewCRC(l3) <c=c 0(3 I) xor 0 (28) xor 0(25) xor D(22) xor 0 (19) xor 0(18) xor 
D( 16) xor D( 14) xor D( I 3) xor 0 ( 10) xor D(7) xor D(6) xor 
0 (5) xor D(3) xor D(2) xor D(l) xor C( I) xor C(2) xor 
C(3) xor C(5) xor C(6) xor C(7) xor C(lO) xor C(l3) xor 
C( 14) xor C( 16) xor C( 18) xor C( 19) xor C(22) xor C(25) xor 
227 
C(28) xor C(3 l ); 
NewCRC(l4) <= D(29) xor D(26) xor 0(23) xor D(20) xor D(l9) xor D( 17) xor 
D( 15) xor D{l 4) xor D{l I) xor 0(8) xor D(7) xor O{ 6) xor 
D(4) xor D(3) xor D{2) xor C(2) xor C(3) xor C(4) xor 
C( 6) xor C(7) xor C(8) xor C( 11) xor C( 14) xor C( 15) xor 
C( 17) xor C( 19) xor C(20) xor C(23) xor C(26) xor C(29); 
NewCRC( 15) <= D(30) xor D(27) xor D(24) xor D(2 l) xor 0(20) xor D( 18) xor 
D( 16) xor 0( 15) xor 0( 12) xor 0 (9) xor 0(8) xor 0(7) xor 
0(5) xor 0(4) xor 0(3) xor C(3) xor C(4) xor C(5) xor 
C(7) xor C(8) xor C(9) xor C( 12) xor C( 15) xor C( 16) xor 
C(18) xor C(20) xor C(21) xor C(24) xor C(27) xor C(30); 
NewCRC(l6) <= D(30) xor 0 (29) xor D(26) xor D(24) xor D(22) xor D{2 l) xor 
D( 19) xor 0( 17) xor D( 13) xor 0( 12) xor D{8) xor D(5) xor 
0(4) xor D(O) xor C(O) xor C(4) xor C(5) xor C(8) xor 
C( 12) xor C(l3) xor C( 17) xor C( l9) xor C(2 l) xor C(22) xor 
C(24) xor C(26) xor C(29) xor C(30); 
NewCRC( 17) <= 0(31) xor D(30) xor 0(27) xor D(25) xor 0(23) xor 0(22) xor 
D(20) xor D( 18) xor 0( 14) xor 0( 13) xor 0(9) xor D( 6) xor 
0(5) xor 0(1) xor C(I) xor C(5) xor C(6) xor C(9) xor 
C(l3) xor C(l4) xor C(l8) xor C(20) xor C(22) xor C(23) xor 
C(25) xor C(27) xor C(30) xor C(3 l ); 
NewCRC( 18) <= D{3 l) xor D(28) xor 0(26) xor 0(24) xor 0(23) xor 0(2 1) xor 
0( 19) xor 0(15) xor D( 14) xor 0 ( I 0) xor 0(7) xor 0(6) xor 
0(2) xor C(2) xor C(6) xor C(7) xor C( IO) xor C(l4) xor 
C(l 5) xor C(l9) xor C(21) xor C(23) xor C(24) xor C(26) xor 
C(28) xor C(3 l ); 
NewCRC( 19) <= 0(29) xor 0 (27) xor 0(25) xor 0(24) xor 0(22) xor 0(20) xor 
0 ( I 6) xor 0( 15) xor 0(11) xor 0 (8) xor 0(7) xor 0(3) xor 
C(3) xor C(7) xor C(8) xor C(l l ) xor C(l5) xor C(l6) xor 
C(20) xor C(22) xor C(24) xor C(25) xor C(27) xor C(29); 
NewCRC(20) <= 0 (30) xor 0(28) xor 0(26) xor 0(25) xor 0(23) xor 0(21) xor 
0( 17) xor 0( 16) xor 0( 12) xor 0 (9) xor 0(8) xor 0( 4) xor 
C( 4) xor C(8) xor C(9) xor C( 12) xor C( 16) xor C( I 7) xor 
C(2 l) xor C(23) xor C(25) xor C(26) xor C(28) xor C(30); • 
NewCRC(2 l ) <= 0 (3 1) xor 0(29) xor 0(27) xor D(26) xor 0(24) xor D(22) xor 
D( 18) xor 0(17) xor 0( 13) xor 0 ( I 0) xor 0(9) xor 0(5) xor 
C(5) xor C(9) xor C( l 0) xor C( 13) xor C( 17) xor C( 18) xor 
C(22) xor C(24) xor C(26) xor C(27) xor C(29) xor C(3 l ); 
NewCRC(22) <= 0(31) xor 0 (29) xor 0(27) xor 0(26) xor D(24) xor 0(23) xor 
0( 19) xor D( I 8) xor D{l 6) xor 0(14) xor 0( 12) xor 0(11) xor 
0(9) xor 0(0) xor C(O) xor C(9) xor C(l l) xor C(l2) xor 
C(l4) xor C( l 6) xor C( l8) xor C( l9) xor C(23) xor C(24) xor 
C(26) xor C(27) xor C(29) xor C(3 l); 
NewCRC(23) <= 0(31) xor 0(29) xor 0(27) xor 0(26) xor 0 (20) xor 0( 19) xor 
0( 17) xor 0( 16) xor 0( 15) xor 0 ( 13) xor 0(9) xor D(6) xor 
0(1) xor 0(0) xor C(O) xor C( l) xor C(6) xor C(9) xor 
C( l 3) xor C( 15) xor C( 16) xor C( 17) xor C( 19) xor C(20) xor 
C(26) xor C(27) xor C(29) xor C(3 l ); 
NewCRC(24) <= 0(30) xor 0(28) xor D(27) xor 0(21) xor 0(20) xor D( l 8) xor 
D( 17) xor 0( 16) xor 0( 14) xor D( I 0) xor D(7) xor 0 (2) xor 
D( I) xor C( I) xor C(2) xor C(7) xor C( 10) xor C( 14) xor 
C( 16) xor C( 17) xor C( 18) xor C(20) xor C(2 I) xor C(27) xor 
C(28) xor C(30); 
NewCRC(25) <= 0(31) xor 0(29) xor 0(28) xor 0(22) xor 0 (21) xor 0 ( 19) xor 
0( 18) xor D( 17) xor D( 15) xor 0( 11) xor D(8) xor D(3) xor 
0(2) xor C(2) xor C(3) xor C(8) xor C( 11 ) xor C( 15) xor 
228 
C( 17) xor C( 18) xor C( 19) xor C(2 l) xor C(22) xor C(28) xor 
C(29) xor C(3 l); 
NewCRC(26) <= D(31) xor D(28) xor D(26) xor D(25) xor D(24) xor D(23) xor 
D(22) xor D(20) xor D( 19) xor D(18) xor D( 10) xor D( 6) xor 
D(4) xor D(3) xor D(0) xor C(0) xor C(3) xor C(4) xor 
C(6) xor C( 10) xor C( 18) xor C( 19) xor C(20) xor C(22) xor 
C(23) xor C(24) xor C(25) xor C(26) xor C(28) xor C(3 l); 
NewCRC(27) <= D(29) xor D(27) xor D(26) xor D(25) xor D(24) xor D(23) xor 
D(2 l) xor D(20) xor D{19) xor D( 11 ) xor D(7) xor D( 5) xor 
D(4) xor D(l) xor C(J) xor C(4) xor C(5) xor C(7) xor 
C(l 1) xor C( 19) xor C(20) xor C(2 I) xor C(23) xor C(24) xor 
C(25) xor C(26) xor C(27) xor C(29); 
NewCRC(28) <= D(30) xor D(28) xor D(27) xor D(26) xor D{25) xor D(24) xor 
D(22) xor D(2 l) xor D(20) xor D{l2) xor D(8) xor D(6) xor 
D(5) xor D(2) xor C(2) xor C(5) xor C(6) xor C(8) xor 
C( 12) xor C(20) xor C(2 l) xor C(22) xor C(24) xor C(25) xor 
C(26) xor C{27) xor C(28) xor C(30); 
NewCRC(29) <= D(3 l) xor D(29) xor D(28) xor D(27) xor D(26) xor D(25) xor 
D(23) xor D(22) xor D(2 l) xor D(l3) xor D(9) xor D(7) xor 
D(6) xor D(3) xor C(3) xor C(6) xor C(7) xor C(9) xor 
C( 13) xor C(2 l) xor C(22) xor C(23) xor C(25) xor C(26) xor 
C(27) xor C(28) xor C(29) xor C(3 l ); 
NewCRC(30) <= D(30) xor D(29) xor D(28) xor D(27) xor D(26) xor D(24) xor 
D(23) xor D(22) xor D(l4) xor D{l0) xor D(8) xor D(7) xor 
D(4) xor C(4) xor C(7) xor C(8) xor C(l0) xor C(l4) xor 
C(22) xor C(23) xor C(24) xor C(26) xor C(27) xor C(28) xor 
C(29) xor C(30); 
NewCRC(3 l) <= D(3 l ) xor D(30) xor D(29) xor D(28) xor D(27) xor D(25) xor 
D(24) xor D(23) xor D( 15) xor D( 11) xor D(9) xor D(8) xor 
D(5) xor C(5) xor C(8) xor C(9) xor C(l l) xor C( l5) xor 
C(23) xor C(24) xor C(25) xor C(27) xor C(28) xor C(29) xor 
C(30) xor C(3 l ); 
end process; 
end architecture crc_beh; 
-- CRC Compare 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity compcrc is 
port(calccrc, crcin: in std_logic_ vector(3 I downto 0); 
crcchkout: out std_logic); 
end entity compcrc; 
architecture compcrc _ beh of compcrc is 
component XOR2 is 
port(IO, I 1: in std _logic; 
0: out std _logic); 
end component XOR2; 
component OR16 is 
port(I6, 19, 18, I7, 15, 14, 13, 12,115,114, 113,112, II 1,110, 11 , IO: in std_logic; 
0: out std_logic); 
end component OR16; 
229 
signal x: std_logic_ vector(3 l downto 0); 
signal EQ I, EQ2: std_ logic; 
begin 
xorcompl: XOR2 port map(IO=>calccrc(0), Il =>crcin(0), O=>x(0)); 
xorcomp2: XOR2 port map(IO=>calccrc( I), 11 =>crcin( I), O=>x(l )); 
xorcomp3: XOR2 port map(IO=>calccrc(2), Il=>crcin(2), O=>x(2)); 
xorcomp4: XOR2 port map(IO=>calccrc(3), Il=>crcin(3), O=>x(3)); 
xorcomp5: XOR2 port map(IO=>calccrc(4), ll =>crcin(4), O=>x(4)); 
xorcomp6: XOR2 port map(IO=>calccrc(5), I l=>crcin(5), O=>x(5)); 
xorcomp7: XOR2 port map(I0=>calccrc(6), Il=>crcin(6), O=>x(6)); 
xorcomp8: XOR2 port map(IO=>calccrc(7), Il=>crcin(7), O=>x(7)); 
xorcomp9: XOR2 port map(IO=>calccrc(8), ll=>crcin(8), O=>x(8)); 
xorcomp l 0: XOR2 port map(IO=>calccrc(9), I l =>crcin(9), O=>x(9)); 
xorcomp 11: XOR2 port map(IO=>calccrc( I 0), 11 =>crcin( I 0), O=>x(l 0)); 
xorcomp 12: XOR2 port map(IO=>calccrc(l l), Il =>crcin(l l), O=>x(l l )); 
xorcompl3: XOR2 port map(IO=>calccrc( l2), Il=>crcin( l2), O=>x( l2)); 
xorcompl4: XOR2 port map(IO=>calccrc(l3), ll=>crcin( l3), O=>x( l3)); 
xorcompl5: XOR2 port map(IO=>calccrc(l4), Il =>crcin(l4), O=>x( l4)); 
xorcomp 16: XOR2 port map(IO=>calccrc( 15), I I =>crcin( 15), O=>x( 15)); 
xorcomp 17: XOR2 port map(IO=>calccrc( 16), II =>crcin( 16), O=>x( I 6)); 
xorcompl8: XOR2 port map(IO=>calccrc(l7), ll=>crcin(17), O=>x( \7)); 
xorcomp 19: XOR2 port map(IO=>ca\ccrc(I 8), II =>crcin( 18), O=>x(l 8)); 
xorcomp20: XOR2 port map(IO=>calccrc( 19), II =>crcin( 19), O=>x( I 9)); 
xorcomp2 l: XOR2 port map(IO=>calccrc(20), I I =>crcin(20), O=>x(20)); 
xorcomp22: XOR2 port map(IO=>calccrc(2 I), 11 =>crcin(2 I), O=>x(2 I)); 
xorcomp23: XOR2 port map(IO=>calccrc(22), II =>crcin(22), O=>x(22)); 
xorcomp24: XOR2 port map(IO=>calccrc(23), II =>crcin(23), O=>x(23)); 
xorcomp25: XOR2 port map(IO=>calccrc(24), Il=>crcin(24), O=>x(24)); 
xorcomp26: XOR2 port map(IO=>calccrc(25), II =>crcin(25), O=>x(25)); 
xorcomp27: XOR2 port map(IO=>calccrc(26), Il =>crcin(26), O=>x(26)); 
xorcomp28: XOR2 port map(IO=>calccrc(27), II =>crcin(27), O=>x(27)); 
xorcomp29: XOR2 port map(IO=>calccrc(28), I I =>crcin(28), O=>x(28)); 
xorcomp30: XOR2 port map(IO=>calccrc(29), Il=>crcin(29), O=>x(29)); 
xorcomp3 I: XOR2 port map(IO=>calccrc(30), 11 =>crcin(30), O=>x(30)); 
xorcomp32: XOR2 port map(IO=>calccrc(31), ll=>crcin(31 ), O=>x(3 I)); 
orcomp I : OR16 port map(I6=>x(6), l9=>x(9), l8=>x(8), 17=>x(7), 15=>x(5), I4=>x(4), 13=>x(3), 
I2=>x(2), I15=>x(l 5), ll4=>x(1 4), II 3=>x(l3), I 12=>x( 12), II l=>x( 11), I!0=>x(l0), 11 =>x( I), IO=>x(O), 
O=>EQI); 
orcomp2: OR16 port map(16=>x(i6), l9=>x(17), 18=>x(18), 17=>x(l9), 15=>x(20), l4=>x(2 1), 13=>x(22), 
I2=>x(23), I 15=>x(24), I J 4=>x(25), II 3=>x(26), II 2=>x(27), 11 I =>x(28), II 0=>x(29), II =>x(30), 
IO=>x(3 l ), O=>EQ2); 
crcchkout <= not (EQ l or EQ2); 
end architecture compcrc _ beh; 
-- CRC OUT 
library IEEE; 
use IEEE.std _logic_ l I 64.all; 
use IEEE.std _logic_arith.all; 
use IEEE.std _logic_ unsigned.al I; 
entity crcout is 
port(ldcr, crcz: in std_logic; 
230 
crcoutin, cin2: in std _ logic_ vector(3 1 downto 0); 
crcoutout: out std _logic_ vector(3 l downto 0)); 
end entity crcout; 
architecture crcout_ beh of crcout is 
signal sig: std _logic_ vector( l down to 0); 
begin 
sig <= crcz&ldcr; 
process(sig, crcoutin, cin2) is 
begin 
case sig is 
when "00" => crcoutout <= crcoutin; 
when "O I" => crcoutout <= cin2; 
when" 10" => crcoutout <=(others=> '0'); 
when "1 1" => crcoutout <= ( others => '0'); 
when others => crcoutout <=(others=> '0'); 
end case; 
end process; 
end architecture crcout_ beh; 
-- CRCCALCULATED VALUE STORE 
library IEEE; 
use IEEE.std_logic _1164.all; 
entity crest is 
port(clk : in std_logic; 
ere_ calc _in : in std _logic_ vector(3 1 downto 0); 
crc_calc_out : out std_logic_vector(3 1 downto O)); 
end entity crest; 




if (rising_ edge( elk) )then 
crc_calc_out <= crc_calc_ in; 
end if; 
end process; 
end architecture crcst_beh; 
-- CRC OUTRAM MODULE 
library IEEE; 
use IEEE.std_logic_l l64.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity crcout_ ram is 
port( epout: in std _logic; 
ere cin, outramin: in std _logic_ vector(3 l downto O); 
out~amout: out std _logic_ vector(3 l down to 0)); 
end entity crcout_ram; 
231 
architecture crcout ram heh of crcout ram is 
begin - -
process( epout, ere_ cin, outramin) is 
begin 
case epout is 
when 'O' => outramout <= outrarnin; 
when' I' => outramout <= crc_cin; 
when others=> outramout <= (others => 'O'); 
end case; 
end process; 
end architecture crcout_ ram_ beh; 
-- For getting instructions 
library IEEE; 
use IEEE.std_logic_ l l64.all; 
entity tstin is 
port(in _inst, elk: in std _logic; 
ininstout: out std _logic); 
end entity tstin; 
architecture tstin beh oftstin is 
component FD is 
port(D, C: in std_logic; 
Q: out std _logic); 
end component FD; 
component gdi is 
port(clk, fb: in std_ logic; 
diout: out std_logic) ; 
end component gdi; 
signal inbarO, inbarl , inO, inl , ininstoutl: std_logic; 
begin 
dff_stO: FD port map(D=>in_inst, C=>clk, Q=>inO); 
dff_st l : FD port map(D=>inO, C=>clk, Q=>inl); 
inbarO <= not(inO); 
inbarl <= not(inl); 
gdicomp: gdi port map(clk=>clk, fb=>inbarO, diout=>ininstoutl ); 
ininstout <= in _inst and ininstoutl ; 
end architecture ts tin_ beh; 
-- Getting desired inst 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity gdi is 
port(clk, fb: in std_logic; 
diout: out std_logic); 
end entity gdi; 
architecture gdi_beh of gdi is 
begin 
232 
process( elk, fb) is 
begin 
if(falling_edge(clk)) then 
diout <= fb; 
end if; 
end process; 
end architecture gdi_beb; 
-- 8 bit FLAG register 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity aereg is 
port( elk, ldaer : in std _logic; 
flagval_ in : in std _logic_ vector(? downto O); 
aerout : out std _logic_ vector(? downto O)); 
end entity aereg; 
architecture aereg_ beh of aereg is 
begin 
process(clk, ldaer, flagval_in) 
begin 
if (rising_ edge( elk) )then 
if (ldaer = 'I') then 




end architecture aereg_ beh; 
-- 8 bit OCR register 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
entity ocreg is 
port( elk, !doer : in std _logic; 
val_ in : in std _logic_ vector(? down to O); 
ocrout : out std _logic_ vector(? downto O)); 
end entity ocreg; 
architecture ocreg_beb of ocreg is 
begin 
process(clk, !doer, val_ in) 
begin 
if (rising_ edge( clk))then 
if (!doer = ' l ') then 




end architecture ocreg_ beh; 
-- 6 bit MOR register 
library IEEE; 
use IEEE.std_logic_ 1164.all; 
233 
entity moreg is 
port( elk, ldmor : in std _logic; 
mop_fmpkt_in: in std_logic_vector(5 downto 0); 
mopout : out std _logic_ vector( 5 downto 0) ); 
end entity moreg; 
architecture moreg_beh ofmoreg is 
begin 
process(clk, ldmor, mop_fmpkt_in) 
begin 
if (rising_ edge( clk))then 
if (ldmor = 'l ') then 




end architecture moreg_ beh; 
--ESS 
--FULL ESS from 3 Stages 
library IEEE; 
use lEEE.std_logic_l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
use lEEE.numeric_std.all; 
library SYNOPSYS; 
use SYNOPSYS.A TTRIBUTES.all; 
entity esstop0 is 
port(tag_in, value_in: in std_logic_vector(63 downto 0); 
elk, clock, ess_we, ess_re, putin: in std_logic; 
gf, pf, ess_full, le: out std_logic; 
outvalue: out std_logic_vector(63 downto 0)); 
end entity esstop0; 
architecture fulless _ beh of esstop0 is 
--Components 
component First_ Stage is 
port(tag_in: in std_logic_vector(63 downto 0); 
elk, put, get, empty, putin_fmTS, match_fmTS: in std_logic; 
frnmux _addr: in std _logic_ vector( 4 downto 0); 
matchout, putout, getout: out std_logic; 
match_outaddr: out std_logic_ vector(4 downto 0)); 
end component First_ Stage; 
component Second_Stage is 
port(clk, clock, get, put, empin_fmTS, lexpd_fmTS, put_fmTS, matchin_frnTS: in std_logic; 
matchin _ frnFS: in std _logic; 
mataddr_fmFS: in std_logic_vector(4 downto 0); 
matchout_sec, putout_sec, getout_sec: out std_logic; 
emptysig_ out_sec, life_ expd _ out_ sec, GF _ out, PF_ out: out std _logic; 
mux_outaddr_sec: out std_logic_vector(4 downto 0)); 
end component Second_Stage; 
234 
component TSPE is 
port( elk, get, put: in std _logic; 
GF _fmsec, PF _fmsec, empty_fmsec, lifeexpd_fmsec, match_fmsec: in std_logic; 
value _in: in std _logic_ vector( 63 downto 0); 
muxaddr_fmsec: in std_logic_vector(4 downto 0); 
GFOUT, PFOUT, ESSFULL, le: out std_logic; 
OUTVALUE: out std_ logic_vector(63 downto 0)); 
end component TSPE; 
----signals 
----FS signals 
signal matchout_FS, getout_FS, putout_FS: std_logic; 
signal muxaddr_fmsecst, matchaddr_FS: std_logic_vector(4 downto 0); 
--SS signals 
signal match_SS, EO_SS, put_SS, get_SS, LE_SS, GF _SS, PF _SS: std_logic; 
----TS signals 
signal gf_TS, pf_TS: std_logic; 
begin 
gf <= GF_SS; 
pf <= PF_SS; 
FS _ comp: First_ Stage port map(tag_ in=>tag_ in, clk=>clk, put=>ess _ we, get=>ess _re, empty=>EO _ SS, 
putin _fm TS=>putin, match_ fm TS=>match _ SS, fmmux _ addr=>muxaddr _fmsecst, 
matchout=>matchout_FS, putout=>putout_FS, getout=>getout_FS, match_ outaddr=>matchaddr _FS); 
SS_comp: Second_Stage port map(clk=>clk, clock=>clock, get=>getout_FS, put=>putout_FS, 
empin_fmTS=>EO_SS, lexpd_fmTS=>LE_SS, put_fmTS=>put_SS, matchin_fmTS=>match_SS, 
matchin _ fmFS=>matchout_FS, mataddr _ fmFS=>matchaddr _FS, matchout_sec=>match _ SS, 
putout_sec=>put_ SS, getout_ sec=>get_ SS, emptysig_ out_sec=>EO _ SS, life_ ex pd_ out_ sec=>LE _ SS, 
GF _out=>GF _SS, PF_out=>PF _SS, mux_outaddr_sec=>muxaddr_fmsecst); 
TS_ comp: TSPE port map( clk=>clk, get=>get_ SS, put=>put_ SS, GF _fmsec=>GF _ SS, 
PF _fmsec=>PF _SS, empty_fmsec=>EO_SS, lifeexpd_fmsec=>LE_SS, match_fmsec=>match_SS, 
value_ in=>value _in, muxaddr _ fmsec=>muxaddr _ fmsecst, GFOUT=>gf _ TS, PFOUT=>pf _ TS, 
ESSFULL=>ess_full , le=>le, OUTVALUE=>outvalue); 
end architecture fulless _ beh; 
--Individual Components 
--First Stage Pipeline ESS 
library IEEE; 
use IEEE.std_logic_J l64.all; 
entity First_ Stage is 
port( tag_in: in std _logic_ vector( 63 downto 0); 
elk, put, get, empty, putin_fmTS, match_fmTS: in std_logic; 
fmmux addr: in std logic vector(4 downto 0); 
matcho~t, putout, gctout: ;;-ut std_logic; 
match_outaddr: out std_logic_vector(4 downto O)); 
end entity First_ Stage; 
architecture First_ Stage_ beh of First_ Stage is 
--components 
component FSPE is 
235 
port(tag_in: in std_logic_ vector(63 downto 0); 
elk, put, get, empty, putin_fmTS, match_fmTS: in std_logic; 
fmmux_addr: in std_ logic_vector(4 downto 0); 
matchsig: out std_logic; 
matchoutaddr: out std _logic_ vector( 4 downto 0)); 
end component FSPE; 
component FSL is 
port(clk: in std_logic; 
putin, getin, matchin: in std_logic; 
match_ inaddr: in std _logic_ vector( 4 down to 0); 
putout, getout, matchout: out std_ logic; 
match_outaddr: out std_logic_vector{4 downto 0)); 
end component FSL; 
--signals 
signal mat_signal: std _logic; 
signal mat_address: std_logic_vector(4 downto 0); 
begin 
FSPE_comp: FSPE port map(tag_in=>tag_in, clk=>clk, put=>put, get=>get, empty=>empty, 
putin _fmTS=>putin _ fmTS, match_ fmTS=>match _fmTS, fmmux _addr=>fmmux _ addr, 
matchsig=>mat_ signal, matchoutaddr=>mat_ address); 
FSL_comp: FSL port map(clk=>clk, putin=>put, getin=>get, matchin=>mat_signal, 
match_ inaddr=>mat_address, putout=>putout, getout=>getout, matchout=>matchout, 
match_ outaddr=>match _ outaddr); 
end architecture First_Stage_beh; 
--Individual Components 
--First Stage of Pipeline ESS 
library IEEE; 
use IEEE.std_logic _ 1164.aJI; 
entity FSPE is 
port(tag_ in: in std _logic_ vector( 63 downto 0); 
elk, put, get, empty, putin_fmTS, match_fmTS: in std_logic; 
fmmux_addr: in std_logic_vector(4 downto 0); 
matchsig: out std _logic; 
matchoutaddr: out std_logic_ vector(4 downto 0)); 
end entity FSPE; 
architecture FSPE heh ofFSPE is 
component camfull is 
port( tag_ in : in std_ logic_ vector( 63 down to 0) ; 
ADDR : in std_logic_vector(4 downto 0); 
WRITE_ENABLE : in std_logic; 
ERASE_ WRITE : in std_logic; 
WRITE_RAM : in std_logic; 
CLK : in std_logic; 
MATCH _ENABLE : in std _logic; 
MATCH RST : in std logic; 
MATCH- SIG OUT - : out std_logic; 
MATCH=ADDR: out std_logic_vector(4 downto 0)); 
end component camfull; 
236 
signal pg, write: std _logic; 
begin 
pg <= put or get; 
write<= ( empty and putin_fmTS and (not(match_fmTS)) ); -- has to be empty and put (not empty alone) 
camfull_comp: camfull port map(tag_in=>tag_in, ADDR=>fmmux_addr, WRITE_ENABLE=>write, 
ERASE_WRITE=>write, WRITE_RAM=>write, CLK=>clk, MATCH_ENABLE=>pg, 
MATCH _RST=>pg, MATCH_ SIG_ OUT =>matchsig, MATCH _ADDR=>matchoutaddr); 
end architecture FSPE _ beh; 
--Full Original CAM 
-- single CAM module 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity camfull is 
port( tag_in : in std _logic_ vector( 63 down to 0) ; 
ADDR : in std _logic_ vector( 4 down to 0) ; 
WRITE_ENABLE : in std_logic; 
ERASE_ WRITE : in std _logic; 
WRITE_RAM : in std_logic; 
CLK : in std _logic; 
MATCH_ENABLE : in std_logic; 
MATCH_RST : in std_logic; 
MATCH_ SIG_ OUT : out std _logic; 
MATCH_ADDR: out std_logic_vector(4 downto 0)); 
end entity camfull; 
architecture camfull beh of camfull is 
component cam l 6x64 _ 1 is 
port( tag_in: in std_logic_vector(63 downto 0); 
ADDR : in std_logic_ vector(3 downto 0); -- Used by erase/write operation only 
WRITE ENABLE : in std logic; -- Write Enable during 2 clock cycles 
ERASE= WRITE : in std_logic; --- if'O' ERASE else WRITE, generate from WRITE_ENABLE at 
the CAMs' top level 
WRITE RAM : in std _logic; -- if' I' DAT A_ IN is WRITE in the RAM l 6x 1 s, generate from 
WRITE_ENABLE at the CAMs' top level 
CLK : in std_logic; 
MATCH ENABLE : in std logic; 
MATCH - RST : in std logic; ---Synchronous reset => MATCH = "00000000000000000" 
MATCH - - : out std logic vector(l5 downto 0)); 
end component cam! 6x64 _ 1; 
component ENCODE_ 4 _ LSB is 
port( BINARY_ADDR : in std_logic_ vector(3 l downto 0); 
MATCH ADDR : out std _logic_ vector( 4 down to 0); -- Match address found 
MATCH=OK : out std_logic); -- ' l' ifMatch found 
end component ENCODE_ 4_LSB; 
signal match_sigl: std_logic_ vector( 15 downto 0); 
signal match_sig2: std_logic_ vector(l 5 downto 0); 
signal match _sig: std _logic_ vector(3 l downto 0); 
signal WE_l, WE_2, EW_ l , EW_2, WR_ !, WR_2, adnot: std_logic; 
begin 
237 
adnot <= not (ADDR(4)); 
WE_ l <= adnot and WRlTE _ ENABLE; 
WE_2 <= ADDR(4) and WRITE ENABLE; 
EW I <= adnot and ERASE WIUTE· 
- - , 
EW _2 <= ADDR(4) and ERASE WRJTE; 
WR_ I <= adnot and WRlTE _ RAM; 
WR_2 <= ADDR(4) and WRlTE_RAM; 
camfinal0: caml 6x64_1 port map(tag_in=>tag_in, ADDR=>ADDR(3 downto 0), 
WRITE_ENABLE=>WE_l, ERASE_WRITE=>EW _I, WRITE_RAM=>WR_ l , CLK=>CLK, 
MATCH_ ENABLE=>MA TCH _ ENABLE, MATCH _RST=>MATCH _ RST, MA TCH=>match _sig 1 ); 
camfinall : cam l 6x64_ 1 portmap(tag_in=>tag_in, ADDR=>ADDR(3 downto 0), 
WRITE_ENABLE=>WE_2, ERASE_ WRITE=>EW _2, WR1TE_RAM=>WR_2, CLK=>CLK, 
MATCH_ENABLE=>MATCH_ENABLE, MATCH_RST=>MATCH_RST, MATCH=>match_sig2); 
match_ sig <= match_ sig2&match _ sig 1; 
encoder: ENCODE_ 4_LSB port map(BINARY _ADDR=>match_sig, 
MATCH_ADDR=>MATCH_ADDR, MATCH_OK=>MATCH_SIG_OUT); 
end architecture camfull_beh; 
--CAM 16x64 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
entity cam l 6x64 _ I is 
port( tag_in: in std _logic_ vector(63 down to 0); 
ADDR : in std _ logic_ vector(3 downto 0) ; -- Used by erase/write operation only 
WRITE ENABLE : in std logic; -- Write Enable during 2 clock cycles 
ERASE= WRlTE : in std_logic; --if'O' ERASE else WRITE, generate from WRITE_ENABLE at 
the CAMs' top level 
WRITE_RAM : in std_logic; -- if' !' DATA_IN is WRITE in the RAM16xls, generate from 
WRlTE_ENABLE at the CAMs' top level 
CLK : in std_log:ic; 
MATCH ENABLE : in std logic; 
MATCH-RST : in std logic; ---Synchronous reset => MATCH= "00000000000000000" 
MATCH - - : out std_logic_ vector(l 5 downto 0)); 
end entity cam l 6x64 _ 1; 
architecture carntryl_beb ofcaml 6x64_ 1 is 
component CAM _RAMB4 is 
port( DAT A_ IN : in std _logic_ vector(? down to 0) ; -- Data to compare or to write 
ADDR : in std _logic_ vector(3 downto 0) ; -- Used by erase/write operation only 
WRITE ENABLE : in std logic; -- Write Enable during 2 clock cycles 
ERASE= WRlTE : in std_logic; --if 'O' ERASE else WRITE, generate from WRITE_ENABLE at 
the CAMs' top level 
WRITE RAM : in std_logic; -- if'l' DATA_IN is WRITE in the RAM16xls, generate from 
WRITE ENABLE at the CAMs' top level 
- CLK : in std_logic; 
MATCH ENABLE : in std logic; 
MATCH- RST : in std_logic; ---Synchronous reset=> MATCH = "00000000000000000" 
MATCH= OUT : out std _logic_ vector( 15 down to 0)); 
end component CAM_RAMB4; 
238 
signal match_out0, match_outl, match_out2, match_out3, match_out4, match_out5, match_out6, 
match_out7: std_logic_vector(IS downto 0); 
begin 
camtry0: CAM_RAMB4 port map(DATA_IN=>tag_in(63 downto 56), ADDR=>ADDR, 
WRITE ENABLE=>WRITE ENABLE, ERASE WRITE=>ERASE WRITE 
WRITE=RAM=>WRITE_RAM, CLK=>CLK, MATCH ENABLE=>MATCH ENABLE, 
MATCH_RST=>MATCH_RST, MATCH_OUT=>match_outO); -
camtryl: CAM_RAMB4 port map(DATA_IN=>tag_in(55 downto 48), ADDR=>ADDR, 
WRITE_ ENABLE=>WRITE _ ENABLE, ERASE_ WRITE=>ERASE _ WRITE, 
WRITE_RAM=>WRITE_RAM, CLK=>CLK, MATCH_ENABLE=>MATCH_ENABLE, 
MATCH _RST=>MA TCH _ RST, MATCH_ OUT=>match _ out I); 
camtry2: CAM_RAMB4 port map(DATA_IN=>tag_in(47 downto 40), ADDR=>ADDR, 
WRITE_ENABLE=>WRITE_ENABLE, ERASE_ WRITE=>ERASE_ WRITE, 
WRITE_RAM=>WRITE_RAM, CLK=>CLK, MATCH_ENABLE=>MATCH_ENABLE, 
MATCH_RST=>MATCH_RST, MATCH_OUT=>match_out2); 
camtry3: CAM_RAMB4 port map(DATA_IN=>tag_in(39 downto 32), ADDR=>ADDR, 
WRITE_ ENABLE=>WRITE _ ENABLE, ERASE_ WRITE=>ERASE _ WRITE, 
WRITE_RAM=>WRITE_RAM, CLK=>CLK, MATCH_ENABLE=>MATCH_ENABLE, 
MATCH_RST=>MA TCH_RST, MATCH_OUT=>match_out3); 
camtry4: CAM_ RAMB4 port map(DA TA_ IN=>tag_ in(3 l downto 24 ), ADDR => ADDR, 
WRITE_ENABLE=>WRITE_ENABLE, ERASE_ WRITE=>ERASE_ WRITE, 
WRITE_RAM=>WRITE_RAM, CLK=>CLK, MATCH_ENABLE=>MATCH_ENABLE, 
MATCH_RST=>MATCH_RST, MATCH_OUT=>match_out4); 
camtry5: CAM_RAMB4 port map(DATA_IN=>tag_in(23 downto 16), ADDR=>ADDR, 
WRITE_ ENABLE=>WRITE _ ENABLE, ERASE_ WRITE=>ERASE _ WRITE, 
WRITE_RAM=>WRITE_RAM, CLK=>CLK, MATCH_ENABLE=>MATCH_ENABLE, 
MATCH_RST=>MATCH_RST, MATCH_OUT=>match_out5); 
camtry6: CAM_RAMB4 port map(DATA_IN=>tag_in(l5 downto 8), ADDR=>ADDR, 
WRITE_ENABLE=>WRITE_ENABLE, ERASE_ WRITE=>ERASE_ WRITE, 
WRITE RAM=>WRITE RAM, CLK=>CLK, MATCH ENABLE=>MATCH ENABLE, 
MATCH_ RST=>MA TCH _ RST, MATCH_ OUT=>match _ out6); -
camtry7: CAM_RAMB4 port map(DATA_IN=>tag_in(7 downto 0), ADDR=>ADDR, 
WRITE ENABLE=>WRITE ENABLE, ERASE WRITE=>ERASE WRITE, 
WRITE=RAM=>WRITE_AAM, CLK=>CLK, MATCH_ENABLE=>MATCH_ENABLE, 
MATCH _RST=>MATCH _ RST, MATCH_ OUT=>match _ out?); 
MATCH <= match_out0 and match_outl and match_out2 and match_out3 and match_out4 and 
match_out5 and match_out6 and match_out7; 
end architecture camtry 1 _ beh; 
-- Individual CAM module 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity CAM_ RAMB4 is 
port( DATA_IN 
ADDR 
: in std logic vector(? downto 0) ; -- Data to compare or to write 
: in std- logic- vector(3 downto 0) ; -- Used by erase/write operation only 
239 
WRITE_ENABLE : in std_logic; -- Write Enable during 2 clock cycles 
ERASE_ WRITE : in std_logic; -- if'0' ERASE else WRITE, generate from WRITE_ENABLE at 
the CAMs' top level 
WRITE RAM : in std_ logic; -- if ' !' DATA_IN is WRITE in the RAM16xl s, generate from 
WRITE_ENABLE at the CAMs' top level 
CLK : in std_logic; 
MATCH_ENABLE : in std_logic; 
MATCH_ RST : in std _logic; -- Synchronous reset=> MATCH = "00000000000000000" 
MATCH OUT : out std_logic_vector(l5 downto 0)); 
end CAM_RAMB4; 
architecture CAM RAMB4_arch ofCAM_RAMB4 is 
-- Components Declarations: 
component INIT_8_RAM 16xls 
port( DATA_IN : in std_logic_vector(7 downto 0); 
ADDR : in std_logic_vector(3 downto 0); 
WRITE_RAM : in std_logic; 
CLK : in std _logic; 
DA TA_ WRITE : out std _logic_ vector(? down to 0)); 
end component; 
component INIT _ RAMB4 _SI_ S 16 
port( DIA : in std _logic; 
ENA : in std_ logic; 
ENB : in std_logic; 
WEA : in std_logic; 
RSTB : in std _logic; 
CLK : in std_logic; 
ADDRA : in std_logic_vector (11 downto 0); 
ADD RB : in std _logic_ vector (7 down to 0); 
DOB : out std_logic_vector (15 downto 0)); 
end component; 
-- Signal Declarations: 
signal DATA_ WRITE : std_logic_ vector(? downto 0); 
signal ADDR_ WRITE : std_logic_ vector(! I downto 0); 
DATA WRITE 
-- Data to be written in the RAMB4 
-- Combine write address from ADDR and 
signal B_MATCH_RST: std_logic; -- inverter MATCH_RST active high 
begin 
B MATCH_ RST <= not MATCH_ RST; 
-- SelectRAM instantiation= 8 x RAM16xls_ 
RAM_ERASE: INIT_8_RAM16xls 
port map ( 
DATA_IN => DATA_IN, 
ADDR => ADDR, 
WRITE_RAM => WRITE_RAM, 
CLK=>CLK, 
DATA_WRITE => DATA_ WRITE 
); 
-- Select the write data for addressing 
ADDR WRITE(3 downto 0) <= ADDR(3 downto 0); 
ADDR= WRITE(! I downto 4) <= DATA_ WRITE(7 downto 0); 
240 
-- Select BlockRAM RAMB4 SI S 16 instantiation 
RAMB4 : INlT _RAMB4 _SI ~S 16 
port map ( 
DIA=> ERASE_ WRITE, 
ENA => WRITE_ENABLE, 
ENB => MATCH_ENABLE, 
WEA=> WRITE_ ENABLE, 
RSTB => B_MATCH_RST, 
CLK=>CLK, 
ADORA => ADDR_ WRITE(! I downto 0), 
ADORB => DATA_TN(7 downto 0), 
DOB => MATCH_OUT(15 downto 0) ); 
end CAM_RAMB4_arch; 
-- lnit_RAMB4_Sl_Sl6 module 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity INIT _ RAMB4 _S I_ S I 6 is 
port ( 
DIA : in std _logic; 
ENA : in std_logic; 
ENB : in std_logic; 
WEA : in std_logic; 
RSTB : in std _logic; 
CLK : in std_ logic; -- Same clock on ports A & B 
ADORA : in std_logic_ vector {1 1 downto 0); 
ADDRB : in std _logic_ vector (7 downto 0); 
DOB : out std _logic_ vector ( 15 down to 0) 
); -- unused input ports are tied to GND 
end TNIT_RAMB4_Sl _Sl6; 
architecture INIT_RAMB4_Sl_Sl6_arch ofINIT_RAMB4_S l_Sl6 is 
component RAMB4 _ S l _ S 16 
-- pragma synthesis_ off 
generic( 
INIT _ 00 : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000OOOOO"; 
INIT_0I: bit_vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOO"; 
INIT _ 02 : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOO"; 
!NIT_ 03 : bit_ vector(255 down to 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOO"; 
INIT _ 04 : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000OOOOOOO"; 
!NIT_ 05 : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000OOOOOOO"; 
INIT _ 06 : bit_ vector(255 down to 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOO"; 
INIT _ 07 : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000OOOOOOO"; 
INIT _ 08 : bit_ vector(255 down to 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OO0000000000000000OOOOOOOOO"; 
241 
INIT _ 09 : bit_ vector(255 downto 0) := 
X"OOOOOOO000000000000000000000000000000000000000000000000000000000"; 
INIT _ QA : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOO00000000000000000000000000000000000000000000000000"; 
INIT_0B: bit_vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOO00000000000000000000000000000000000000000000000"; 
INIT_0C: bit_vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOO00000000000000000000000000000000000000000000000"; 
INIT _ OD : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000000000000000"; 
INIT _ OE : bit_ vector(255 down to 0) := 
X''OOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000000000"; 
INIT _ OF : bit_ vector(255 downto 0) := 
X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000" 
); 
-- pragma synthesis_ on 
port ( 
DIA : in std_logic_vector(0 downto 0); 
DIB : in std_logic_vector (15 downto 0); 
ENA : in std_logic; 
£NB : in std_logic; 
WEA : in std_logic; 
WEB : in std_logic; 
RSTA : in std_logic; 
RSTB : in std _logic; 
CLKA : in std_logic; 
CLKB : in std _logic; 
ADDRA : in std_logic_vector (11 downto 0); 
ADD RB : in std _logic_ vector (7 downto 0); 
DOA : out std _logic_ vector (0 down to 0); 
DOB : out std _logic_ vector ( 15 down to 0) 
); 
end component; 
attribute INIT _ 00: string; 
attribute INIT _ 0 I: string; 
attribute INlT _ 02: string; 
attribute INIT _ 03: string; 
attribute INIT _ 04: string; 
attribute INIT_05: string; 
attribute INIT _ 06: string; 
attribute INIT_07: string; 
attribute lNIT _ 08: string; 
attribute INIT_09: string; 
attribute lNIT _ 0A: string; 
attribute lNIT_0B: string; 
attribute INIT _ QC: string; 
attribute lNIT _ OD: string; 
attribute INIT _ OE: string; 
attribute INIT_0F: string; 
attribute INIT 00 ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 01 ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
242 
attribute INIT_02 ofRAMB4: label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 03 of RAMB4: label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_04 of RAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_05 ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_06 ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 07 ofRAMB4: label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 08 ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 09 ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB of RAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0C ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OF ofRAMB4: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
-- Signal Declarations: 
signal DIA_ TMP : std _logic_ vector(0 down to 0); -- to match RAMB4 input type 
signal BUS16_GND : std_logic_vector(l5 downto 0); 
signal GND : std _logic; 
begin 
GND <='0'; 
BUS 16_GND <= (others =>'0'); 
DIA_TMP(0) <= DIA; 
-- Select BlockRAM RAMB4 SI S 16 instantiation 
RAMB4: RAMB4_Sl_Sl6 
port map ( 
DIA => DIA_TMP, 




WEB => GND, 
RSTA => GND, 
RSTB => RSTB, 
CLK.A=>CLK, 
CLKB => CLK, 
ADDRA => ADDRA, 
ADDRB => ADDRB, 
DOA =>, 
243 
DOB => DOB); 
end INIT_RAMB4_S I_Sl6_arch; 
-- Init_ 8_RAMl6xls module 
library IEEE; 
use IEEE.std_logic_l 164.all; 






: in std _ logic_ vector(? down to 0) ; 
: in std_ logic_ vector(3 downto 0); -- Used by erase/write operation only 
: in std_ logic; -- if'l ' DATA_IN is WRITE in the RAM16xls 
: in std _logic; 
DATA WRITE 
); 
: out std_ logic_ vector(? down to 0) 
end INIT_8_RAM16x l s; 
architectureINIT 8 RAM16xls archoflNIT 8 RAM16xlsis 
- - -
component RAM l 6x Is_ I 
-- pragma synthesis_off 
generic( 
INIT: bit_ vector(l5 downto 0) := X"0000" 
); 











: in std _logic; 
: in std_logic; -- inverted Clock 
: in std _logic; 
: in std_logic; 
: in std_ logic; 
: in std _ logic; 
: in std_logic; 
: out std_logic 
end component; 
attribute INIT: string; 
attribute INIT of RAM ERASE 0: label is "0000"; 
attribute INIT of RAM - ERASE - 1: label is "0000" ; 
attribute INIT of RAM - ERASE - 2: label is "0000"; 
attribute INIT ofRAM- ERASE- 3: label is "0000" ; 
attribute INIT ofRAM- ERASE- 4: label is "0000"; 
attribute INIT ofRAM- ERASE- 5: label is "0000"; 
attribute INIT of RAM- ERASE - 6: label is "0000"; 
attribute INIT ofRAM=ERASE=7: label is "0000"; 
begin 
-- Select RAM instantiation = 8 x RAM I 6x Is 
RAM_ERASE_0: RAM16x l s_ l 
port map ( 
WE => WRITE_RAM, 
WCLK => CLK, 
D => DAT A _IN(0), 
AO => ADDR(0), 
244 
Al=> ADDR(l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_WRITE(O) 
); 
RAM_ERASE_ l: RAM16xls_ l 
port map ( 
WE => WRITE RAM 
- ' 
WCLK=>CLK, 
D => DATA_IN(l), 
AO=> ADDR(O), 
Al => ADDR(I), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_ WRITE(l) 
); 
RAM_ERASE_2: RAMl6xl s_l 
port map ( 
WE=> WRITE_RAM, 
WCLK => CLK, 
D => DATA_IN(2), 
AO => ADDR(O), 
Al => ADDR( l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_WRITE(2) 
); 
RAM_ERASE_3: RAMJ6x l s_l 
port map ( 
WE => WRITE_ RAM, 
WCLK => CLK, 
D => DATA_lN(3), 
AO => ADDR(O), 
Al => ADDR(l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_ WRITE(3) 
); 
RAM_ERASE_4: RAM16xls_l 
port map ( 
WE => WRITE_RAM, 
WCLK => CLK, 
D => DATA_IN(4), 
AO => ADDR(O), 
Al=> ADDR(l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_ WRITE(4) 
); 
RAM_ERASE_S: RAM 16xl s_ l 
port map ( 
245 
WE=> WRITE RAM 
WCLK => CLK~ , 
D => DATA_IN(S), 
AO => ADDR(0), 
Al => ADDR(l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0=> DATA_WRJTE(S) 
); 
RAM_ERASE_6: RAM16xls_ l 
port map ( 
WE=> WRJTE_RAM, 
WCLK=>CLK, 
D => DATA_IN(6), 
AO => ADDR(0), 
Al=> ADDR{l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_ WRITE(6) 
); 
RAM_ERASE_7: RAM16xls_ l 
port map ( 
WE=> WRJTE RAM 
- , 
WCLK=>CLK, 
D => DATA_IN(7), 
AO => ADDR(O), 
Al=> ADDR(l), 
A2 => ADDR(2), 
A3 => ADDR(3), 
0 => DATA_ WRITE(?) 
); 
end INIT_8_RAM16x l s_arch; 
-- 32 to 5 encoder 
library IEEE; 
use IEEE.std_logic_l 164.all; 





: in std_logic_ vector(3 l downto 0); 
: out std _logic_ vector( 4 down to 0); -- Match address found 
: out std _logic -- 'I' if MATCH found 
); 
end entity ENCODE_ 4_LSB; 
architecture ENCODE_ 4_LSB_arch ofENCODE_4_LSB is 
begin 
GENERATE_ADDRESS: process (BINARY_ADDR) 
begin 
case BINARY_ADDR(31 downto 0) is 
when "00000000000000000000000000000001" => MATCH ADDR <= "00000"; 
when "00000000000000000000000000000010" => MATCH=ADDR <= "00001"; 
when "00000000000000000000000000000100" => MATCH ADDR <= "00010"· - ,
when "00000000000000000000000000001000" =>MATCH_ ADDR <= "000 11 "; 
246 
when "00000000000000000000000000010000" => MATCH ADDR <= "00 100"· 
when "00000000000000000000000000 I 00000" => MATCH - ADDR <= "00 IO I"'. 
when "0000000000000000000000000 l 000000" => MATCH - ADDR <= "0011 0" '. 
when "000000000000000000000000 I 0000000" =>MATCH-ADDR <= "0011 1 "'. 
when "00000000000000000000000 I 00000000" => MATCH - ADDR <= "O I 000" '. 
when "0000000000000000000000 I 000000000" => MATCH-ADDR <= "O I 00 I,,'. 
when "00000000000000000000010000000000" => MATCH- ADDR <= "0101 0,,'. 
when "00000000000000000000 I 00000000000" => MATCH - ADDR <= "O IO 11 "'. 
when "0000000000000000000 l 000000000000" =>MATCH - ADDR <= "O 1100" '. 
when "000000000000000000 l 0000000000000" => MATCH- ADDR <= "0 110 I"'. 
- , 
when "00000000000000000100000000000000" => MATCH_ADDR <= "011 10"; 
when "00000000000000001000000000000000" =>MATCH_ ADDR <= "0 1111 "; 
when "000000000000000 I 0000000000000000" => MATCH _ADDR <= " I 0000" ; 
when "00000000000000100000000000000000" => MATCH_ADDR <= "10001"; 
when "00000000000001000000000000000000" => MATCH_ADDR <= "10010"; 
when "000000000000 I 0000000000000000000" => MATCH_ ADDR <= "10011 "; 
when "00000000000 l 00000000000000000000" => MATCH_ ADDR <= "10100"; 
when "00000000001000000000000000000000" => MATCH _ADDR <= " IO 10 I"; 
when "000000000 l 0000000000000000000000" => MATCH_ ADDR <= "10 11 O"; 
when "00000000 I 00000000000000000000000" => MATCH _ADDR <= "IO 111 "; 
when "0000000 I 000000000000000000000000" => MATCH_ ADD R <= "11 000"; 
when "000000 I 0000000000000000000000000" =>MATCH_ ADDR <= " 1100 I"; 
when "00000 l 00000000000000000000000000" =>MATCH_ ADDR <= " 110 l 0"; 
when "00001000000000000000000000000000" => MATCH_ADDR <= "110 11"; 
when "000 I 0000000000000000000000000000" => MATCH _ADDR <= "1 1100"; 
when "00 I 00000000000000000000000000000" =>MATCH_ ADDR <= "1110 l "; 
when "01000000000000000000000000000000" => MATCH_ADDR <= "111 10" ; 
when " l 0000000000000000000000000000000" => MATCH_ ADDR <= " 11 11 1"; 
when others => 
MATCH_ ADDR <= ( others => 'X'); 
end case; 
end process GENERA TE _ADDRESS; 
-- Generate the match signal if one or more matche(s) is/are found 
GENERATE_MATCH: process (BINARY_ADDR) 
begin 
if (BINARY_ ADDR = "00000000000000000000000000000000") then 
MATCH_OK <= 'O'; 
else 
MATCH_OK <= ' I'; 
end if; 
end process GENERA TE_ MATCH; 
end architecture ENCODE_ 4_LSB_arch; 
--First Stage Latch of Pipeline ESS 
library lEEE; 
use IEEE.std _logic_ 1164.all; 
entity FSL is 
port(clk: in std_logic; 
putin, getin, matchin: in std _logic; 
match inaddr: in std logic vector(4 downto 0); 
putoui:-getout, matchout: o;t std_logic; 
247 
match_outaddr: out std_ logic_vector(4 downto O)); 
end entity FSL; 
architecture FSL_beh ofFSL is 
signal psig, gsig: std_logic; 
begin 
FSL_Processl: process(clk, putin, getin) is 
begin 
if(rising_edge(clk)) then 
psig <= putin; 
gsig <= getin; 
end if; 
end process FSL_Process l ; 
FSL_Process2: process(clk, psig, gsig, matchin, match inaddr) is ~~ -
if ( falling_ edge( elk)) then 
putout <= psig; 
getout <= gsig; 
matchout <= matchin; 
match outaddr <= match inaddr 
- - , 
end if; 
end process FSL_Process2; 
end architecture FSL_beh; 
--Second Stage Pipeline ESS 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity Second_Stage is 
port(clk, clock, get, put, empin_fmTS, lexpd_fmTS , put_fmTS, matchin_fmTS: in std_logic; 
matchin _ fmFS: in std _logic; 
mataddr_fmFS: in std_logic_vector(4 downto O); 
matchout_sec, putout_sec, getout_sec: out std_logic; 
emptysig_out_sec, life_expd_out_sec, GF _out, PF _out: out std_logic; 
mux_outaddr_sec: out std_logic_vector(4 downto O)); 
end entity Second_Stage; 
architecture Second_ Stage_ heh of Second_ Stage is 
--Components 
component SSPE is 
port(clk, clock, get, put, empin_fmTS, lexpd_fmTS, put_fmTS, matchin_fmTS: in std_logic; 
matchin_fmFS: in std_ logic; 
mataddr finFS: in std logic vector(4 downto O); 
mux addrout: out std logic -vector(4 downto O); 
- - -
empty_out, life_expd_out, GF, PF: out std_logic); 
end component SSPE; 
component SSL is 
port( elk: in std logic; 
matchin_se;;-, putin_sec, getin_sec: in std_logic; 
emptysig_ in, life_ expd _ in, GF _in, PF _in: in std _logic; 
mux inaddr sec: in std logic vector(4 downto O); 
matchout_s~, putout_s~c, getout_sec: out std_logic; 
248 
emptysig_ out_sec, life_ expd _ out_ sec, GF _ out, PF_ out: out std _logic; 
mux_outaddr_sec: out std_logic_vector(4 downto 0)); 
end component SSL; 
--signals 
signal muxoutsig: std _ logic_ vector( 4 down to 0); 
signal emptyoutsig, lifeexpdsig, GF _sig, PF _sig: std_logic; 
begin 
SSPE_comp: SSPE port map(clk=>clk, clock=>clock, get=>get, put=>put, empin_ fmTS=>empin_fmTS, 
lexpd_fmTS=>lexpd_fmTS , put_fmTS=>put_fmTS, matchin_fmTS=>matchin_fmTS, 
matchin _frnFS=>matchin _ fmFS, mataddr _fmFS=>mataddr _fmFS, mux _ addrout=>muxoutsig, 
empty_ out=>emptyoutsig, life_ expd _ out=>lifeexpdsig, GF=>GF _ sig, PF=>PF _sig); 
SSL_comp: SSL port map(clk=>clk, matchin_sec=>matchin_fmFS, putin_sec=>put, getin_sec=>get, 
emptysig_ in=>emptyoutsig, I ife _ expd _ in=>lifeexpdsig, GF _ in=>GF _ sig, PF_ in=>PF _ sig, 
mux _inaddr _ sec=>muxoutsig, matchout_ sec=>matchout_sec, putout_sec=>putout_sec, 
getout_sec=>getout_ sec, emptysig_ out_sec=>emptysig_ out_sec, life_ expd _ out_ sec=>life _ expd _ out_ sec, 
GF _out=>GF _out, PF _out=>PF _out, mux_outaddr_sec=>mux_outaddr_sec); 
end architecture Second_ Stage_ beh; 
--Individual Components 
-- Second Stage of Pipeline ESS 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity SSPE is 
port(clk, clock, get, put, empin_frnTS, lexpd_fmTS , put_fmTS, matchin_fmTS: in std_logic; 
matchin_fmFS: in std_logic; 
mataddr frnFS: in std logic vector(4 downto 0); 
mux addrout: out std logic -vector(4 downto 0); 
empty_out, life_expd=out, GF, PF: out std_logic); 
end entity SSPE; 
architecture SSPE_beh ofSSPE is 
--components 
--empty ram 
component empram0 is 
port(addr: in std_logic_vector(4 downto 0); 
data_in_emp0: in std_ logic; 
data_out_emp0: out std_logic; 
ernp _loc _ addr: out std _logic_ vector( 4 downto 0); 
empout: out std _logic_ vector(3 l downto 0); 
elk: in std_logic; 
we_emp0: in std_logic); 
end component empram0; 
--check empty 
component empcount is 
port(emptysig: in std_logic_ vector(3 l downto 0); 
chk_empty: out std_logic); 
end component empcount; 
--mux address 
component mux 1 is 
port (al: in STD_LOGIC_ VECTOR (4 downto 0); 
249 
bl: in STD_LOGIC_ VECTOR (4 downto 0); 
s l: inSTD_LOGIC; 
y 1: out STD_ LOGIC_ VECTOR ( 4 downto 0) ); 
end component mux l ; 
--exp. time ram 
component exptime_ram is 
port( elk, we, en, rst: in std _logic; 
addr: in std _logic_ vector( 4 downto 0); 
din: in std _logic_ vector(? down to 0); 
<lout: out std _logic_ vector(? down to 0)); 
end component exptime _ram; 
--exp. time calc 
component exp_ calc is 
port( exp time _in: in std _logic_ vector(? downto 0); -- originally it has to be 10 bits bot now for checking 8 
bits 
clock, chklife: in std_logic; 
life_ expd: out std _logic; 
exptime_out: out std_logic_vector(7 downto 0)); 
end component exp_ calc; 
--signals 
signal int_mux_addr, empty_addr: std_logic_vector(4 downto 0); 
signal empout_ full: std _logic_ vector(3 l down to 0); 
signal expdatain, expdataout: std _ logic_ vector(? down to 0); 
signal empsig, data_outsig, we_exp_sig, we_emp_sig, life_expd_sig, ensig, rstsig, chksig: std_logic; 
begin 
mux _ addrout <= int_ mux _ addr; 
empty_out <= empsig; 
life_expd_out <= life_expd_sig; 
GF <= ( ((not(matchin_fmFS)) and get) or (matchin_fmFS and life_expd_sig and get)); 
PF <= ( (not(matchin_ frnFS)) and (not(empsig)) and put); 
we_exp_sig <= ( (put_fmTS and (not(matchin_fmTS)) and empin_fmTS) or (put_fmTS and matchin_fmTS 
and lexpd_fmTS) ); 
we_emp_sig <= ( (put_fmTS and matchin_fmTS) or (put_fmTS and (not(matchin_fmTS)) and 
empin _ fmTS) ); 
ensig <= 'l '; 
rstsig <= '0'; 
chksig <= matchin_fmFS; 
empram_comp: empramO port map(addr=>int_mux_addr, data_in_emp0=>empin_fmTS, 
data_ out_ emp0=>data _ outsig, emp _Joe _addr=>empty _addr, empout=>empout_ full , clk=>clk, 
we_ emp0=>we _ emp _ sig); 
empcnt_ comp: empcount port map( emptysig=>empout_ full, chk _ empty=>empsig); 
addnnux_comp: muxl port map(al=>empty_addr, bl=>mataddr_fmFS, sl=>matchin_fmFS , 
y l=>int_mux_addr); 
expram _ comp: exp time _ram port map( clk=>clk, we=>we _exp_ sig, en=>ensig, rst=>rstsig, 
addr=>int_ mux _addr, din=>expdatain, dout=>expdataout); 
expcalc_comp: exp_calc port map(exptime_in=>expdataout, clock=>clock, chklife=>chksig, 
life_ expd=>life _ expd _sig, exp time_ out=>expdatain); 
end architecture SSPE_beh; 
250 
--Individual components 
-- Exp Mem Design using Block RAM 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
entity exptime_ram is 
port( elk, we, en, rst: in std _logic; 
addr: in std _logic_ vector( 4 downto 0); 
din: in std _logic_ vector(? downto 0); 
dout: out std_logic_vector(7 downto 0)); 
end entity exptime_ram; 
architecture behaviour of exp time _ram is 
component RAMB4_S8 is 
port(ADDR: in std_logic_ vector(8 downto 0); 
CLK: in std_logic; 
DI: in std_logic_ vector(? downto 0); 
DO: out std _logic_ vector(? downto 0); 
EN, RST, WE: in std_logic); 
end component RAMB4_S8; 
signal msbaddr: std _logic_ vector(3 downto 0); 
signal addr _ expram: std _logic_ vector(8 down to 0); 
begin 
msbaddr <= "0000"; 
addr _ ex pram <= msbaddr & addr; 
ram0: RAMB4_S8 port map(ADDR=>addr_expram, CLK=>clk, Dl=>din, DO=>dout, EN=>en, 
RST=>rst, WE=>we); 
end architecture behaviour; 
-- EXPIRATION TIME CALCULATION MODULE 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std_ logic_ unsigned.al I; 
entity exp_calc is 
port( exp time _in: in std _logic_ vector(? downto 0); -- originally it has to be IO bits bot now for checking 8 
bits 
clock, chklife: in std _logic; 
life_expd: out std_logic; 
exptime_out: out std_logic_vector(7 downto 0)); 
end entity exp_ calc; 
architecture expcalc _ beh of exp_ calc is 
signal gcrsig: std _logic_ vector(? downto 0); 
begin 
expcalcprocess:process(gcrsig, exptime_in, chklife) is 
begin 
if( chklife = 'l ') then 
if(gcrsig <= exptime_in) then 
251 
life_expd <= '0'; -- life time is not expired 
else 
life_expd <= 'l'; -- life time expired 
end if; 
else 
life_expd <= '0'; -- default 
end if; 
end process expcalcprocess; 
gcrprocess:process( clock, gcrsig) is 
variable tau: std _logic_ vector(7 downto 0); 
begin 
tau:= "00001111"; 
exptime_out <= gcrsig + tau; -- new life time 
if (rising_ edge( clock)) then 
gcrsig <= gcrsig + I; 
end if; 
end process gcrprocess; 
end architecture expcalc _ heh; 
-- EMPTY RAM Module 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic _arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity empram0 is 
port(addr: in std_logic_vector(4 downto 0); 
data_in_emp0: in std_logic; 
data_out_emp0: out std_logic; 
emp_loc_addr: out std_logic_vector(4 downto 0); 
empout: out std _logic_ vector(3 l downto 0); 
elk: in std _logic; 
we_emp0: in std_logic); 
end entity empram0; 
architecture behavioural of empram0 is 
-- function for getting integer 
function getint(signal data: std_logic_vector) return integer is 
variable count: integer range 0 to 32; 
begin 
for i in data'range loop 





end function getint; 
type mem _an·ay is array(0 to 31) of std _logic; 
signal emptyout: std _logic_ vector(3 l down to O); 
signal empty_mem :mem_array; 
signal address: integer; 
signal emploc: integer range 0 to 32; 
begin 
252 
address<= conv _integer(addr); 
emploc <= getint(emptyout); 
emp _ Joe_ addr <= conv _std_ logic_ vector( emploc, 5); 
mb e~_process:process(clk, addr, we_emp0, data_in_emp0, empty_mem, emptyout) is 
egm 
if (rising_ edge( elk)) then 
if (we_ emp0 = ' I ') then 
empty_mem(address) <= data_in_emp0; 
else 
data_out_emp0 <= empty_mem(address); 
end if; 
end if; 
for i in 31 downto 0 loop 
emptyout(i) <= empty_mem(i); 
end loop; 
empout <= emptyout; 
end process mem _process; 
end architecture behavioural; 
-- Count the number for zeros for empty location 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _ logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity empcount is 
port( emptysig: in std _ logic_ vector(3 l downto 0); 
chk_empty: out std_logic); 
end entity empcount; 
architecture empcnt_ beh of empcount is 
signal c: std_logic; 
begin 
process( emptysig, c) is 
begin 
c <= ((emptysig(3 J) and emptysig(30) and emptysig(29) and emptysig(28)) and (emptysig(27) and 
emptysig(26) and emptysig(25) and emptysig(24)) and (emptysig(23) and emptysig(22) and emptysig(2 l) 
and emptysig(20)) and ( emptysig( 19) and emptysig(l 8) and emptysig( 17) and emptysig(l 6)) and 
(emptysig(l5) and emptysig( l4) and emptysig(J 3) and emptysig( l2)) and (emptysig( l I) and emptysig( lO) 
and emptysig(9) and emptysig(8)) and (emptysig(7) and emptysig(6) and emptysig(5) and emptysig(4)) and 
(emptysig(3) and emptysig(2) and emptysig( l) and emptysig(O))); 
chk_empty <= not (c); 
end process; 
end architecture empcnt_ beh; 
--MUX for address 
library IEEE; 
use IEEE.std _logic_ l I 64.all; 
entity mux l is 
port (al: in STD_LOGIC_ VECTOR (4 downto O); 
253 
b l : in STD_LOGIC_VECTOR (4 downto O); 
s I: in STD_ LOGIC; 
y l : out STD_LOGIC_ VECTOR (4 downto 0) ); 
end entity muxl; 
architecture mux arch I of mux I is 
begin -
process (al, bl, s l ) 
begin 
case s l is 
when 'O' => yl <= al; 
when 'l' => y I <= b I; 
when others => yl <=(others=> 'O'); 
end case; 
end process; 
end mux_archl ; 
--Second Stage Latch of Pipeline ESS 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity SSL is 
port(clk: in std_logic; 
matchin_sec, putin_sec, getin_sec: in std_logic; 
emptysig_ in, life_ expd _in, GF _ in, PF _in: in std _logic; 
mux_inaddr_sec: in std_logic_vector(4 downto O); 
matchout_sec, putout_sec, getout_sec: out std_logic; 
emptysig_out_sec, life_expd_out_sec, GF _out, PF _out: out std_logic; 
mux _ outaddr _ sec: out std _ logic_ vector( 4 downto O)); 
end entity SSL; 
architecture SSL heh of SSL is 
signal m2sig, p2sig, g2sig, e2sig, 12sig, GF2sig, PF2sig: std _logic; 
signal maddr2_sig: std_logic_ vector(4 downto O); 
begin 
SSL_Processl: process(clk, matchin_sec, putin_sec, getin_sec, emptysig_in, life_expd_in, GF _in, PF _in, 
mux_inaddr_sec) is 
begin 
if (rising_ edge( elk)) then 
m2sig <= matchin _ sec; 
p2sig <= put in_ sec; 
g2sig <= getin _ sec; 
e2sig <= emptysig_in; 
--12sig <= life_expd_in; 
GF2sig <= GF _in; 
PF2sig <= PF_ in; 
maddr2_sig <= mux_inaddr_sec; 
end if; 
end process SSL_Processl; 




matchout_sec <= m2sig; 
putout_ sec <= p2sig; 
getout_ sec <= g2sig; 
emptysig_out_sec <= e2sig; 
life_ expd _ out_ sec <= life_ expd _ in; 
GF _ out <= GF2sig; 
PF _out<= PF2sig; 
mux_outaddr_sec <= maddr2_sig; 
end if; 
end process SSL_Process2; 
end architecture SSL_ heh; 
--Third Stage of Pipeline ESS 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity TSPE is 
port( elk, get, put: in std _logic; 
GF _fmsec, PF _fmsec, empty_fmsec, lifeexpd_fmsec, match_fmsec: in std_logic; 
value_in: in std_logic_vector(63 downto 0); 
muxaddr _ fmsec: in std _ logic_ vector( 4 down to 0); 
GFOUT, PFOUT, ESSFULL, le: out std logic; 
OUTVALUE: out std_logic_ vector(63 d-;wnto 0)); 
end entity TSPE; 
architecture TSPE_beh ofTSPE is 
--components 
component ram_ val is 
port(clk, we_val, en, rst: in std_logic; 
addr: in std _logic_ vector( 4 down to 0); 
data_in_ val: in std_logic_ vector(63 downto 0); 
data_out_val: out std_logic_vector(63 downto 0)); 
end component ram_val; 
component mux is 
port(a: in STD_ LOGIC_ VECTOR (63 downto O); 
b: in STD_LOGIC_ VECTOR (63 downto 0); 
s: in STD_LOGIC; 
y: out STD_LOGIC_ VECTOR (63 downto 0) ); 
end component mux; 
--signals 
signal zerosig, muxvalout, OUTsig: std_logic_vector(63 downto O); 
signal rst_ zero, en_ one, wesig, ggfsig: std _logic; 
begin 
zerosig <= (others => 'O'); 
rst zero <= '0'· 
- ' 
en one <= 'l '· 
w~ig <= ( (g~t and match_fmsec and lifeexpd_fmsec) or (get and (not(match_fmsec))) or (put and 
match_fmsec) or (put and (not(match_fmsec)) and empty_fmsec) ); 
--outputs 
GFOUT <= GF _fmsec; 
PFOUT <=PF_ fmsec; 
255 
le<= lifeexpd_fmsec; 
ESSFULL <= (not(empty_fmsec)); 
ggfsig <= GF _fmsec or (not(get)); 
muxval_comp: mux port map(a=>value_in, b=>zerosig, s=>GF _fmsec, y=>muxvalout); 
muxout_ comp: mux port map(a=>OUTsig, b=>zerosig, s=>ggfsig, y=>OUTV ALUE); 
val ram_ comp: ram_ val port map( clk=>clk, we_ val=>wesig, en=>en _ one, rst=>rst_ zero, 
addr=>muxaddr _fmsec, data _in_ val=>muxvalout, data_ out_ val=>OUTsig); 
end architecture TSPE_beh; 
--Individual Components 
-- For VALUE RAM 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity ram_ val is 
port( elk, we_ val, en, rst: in std _logic; 
addr: in std _logic_ vector( 4 down to 0); 
data_in_val: in std_logic_vector(63 downto 0); 
data_out_val: out std_logic_vector(63 downto 0)); 
end entity ram_ val; 
architecture rarnval_ behave of ram val is 
component valram is 
port(clk, we, en, rst: in std_logic; 
addr: in std _logic_ vector( 4 down to 0); 
din: in std _logic_ vector( 15 down to 0); 
dout: out std_logic_vector(l 5 downto 0)); 
end component valram; 
begin 
valueram0: val ram port map( clk=>clk, we=>we _ val, en=>en, rst=>rst, addr=>addr, din=>data _in_ val( 15 
downto 0), dout=>data _ out_ val( 15 downto 0)); 
valueram I: valram port map( clk=>clk, we=>we _ val, en=>en, rst=>rst, addr=>addr, din=>data _in_ val(3 l 
downto 16), dout=>data_out_val(31 downto 16)); 
valueram2: valram port map(clk=>clk, we=>we_val, en=>en, rst=>rst, addr=>addr, din=>data_in_val(47 
downto 32), dout=>data_out_val(47 downto 32)); 
valueram3: valram port map(clk=>clk, we=>we_ val, en=>en, rst=>rst, addr=>addr, din=>data_in_ va1(63 
downto 48), dout=>data_ out_ val(63 downto 48)); 
end architecture ram val_ behave; 
-- Value Mem Design using B lock RAM 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity valram is 
port( elk, we, en, rst: in std _ logic; 
addr: in std_logic_ vector(4 downto 0); 
din: in std _logic_ vector(l 5 down to 0); 
dout: out std _logic_ vector(l 5 downto 0)); 
end entity valram; 
256 
architecture behave_ valram of valram is 
component RAMB4 _ S 16 is 
port(ADDR: in std _ logic_ vector(? downto 0); 
CLK: in std_logic; 
DI: in std_logic_vector(l5 downto O); 
DO: out std_logic_ vector(l5 downto O); 
EN, RST, WE: in std_logic); 
end component RAMB4_S 16; 
signal msbvaladdr: std _logic_ vector(2 downto 0); 
signal addr _ val ram: std _logic_ vector(? downto 0); 
begin 
msbvaladdr <= "000"; 
addr _ valram <= msbvaladdr & addr; 
ramO: RAMB4 _ S 16 port map(ADDR=>addr _ valram, CLK=>clk, DI=>din, DO=>dout, EN=>en, 
RST=>rst, WE=>we); 
end architecture behave_ valram; 
--MUX for VALUE 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity mux is 
port(a: in STD_LOGIC_ VECTOR (63 downto 0); 
b: in STD_ LOGIC_ VECTOR (63 downto 0); 
s: in STD_LOGIC; 
y: out STD LOGIC VECTOR (63 downto 0) ); 
- -
end entity mux; 
architecture mux arch of mux is 
begin 
process (a, b, s) 
begin 
if ( s = '0') then 
y<= a; 
else y <= b; 
end if; 
end process; 
end architecture mux _ arch; 
-- ETM/LTC stage Regsiter 
library IEEE; 
use IEEE.std_ logic_ 1164 .all; 
use IEEE.std _ logic _arith.all ; 
use IEEE.std _logic_ unsigned.all; 
entity ex3_ex4_reg is 
port(clk, EX_Flush_in: in std_logic; 
braddrin: in std _logic_ vector( 15 down to 0); 
ctrlinEX: in std _logic_ vector(24 down to 0); 
257 
opinEX: in std_logic_ vector(5 downto 0); 
WB _in_ fm _ ex: in std _ logic_ vector(3 down to 0); 
RSl_in_fm_ex, RS2_in_fm_ex, RD_in_fm_ex, TR_in_fm_ex, VR_in_fm_ex: in std_logic_vector(4 
downto 0); 
aluout_fm_ex, pktout_fm_ex, GPRlin, GPr2in: in std_logic_vector(63 downto 0); 
braddrout: out std _logic_ vector( 15 downto 0); 
ctrloutEX: out std _logic_ vector(24 downto 0); 
opoutEX: out std_ logic_ vector(5 downto 0); 
aluout_to_wb, pktout_ to_wb, GPRlout, GPR2out: out std_logic_vector(63 downto 0); 
RSl_out_to_regs, RS2_out_to_regs, RD_out_to_regs, TR_out_to_regs, VR_out_to_regs: out 
std _logic_ vector( 4 down to 0); 
WB_out_fm_wb: out std_ logic_vector(3 downto 0)); 
end entity ex3_ex4_reg; 
architecture ex34_beh ofex3_ex4_reg is 
signal chkoutalu: std_ logic_ vector(63 downto 0); 
begin 
ex3process:process(clk, chkoutalu, braddrin, ctrlinEX, opinEX, WB_in_fm_ex, RD_in_fm_ex, 
TR_in_fm_ex , VR_in_fm_ex, aluout_fm_ex, pktout_fm_ex, GPR!in, GPR2in, EX_Flush_in) is 
begin 
if(falling_edge(clk)) then 
case EX Flush in is 
- -
when '0' => 
WB_out_fm_wb <= WB_in_fm_ex; 
RD_ out_ to _regs <= RD_ in_ fm _ ex; 
TR out to regs<= TR in fm_ex; 
vR-=_out-=_10-=_regs <= v(_i~fm_ex; 
aluout_to_wb <= chkoutalu; 
pktout_to_wb <= pktout_fm_ex; 
ctrloutEX <= ctrlinEX; 
opoutEX <= opinEX; 
RS I out to regs<= RSl _in_fm_ex; 
RS2=out=to=regs <= RS2_in_fm_ex; 
GPRlout <= GPRlin; 
GPR2out <= GPR2in; 
braddrout <= braddrin; 
when 'I ' => 
WB out fm wb <=(others => '0'); 
RD -out to regs <= (others=> '0'); 
TR - out- to-regs<= (others => '0'); 
VR - out to- regs<= (others=> '0'); 
alu~ut to ;b <= (others => 'O'); 
pktout=. to= wb <= ( others => '0'); 
ctrloutEX <=(others => '0'); 
opoutEX <=(others=> '0'); 
RS 1 out to regs <= (others => '0'); 
RS2- out-to-regs<= (others=> '0'); 
GPRlout <~(others=> '0'); 
GPR2out <= (others=> '0'); 
when others => null; 
end case; 
end if; 
end process ex3process; 
258 
chkprocess: process(chkoutalu, opinEX, aluout fm ex pktout fm ex) is 
begin - - ' - -
case opinEX is 
when "010101" => 
ch.koutalu <= pktout_ fm _ ex; 
when others => 
ch.koutalu <= aluout_ fm _ ex; 
end case; 
end process ch.kprocess; 
end architecture ex34_beh; 
5. LTC Stage 
-- L TC stage Top 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity ex4top is 
port(clk: in std_ logic; 
WBctrlin: in std_logic_vector{3 downto 0); 
out_ fm_alu: in std_ logic_vector(63 downto 0); 
RSlin, RS2in, VRin, VSTRD, VSTVRD: in std_logic_vector{4 downto 0); 
RDin_fm4, VRDin_fm4, TRDin_fm4: in std_logic_ vector(4 downto 0); 
op_ in: in std_logic_ vector(5 downto 0); 
GPRinl, GPRin2, PTin: in std_logic_vector(63 downto 0); 
brtype: in std _logic_ vector(2 down to 0); 
ccr_inp, ccr_ing: in std_logic; 
branch: out std_logic; 
WBctout: out std_logic_vector(3 downto 0); 
WBdataout: out std_logic_ vector(63 downto 0); 
WBRDout, WBVRDout, WBTRDout: out std_logic_vector{4 downto 0)); 
end entity ex4top; 
architecture ex4top_beh of ex4top is 
--components 
component ex4stage is 
port(op_in: in std_logic_vector(5 downto 0); 
RS I in, RS2in, VRin, VSTRD, VSTVRD: in std _logic_ vector( 4 downto 0); 
GPRinl, GPRin2, PTin: in std_logic_vector(63 downto 0); 
brtype: in std _logic_ vector(2 downto 0); 
ccr _ inp, ccr _ ing: in std _logic; 
branch: out std _logic); 
end component ex4stage; 
component ex4 _ ex5 _reg is 
port(clk: in std_logic; 
WBctrlin: in std_logic_vector(3 downto 0); 
out fm alu: in std logic vector(63 downto 0); 
RDin fui4 VRDi~ fm4- TRDin fm4: in std logic vector(4 downto 0); 
WBcuiut: ~ut std_l;gic~vector(3-downto O);- -
WBdataout: out std_ logic_ vector(63 down to 0); 
WBRDout, WBVRDout, WBTRDout: out std_logic_ vector(4 downto 0)); 
end component ex4_ ex5 _reg; 
begin 
259 
ex4stcomp: ex4stage port map( op _ in=>op _in, RS I in=>RS I in, RS2in=>RS2in, VRin=> VRin, 
VSTRD=>VSTRD, VSTVRD=>VSTVRD, GPRinl=>GPRinl, GPRin2=>GPRin2, PTin=>PTin, 
brtype=>brtype, ccr _ inp=>ccr _inp, ccr _ing=>ccr _ing, branch=>branch); 
ex4regcomp: ex4 _ ex5 _reg port map( clk=>clk, WBctrlin=>WBctrlin, out_ fm _ alu=>out_fm _ alu, 
RD in_ fm4=>RDin _ fm4, VRDin _ fm4=> VRDin _ fm4, TRDin _ fm4=> TRDin_ fm4, WBctout=>WBctout, 
WBdataout=>WBdataout, WBRDout=>WBRDout, WBVRDout=>WBVRDout, 
WBTRDout=>WBTRDout); 
end architecture ex4top _ beh; 
--Individual componenets 
-- L TC Module 
library IEEE; 
use IEEE.std_ logic_ l 164.all; 
entity ex4stage is 
port(op_in: in std_logic_vector(5 downto O); 
RSlin, RS2in, VRin, VSTRD, VSTVRD: in std_logic_vector(4 downto O); 
GPRinl , GPRin2, PTin: in std_logic_vector(63 downto O); 
brtype: in std _ logic_ vector(2 downto O); 
ccr_inp, ccr_ing: in std_ logic; 
branch: out std_logic); 
end entity ex4stage; 
architecture ex4_ beh of ex4stage is 
--componenets 
component bdetunit is 
port(brtype: in std_logic_vector(2 downto O); 
op_in: in std_logic_vector(5 downto O); 
RSI , RS2: in std_ logic_vector(63 downto O); 
ccr_inp, ccr_ ing: in std_logic; 
branch: out std _logic); 
end component bdetunit; 
component muxbr is 
port(GPRin, PTin: in std_ logic_ vector(63 downto O); 
Sbr: in std_logic; 
Brin: out std_ logic_vector(63 downto O)); 
end component muxbr; 
component fwd_ br is 
port(opcode in: in std_logic_vector(5 downto O); . 
RSI in, RS2in, VRin, VSTRD, VSTVRD: in std_ log1c_vector(4 downto 0); 
Sbrl_out, Sbr2_out: out std_ logic); 
end component fwd_ br; 
--signals 
signal brRS !in, brRS2in: std_logic_ vector(63 downto 0); 
signal Sbr 1 sig, Sbr2sig: std _logic; 
begin 
bcomp: bdetunit port map(brtype=>brtype, op_in=>op_in, RS l=>brRS I in, RS2=>brRS2in, 
ccr _ inp=>ccr _ inp, ccr _ing=>ccr _ ing, branch=>branch); 
260 
mbcomp I: muxbr po11 map(GPRin=>GPRinl , PTin=>PTin, Sbr=>Sbr I sig, Brin=>brRS l in); 
mbcomp2: muxbr port map(GPRin=>GPRin2, PTin=>PTin, Sbr=>Sbr2sig, Brin=>brRS2in); 
fcomp: fwd_ br port map( opcode_ in=>op _ in, RS I in=>RS l in, RS2in=>RS2in, VRin=> VRin, 
VSTRD=>VSTRD, VSTVRD=>VSTVRD, Sbrl_out=>Sbrlsig, Sbr2_out=>Sbr2sig); 
end architecture ex4_beh; 
--Individual Componenets 
-- Branch Detect Unit 
library IEEE; 
use IEEE.std_logic_l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all; 
entity bdetunit is 
port(brtype: in std_logic_vector(2 downto O); 
op_ in: in std_logic_ vector(5 downto O); 
RSI , RS2: in std_logic_vector(63 downto 0); 
ccr inp, ccr ing: in std logic; 
cZr_inp, cZr_ing, ove~flow, jumpin, retin: in std_logic; 
jumpout, retout: out std_logic; 
branch, ID _Flush_ br: out std _logic); 
branch: out std_logic); 
end entity bdetunit; 
architecture bdetunit_ beh of bdetunit is 
-- comparator function 
function compare(signal a, b: std_logic_vector) return std_logic is 
variable equal: std_logic; 
variable res or: std logic; 
variable res= xor: std _logic_ vector(63 downto 0); 
begin 
res_or := 'O'; 
res_xor := a xor b; 
for i in 63 downto O loop 
res_or := res_or or res_xor(i); 
end loop; 
equal := not (res_ or); 
return equal; 
end function compare; 
signal temp, br_sig: std_logic; 
signal zerosig: std _logic_ vector( 63 down to O); 
begin . . . . . ) . 
brprocess:process(brtype, RSI , RS2, temp, zeros1g, br_s1g, ccr_mp, ccr_mg, op_m 1s 
begin 
zerosig <=(others => '0'); 
case brtype is 
when "00 I" => --BRNE 
temp<= compare(RS 1, RS2); 
br_sig <= not(temp); 
261 
when "01 0" => --BREQ 
temp <= compare(RS l , RS2); 
br _ sig <= temp; 
when "O 11" => --BGE 
if(RS I >= RS2) then 
temp <= ' I '; 
else 
temp <= 'O'; 
end if; 
br_sig <= temp; 
when " I 00" => -- BNEZ 
temp <= compare(RS I, zerosig); 
br_sig <= not(temp); 
when "IOI" => --BEQZ 
temp <= compare(RS I , zerosig); 
br _ sig <= temp; 
when " 110" => -- BGF 
temp <= ccr _ ing; 
br_sig <= temp; 
when "1 11" => -- BPF 
temp<= ccr_inp; 
br_sig <= temp; 
when "000" => -- BLT 
if( op _ in = "100011 ") then 
if(RS I < RS2) then 
temp <='!'; 
else 
temp <= 'O'; 
end if; 
br_sig <= temp; 
else 
temp <= 'O'; 
br_sig <= temp; 
end if; 
when others => 
br_sig <= 'O'; 
temp <= 'O'; 
end case; 
branch <= br_sig; 
end process brprocess; 
end architecture bdetunit_ beh; 
-- MUX used as BR. Det unit in mux 
library IEEE; 
use IEEE.std _logic_ l I 64.all; 
262 
entity muxbr is 
port(GPRin, PT in: in std _logic_ vector( 63 down to O); 
Sbr: in std_logic; 
Brin: out std _logic_ vector(63 down to O)); 
end entity muxbr; 
architecture muxbr beh of muxbr is 
signal brsig: std _logic_ vector(63 downto O); 
begin 
process(GPRin, PTin, Sbr, brsig) is 
begin 
case Sbr is 
when 'O' => brsig <= GPRin; 
when 'I' => brsig <= PT in; 
when others => brsig <= brsig; 
end case; 
Brin <= brsig; 
end process; 
end architecture muxbr _ beh; 
-- Simple FWD unit for Br.Det 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity fwd_ br is 
port(opcode_in: in std_logic_vector(5 downto O); 
RSlin, RS2in, VRin, VSTRD, VSTVRD: in std_logic_vector(4 downto O); 
Sbrl_out, Sbr2_out: out std_logic); 
end entity fwd_ br; 
architecture fwd_br_beh offwd_br is 
begin 
blp:process(opcode_in, RS I in, VRin, VSTRD, VSTVRD) is 
begin if(opcode_in = "O 101 l l" or opcode_in = "O 11000" or opcode_in = "01 l001" or opcode_in = "011010" or 
opcode_ in= "O 11011" or opcode_ in =" 100011 ") then 
if( (RS I in /= "00000" and RS l in = VSTRD) or (VRin /= "00000" and VRin = VSTVRD) ) then 
Sbrl_out <= ' I'; 
else 
Sbrl_out <= 'O'; 
end if; 
else 
Sbrl_out <= 'O'; 
end if; 
end process blp; 
b2p:process(opcode_in, RS2in, VRin, VSTRD, VSTVRD) is 
begin if( opcode _in = "0 IO l 11" or opcode _in= "O 11000" or opcode _in = "O 1100 I" or opcode_ in = "O 110 IO" or 
opcode_in = "O llOl l " or opcode_in = "100011 ") then 
if( (RS2in /= "00000" and RS2in = VSTRD) or (VRin /= "00000" and VRin = VSTVRD) ) then 




Sbr2_out <= 'O'; 
end if; 
else 
Sbr2_out <= 'O'; 
end if; 
end process b2p; 
end architecture fwd_br_beh; 
-- L TC/UD stage reg 
library IEEE; 
use IEEE.std_ logic _1164.all; 
entity ex4 _ exS _reg is 
port( elk: in std _logic; 
WBctrlin: in std _logic_ vector(3 downto O); 
out_fm_alu: in std_logic_ vector(63 downto O); 
RDin_fm4, VRDin_fm4, TRDin_fm4: in std_logic_vector(4 downto O); 
WBctout: out std _logic_ vector(3 downto O); 
WBdataout: out std _ logic_ vector( 63 down to O); 
WBRDout, WBVRDout, WBTRDout: out std_logic_ vector(4 downto O)); 
end entity ex4_exS_reg; 
architecture ex4S_beh ofex4_exS_reg is 
begin 
process(clk, WBctrlin, out_ fm_alu, RDin_fm4, VRDin_fm4, TRDin_fm4) is 
begin 
if(falling_ edge( elk)) then 
WBctout <= WBctrlin; 
WBdataout <= out_fm_alu; 
WBRDout <= RDin_fm4; 
WBVRDout <= VRDin_fm4; 
WBTRDout <= TRDin_fm4; 
end if; 
end process; 
end architecture ex4S _ beh; 
6. UDSTAGE 
-- UD Stage Top 
library IEEE; 
use IEEE.std _logic_ 11 64 .all; 
entity stages is 
port(WB inl: in std logic; 
aluout fm ex essout fm stS: in std logic vector(63 downto O); 
dataout: o~t s;d_logi~ ve~tor(63 do~to O)); 
end entity stages; 
architecture stages_ beh of stages is 
component wbstage is 
port(WB inl fm exreg: in std_logic; 
aluout fm - exreg, essout_ fm _ exreg: in std_ logic_ vector( 63 down to 0); 
datao;tfm~b: out std_logic_ vector(63 downto O)); 
end component wbstage; 
264 
begin 
st5comp: wbstage port map(WB_inl_fm_exreg=>WB inl, aluout fm exreg=>aluout fro ex, 
essout_fm _ exreg=>essout_fm_st5, dataoutfmwb=>dat-;;-out); - - - -
end architecture stage5 _ beh; 
-- UD ST AGE MUX 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
entity wbstage is 
port(WB_inl_fm_exreg: in std_ logic; 
aluout_fm_exreg, essout_fm_exreg: in std_logic_vector(63 downto 0); 
dataoutfmwb: out std_logic_vector(63 downto 0)); 
end entity wbstage; 
architecture wbstage _ beh of wbstage is 
signal s6wbmux: std_logic; 
signal Write_data_out_fmwb: std_logic_vector(63 downto 0); 
begin 
s6wbmux <= WB _ in I_ fm _ ex reg; 
s6process:process(s6wbmux, aluout_ fm _ ex reg, essout_ fm _ exreg, Write_ data_ out_fmwb) is 
begin 
case s6wbmux is 
when '0' => Write_data_out_fmwb <= essout_fm_exreg; 
when' I' => Write_ data_ out_ fmwb <= aluout_fm _ exreg; 
when others=> null; 
end case; 
dataoutfmwb <= Write_data out fmwb; 
end process s6process; 
end architecture wbstage _ beh; 
7. MACRO CONTROLLER 
library IEEE; 
use lEEE.std_logic_ l 164 .all; 
entity topmac is 
port(ESPR on, EOP, crcchkin, essfullin, locchk, elk: in std_logic; 
macop: in std _logic_ vector(? down to 0); -- decoded 3to8 macro opcode 
fmmactrlrout: out std_logic_vector(l5 downto O); 
incr_pc, macctl: out std_logic); 
end entity topmac; 
architecture topmac_beh oftopmac is 
component macctrl is 
port(ESPR on, EOP, crcchkin, essfullin, Jocchk, elk: in std_logic; 
dee macop: in std logic vector(? downto 0); -- decoded 3to8 macro opcode 
fmO~ fm J, fm2, fm3, fmO, fmT, fmf, fmC, fmA, fmRC: out std _logic; -- for "fmmacctrlr" 
macctl, incr_pc: out std_logic); 
end component macctrl; 
component fmm is 
265 
port(fm0in, fmlin, fm2in, fm_3in, fmOin, fmTin, fmfin, fmCin, fmAin, fmRCin: in std logic; 
fmmactrlrout: out std _logic_ vector( 15 downto 0)); -
end component fmm; 
component dec_3to8 I is 
port(inp: in std _logic_ vector(2 down to 0); 
outp: out std _logic_ vector(? down to 0)); 
end component dee_ 3 to8 l ; 
signal insig0, insigl, asig, fsig, osig, Tsig, FMsig, Csig, ACsig, RCsig: std logic; 
signal macop I: std _logic_ vector(? down to 0); -
begin 
maccomp: macctrl port map(ESPR on=>ESPR on, EOP=>EOP crcchkin=>crcchkin essfullin=>essfullin 
locchk=>locchk, clk=>clk, dee _ma~op=>macopl, fmO=>insigo,'rm 1 =>insig 1, fm2=;asig, fm3=>fsig, ' 
fmO=>osig, fmT=>Tsig, fmF=>FMsig, fmC=>Csig, fmA=>ACsig, fmRC=>RCsig, macctl=>macctl, 
incr _pc=>incr _pc); 
fmmcomp: fmm port map(fmOin=>insig0, fin l in=>insig 1, fm2in=>asig, fm3in=>fsig, fmOin=>osig, 
fmTin=>Tsig, fmFin=>FMsig, fmCin=>Csig, fmAin=>ACsig, fmRCin=>RCsig, 
fmmactrlrout=>fmmactrlrout); 
decodecomp: dec_3to8 I port map(inp=>macop(2 downto 0), outp=>macop l); 
end architecture topmac_beh; 
-- MACRO CONTROLLER 
library IEEE; 
use lEEE.std_logic_l 164.all; 
entity macctrl is 
port(ESPR_on, EOP, crcchkin, essfullin, locchk, elk: in std_logic; 
dee _macop: in std _logic_ vector(? down to 0); -- decoded 3to8 macro opcode 
fm0, fml, fm2, fm3, fmO, fmT, fmF, fmC, fmA, fmRC: out std_logic; -- for "fmmacctrlr" 
macctl, incr_pc: out std_ logic); 
end entity macctrl; 
architecture macctrl beh of macctrl is 
component FD is 
port{D, C: in std_logic; 
Q: out std_logic); 
end component FD; 
signal startespr, st0, st0_bar, st!, stl_bar: std_logic; 
signal md0, mdl, md2, md3, md4, mdS , md6, md7, md8, md9, mdA, mdB, mdC, mdD, mdE, mdF: 
std_logic; 
signal md!0, mdl 1, mdl2, mdl3, md\4, mdlS, md16, md1 7, mdl8, mdl9, mdlA, mdlB, mdlC, md!D, 
md!E, mdlF: std_logic; 
signal md20, md21, md22, md23, md24, md25, md26, md27, md28, md29, md2A, md2B, md2C, md2D, 
md2E, md2F: std logic; 
signal md30, md31 , md32, md33, md34, md35, md36, md37, md38, md39, md3A, md3B, md3C, md3D, 
md3E, md3F: std logic; 
signal md40, md41 , md42, md43, md44, md45, md46, md47, md48, md49, md4A, md4B, md4C, md4D, 
md4E, md4F: std logic; 
signal md50, rod.SI, md52, md53, md54, md55, md56, md57, md58, md59, md5A, mdSB, mdSC, mdSD, 
md5E, md5F: std_logic; 
266 
signal md60, md61 , md62, md63, md64, md65, md66, md67, md68, md69, md6A, md6B, md6C, md6D, 
md6E, md6F: std_logic; 
signal md70, md71 , md72, md73, md74, md75, md76, md77, md78, md79, md7A, md7B, md7C, md7D, 
md7E, md7F: std_logic; 
signal md80, md81 , md82, md83, md84, md85, md86, md87, md88, md89, md8A, md8B, md8C, md8D, 
md8E, md8F: std_logic; 
signal md90, md91, md92, md93, md94, md95, md96, md97, md98, md99, md9A, md9B, md9C, md9D, 
md9E, md9F: std_logic; 
signal mdAO, mdAI , mdA2, mdA3, mdA4, mdAS, mdA6, mdA7, mdA8, mdA9, mdAA, mdAB, mdAC, 
mdAD, mdAE, mdAF: std_logic; 
signal mdBO, mdB I , mdB2, mdB3, mdB4, mdBS, mdB6, mdB7, mdB8, mdB9, mdBA, mdBB, mdBC, 
mdBD, mdBE, mdBF: std_logic; 
signal mdCO, mdCl , mdC2, mdC3, mdC4, mdC5, mdC6, mdC7, mdC8, mdC9, mdCA, mdCB, mdCC, 
mdCD, mdCE, mdCF: std_logic; 
signal mdDO, mdDl , mdD2, mdD3, mdD4, mdDS, mdD6, mdD7, mdD8, mdD9, mdDA, mdDB, mdDC, 
mdDD, mdDE, mdDF: std_logic; 
signal mtO, mtl , mt2, mt3, mt4, mtS, mt6, mt7, mt8, mt9, mtA, mtB, mtC, mtD, mtE, mtf: std logic; 
signal mtlO, mtl l , mtl2, mtl3, mtl4, mt!S, mtl6, mtl7, mtl8, mtl9, mt!A, mtlB, mt!C, mtlD, mtlE, 
mt!F: std_logic; 
signal mt20, mt21 , mt22, mt23, mt24, mt25, mt26, mt27, mt28, mt29, mt2A, mt2B, mt2C, mt2D, mt2E, 
mt2F: std_logic; 
signal mt30, mt31 , mt32, mt33, mt34, mt35, mt36, mt37, mt38, mt39, mt3A, mt3B, mt3C, mt3D, mt3E, 
mt3F: std_logic; 
signal mt40, mt41, mt42, mt43, mt44, mt45, mt46, mt47, mt48, mt49, mt4A, mt4B, mt4C, mt4D, mt4E, 
mt4F: std_logic; 
signal mtSO, mtS l, mt52, mt53, mt54, mtSS, mt56, mt57, mt58, mt59, mtSA, mtSB, mtSC, mtSD, mtSE, 
mtSF: std_logic; 
signal mt60, mt6 I, mt62, mt63, mt64, mt65, mt66, mt67, mt68, mt69, mt6A, mt6B, mt6C, mt6D, mt6E, 
mt6F: std _logic; 
signal mt70, mt7 1, mt72, mt73, mt74, mt75, mt76, mt77, mt78, mt79, mt7A, mt7B, mt7C, mt7D, mt7E, 
mt7F: std logic; 
signal mt80, mt81 , mt82, mt83, mt84, mt85, mt86, mt87, mt88, mt89, mt8A, mt8B, mt8C, mt8D, mt8E, 
mt8F: std _logic; 
signal mt90, mt9 1, mt92, mt93, mt94, mt95, mt96, mt97, mt98, mt99, mt9A, mt9B, mt9C, mt9D, mt9E, 
mt9F: std logic; 
signal mtAO, mtA I , mtA2, mtA3, mtA4, mtAS, mtA6, mtA7, mtA8, mtA9, mtAA, mtAB, mtAC, mtAD, 
mtAE, mtAF: std logic; 
signal mtBO, mtBl , mtB2, mtB3, mtB4, mtBS , mtB6, mtB7, mtB8, mtB9, mtBA, mtBB, mtBC, mtBD, 
mtBE, mtBF: std logic; 
signal mtCO, mtC I , mtC2, mtC3, mtC4, mtCS, mtC6, mtC7, mtC8, mtC9, mtCA, mtCB, mtCC, mtCD, 
mtCE, mtCF: std logic; 
signal mtDO, mtDI, mtD2, mtD3, mtD4, mtDS, mtD6, mtD7, mtD8, mtD9, mtDA, mtDB, mtDC, mtDD, 
mtDE, mtDF: std_logic; 
signal crc_bar, loc_bar, ess_bar, eopbar, incr_pcl, incr_pc2, incr_pc3: std_logic; 
begin 
dff stO: FD port map(D=>ESPR_on, C=>clk, Q=>stO); 
dff=stl: FD port map(D=>stO, C=>clk, Q=>stl); 
stO _bar <= not (stO); 
st! bar <= not (stl); 
startespr <= stO and stl_bar; 
dffmO: FD port map(D=>mdO, C=>clk, Q=>mtO); 
dffm I: FD port map(D=>md I, C=>clk, Q=>mt I); 
267 
dffin2: FD port map(D=>md2 , C=>clk, Q=>mt2); 
dffin3: FD port map{D=>md3 , C=>clk, Q=>mt3); 
dffm4: FD port map(D=>md4, C=>clk, Q=>mt4); 
dffinS: FD port map(D=>mdS , C=>clk, Q=>mtS); 
dffm6: FD port map{D=>md6, C=>clk, Q=>mt6); 
dffm7: FD port map{D=>md7, C=>clk, Q=>mt7); 
dffin8: FD port map{D=>md8, C=>clk, Q=>mt8); 
dffm9: FD port map{D=>md9, C=>clk, Q=>mt9); 
dffmA: FD port map(D=>mdA, C=>clk, Q=>mtA); 
dffinB: FD port map(D=>mdB, C=>clk, Q=>mtB); 
dffmC: FD port map(D=>mdC, C=>clk, Q=>mtC); 
dffrnD: FD port map(D=>mdD, C=>clk. Q=>mtD); 
dffmE: FD port map(D=>mdE, C=>clk, Q=>mtE); 
dffmF: FD port map(D=>mdF, C=>clk, Q=>mtF); 
dffmlO: FD port map(D=>mdlO, C=>clk, Q=>mtlO); 
dffin 11 : FD port map(D=>md 11 , C=>clk, Q=>mt 11 ); 
dffm 12: FD port map(D=>md 12, C=>clk, Q=>mtl 2); 
dffm 13: FD port map(D=>md 13, C=>clk, Q=>mtl 3); 
dffml4: FD port map(D=>mdl4, C=>clk, Q=>mtl4); 
dffin IS: FD port map(D=>md IS, C=>clk, Q=>mtlS); 
dffm 16: FD port map(D=>md 16, C=>clk, Q=>mt 16); 
dffml7: FD port map(D=>md 17, C=>clk, Q=>mtl7); 
dffm 18: FD port map(D=>md 18, C=>clk, Q=>mtl8); 
dffinl9: FD port map(D=>mdl9, C=>clk, Q=>mtl9); 
dffinJA: FD port map(D=>md IA, C=>clk, Q=>mtlA); 
dffmlB: FD port map(D=>md 18, C=>clk, Q=>mtlB); 
dffm IC: FD port map(D=>md IC, C=>clk, Q=>mt 1 C); 
dffm ID: FD port map(D=>md ID, C=>clk, Q=>mt ID); 
dffm IE: FD port map(D=>md IE, C=>clk, Q=>mt IE); 
dffin!F: FD port map(D=>mdlF, C=>clk, Q=>mtlF); 
dffm20: FD port map(D=>md20, C=>clk, Q=>mt20); 
dffm2 I: FD port map(D=>md2 I, C=>clk, Q=>mt2 I); 
dffm22: FD port map(D=>md22, C=>clk, Q=>mt22); 
dffm23: FD port map(D=>md23, C=>clk, Q=>mt23); 
dffm24: FD port map(D=>md24, C=>clk, Q=>mt24); 
dffin2S: FD port map(D=>md2S, C=>clk, Q=>mt2S); 
dffm26: FD port map(D=>md26, C=>clk, Q=>mt26); 
dffm27: FD port map(D=>md27, C=>clk, Q=>mt27); 
dffm28: FD port map(D=>md28, C=>clk, Q=>mt28); 
dffm29: FD port map(D=>md29, C=>clk, Q=>mt29); 
dffm2A: FD port map(D=>md2A, C=>clk, Q=>mt2A); 
dffm2B: FD port map(D=>md2B, C=>clk, Q=>mt2B); 
dffm2C: FD port map(D=>md2C, C=>clk, Q=>mt2C); 
dffm2D: FD port map{D=>md2D, C=>clk, Q=>mt2D); 
dffm2E: FD port map(D=>md2E, C=>clk, Q=>mt2E); 
dffm2F: FD port map(D=>md2F, C=>clk, Q=>mt2F); 
dffin30: FD port map(D=>md30, C=>clk, Q=>mt30); 
dffm3 l : FD port map(D=>md3 l , C=>clk, Q=>mt31 ); 
dffin32: FD port map(D=>md32, C=>clk, Q=>mt32); 
dffm33: FD port map(D=>md33, C=>clk, Q=>mt33); 
dffm34: FD port map(D=>md34, C=>clk, Q=>mt34); 
dffm3S: FD port map(D=>md3S, C=>clk, Q=>mt3S); 
dffm36: FD port map(D=>md36, C=>clk, Q=>mt36); 
dffm37: FD port map{D=>md37, C=>clk, Q=>mt37); 
268 
dffm38: FD port map(D=>md38, C=>clk, Q=>mt38); 
dffm39: FD port map(D=>md39, C=>clk, Q=>mt39); 
dffm3A: FD port map(D=>md3A, C=>clk, Q=>mt3A); 
dffm3B: FD port map(D=>md3B, C=>clk, Q=>mt3B); 
dffm3C: FD port map(D=>md3C, C=>clk, Q=>mt3C); 
dffm3D: FD port map(D=>md3D, C=>clk, Q=>mt3D); 
dffm3E: FD port map(D=>md3E, C=>clk, Q=>mt3E); 
dffm3F: FD port map(D=>md3F, C=>clk, Q=>mt3F); 
dffm40: FD port map(D=>md40, C=>clk, Q=>mt40); 
dffm4 l: FD port map(D=>md4 l , C=>clk, Q=>mt4 I); 
dffm42: FD port map(D=>md42, C=>clk, Q=>mt42); 
dffm43: FD port map(D=>md43, C=>clk, Q=>mt43); 
dffin44: FD port map(D=>md44, C=>clk, Q=>mt44); 
dffm45: FD port map(D=>md45, C=>clk, Q=>mt45); 
dffm46: FD port map(D=>md46, C=>clk, Q=>mt46); 
dffm47: FD port map(D=>md47, C=>clk, Q=>mt47); 
dffm48: FD port map(D=>md48, C=>clk, Q=>mt48); 
dffm49: FD port map(D=>md49, C=>clk, Q=>mt49); 
dffm4A: FD port map(D=>md4A, C=>clk, Q=>mt4A); 
dffm4B: FD port map(D=>md4B, C=>clk, Q=>mt4B) ; 
dffm4C: FD port map(D=>md4C, C=>clk, Q=>mt4C); 
dffm4D: FD port map(D=>md4D, C=>clk, Q=>mt4D); 
dffm4E: FD port map(D=>md4E, C=>clk, Q=>mt4E); 
dffm4F: FD port map(D=>md4F, C=>clk, Q=>mt4F); 
dffm50: FD port map(D=>md50, C=>clk, Q=>mt50); 
dffm5 I: FD port map(D=>md5 I, C=>clk, Q=>mt5 l ); 
dffm52: FD port map(D=>md52, C=>clk, Q=>mt52); 
dffm53: FD port map(D=>md53, C=>clk, Q=>mt53); 
dffm54: FD port map(D=>md54, C=>clk, Q=>mt54); 
dffm55: FD port map(D=>md55, C=>clk, Q=>mt55); 
dffm56: FD port map(D=>md56, C=>clk, Q=>mt56); 
dffm57: FD port map(D=>md57, C=>clk, Q=>mt57); 
dffm58: FD port map(D=>md58, C=>clk, Q=>mt58); 
dffm59: FD port map(D=>md59, C=>clk, Q=>mt59); 
dffm5A: FD port map(D=>md5A, C=>clk, Q=>mt5A); 
dffm5B: FD port map(D=>md5B, C=>clk, Q=>mt5B); 
dffm5C: FD port map(D=>md5C, C=>clk, Q=>mt5C); 
dffm5D: FD port map(D=>md5D, C=>clk, Q=>mt5D); 
dffm5E: FD port map(D=>md5E, C=>clk, Q=>mt5E); 
dffm5F: FD port map(D=>md5F, C=>clk, Q=>mt5F); 
dffm60: FD port map(D=>md60, C=>clk, Q=>mt60); 
dffm61: FD port map(D=>md6 l , C=>clk, Q=>mt6 l ); 
dffm62: FD port map(D=>md62, C=>clk, Q=>mt62); 
dffm63: FD port map(D=>md63, C=>clk, Q=>mt63); 
dffm64: FD port map(D=>md64, C=>clk, Q=>mt64); 
dffm65: FD port map(D=>md65, C=>clk, Q=>mt65); 
dffm66: FD port map(D=>md66, C=>clk, Q=>mt66); 
dffm67: FD port map(D=>md67, C=>clk, Q=>mt67); 
dffm68: FD port map(D='>md68, C=>clk, Q=>mt68); 
dffm69: FD port map(D='>md69, C=>clk, Q=>mt69); 
dffm6A: FD port map(D=>md6A, C=>clk, Q=>mt6A); 
dffm6B: FD port map(D=>md6B, C=>clk, Q=>mt6B); 
dffm6C: FD port map(D=>md6C, C=>clk, Q=>mt6C); 
269 
dffm6D: FD port map(D=>md6D, C=>clk, Q=>mt6D); 
dffm6E: FD port map(D=>md6E, C=>clk, Q=>mt6E); 
dffm6F: FD port map(D=>md6F, C=>clk, Q=>mt6F); 
dffm70: FD port map(D=>md70, C=>clk, Q=>mt70); 
dffm7 l: FD port map(D=>md7 l , C=>clk, Q=>mt7 l ); 
dffm72: FD port map(D=>md72, C=>clk, Q=>mt72); 
dffm73: FD port map(D=>md73 , C=>clk, Q=>mt73); 
dffm74: FD port map(D=>md74, C=>clk, Q=>mt74); 
dffm75: FD port map(D=>md75, C=>clk, Q=>mt75); 
dffm76: FD port map(D=>md76, C=>clk, Q=>mt76); 
dffm77: FD port map(D=>md77, C=>clk, Q=>mt77); 
dffm78: FD port map(D=>md78, C=>clk, Q=>mt78); 
dffm79: FD port map(D=>md79, C=>clk, Q=>mt79); 
dffm7 A: FD port map(D=>md7 A, C=>clk, Q=>mt7 A); 
dffm78: FD port map(D=>md7B, C=>clk, Q=>mt7B); 
dffm7C: FD port map(D=>md7C, C=>clk, Q=>mt7C); 
dffin7D: FD port map(D=>md7D, C=>clk, Q=>mt7D); 
dffin7E: FD port map(D=>md7E, C=>clk, Q=>mt7E); 
dffm7F: FD port map(D=>md7F, C=>clk, Q=>mt7F); 
dffm80: FD port map(D=>md80, C=>clk, Q=>mt80); 
dffm8 l: FD port map(D=>md8 l , C=>clk, Q=>mt8 l ); 
dffm82: FD port map(D=>md82, C=>clk, Q=>mt82); 
dffin83: FD port map(D=>md83, C=>clk, Q=>mt83); 
dffm84: FD port map(D=>md84, C=>clk, Q=>mt84); 
dffm85: FD port map(D=>md85, C=>clk, Q=>mt85); 
dffm86: FD port map(D=>md86, C=>clk, Q=>mt86); 
dffm87: FD port map(D=>md87, C=>clk, Q=>mt87); 
dffm88: FD port map(D=>md88, C=>clk, Q=>mt88); 
dffm89: FD port map(D=>md89, C=>clk, Q=>mt89); 
dffm8A: FD port map(D=>md8A, C=>clk, Q=>mt8A); 
dffm8B: FD port map(D=>md8B, C=>clk, Q=>mt8B); 
dffm8C: FD port map(D=>md8C, C=>clk, Q=>mt8C); 
dffm8D: FD port map(D=>md8D, C=>clk, Q=>mt8D); 
dffm8E: FD port map(D=>md8E, C=>clk, Q=>mt8E); 
dffm8F: FD port map(D=>md8F, C=>clk, Q=>mt8F); 
dffm90: FD port map(D=>md90, C=>clk, Q=>mt90); 
dffm91: FD port map(D=>md91 , C=>clk, Q=>mt9l); 
dffm92: FD port map(D=>md92, C=>clk, Q=>mt92); 
dffm93: FD port map(D=>md93, C=>clk, Q=>mt93); 
dffm94: FD port map(D=>md94, C=>clk, Q=>mt94); 
dffm95: FD port map(D=>md95, C=>clk, Q=>mt95); 
dffm96: FD port map(D=>md96, C=>clk, Q=>mt96); 
dffm97: FD port map(D=>md97, C=>clk, Q=>mt97); 
dffm98: FD port map(D=>md98, C=>clk, Q=>mt98); 
dffm99: FD port map(D=>md99, C=>clk, Q=>mt99); 
dffm9A: FD port map(D=>md9A, C=>clk, Q=>mt9A); 
dffm9B: FD port map(D=>md9B, C=>clk, Q=>mt9B); 
dffm9C: FD port map(D=>md9C, C=>clk, Q=>mt9C); 
dffm9D: FD port map(D=>md9D, C=>clk, Q=>mt9D); 
dffm9E: FD port map(D=>md9E, C=>clk, Q=>mt9E); 
dffm9F: FD port map(D=>md9F, C=>clk, Q=>mt9F); 
dffmAO: FD port map(D=>mdAO, C=>clk, Q=>mtAO); 
270 
dffinAJ: FD port map(D=>mdAI , C=>clk, Q=>mtAI); 
dffmA2: FD port map(D=>mdA2, C=>clk, Q=>mtA2); 
dffmA3: FD port map(D=>mdA3, C=>clk, Q=>mtA3); 
dffinA4: FD port map(D=>mdA4, C=>clk, Q=>mtA4); 
dffinA5: FD port map(D=>mdA5, C=>clk, Q=>mtA5); 
dffmA6: FD port map(D=>mdA6, C=>clk, Q=>mtA6); 
dffmA 7: FD port map(D=>mdA 7, C=>clk, Q=>mtA 7); 
dffmA8: FD port map(D=>mdA8, C=>clk, Q=>mtA8); 
dffmA9: FD port map(D=>mdA9, C=>clk, Q=>mtA9); 
dffinAA: FD port map(D=>mdAA. C=>clk, Q=>mtAA); 
dffrnAB: FD port map(D=>mdAB, C=>clk, Q=>mtAB); 
dffmAC: FD port map(D=>mdAC, C=>clk, Q=>mtAC); 
dffmAD: FD port map(D=>mdAD, C=>clk, Q=>mtAD); 
dffinAE: FD port map(D=>mdAE, C=>clk, Q=>mtAE); 
dffmAF: FD port map(D=>mdAF, C=>clk, Q=>mtAF); 
dffinBO: FD port map(D=>mdBO, C=>clk, Q=>mtBO); 
dffinB I: FD port map(D=>mdB I, C=>clk, Q=>mtB I); 
dffinB2: FD port map(D=>mdB2, C=>clk, Q=>mtB2); 
dffinB3: FD port map(D=>mdB3, C=>clk, Q=>mtB3); 
dffmB4: FD port map(D=>mdB4, C=>clk, Q=>mtB4); 
dffinB5: FD port map(D=>mdB5, C=>clk, Q=>mtB5); 
dffmB6: FD port map(D=>mdB6, C=>clk, Q=>mtB6); 
dffmB7: FD port map(D=>mdB7, C=>clk, Q=>mtB7); 
dffmB8: FD port map(D=>mdB8, C=>clk, Q=>mtB8); 
dffrnB9: FD port map(D=>mdB9, C=>clk, Q=>mtB9); 
dffinBA: FD port map(D=>mdBA, C=>clk, Q=>mtBA); 
dffmBB: FD port map(D=>mdBB, C=>clk, Q=>mtBB); 
dffmBC: FD port map(D=>mdBC, C=>clk, Q=>mtBC); 
dffmBD: FD port map(D=>mdBD, C=>clk, Q=>mtBD); 
dffmBE: FD port map(D=>mdBE, C=>clk, Q=>mtBE); 
dffmBF: FD port map(D=>mdBF, C=>clk. Q=>mtBF); 
dffmCO: FD port map(D=>mdCO, C=>clk, Q=>mtCO); 
dffinC I: FD port map(D=>mdC I, C=>clk, Q=>mtC I); 
dffinC2: FD port map(D=>mdC2, C=>clk, Q=>mtC2); 
dffinC3: FD port map(D=>mdC3, C=>clk, Q=>mtC3); 
dffmC4: FD port map(D=>mdC4, C=>clk, Q=>mtC4); 
dffmC5: FD port map(D=>mdC5, C=>clk, Q=>mtC5); 
dffinC6: FD port map(D=>mdC6, C=>clk, Q=>mtC6); 
dffmC7: FD port map(D=>mdC7, C=>clk, Q=>mtC7); 
dffmC8: FD port map(D=>mdC8, C=>clk, Q=>mtC8); 
dffmC9: FD port map(D=>mdC9, C=>clk, Q=>mtC9); 
dffmCA: FD port map(D=>mdCA, C=>clk, Q=>mtCA); 
dffmCB: FD port map(D=>mdCB, C=>clk, Q=>mtCB); 
dffmCC: FD port map(D=>mdCC, C=>clk, Q=>mtCC); 
dffmCD: FD port map(D=>mdCD, C=>clk, Q=>mtCD); 
dffmCE: FD port map(D=>mdCE, C=>clk, Q=>mtCE); 
dffmCF: FD port map(D=>mdCF, C=>clk, Q=>mtCF); 
dffmDO: FD port map(D=>mdDO, C=>clk, Q=>mtDO); 
dffinD l: FD port map(D=>mdD I, C=>clk, Q=>mtD I); 
dffmD2: FD port map(D=>mdD2, C=>clk, Q=>mtD2); 
dffmD3: FD port map(D=>mdD3, C=>clk, Q=>mtD3); 
dffmD4: FD port map(D=>mdD4, C=>clk, Q=>mtD4); 
dffinD5: FD port map(D=>mdD5, C=>clk, Q=>mtD5); 
271 
dffmD6: FD port map(D=>mdD6, C=>clk, Q=>mtD6); 
dffmD7: FD port map(D=>mdD7, C=>clk, Q=>mtD7); 
dffmD8: FD port map(D=>mdD8, C=>clk, Q=>mtD8); 
dffmD9: FD port map(D=>mdD9, C=>clk, Q=>mtD9); 
dffmDA: FD port map(D=>mdDA, C=>clk, Q=>mtDA); 
dffmDB: FD port map(D=>mdDB, C=>clk, Q=>mtDB); 
dffmDC: FD port map(D=>mdDC, C=>clk, Q=>mtDC); 
dffmDD: FD port map(D=>mdDD, C=>clk, Q=>mtDD); 
dffmDE: FD port map(D=>mdDE, C=>clk, Q=>mtDE); 
dffmDF: FD port map(D=>mdDF, C=>clk, Q=>mtDF); 
crc_bar <= not (crechkin); 
loc_bar <= not (loechk); 
ess_bar <= not (essfullin); 
eopbar <= not (EOP); 
-- state equations 
md0 <= startespr; 
mdl <= (mt0 or (mt2 and eopbar)); 
md2 <= mtl ; 
md3 <= mt2 and EOP; 
md4 <= mt3 and ere_ bar; 
md5 <= mt4; 
md6 <= mt3 and ercehkin; 
md7 <= mt6 and locehk; 
md8 <= mt?; 
md9 <= mt6 and Joe_ bar; 
mdA <= mt9 and essfullin; 
mdB <= mtA; 
--COUNT 
mdC <= mt9 and ess_bar and dee_macop(0); 
mdD <= mtC; 
mdE <= mtD; 
mdF <= mtE; 
mdl 0 <= mtF; 
mdl I <= mt IO; 
md12 <= mtl I; 
md13 <= mt12; 
md14 <= mtl3 ; 
md15 <= mtl4; 
md16 <= mtl5; 
mdl7 <= mtl 6; 
md18 <= mt17; 
mdl9 <= mt18; 
mdlA <= mt19; 
mdlB <= mtlA; 
mdlC <= mtlB; 
mdlD <= mtlC; 
mdl E <= mt ID; 
mdlF <= mtlE; 
md20 <= mtlF; 
md2 l <= mt20; 
md22 <= mt21 ; 
md23 <= mt22; 
md24 <= mt23; 
272 
md2S <= mt24; 
md26 <= mt2S ; 
md27 <= mt26; 
md28 <= mt27; 
md29 <= mt28; 
md2A <= mt29; 
md2B <= mt2A; 
md2C <= mt2B; 
--COMPARE 
md2D <= mt9 and ess_bar and dec_macop(l); 
md2E <= mt2D; 
md2F <= mt2E; 
md30 <= mt2F; 
md3l <= mt30; 
md32 <= mt3 l; 
md33 <= mt32; 
md34 <= mt33; 
md3S <= mt34; 
md36 <= mGS; 
md37 <= mt36; 
md38 <= mt37; 
md39 <= mt38; 
md3A <= mt39; 
md3B <= mt3A; 
md3C <= mGB; 
md3D <= mt3C; 
md3E <= mt3D; 
md3F <= mt3E; 
md40 <= mt3F; 
md4 l <= mt40; 
md42 <= mt4 l ; 
md43 <= mt42; 
md44 <= mt43; 
md4S <= mt44; 
md46 <= mt4S; 
--COLLECT 
md47 <= mt9 and ess_bar and dec_macop(2); 
md48 <= mt47; 
md49 <= mt48; 
md4A <= mt49; 
md4B <= mt4A; 
md4C <= mt4B; 
md4D <= mt4C; 
md4E <= mt4D; 
md4F <= mt4E; 
mdS0 <= mt4F; 
mdS I <= mtS0; 
md52 <= mtS I; 
md53 <= mt52; 
md54 <= mt53; 
mdSS <= mt54; 
md56 <= mtSS; 
md57 <= mtS6; 
md58 <= mt57; 
273 
md59 <= mt58; 
md5A <= mt59; 
md5B <= mt5A; 
md5C <= mt5B; 
md5D <= mt5C; 
md5E <= mt5D; 
md5F <= mt5E; 
md60 <= mt5F; 
md6 l <= mt60; 
md62 <= mt6 l; 
md63 <= mt62; 
md64 <= mt63 ; 
md65 <= mt64; 
md66 <= mt65; 
md67 <= mt66; 
md68 <= mt67; 
md69 <= mt68; 
--RCHLD 
md6A <= mt9 and ess_bar and dec_macop(3); 
md6B <= mt6A; 
md6C <= mt6B; 
md6D <= mt6C; 
md6E <= mt6D; 
md6F <= mt6E; 
md70 <= mt6F; 
md7 l <= mt70; 
md72 <= mt7 l ; 
md73 <= mt72; 
md74 <= mt73; 
md75 <= mt74; 
md76 <= mt75; 
md77 <= mt76; 
md78 <= mt71 ; 
md79 <= mt78; 
md7A <= mt79; 
md7B <= mt7 A; 
md7C <= mt7B; 
md7D <= mt7C; 
md7E <= mt7D; 
md7F <= mt7E; 
md80 <= mt7F; 
md8 l <= mt80; 
md82 <= mt8 l ; 
md83 <= mt82; 
md84 <= mt83 ; 
md85 <= mt84; 
md86 <= mt85 ; 
md87 <= mt86; 
md88 <= mt87; 
md89 <= mt88; 
md8A <= mt89; 
md8B <= mt8A; 
md8C <= mt8B; 
md8D <= mt8C; 
md8E <= mt8D; 
274 
md8f <= mt8E; 
md90 <= mt8F; 
md9 l <= mt90; 
md92 <= mt9 l; 
md93 <= mt92; 
md94 <= mt93; 
md95 <= mt94; 
md96 <= mt95; 
--RCOLLECT 
md97 <= mt9 and ess_bar and dec_macop(4); 
md98 <= mt97; 
md99 <= mt98; 
md9A <= mt99; 
md9B <= mt9A; 
md9C <= mt9B; 
md9D <= mt9C; 
md9E <= mt9D; 
md9F <= mt9E; 
mdA0 <= mt9F; 
mdA 1 <= mtA0; 
mdA2 <= mtAl ; 
mdA3 <= mtA2; 
mdA4 <= mtA3; 
mdAS <= mtA4; 
mdA6 <= mtAS ; 
mdA 7 <= mtA6; 
mdA8 <= mtA7; 
mdA9 <= mtA8; 
mdAA <= mtA9; 
mdAB <= mtAA; 
mdAC <= mtAB; 
mdAD <= mtAC; 
mdAE <= mtAD; 
mdAF <= mtAE; 
mdB0 <= mtAF; 
mdB 1 <= mtB0; 
mdB2 <= mtB I; 
mdB3 <= mtB2; 
mdB4 <= mtB3; 
mdBS <= mtB4; 
mdB6 <= mtBS ; 
mdB7 <= mtB6; 
mdB8 <= mtB7; 
mdB9 <= mtB8; 
mdBA <= mtB9; 
mdBB <= mtBA; 
mdBC <= mtBB; 
mdBD <= mtBC; 
mdBE <= mtBD; 
mdBF <= mtBE; 
mdC0 <= mtBF; 
mdCl <= mtC0; 
mdC2 <= mtC l; 
mdC3 <= mtC2; 
mdC4 <= mtC3; 
275 
mdCS <= mtC4; 
mdC6 <= rotCS; 
mdC7 <= mtC6; 
mdC8 <= mtC7; 
mdC9 <= mtC8; 
mdCA <= mtC9; 
mdCB <= mtCA; 
mdCC <= mtCB; 
mdCD <= mtCC; 
mdCE <= mtCD; 
mdCF <= mtCE; 
mdD0 <= mtCF; 
mdD I <= rotD0; 
mdD2 <= mtD I ; 
mdD3 <= mtD2; 
mdD4 <= mtD3 ; 
mdDS <= mtD4; 
mdD6 <= mtDS; 
mdD7 <= mtD6; 
mdD8 <= mtD7; 
mdD9 <= mtD8; 
mdDA <= mtD9; 
mdDB <= mtDA; 
mdDC <= mtDB; 
mdDD <= mtDC; 
mdDE <= mtDD; 
mdDF <= mtDE; 
-- Output equations 
macctl <= startespr or mt0 or mt4 or mt5 or mt7 or rnt8 or mtA or mtB or mtC or mt2D or mt47 or mt6A or 
mt97; 
fm0 <= startespr; 
fm I <= mt0; -- IN 
fm2 <= mt4 or mtA; -- ABORT2 for example 
fm3 <= mt7; -- FWD 
fmO <= mt5 or mt8 or mtB; -- OUT 
fmT <= mtC; 
fmF <= mt2D; 
fmC <= mt47; 
fmA <= mt6A; 
fmRC <= mt97; 
incr_pcl <= mtC or mtD or mtE or mtF or mtl0 or mt! I or mtl2 or mtl3 or mt14 or mtl5 or mtl6 or mtl 7 
or mtl8 or mt! 9 or mtlA or mt! B or mt IC or mt ID or mt! E or mt!F or mt20 or mt2 l or mt22 or mt23 or 
mt24 or mt26 or mt27 or mt28 or mt29 or mt2A or mt2B or mt2C or mt2D or mt2E or mt2F or mt30 or 
mt3 l or mt32 or mt33 or rot34 or mt35 or mt36 or mt37 or mt38 or mt39 or mt3A or mt3B or mt3C or 
mt3D or mt3E or mt40 or mt4 l or mt42 or mt43 or mt44 or mt45 or mt46 or mt4 7 or mt48 or mt49 or 
mt4A or mt4B or mt4C or mt4D or mt4E or mt4F or mt50 or mt5 l or mt52 or mt53 or mt54 or mt55 or 
mt56 or mt57 or mt58 or mt59 or mtSA or mt5B or mt5C or mt5D or mtSE or mtSF; 
incr_pc2 <= mt60 or mt61 or mt63 or mt64 or mt65 or mt66 or mt67 or mt68 or mt69 or mt6A or mt6B or 
mt6C or mt6D or mt6E or mt6F or mt70 or mt7 l or mt72 or mt73 or mt74 or mt75 or mt76 or mt77 or 
mt78 or mt79 or mt7 A or mt7B or mt7C or mt7D or mt7E or mt7F or mt80 or mt8 l or mt82 or mt83 or 
mt84 or mt85 or mt86 or mt87 or mt88 or mt89 or mt8A or mt8B or mt8C or mt8D or mt8E or mt90 or 
mt9 l or mt92 or rot93 or mt94 or mt95 or mt96 or mt97 or mt98 or mt99 or mt9A or mt9B or mt9C or 
276 
mt9D or mt9E or mt9F or mtA0 or mtA I or mtA2 or mtA3 or mtA4 or mtAS or mtA6 or mtA 7 or mtA8 or 
mtA9 or mtAA or mtAB or mtAC or mtAD or mtAE or mtAF; 
incr _pc3 <= mtB0 or mtB 1 or mtB2 or mtB3 or mtB4 or mtBS or mtB6 or mtB7 or mtB8 or mtB9 or mtBA 
or mtBB or mtBC or mtBD or mtBE or mtBF or mtC0 or mtC I or mtC2 or mtC3 or mtC4 or mtCS or mtC6 
or mtC7 or mtC8 or mtC9 or mtCA or mtCB or mtCC or mtCD or mtCE or mtCF or mtD0 or mtD I or 
mtD2 or mtD3 or mtD4 or mtD6 or mtD7 or mtD8 or mtD9 or mtDA or mtDB or mtDC or mtDD or mtDE 
or mt25 or mt3F or mt62 or mt87 or mtDS; 
incr _pc <= incr _pc I or incr _pc2 or incr _pc3; 
end architecture macctrl_ beh; 
-- For getting fmmactrlr address for specific macro and micro instructions 
library IEEE; 
use IEEE.std_logic_l 164.all; 
entity fmm is 
port(fm0in, fmlin, fm2in, fm3in, fmOin, fmTin, fmFin, fmCin, fmAin, fmRCin: in std_logic; 
fmmactrlrout: out std_logic_vector{IS downto 0)); 
end entity fmrn; 
architecture fmrn beh of fmm is 
signal fmsig: std _logic_ vector(9 downto 0); 
begin 
fmsig <= fm0in & fin 1 in & fm2 in & fm3in & fmOin & fmTin & fmFin & fmCin & fmAin & fmRCin; 
process(fmsig) is 
begin 
case fmsig is 
when " 1000000000" => fmmactrlrout <= ( others => '0'); -- address 0 for IN 
when "0 100000000" => frnmactrlrout <= "000000000000000 I"; -- I for IN 
when "0010000000" => fmmactrlrout <= "0000000000000010"; -- 2 for Abort2 
when "000 I 000000" => fmmactrlrout <= "0000000000000011 "; -- 3 for Fwd 
when "0000 100000" => frnmactrlrout <= "000000000001 1100"; -- IC for Out 
when "00000 I 0000" => frnmactrlrout <= "000000000000010 I"; -- S for THRESH 
when "000000 I 000" => fmmactrlrout <= "0000000000 I0011 0"; -- 26 for FINDM 
when "0000000 I 00" => fmmactrlrout <= "000000000 I 000000"; -- 40 for COLLECT 
when "00000000 IO" => fmmactrlrout <= "00000000011000 11"; -- 63 for RCHLD 
when "000000000 l" => frnmactrlrout <= "0000000010010000"; -- 90 for RCOLLECT 
when others => finmactrlrout <= "00000000001000 1 0" ; -- address 22 for NOP 
end case; 
end process; 
end architecture finm _ beh; 
-- 3to8 Decoder 
library IEEE; 
use IEEE.std_logic_ l 164.all; 
use IEEE.std _logic_ arith.all; 
use IEEE.std _logic_ unsigned.all ; 
entity dee _3to8 l is 
port(inp: in std _logic_ vector(2 down to 0); 
outp: out std _logic_ vector(? down to 0)); 
end entity dee_ 3 to8 l ; 
277 
architecture dee 3to8 l heh of dee 3to8 l is 
begin - - -
process(inp) is 
begin 
case inp is 
when "000" => outp <= "00000001 "; 
when "001" => outp <= "00000010"; 
when "010" => outp <= "00000 100"; 
when "0 11" => outp <= "0000 l 000"; 
when " I 00" => outp <= "000 I 0000"; 
when "l 0 l" => outp <= "00 I 00000"; 
when " I IO" => outp <= "O I 000000"; 
when " 111" => outp <= " I 0000000"; 
when others=> outp <= (others => '0'); 
end case; 
end process; 
end architecture dee 3 to8 l _ beh; 
8. Instruction Memory Initialization 
--Instruction Memory for 'COUNT' Macro Instruction 
library IEEE; 
use IEEE.std_logic_ l !64.all; 
--synopsys translate_ off; 
library unisim; 
use unisim.vcomponents.all; 
--synopsys translate_ on; 
entity INSTMEM is 
port(clk, we, en, rst: in std_logic; 
addr: in std_logic_ vector(? downto 0); 
inst in: in std logic vector(63 downto 0); 
inst=out: out itd_logic_vector(63 downto 0)); 
end entity INSTMEM; 
architecture behavioural ofINSTMEM is 
component RAMB4 _ S 16 is 
port(ADDR: in std _logic_ vector(? downto 0); 
CLK: in std_logic; 
DI: in std _logic_ vector( 15 down to 0); 
DO: out std _logic_ vector( 15 down to 0); 
EN, RST, WE: in std_Jogic); 
end component RAMB4 _SI 6; 
attribute INIT _ 00: string; 
attribute INIT _ 0 l: string; 
attribute INIT _ 02: string; 
attribute INIT _ 03: string; 
attribute INIT_04: string; 
attribute INIT _ 05: string; 
attribute INIT _ 06: string; 
attribute INIT _ 07: string; 
attribute INIT _ 08: string; 
278 
attribute TNIT _ 09: string; 
attribute INIT_0A: string; 
attribute INIT _OB: string; 
attribute INIT_0C: string; 
attribute INIT _ OD: string; 
attribute INIT _ OE: string; 
attribute INIT_0F: string; 
attribute INIT _ 00 of Instram0 : label is 
ti 00000000000006C0000O00C000000000 1 0400000000000000000004000000000tl · 
attribute INIT _ 0 I of Instram0 : label is ' 
''00000000054008000000000000000000090000000 l 4000000000080000000000tl · 
attribute INIT _ 02 of lnstram0 : label is ' 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl-
attribute INIT _ 03 oflnstram0 : label is ' 
ti ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl · 
attribute INIT_04 oflnstram0: label is ' 
"ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl · 
attribute INIT_05 oflnstram0: label is ' 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl-
attribute INIT_06 oflnstram0: label is ' 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000tl; 
attribute INIT _ 07 of lnstram0 : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT _ 08 of lnstram0 : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT _ 09 of lnstram0 : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT _ 0A oflnstramO : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT OB of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT _ 0C of Instram0 : label is 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000O000000000000000OOOOOtl; 
attribute INIT _ OD of lnstram0 : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INTT OE of lnstram0 : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INTT _ OF of Instram0 : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT 00 oflnstraml : label is 
"O l 00000000O000008000000000000000000000008 l 000 l 000 I 000 l OOOOOOOOOOtl; 
attribute INIT 0 l of Ins tram l : label is 
tlOOOOOOOOOOOOOOOOOOO000000000000000000 I 000 l 0000000000000000008000tl; 
attribute INTT 02 oflnstraml : label is 
tlooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT 03 of lnstraml : label is 
,, ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT 04 of Ins tram I : label is 
tloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo"; 
attribute INIT 05 of lnstram l : label is 
··ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
attribute INIT 06 of lnstram l : label is 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOtl; 
attribute INIT 07 of Ins tram l : label is 
··ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooootl; 
279 
attribute IN1T _ 08 of Ins tram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_09 oflnstraml : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute IN1T_0A oflnstram l : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000" · 
attribute INIT _ OB of Ins tram I : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INTT _ 0C of Ins tram l : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000''· 
attribute INIT_0D of lnstram l : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0E oflnstraml : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000" · 
attribute INlT _ OF of Ins tram 1 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute !NIT_ 00 of lnstram2 : label is 
"000 1000000000000004 100600000000000000041 000100601800000000000000"; 
attribute INIT _ 0 I oflnstram2 : label is 
"0000000000000000004 l 0000000000002800000 I 0000000000000000004 l000 I"; 
attribute INIT _ 02 of Instram2 : label is 
''0000000000000000000000000000000000000000000000000000000000000000" ; 
attribute INIT_03 of lnstram2: label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_04 oflnstram2: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 05 oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute IN1T _ 06 of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 




attribute INIT 08 of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 09 of fnstram2 : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT QA of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute !NIT OB oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0C of Instram2 : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE of Instram.2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute !NIT OF of Jnstram.2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 00 of lnstram3 : label is 
"2C80000000008000780054000000000084007C00 l COS I C0524A4208000000400"; 
attribute INIT 0 I of Instrarn3 : label is 
"0000000070O084007C0000000000 14006404 l CA054800000000084007COO 1 C04"; 




attribute INlT _ 03 of Instram3 : label is 
"0O?00000000000000O0000000000000000000000000000000000000000000000"; 
attnbute INIT _ 04 oflnstram3 : label is 
"00?000000O000000000000000000000000000000000000000000000000000000"; 
attnbute INIT_0S ofJnstram3: label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 06 of lnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attt;bute INIT _ 07 of Instram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_08 oflnstram3: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute IN1T _ 09 oflnstram3 : label is ' 
''0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0A oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ OB oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0C oflnstram3 : label is ' 
''0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0D oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INlT_0E oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute IN1T_0F oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
begin 
lnstram0: RAMB4 S 16 
--synopsys translate_ off 
GENERIC MAP ( 
INIT _ 00 => X"00000000000006C0000000C000000000 I 0400000000000000000004000000000", 
INIT _ 0 I => X"00000000054008000000000000000000090000000 I 4000000000080000000000", 
INIT _ 02 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000OOOOOOOOOOO", 
INIT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000OOOOOOOOOO", 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000OOOOOOOOOOOO", 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000OOOOOOOOOOO" , 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000OOOOOOOOOOOOOO" , 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000OOOOOOOOOOOO" , 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000OOOOOOOOOOOOOO" , 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOO", 
INlT 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000OOOOOOOOOOOOO", 
INIT - OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000OOOOOOOOOOOOOOO", 
INIT - 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000OOOOOOOOOOOOO", 
INIT - OD=> X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOO", 
INIT - OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000OOOOOOOOOOOOOOO" , 
IN1T - OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(l5 downto 0), DO=>inst_out(IS downto 0), EN=>en, 
RST=>rst, WE=>we); 
Ins tram I: RAMB4 _ S 16 
--synopsys translate_ off 
GENERIC MAP ( INIT 00 => X"0 10000000000000080000000000000000000000081000I0001000 l 0000000000", 
281 
INIT _ 01 => X"0000O00000000000000000000000000000000 I 00010000000000000000008000" 
INTT _ 02 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INTT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" ' 
INTT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
IN1T _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INTT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO': 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INTT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INTT _OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INTT _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO": 
INTT _ OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, Dl=>inst_in(3 l downto 16), DO=>inst_out(3 l downto 16), EN=>en, 
RST=>rst, WE=>we); 
Instram2: RAMB4 Sl6 
--synopsys translate_ off 
GENERIC MAP ( 
INTT _ 00 => X"000 1000000000000004 l 0060000000000000004 l 000 l 0060 l 800000000000000" , 
INIT _ 0 I => X"0000000000000000004 l 0000000000002800000 I 0000000000000000004 l 000 I", 
INTT _ 02 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000OOOOOOOOOOOO", 
INTT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000OOOOOOOOOOOO", 
INTT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOO", 
TNlT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000OOOOOOOOOOOOO", 
TNlT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000OOOOOOOOOOOOO", 
!NIT_ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOO", 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOOO", 
INlT _ 09 => X" 0000000000000000000000000000000000000000000000000000000000000000", 
INIT _QA => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOOO", 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000OOOOOOOOOOOOOOOO", 
!NIT_ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00OOOOOOOOOOOOOOOOO", 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOOO", 
INIT OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000OOOOOOOOOOOOOOO", 
INlT= OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 
--synopsys translate_ on port map(ADDR=>addr, CLK=>clk, DI=>inst_in(47 downto 32), DO=>inst_out(47 downto 32), EN=>en, 
RST=>rst, WE=>we); 
Instram3: RAMB4_SJ6 
--synopsys translate_ off 
GENERIC MAP ( INlT 00 => X"2C80000000008000780054000000000084007C00 I C05 l C0524A4208000000400", 
lNIT-0 l => X"00000000700084007C000000000014006404 l CA054800000000084007COO I C04" , 
INIT - 02 => X"000000000000000000000000000000000000000008000C000000000008008800", 
INIT - 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT- 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT- 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
282 
INIT _ OB => X"0000000O00000000000000000000000000000000000000000000000000000000", 
INIT _ 0C => X"000000000000O000000000000000000000000000000000000000000000000000", 
INIT _ OD => X"000000000000O000000000000000000000000000000000000000000000000000", 
INIT _ OE => X"0000O00O0O0O0000000000000000000000000000000000000000000000000000", 
INIT_0F => X"0000000000000000000000000000000000000000000000000000000000000000") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(63 downto 48), DO=>inst_out(63 downto 48), EN=>en, 
RST=>rst, WE=>we); 
end architecture behavioural; 
--Instruction Memory for 'COMPARE' Macro Instruction 
library IEEE; 
use IEEE.std _logic_ 1164.all; 
--synopsys translate_ off; 
library unisim; 
use unisim.vcomponents.all; 
--synopsys translate_ on; 
entity INSTMEM is 
port( elk, we, en, rst: in std _logic; 
addr: in std _ logic_ vector(7 downto 0); 
inst in: in std logic vector(63 downto 0); 
inst=out: out ~d_logic_ vector(63 downto 0)); 
end entity INSTMEM; 
architecture behavioural ofINSTMEM is 
component RAMB4 _ S 16 is 
port(ADDR: in std _logic_ vector{7 downto 0); 
CLK: in std_logic; 
DI: in std logic vector(l 5 downto 0); 
DO: out std _logic_ vector( 15 down to 0); 
EN, RST, WE: in std_ logic); 
end component RAMB4 _SI 6; 
attribute INIT _ 00: string; 
attribute INIT _ 0 I : string; 
attribute INIT _ 02: string; 
attribute INIT _ 03: string; 
attribute INIT _ 04: string; 
attribute INIT _ 05: string; 
attribute INIT _ 06: string; 
attribute INIT _ 07: string; 
attribute INIT _ 08: string; 
attribute INIT _ 09: string; 
attribute INIT_0A: string; 
attribute INIT _OB : string; 
attribute INIT _ QC: string; 
attribute INIT_0D: string; 
attribute INIT _ OE: string; 
attribute INIT _OF: string; 
283 
attribute INIT _ 0 l ofinstramO : label is 
"00?000000000000000000000000007 80000000000 l 400000000000000 5400 I CO"; 
attnbute INIT _ 02 of InstramO : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 03 of Instram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 04 of InstramO : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT OS of lnstram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 06 of InstramO : label is ' 
''0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_07 oflnstramO: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 08 of lnstram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 09 ofinstram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 0A of lnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_0B oflnstram0: label is 
"0000000000000000000000000000000000000000000000000000000000000000" ; 
attribute INIT _ 0C of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OF of Instram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 00 of Ins tram 1 : label is 
"O I 000 I 0000000000800000000000000000000000000000000000000000000000"; 
attribute INIT 0 l of lnstram I : label is 
"00000000000000000000000000000000000080000 I 0000000000000000000080"; 
attribute INIT 02 oflnstraml : label is 
··0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 03 oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 04 oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OS oflnstraml : label is 
··0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of Ins tram I : label is 
··0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT 07 of Ins tram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT 08 of Ins tram l : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute !NIT 09 of lnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A of lnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB oflnstraml : label is 
··0000000000000000000000000000000000000000000000000000000000000000"; 
284 
attribute INIT_0C oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attnbute INIT_0D oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000" · 
attribute INIT _ OE of lnstraml : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0F oflnstraml : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 00 oflnstram2 : label is 
"000 I 000000000000004 I 00600000000000000000000000000000000000000000" · 
attribute INIT _ 0 I oflnstram2 : label is ' 
"00000000000000000000000000000000004 l 000 I 000000000000000042800000" · 
attribute INIT _ 02 of lnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 03 of Instram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 04 oflnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 05 of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000''; 
attribute INIT _ 06 of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 07 of Instram2 : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 08 of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute JNIT 09 of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0C of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OF oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 00 of Instram3 : label is 
" 1 C8054A000008000780054000000000000000000000000000000000000000400"; 
attribute INIT 0 I of Instram3 : label is 
"0800880000O0000008000C00000084007C00 I C0554A00000OOOO 140000005400"; 
attribute INIT 02 oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 03 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000" ; 
attribute INIT 04 of Instram3 : label is 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOO''; 
attribute INIT 05 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of Instram3 : label is 
··0000000000000000000000000000000000000000000000000000000000000000"; 
285 
attribute INIT _ 07 of Instram3 : label is 
"00?000000000000O000000000000000000000000000000000000000000000000"; 
attnbute INIT_08 oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 09 of lnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INlT _ 0A of Instram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INlT_0B oflnstram3: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0C oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ OD of Instram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT OE oflnstram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 




--synopsys translate_ off 
GENERIC MAP ( 
INIT 00 => X"00000 l 4000000540000000COOOOOO00000000000000000000000000000000000" 
INIT = 0 I => X"00000000000000000000000000000780000000000 l 4000000000000005400 I CO": 
INIT _ 02 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000", 
INIT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000", 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000OOOO", 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000OO", 
fNIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000OOOO", 
fNIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000OOOOOO", 
fNIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000O00OOOOOOO", 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000OOOO" , 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000OOOOOOO", 
INIT _OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000OOOOOO", 
INIT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOOOO", 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOOO" , 
INIT OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000OOOOOOOO", 
INIT = OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000OOOOOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(l 5 downto 0), DO=>inst_ out( 15 downto 0), EN=>en, 
RST=>rst, WE=>we); 
Ins tram I: RAMB4 _ S 16 
--synopsys translate_ off 
GENERIC MAP ( 
INIT 00 => X"0 J 000 I 0000000000800000000000000000000000000000000000000000000000", 
INIT - 0 I => X"00000000000000000000000000000000000080000 I 0000000000000000000080", 
INIT - 02 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOOO" , 
INIT - 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOOO", 
INIT - 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOOO", 
INIT - 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
!NIT - 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT ~) 9 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
286 
INIT _ 0A => X"000000O000000000000000000000000000000000000000000000000000000000" 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
IN1T _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(3 I downto 16), DO=>inst out(3 l downto I 6), EN=>en, 
RST=>rst, WE=>we); -
lnstram2: RAMB4_Sl6 
--synopsys translate_ off 
GENERIC MAP ( 
INIT _ 00 => X"000 I 000000000000004 l 00600000000000000000000000000000000000000000", 
INIT _ 0 I => X"000000000000000000000000000000000041000 I 000000000000000042800000" , 
INIT _ 02 => X"OOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000000", 
INIT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOO00000O0000000000000000000000000000000", 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOO00000O000000000000000000000000000000", 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000", 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000", 
INlT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000", 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000", 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000O", 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000OOOO", 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000OOO", 
INIT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000OOOOOO", 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOOO", 
!NIT_ OE=> X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000O000000000000000000OOOOOOO", 
INIT _ OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000OOOOOOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(47 downto 32), DO=>inst_out(47 downto 32), EN=>en, 
RST=>rst, WE=>we); 
lnstram3: RAMB4 S l 6 
--synopsys translate_ off 
GENERIC MAP ( 
INIT 00 => X" I C8054A000008000780054000000000000000000000000000000000000000400", 
INIT - 0 I => X"080088000000000008000C00000084007C00 1C0554A000000000140000005400", 
!NIT- 02 => X"000000000000000000000000000000000000000008000C000000000008008800" , 
INIT- 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOO", 
INlT - 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000OOOOOOOOOOOOO", 
INIT -05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000OOOOOOOOOOOOOOO", 
INIT-06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000OOOOOOOOOOOOOO", 
INIT - 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOO", 
INIT - 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOOO", 
INlT - 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOO" , 
INIT -OA => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOOO", 
INlT - OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOOO", 
INIT - 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00OOOOOOOOOOOOOOOOO", 
INIT - OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOOO", 
INIT - OE=> X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT =OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 
--synopsys translate_ on . port map(ADDR=>addr, CLK=>clk, DI=>inst_in(63 downto 48), DO=>rnst_out(63 downto 48), EN=>en, 
RST=>rst, WE=>we); 
end architecture behavioural; 
287 
--Instruction Memory for 'COLLECT' Macro Instruction 
library IEEE; 
use IEEE.std_logic_l 164.all ; 
--synopsys translate off 
library unisim; - ' 
use unisim.vcomponents.all; 
--synopsys translate_ on; 
entity INSTMEM is 
port( elk, we, en, rst: in std_ logic; 
addr: in std _logic_ vector(? down to 0); 
!nst_in: in std _logic_ vector(63 downto 0); 
mst-:-out: out std_ logic_vector(63 downto 0)); 
end entity INSTMEM; 
architecture behavioural of INSTMEM is 
component RAMB4 S 16 is 
port(ADDR: in std_ k>gic _ vector(? downto 0); 
CLK: in std_logic; 
DI: in std _logic_ vector( 15 down to 0); 
DO: out std _logic_ vector( 15 downto 0); 
EN, RST, WE: in std_logic); 
end component RAMB4_S16; 
attribute INIT _ 00: string; 
attribute INIT _ 01 : string; 
attribute INIT _ 02: string; 
attribute INIT _ 03: string; 
attribute INIT _ 04: string; 
attribute INIT _ 05: string; 
attribute !NIT_ 06: string; 
attribute INIT _ 07: st1ing; 
attribute IN1T_08: string; 
attribute INJT _ 09: string; 
attribute lNIT _ 0A: string; 
attribute INIT_0B: string; 
attribute INIT _ 0C: string; 
attribute INIT _ OD: string; 
attribute INIT _ OE: string; 
attribute INIT_0F: string; 
attribute INIT 00 of InstramO : label is 
"0 1C0000000000640000000COOOOOOOOO10400000000000000000004000000000"; 
attribute INIT 0 I of Instram0 : label is 
"000000000 1400000000000000000000007C007C0024000000140000007 400000"; 
attribute INIT 02 of InstramO : label is 
"00000000000000000 l 400000000000000AC00000064000000000000000000640"; 
attribute INIT 03 ofinstrarn0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 04 ofinstrarn0 : label is 
··0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 05 of Instram0 : label is 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOO''; 
288 
attribute INIT _ 06 oflnstram0 : label is 
'' 0000000000000000000000000000000000000000000000000000000000000000"; 
attnbute INIT _ 07 of Instram0 : label is 
"00?0000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_08 oflnstram0: label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_09 oflnstram0: label is ' 
'' 0000000000000000000000000000000000000000000000000000000000000000" · 
attribute INIT_0A of lnstramO: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ OB of Instram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute fNIT _ 0C oflnstram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0D oflnstram0: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0E oflnstram0: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute fNIT _ OF of lnstram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_00 oflnstraml : label is 
"00000000000000008000000000000000000000008100010001000 10000000000"; 
attribute INIT _ 0 1 oflnstram 1 : label is 
"0000800001000000000000000000000000008000008001000100000000008000"; 
attribute INTT 02 of Instram 1 : label is 
"00000000000000000000000000000000000000000000000080000 10000000000"; 
attribute INTT 03 of Ins tram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute fNIT 04 of lnstraml : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT 05 of Ins tram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000''; 
attribute INIT 06 oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute fNlT 07 of Ins tram l : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute fNIT 08 of Instram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT 09 of Ins tram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT 0A oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT OB oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT 0C oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT OD oflnstram 1 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE of Ins tram 1 : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INTT OF oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 00 of Instram2 : label is 
"OOA0000000000000004 l 00600000000000000041000100601800000000000000" ; 
289 
attribute fNIT _ 0 I of lnstram2 : label is 
"00~2000200000000000000000000000000002002000000020000000000000082"; 
attnbute !NIT_ 02 of lnstram2 : label is 
"00000000000000000002000000000000000 I 000000000041000 I 000 I 00000000" · 
attribute fNIT _ 03 of Instram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute fNIT_04 oflnstram2: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute fNIT _ 05 of lnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute fNIT _ 06 of lnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000" · 
attribute IN1T_07 oflnstram2: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 08 oflnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute IN1T_09 oflnstram2: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0A oflnstram2: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ OB of Instram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 0C of lnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT_0D of lnstram2: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OF of lnstram2 : label is 
"000000000000000000000000000000000000000000000000000000000000000011 ; 
attribute INTT 00 of lnstram3 : label is 
"5400000000008000780054000000000084007C00 l C05 I C0524A4208000000400"; 
attribute fNIT 0 I oflnstram3 : label is 
"7C00 l C04548000000000080088000000700000055400 I CA05480000080007800"; 
attribute INIT 02 of Instram3 : label is 
"0000000008O00C005 80000000000 l 4006C00000084007C00 l C0630C000008400"; 
attribute INIT 03 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute fNIT 04 oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 05 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 07 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 08 of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 09 oflnstram3 : label is 
II 0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0A of lnstram3 : label is 
110000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB of lnstram3 : label is 
110000000000000000000000000000000000000000000000000000000000000000"; 
290 
attribute INTT _ 0C of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INTT _ OD of Instram3 : label is ' 
"00?00000000000O000000O000000000000000000000000000000000000000000"; 
attnbute INIT _ OE oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INTT _ OF of Instram3 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"; 
begin 
lnstram0: RAMB4_S 16 
--synopsys translate off 
GENERIC MAP ( -
INIT _ 00 => X"0 I C0000000000640000000C000000000 l 0400000000000000000004000000000" 
INTT _ 0 I => X"000000000 l 400000000000000000000007C007C0024000000140000007400000", 
INTT _ 02 => X"00O00000000000000 l 400000000000000AC00000064000000000000000000640", 
INIT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO",, 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000000000", 
!NIT_ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000000", 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000000", 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000000", 
INTT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000000000000000", 
INTT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000O000O", 
INTT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000", 
lNIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000", 
lNIT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000O" , 
INIT _OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000OOOOO", 
INTT _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000OOOOOO", 
INIT _ OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000OOOO") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(l5 downto 0), DO=>inst_out( l5 downto 0), EN=>en, 
RST=>rst, WE=>we); 
lnstram I: RAMB4 _ S 16 
--synopsys translate_ off 
GENERIC MAP ( 
rNIT 00 => X"00000000000000008000000000000000000000008 I 000 l 000 l 000 I 0000000000", 
lNIT- 0 I => X"000080000 I 00000000000000000000000000800000800 l 000 l 00000000008000", 
lNIT- 02 => X"00000000000000000000000000000000000000000000000080000 I 0000000000", 
INIT- 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000OOOOOOOOOOOOO", 
INIT- 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOO", 
lNIT- 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000OOOOOOOOOOOO", 
INIT- 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOO", 
INIT- 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOOOO" , 
INIT- 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000OOOOOOOOOOOOOOO", 
INIT- 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000OOOOOOOOOOOOOOOOO", 
lNIT- 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00OOOOOOOOOOOOOOOOO", 
lNIT- OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOOO", 
INIT- 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOOO", 
lNIT-OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT- OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
lNIT= OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 




--synopsys translate off 
GENERIC MAP ( -
INIT _ 00 => X"OOA0000000000000004 l 0060000000000000004 l 000I0060 1800000000000000" 
INIT _ 0 I => X"0082000200000000000000000000000000002002000000020000000000000082" , 
INIT _ 02 => X"00000000000000000002000000000000000 I 000000000041000 I 000 I 00000000", 
INIT _ 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO": 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
!NIT_ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ OA => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ OC => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 
--synopsys translate_ on 
po11 map(ADDR=>addr, CLK=>clk, DI=>inst_in(47 downto 32), DO=>inst_out(47 downto 32), EN=>en, 
RST=>rsl, WE=>we); 
Instram3: RAMB4 S 16 
--synopsys translate_ off 
GENERIC MAP ( 
INIT 00 => X" 5400000000008000780054000000000084007COO 1 COS I C0524A4208000000400", 
INIT- 01 => X"7COO I C045480000000000800880000007000000554001 CA05480000080007800", 
INIT- 02 => X"0000000008000C00580000000000 I 4006C00000084007COO 1 C0630C000008400", 
INIT - 03 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT- 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
!NIT - 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT-06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT - 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT - 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT- 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT- OA => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT - OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT - QC => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT-OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INlT - OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INIT - OF=> X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO") 
--synopsys translate_ on . 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(63 downto 48), DO=>mst_out(63 downto 48), EN=>en, 
RST=>rst, WE=>we); 
end architecture behavioural; 
--Instruction Memory for 'RCHLD' Macro Instruction 
library IEEE; 
use IEEE.std_logic_l l64.all; 




--synopsys translate_ on; 
entity INSTMEM is 
port(clk, we, en, rst: in std_logic; 
~ddr:_ in std_logic_ vector(? downto 0); 
~st_m: m std_logic_ vector(63 downto 0); 
mst-:-out: out std_logic _ vector(63 downto 0)); 
end entity INSTMEM; 
architecture behavioural of INSTMEM is 
componentRAMB4 S16is 
port(ADDR: in std_ ~gic _ vector(? down to 0); 
CLK: in std_logic; 
DI: in std_logic_vector(l5 downto 0); 
DO: out std_logic_vector(l5 downto 0); 
EN, RST, WE: in std_logic); 
end component RAMB4 _ S 16; 
attribute INIT _ 00: string; 
attribute INIT _ 01: string; 
attribute INIT _ 02: string; 
attribute INIT _ 03: string; 
attribute INIT_04: string; 
attribute INIT _ 05: string; 
attribute INIT _ 06: string; 
attribute INIT _ 07: string; 
attribute INIT_08: string; 
attribute INIT _ 09: string; 
attribute INIT _ 0A: string; 
attribute INIT _OB: string; 
attribute INIT_0C: stiing; 
attribute INIT _ OD: string; 
attribute INIT _ OE: string; 
attli.bute INIT _OF: string; 
attribute INIT _ 00 of Instram0 : label is 
"00000 I COOOOOOCO0000O00C000000000 l 0400000000000000000004000000000" · 
attribute INIT 0 I oflnstram0 : label is ' 
"0000024000000 A 40000000000000000008C000000 l 4000000A40000000000000" · 
attribute INIT 02 of Instram0 : label is ' 
"0000000000000 1 C0000000000000000007 8000000A 4000000000000000000 BOO"; 
attribute INIT 03 of Instram0 : label is 
'' 00000000000000000000000000000000000000000000054000000A400ooooooo"; 
attribute INIT 04 oflnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000" ; 
attribute INIT 05 of Instrarn0 : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 oflnstram0: label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 07 of Instrarn0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000" ; 
attribute INIT 08 of lnstram0 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 09 of Instram0 : label is 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"; 
293 
attribute INIT _ 0A of Instram0 : label is 
"0O?00000000000O00O0000000000000000000000000000000000000000000000"; 
attnbute INIT_0B oflnstram0 : label is 
"0O?00000000000O00O0000000000000000000000000000000000000000000000"; 
attnbute INIT _ 0C of lnstramO : label is 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT_0D oflnstram0: label is ' 
''0000000000000000000000000000000000000000000000000000000000000000''· 
attribute INIT _ OE of Instram0 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000'' · 
attribute INIT_0F oflnstram0: label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
' 
attribute INIT _ 00 of Ins tram I : label is 
"0!000!0000000000800000000000000000000000810001000100010000000000"· 
attribute INIT_0l oflnstraml : label is ' 
"O I 000 I 0000000000000080000 I 00000000008000000000000000000080000 I 00" · 
attribute INIT _ 02 oflnstraml : label is ' 
"0000000000000000000000000000000000000000000000008000000000000000"· 
attribute INIT _ 03 oflnstram I : label is ' 
"0000000000000000000000000000000000000000000000000000000000008000"; 
attribute INIT _ 04 oflnstram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 05 oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 06 oflnstram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT _ 07 of lnstram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 08 of Instram I : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 09 oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute !NIT 0A of Instraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OB oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 0C oflnstraml : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OD of lnstram I : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT OE of Instram I : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT OF of lnstraml : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INlT 00 of Instram2 : label is 
"0002000000O00000008200A0000000000000004 l 000 I 0060 l 800000000000000"; 
attribute INIT 0 I of Instram2 : label is 
"000 I 0000000000000041000 I 000 I 000000000041006000000000008200024000"; 
attribute INIT 02 of lnstram2 : label is 
"000000000000000000000000000000000000000000000041000 I 000000002800"; 
attribute INIT 03 oflnstram2 : label is 
"0000000000000000000000000000000000000000000000000000000000820002"; 
attribute INIT 04 of Instram2 : label is 
"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"; 
294 
attributeINIT_05 oflnstram2: label is 
"00?000000000000000000000000000000000000000000000000000000000000011 • 
attnbute INIT _ 06 of lnstram2 : label is ' 
"000000000000000000000000000000000000000000000000000000000000000011 • 
attribute INIT_07 oflnstram2 : label is ' 
"00?000000000000000000000000000000000000000000000000000000000000011 ; 
attribute INIT _ 08 of Instram2 : label is 
"000000000000000000000000000000000000000000000000000000000000000011 • 
attribute TNIT _ 09 oflnstram2 : label is ' 
110000000000000000000000000000000000000000000000000000000000000000''· 
attribute INIT_0A oflnstram2: label is ' 
110000000000000000000000000000000000000000000000000000000000000000''· 
attribute INIT _ OB of lnstram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ 0C of Instram2 : label is ' 
"0000000000000000000000000000000000000000000000000000000000000000"· 
attribute INIT _ OD of Instram2 : label is ' 
11 000000000000000000000000000000000000000000000000000000000000000011 • 
attribute INIT _ OE of Instram2 : label is ' 
11 000000000000000000000000000000000000000000000000000000000000000011 ; 
attribute TNIT_0F of lnstram2 : label is 
"000000000000000000000000000000000000000000000000000000000000000011 ; 
attribute INIT _ 00 of lnstram3 : label is 
11 1 CC0550000008000780054000000000084007C00 1 C05 l C0524A420800000040011 ; 
attribute INIT _ 0 1 of Instram3 : label is 
11 1 CA05480000084007C00 I C042C800000800078005400000084007C00 I C0734E611 ; 
attribute INIT 02 oflnstram3 : label is 
"000008000C00580300000800880000007000000084007C00 I C000000 14006404"; 
attribute INIT 03 of lnstram3 : label is 
"000000000000000000000000000000000000000000007000000084007C00 I COO"; 
attribute TNIT 04 of Instram3 : label is 
''0000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 05 oflnstram3 : label is 
110000000000000000000000000000000000000000000000000000000000000000"; 
attribute INIT 06 of lnstram3 : label is 
11 0000000000000000000000000000000000000000000000000000000000000000"; 




attribute TNIT 08 of lnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 












attribute INIT 0C of Instram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 
attribute TNIT OD of Instram3 : label is 
,, 0000000000000000000000000000000000000000000000000000000000000000"; 
attribute TNIT OE oflnstram3 : label is 
"0000000000000000000000000000000000000000000000000000000000000000"; 






Instram0:RAMB4_S l 6 
--synopsys translate off 
GENERJC MAP ( -
INIT _ 00 => X"00000 I C00O0O0C00000000C000000000 I 0400000000000000000004000000000" 
INIT _ 0 I => X"0000024000000A40000000000000000008C000000 l 4000000A40000000000000", 
INIT _ 02 => X"00O00O0O00000 I C00000000000000000078000000A4000000000000000000BOO" , 
INlT _ 03 => X"00000000000000000000000000000000000000000000054000000A4000000000" , 
INlT _ 04 => X"00O0000O00000000000000000000000000000000000000000000000000000000" ' 
INlT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INlT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INlT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INlT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO'; 
INlT _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" , 
INlT _ 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INlT _OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO..: 
INlT _ OE => X"OOOO0O00O000000000000O000000O00000000000000000000000000000000000", 
INlT _OF => X"OOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000000000000000000") 
--synopsys translate_ on 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in( l 5 downto 0), DO=>inst_out(l5 downto 0), EN=>en, 
RST=>rst, WE=>we); 
lnstraml: RAMB4 S l 6 
--synopsys translate_ off 
GENERIC MAP ( 
INlT _ 00 => X"0 I 000 I 00000000008000000000000000000000008 l 000 I 000 l 000 I 0000000000", 
INIT _ 0 I => X"0 I 000 l 0000000000000080000 I 00000000008000000000000000000080000 I 00", 
INIT _ 02 => X"0000000000000000000000000000000000000000000000008000000000000000", 
INlT _ 03 => X"0000000000000000000000000000000000000000000000000000000000008000", 
INlT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000000000OOO", 
INlT _ 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000OOOOOOOO", 
INlT 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000000000000OOOOOOO", 
INIT- 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000OOOOOO", 
INlT - 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000OOOOOOOOO" , 
INIT- 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000OOOOOOOOOO" , 
IN1T - 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000OOOOOOOOOOOO", 
INlT - OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000OOOOOOOOOOOO", 
lN1T - 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000OOOOOOOOOOOOOO", 
INlT - OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000OOOOOOOOOOOOO", 
IN1T - OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOOO" , 
INH - OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000OOOOOOOOOOOOOOO") 
--synopsys translate on 
port map( ADD R =>;ddr, CLK =>elk, DI =>inst_ in(3 I down to 16), DO=>inst_ out(3 I down to 16), EN=>en, 
RST=>rst, WE=>we); 
Instram2: RAMB4 _ S 16 
--synopsys translate_ off 
GENERJC MAP ( II 
INlT 00 => X"0002000000000000008200A0000000000000004 l000100601800000000000000 , 
IN1T - 0 I => X"000 1000000000000004 10001000 I 000000000041006000000000008200024000", 
rNIT - 02 => X"00000000000000000000000000000000000000000000004l0001000000002800::, 
IN1T -03 => X"0000000000000000000000000000000000000000000000000000000000820002,,, 
IN1T = 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO , 
296 
IN1T_05 => X"000000000O000000000000000000000000000000000000000000000000000000" 
INIT _ 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
INIT _ 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO", 
IN1T _ 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" ' 
INIT _ 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ 0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO': 
IN1T _ OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO" ' 
INIT _ OC => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"' 
INIT _ OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO": 
INIT _OF => X"OO0O0OO0000000000000000000O0000000000000000000000000000000000000") 
--synopsys translate_ on 
po1t map(ADDR=>addr, CLK=>clk, DI=>inst_in(47 downto 32), DO=>inst_out(47 downto 32), EN=>en, 
RST=>rst, WE=>we); 
lnstram3:RAMB4_S l 6 
--synopsys translate_ off 
GENERIC MAP ( 
IN1T _ 00 => X" 1 CC0550000008000780054000000000084007C00 I COS I C0524A4208000000400", 
INIT _ 0 I => X" I CA05480000084007C00 I C042C800000800078005400000084007C00 I C0734E6", 
INIT _ 02 => X"000008000C00580300000800880000007000000084007C00 I C000000 14006404", 
INIT _ 03 => X"000000000000000000000000000000000000000000007000000084007C00 I COO", 
INIT _ 04 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000000O", 
IN1T 05 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000OO", 
INIT- 06 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000000000000OOOO", 
rN1T- 07 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000OO", 
INIT- 08 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000OOOOOO", 
INIT - 09 => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000OOOO", 
INIT-0A => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000000OOOOOOOO", 
INIT- OB => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000OOOOOOOOOO", 
INIT- 0C => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000000OOOOOOOOO" , 
INIT- OD => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0000000000000000OOOOOOOOO", 
INIT - OE => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO00000000000000OOOOOOOOOOOO", 
INIT- OF => X"OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO000000000000OOOOOOOOOOO") 
--synopsys translate_on . 
port map(ADDR=>addr, CLK=>clk, DI=>inst_in(63 downto 48), DO=>mst_out(63 downto 48), EN=>en, 
RST=>rst, WE=>we); 
end architecture behavioural; 
297 
References 
I. T. T. Speakman, D. Farinacci, S. Lin, and A. Tweedly. The PGM Reliable Transport 
Protocol, August 1998.RFC (draft-speakman-pgm-spec-02.txt). 
2. H. W. Holbr~ok and D. R. Cheriton, IP Multicast Channels: EXPRESS Support for 
Large-Scale Smgle Source Applications. In Proceedings of SJGCOMM'99, J 999. 
3. D. J. Wetherall, J . V . Guttag, and D. L. Tennenhouse, ANTS: A Toolkit for Building 
and Dynamically Deploying Network Protocols, 1998. 
4. M. Hicks, P. Kakkar, T. Moore, C. A. Gunter, and S. Nettles, PLAN: A Packet 
Language for Active Networks. 1998. International Conference on Functional 
Programming. 
5. J. T. Moore, M. Hicks, and S. Nettles, Practical Programmable Packets. In IEEE 
INFOCOM, Anchorage, AK, April 2001. 
6. S. Wen, J. Griffioen, and K. Calvert, Building Multicast Services from Unicast 
Forwarding and Ephemeral State, In Proceedings of 2001 Open Architectures and 
Network Programming Workshop, Anchorage, AK, April 27-28, 2001. 
7. S. Wen, J. Griffioen, and K. Calvert, CALM: Congestion-Aware Layered Multicast, 
In Proceedings of 2002 Open Architectures and Network Programming Workshop, 
New York, NY, June, 2002 
8. Kenneth L. Calvert, James Griffioen and Su Wen. Lightweight Network Support for 
Scalable End-to-End Services. In Proceedings of SIGCOMM 2002, Pittsburg, PA, 
August 19-23, 2002. 
9. Burton Bloom, Space/time trade-offs in hash coding with allowable errors. 
Communications of the A CM, 13(7):422-426, July 1970. 
10. S. Pingali, D. Towsley, and J. Kurose, A Comparison of Sender-initiated and 
Receiver-initiated Reliable Multicast Protocols. In Proceedings of the ACM 
SJGMETRJCS '94 Conference, Pages 221-230, 1994 
11 . I. Stoica, T. S. Eugene Ng, and H. Zhang. REUNITE: A Recursive Unicast Approach 
to Multicast. In Proceedings of INFOCOM 2000, 2000. 
12. S. Savage, D. Wetherall, A. Karlin, and T. Anderson, Practical Network Support for 
IP Traceback. In ACM SIGCOMM, Stockholm, Sweden, August, 2000. 
l 3. A. c. Snoren, c. E. Jones, F. Tchakountio, S. T. Kent, and W. T. Strayer. Hash-Based 
IP Traceback. In ACM SJGCOMM, San Diego, CA, August, 2001. 
298 
• 
14- ~- ~ark and H. Lee, On the Effectiveness of Route-Based Packet Filtering for 
D'.stnbuted DoS Attack Prevention in Power-Law Intemets. In ACM SIGCOMM San 
Diego, CA, August 2001. ' 
15. K. Calvert, J. _Griffioen, and S. Wen. Concast: Design and Implementation of a New 
Network Service. In Proceedings of 1999 International Conference on Network 
Protocols, Toronto, Ontario, November, 1999. 
16. B. Schwartz, A. Jackson, W. Strayer, W. Zhou, R. Rockwell, and C. Partridge. Smart 
Pack~ts for Active Networks. In 1999 IEEE Second Conference on Open 
Architectures and Network Programming, Pages 90-97, March, 1999. 
17. http://www.xilinx.com 
18. T. Halfhill, "Intel Network Processor Targets Routers: IXP 1200 Integrates Seven 
Cores for Multithreaded Packet Routing", MICROPROCESSOR REPORT, Vol. 13, 
No.12, Sept.13, 1999. 
19. J. Vuillemin, P. Bertin, D. Roncin, M. Shand, H. Touati, and P. Boucard, " 
Programmable Active Memories: Reconfigurable Systems Come of Age", IEEE 
Transactions on VLSI Systems, Vol. 4, Issue 1, pp.56-69 , March, 1996 . 
20. S. Hauck, "The Roles of FPGAs in Reprogrammable Systems", Proceedings of the 
IEEE, Vol. 86, No. 4, pp. 615-638, April, 1998. 
21. K. Bondalapati and V. K. Prasanna, "Reconfigurable Computing: Architectures, 
Models and Algorithms", CURRENT SCIENCE: Special Section on Computational 
Science, Vol. 78, No. 7, pp.828-837, April, 2000. 
22. K. Compton, S. Hauck, "Reconfigurable Computing: Survey of Systems and 
Software", ACM Computing Surveys (CSUR), Vol. 34, Issue 2, pp.171 -210, June, 
2002. 
23 . J.R. Heath, S. Ramamoorthy, C.E. Stroud, and A. Hurt, "Modeling, Design, and 
Performance Analysis of a Parallel Hybrid Data/Command Driven Architecture 
System and its Scalable Dynamic Load Balancing Circuit", IEEE Trans. on Circuits 
and Systems, II: Analog and Digital Signal Processing, Vol. 44, No. 1, pp. 22-40, 
January, 1997. 
24. J.R. Heath and B. Sivanesa, "Development, Analysis, and Verification of a Parallel 
Hybrid Data-flow Computer Architectural Framework and Associated Load Balancing 
Strategies and Algorithms via Parallel Simulation", SIMULATION, Vol. 69, No. 1, pp. 
7-25, July, I 997. 
299 
25. J.R. Heath and A. Tan, "Modeling, Design, Virtual and Physical Prototyping, Testing, 
and Verification of a Multifunctional Processor Queue for a Single-Chip 
Multiprocessor Architecture", Proceedings of 2001 IEEE International Workshop on 
Rapid Systems Prototyping, Monterey, California, 6 pps. June 25-27, 200 I. 
26. Su Wen, "Supporting Group Communication on a Lightweight Programmable 
Network", Ph.D. Thesis, Department of Computer Science, University of Kentucky, 
May, 2003. 
27. John L. Hennessy and David A. Patterson, Computer Organization and Design - The 
Hardware I Software Interface, Morgan Kaufmann Publishers, Inc., San Francisco, 
California, 1994. 
28. http://\\rww.en gr.ukv.edu/ heath/M Muthulakshmi Thesis ESPR.Vl VHDL Code. 
ru!f 
300 
