FPGA based label processor for large scale optical packet switch node by Osers Benaim, Alan
 Eindhoven University of Technology  
Faculty of Electrical Engineering  
Electro-optical Communication (ECO) Group 
 
 
 
 
 
FPGA BASED LABEL PROCESSOR  
FOR LARGE SCALE  
OPTICAL PACKET SWITCH NODE 
 
By Osers Benaim, Alan 
Student ID: s103139 
 
 
 
 
 
 
 
 
 
Master of Science Thesis  
carried out from February 2011 to July 2011 
 
Supervisor:  
Dr. Nicola Calabretta 
 
Graduation professor:  
Prof. Dr. Harm .J.S. Dorren  
Prof. Josep Prat Gomà 
 
ERASMUS student from the Polytechnic University of Catalonia (Barcelona Tech). 
The Faculty of Electrical Engineering of Eindhoven University of Technology disclaims all 
responsibility for the contents of traineeship and graduation reports.  
2  
 
 
  
 3 
 
  
4  
 
 
  
 5 
 
TABLE OF CONTENTS 
 
 
GRAPHICS CONTENTS ........................................................................................................................... 7	  
ABSTRACT ............................................................................................................................................... 12	  
1	   INTRODUCTION ............................................................................................................................... 14	  
1.1 Thesis objective ............................................................................................................................................................. 16	  
1.2 Thesis structure .............................................................................................................................................................. 16	  
2	   FPGA .................................................................................................................................................... 17	  
2.1 XtremeDSP Development Kit-IV .................................................................................................................................. 17	  
2.2 Fixed Point ..................................................................................................................................................................... 18	  
2.3 DSP48 ............................................................................................................................................................................ 19	  
2.4 FIR Filters ...................................................................................................................................................................... 20	  
3	   LABEL PROCESSOR METHODS .................................................................................................. 23	  
3.1	   THE FILTER METHOD ..................................................................................................................... 25	  
3.1.1.1 MATHEMATICAL MODEL .................................................................................................................................. 25	  
3.1.1.2 FPGA MODEL ........................................................................................................................................................ 26	  
3.1.1 Matlab Simulation Filter Method ............................................................................................... 36	  
3.1.2 Rising and falling time study ....................................................................................................... 38	  
3.1.2.1 Selected parameters of the BPF and LPF ................................................................................................................ 41	  
3.1.3 Experimental set-up filter method ............................................................................................... 42	  
3.1.3.1 PERFORMANCES OF THE LABEL PROCESSOR ............................................................................................. 45	  
3.1.3.1.1 OSNR TOLERANCE ....................................................................................................................................... 45	  
3.1.3.1.2 CROSSTALK TOLERANCE .......................................................................................................................... 47	  
3.1.3.1.3 DYNAMIC RANGE ........................................................................................................................................ 48	  
3.1.3.1.4 SCALABILTY TEST ....................................................................................................................................... 51	  
3.1.3.2 INVESTIGATION OF THE LABEL PROCESSOR INCLUDING THE PAYLOAD .......................................... 53	  
3.1.3.2.1 Experiment 1 without carving .......................................................................................................................... 54	  
3.1.3.2.2 Experiment 2 with carving ............................................................................................................................... 55	  
3.1.3.2.3 Experiment 3 1xN optical packet switch .......................................................................................................... 56	  
3.1.4 Summary of the filter method ...................................................................................................... 58	  
3.2	   THE MIXER METHOD ...................................................................................................................... 59	  
3.2.1.1 MATHEMATICAL MODEL .................................................................................................................................. 59	  
3.2.1.2 FPGA MODEL ........................................................................................................................................................ 60	  
3.2.1 Matlab simulation Mixer Method ............................................................................................... 65	  
3.2.2 Rising and falling time study ....................................................................................................... 67	  
3.2.2.1 Selected parameters of the LPF ........................................................................................................................... 68	  
3.2.3 Experimental set-up mixer method ............................................................................................. 69	  
3.2.3.1 PERFORMANCE OF THE LABEL PROCESSOR ............................................................................................... 69	  
3.2.3.1.1 OSNR TOLERANCE ....................................................................................................................................... 69	  
3.2.3.1.2 CROSSTALK TOLERANCE .......................................................................................................................... 69	  
3.2.3.1.3 DYNAMIC RANGE ........................................................................................................................................ 70	  
3.2.3.1.4 SCALABILITY ................................................................................................................................................ 73	  
3.2.4 Summary of the mixer method ..................................................................................................... 74	  
4	   COMPARISON FILTER VS. MIXER METHODS ........................................................................ 75	  
6  
 
5	   VIRTEX 4 VS. VIRTEX 6 .................................................................................................................. 76	  
6	   RECOMMENDATION ...................................................................................................................... 78	  
7	   CONCLUSION .................................................................................................................................... 79	  
8	   REFERENCES .................................................................................................................................... 81	  
9	   APPENDIX .......................................................................................................................................... 82	  
ACRONYMS .............................................................................................................................................. 93	  
ACKNOWLEDGEMENTS ...................................................................................................................... 94	  
 
  
 7 
 
GRAPHICS CONTENTS 
Chart 1 A) Rise Time B) #Channels Experiment 1 Filter Method ................................................ 39	  
Chart 2 A) Rise Time B) #Channels Experiment 2 Filter Method ................................................ 39	  
Chart 3 A) Rise Time B) #Channels Experiment 3 Filter Method ................................................ 40	  
Chart 4 A) Rise Time B) #Channels Experiment 4 Filter Method ................................................ 41	  
Chart 5 Transfer Function Modulator .......................................................................................... 43	  
Chart 6 Transfer Function O/E Reciever 1.25Gbit/s ................................................................... 44	  
Chart 7 Minimum Voltage For Detection Filter Method ............................................................. 47	  
Chart 8 Minimum SNR For Detection .......................................................................................... 48	  
Chart 9 Eye Dynamics Filter Method ........................................................................................... 49	  
Chart 10 Dynamic Range Filter Method ...................................................................................... 50	  
Chart 11 Optical Switch Transfer Function ................................................................................. 56	  
Chart 12 A) Rise Time B) #Channels Experiment 1 Mixer Method ............................................. 67	  
Chart 13 A) Rise Time B) #Channels Experiment 2 Mixer Method ............................................. 68	  
Chart 14 Minimum Voltage For Detection Mixer Method ........................................................... 69	  
Chart 15 Minimum Snr For Detection Mixer Method .................................................................. 70	  
Chart 16 Eye Dynamics Mixer Method ........................................................................................ 71	  
Chart 17 Dynamic Range Mixer Method ..................................................................................... 72	  
  
figure 1 Data Center Architecture (Google Picture) __________________________________ 14	  
Figure 2 NxN Switch __________________________________________________________ 15	  
Figure 3 Fixed-Point Notation Qm.F _____________________________________________ 19	  
Figure 4 Diagram Of The Semi-Parallel Fir Filter __________________________________ 21	  
Figure 5 Threshold Decision For Type Of Fir Filter Implementation Virtex Family _________ 22	  
Figure 6 Photinc Switch _______________________________________________________ 23	  
Figure 7 Optical Packet Generation With In-Band Labels _____________________________ 23	  
Figure 8 Wavelength Extraction Process __________________________________________ 24	  
Figure 9 Fpga Label Processor __________________________________________________ 24	  
Figure 10 Filter Method Mathematical Model ______________________________________ 25	  
Figure 11 Simulink Model Filter Method __________________________________________ 26	  
Figure 12 Nyquist Sampling Theorem - Frequency Domain ___________________________ 27	  
Figure 13 Nyquist Sampling Theorem – Time Domain ________________________________ 27	  
Figure 14 Array Band Pass Filter Implemented With Fir Compiler Tool _________________ 28	  
Figure 15  A) Schematic Circuit Implementation Of The Decoder B) Decoder Truth Table ___ 33	  
Figure 16 Error Correction Simulink Model _______________________________________ 35	  
Figure 17 Digital I/O Diagram __________________________________________________ 35	  
Figure 18 Label Generation Fpga ________________________________________________ 36	  
Figure 19 Label Generation + Label Processor Filter Method _________________________ 36	  
Figure 20 Simulation Stage After The Envelope Detection  (X Axis Time, Y Axis Voltage) ____ 37	  
Figure 21 Seven-Diferent Outputs From The Decoder(X Axis Time, Yaxis Voltage) _________ 37	  
8  
 
Figure 22 The Simulation Estimate 520ns Of Delay (X Axis Time, Y Axis Voltage) __________ 37	  
Figure 23 Simulik Model For Rising And Falling Time Study __________________________ 38	  
Figure 24 Amplitude Tone Behavior ______________________________________________ 44	  
Figure 25 Experimental Set-Up With Payload ______________________________________ 53	  
Figure 26 Spectrum Of The 160gbit/S Packets With Carving  (Y-Axis dB And X-Axis nm) ____ 55	  
Figure 27 Decoder Switching Seven Different Channels ______________________________ 57	  
Figure 28 Mixer Model ________________________________________________________ 59	  
Figure 29 Mixer Method Simulink Model __________________________________________ 60	  
Figure 30 Simulink Model Experiment Phase _______________________________________ 61	  
Figure 31 Ram Implementation – A) 15Mhz B) 30 Mhz Sinusoidal Signals ________________ 62	  
Figure 32 Label Generator Mixer Method _________________________________________ 65	  
Figure 33 Label Generation + Label Processor Mixer Method _________________________ 65	  
Figure 34  Envelope Detection Mathlab Simulation (X Axis Time, Y Axis Voltage) __________ 66	  
Figure 35 Decoder Mathlab Simulatin Mixer Method (X Axis Time, Y Axis Voltage) ________ 66	  
Figure 36 System Delay On Mathlab Simulation Mixer Method (X Axis Time, Y Axis Voltage) 66	  
Figure 37 Simulink Model For Rise Time Study Mixer Method _________________________ 67	  
Figure 38 A)Hilbert Transf. For Envelope DetectionB)Subsytem Nomalizes The Signal ______ 78	  
 
Picture 1 XtremeDSP Development Kit-Iv Board ......................................................................... 17	  
Picture 2 Effect Of The Macc Fir On The Sampling Frequency .................................................. 20	  
Picture 3 Filter Implementation A) Matlab Band-Pass Filter Shape B) Fpga Implementation ... 29	  
Picture 4 A) Frequency Domain After The Band Pass Filter B) Time Domain After The Bpf ..... 31	  
Picture 5 A) Frequency Domain After The Low Pass Filter B) Time Domain After The Lpf ....... 32	  
Picture 6 Error Bit On The Purple Signal .................................................................................... 34	  
Picture 7 Error Correction Implemented ...................................................................................... 35	  
Picture 8 A) Schematic Experimental Set-Up B) Experimental Set-Up ........................................ 42	  
Picture 9 Awg - 15 Tones With A 500ns Pulse At 2vpp ................................................................ 43	  
Picture 10 Equal Power Distribution A) 3 Tones B) 30 Tones ..................................................... 45	  
Picture 11 Osnr Measurement A) 49dB, B) 30dB, C) 20dB ......................................................... 46	  
Picture 12 Fpga Input Signal Filter Method ................................................................................ 49	  
Picture 13 Signal Distorision Non-Lineal Modulator .................................................................. 50	  
Picture 14 A) Spectrum 50 Tones B) Detection 50 Tones ............................................................. 51	  
Picture 15 A) Spectrum 60 Tones B) Error Detection 60 Tones .................................................. 52	  
Picture 16 Spectrum Of The 160gbit/S Packets ............................................................................ 53	  
Picture 17 Spectrum 160gbit/S Without Carving .......................................................................... 54	  
Picture 18 A) Output 1 Optical Switch B) Output 2 Optical Switch ............................................. 57	  
Picture 19 A) Correct Output B) Phase Problem When Overflow ............................................... 61	  
Picture 20 Spectrum After Mix With 30mhz Local Oscilator ....................................................... 63	  
Picture 21 A) Effect Of Insufficient Low Pass Filter B) Proper LPF ........................................... 64	  
Picture 22 Fpga Input Signal Mixer Method ................................................................................ 71	  
Picture 23 Virtex 6 Device ............................................................................................................ 76	  
 9 
 
 
Table 1 Technical Specifaction Of The XtremeDSP Development Kit-Iv .................................... 18	  
Table 2 Fixed-Point Implementation ............................................................................................ 19	  
Table 3 Device Utilization Summary - Dsp48 85% ...................................................................... 29	  
Table 4 Optimization Filter Method Measurement ...................................................................... 51	  
Table 5 Comparison 3 Tones With Paylod Vs Without Payload (No Carving) ........................... 54	  
Table 6 Comparison 8 Tones With Paylod Vs Without Payload (No Carving) ........................... 55	  
Table 7 Measurement Using Payload (Carving) .......................................................................... 56	  
Table 8 Filter Method Summary ................................................................................................... 58	  
Table 9 Outputs Frequencies After The Mixer ............................................................................. 63	  
Table 10 Optimization Of Mixer Method Measurement ............................................................... 72	  
Table 11 Mixer Method Summary ................................................................................................ 74	  
Table 12 Comparison Filter Method Vs Mixer Method ............................................................... 75	  
Table 13 Comparison Virtex Family (5), (6) ................................................................................ 76	  
Table 14 Estimation Table ............................................................................................................ 77	  
 
 
 
 
  
10  
 
 
  
 11 
 
 
  
12  
 
ABSTRACT 
 
The Internet demand is leading us into the creation of faster and more efficient networks. At the 
present time the servers that hold the mayor part of the cloud are grouped in thousands inside 
clusters. A cluster is formed by server racks and switches that connect all the racks together. A 
rack usually has 1 switch and several servers. The topology to interconnect this type of network 
is organized in a fat-tree topology, where each layer has a different hierarchy. All the servers in a 
rack are connected together using the rack switch, in addition the switch provides a reduce 
number of uplink ports, that connect it to a higher hierarchy layer. This is the main tree 
topology's problem, because a bandwidth bottleneck is created. Moreover, different connections 
in the fat-tree topology experience different latency.  
 
The objective of the thesis is to investigate and realize a fast and scalable switch control sub-
system as part of the implement of a thousand by thousand optical switch that flats the today’s 
hierarchical network topology. As a result, the cluster will have less latency thru its internal 
communications. In order to control the I/O ports of the optical switch interconnecting 1000’s of 
servers, the switch control requires a fast label processor capable to process them in a time scale 
of few nanoseconds. Therefore, a new labeling technique that allows parallel on the fly operation 
is introduced. This technique avoids clock recovery and serial to parallel circuits within the label 
processor that cost large delay. The new labeling technique consists of using one optical 
wavelength that carries N subcarriers in band with the payload's spectrum. Each subcarrier is 
formed by a different RF tone and by combining them there 2N addresses can be obtained, 
providing a highly scalable method. The great advantage is that the labels are processed in 
parallel mode. 
  
This thesis implements two label processors by using an FPGA. Two different methods were 
used to extract the binary information carried by each RF tone, process it and decode the packets 
addresses. The first method is based on filtering the tones by band-pass filters centered at the 
central frequency of each tone. The other method is based on demodulating the RF tones using 
an array of RF mixers. Both methods were tested in the lab using a 160Gbit/s payload, being 
able to detect up to 50 tones with a total delay of 470ns and 370ns respectively. These results 
were obtained using an FPGA, already discontinued out of the market. However, the obtained 
results allow us to estimate the potential of those methods. The advantage is that the algorithm is 
completely modular and easy to scale up. In case of a state of the art FPGA will be employed, up 
to 60 tones with a total delay time of 30 ns can be processed. This means an approximately of 
10x faster processing time. Therefore a higher number of tones can be reached, that means a 
higher number of servers, and less processing time, therefore reducing the cluster's latency. 
 
 
 
 
 
 13 
 
 
  
14  
 
1              INTRODUCTION 
 
Cloud services are driving the creation of data centers that hold tens to hundreds of thousands of 
servers to be able to manage the amount of data produced every day. The data center is 
organized with several racks as is illustrated in Figure 1. The servers inside each rack are 
connected between them using a rack switch. This device is responsible for each of the ports that 
connect the servers between them and to provide uplink ports that connect this rack with others. 
All the racks are connected to a switch with a higher hierarchy level called the cluster switch. All 
of these switches together form a cluster. A data center has several clusters interconnected with a 
higher hierarchy switch. The topology created by this format is called a tree topology. The 
problem of this architecture is the bottleneck created inside each layer. The rack switch manages 
the connection inside the rack, supplying M ports for the M servers that compose it, and also 
provides a reduced number of uplink ports to connect the rack to the cluster switch; this is the 
point where the bottleneck is generated. In consequence there is more latency to connect servers 
from different racks.  
 
FIGURE 1 DATA CENTER ARCHITECTURE (GOOGLE PICTURE) 
By reducing the number of layers in the tree topology, the latency can be reduced. To implement 
this, an optical switch with a larger number of ports is needed. The idea is to connect the N 
number of clusters in a datacenter using an NxN switch, providing the capability that any server 
connects to any other server inside the datacenter with a low latency transmission a shown in 
Figure 2 NXN switch 
 15 
 
  
FIGURE 2 NXN SWITCH 
The distance between the clusters ranges between few tens of meters to hundreds of meters. 
Therefore, optical technology is employed to transport data packets at high data rate. The optical 
packets consist of two main parts. The payload that carries the real data and the optical labels, 
which provides the packet’s destination. Typically the optical switches extract the labels from 
the payload. The optical labels are electro-optical converted, while the payload is held in the 
optical domain. This is due to the fact that the intelligence needed to process the labels, at this 
moment, cannot be implemented with optical components. 
Generally, the routing information is sent in serial mode. Thus, the label processor must 
implement a serial to parallel circuit, which includes a clock recovery circuit; this process takes a 
certain number of clocks cycles. In consequence, the first stage of the label processor adds delay 
without processing any information. 
Consequently, if all the servers are connected to one switch, the length in bits of the addresses 
will increase; if the labels are sent in serial mode, even more delay will be needed to process 
them. For that reason a different technique must be used to encode the addresses and to avoid the 
serial to parallel conversion. 
The labeling technique used in this work consists of adding one wavelength inside the payload’s 
spectrum. This wavelength carries N subcarriers; each of these subcarriers represents a binary 
code giving as a result 2N addresses. The N subcarriers are sent in parallel. The subcarriers can 
be then processed in parallel by the FPGA without the need of a clock recovery and serial to 
parallel converter. 
The scalability of the project in number of addresses will be limited by the intrinsic boundaries 
of the electrical system. The FPGA has to accomplish a minimum SNR for the correct detection 
16  
 
of the N tones. Also, the FPGA is connected to an optical system, for this reason, it will be 
limited by the OSNR. Because of this, the FPGA was tested in different scenarios to characterize 
the behavior in different SNR and OSNR to test the dynamic range of the device.  
1.1 THESIS OBJECTIVE 
The objective of the thesis is to extract each label, process it independently and decode the 
address carried in it. Since the signal arrives to the label processor in a parallel mode, the 
philosophy of the project is parallelizing to the maximum the electrical process. For that reason a 
FPGA (Field-programmable Gate Array) is used. The FPGA provides a versatile tool to 
implement high-speed parallel algorithms. 
The FPGA offers in a small area the performance needed to decode the addresses, it also has the 
advantage that the signal will be digitalized and thus more routing logic can be implemented, 
such as packet contention resolution. 
Due to fact that the signal arrives as an array of RF tones, two methods are implemented to 
process the information. The first one consists of positioning a band pass filter on each label. The 
other one is based on an array of mixers. Both techniques are designed in a modular way, 
simplifying the work for further scalability. In addition, the two methods are experimentally 
evaluated by using an optical set-up including 160 Gbit/s payloads. 
 
1.2 THESIS STRUCTURE 
The thesis is organized in five chapters. The first chapter introduces a tutorial of the installation 
and operation of the FPGA XtremeDSP kit for Virtex-4. Moreover, the technical specifications 
of the FPGA employed in the thesis are provided. The theoretical key points of the 
implementation on the FPGA board are also explained in this chapter. The second chapter deals 
with the two label processing methods developed and tested in this thesis: the Filter method and 
the Mixer method. In this section, the simulations, the implementation, and the experimental 
verification of both methods are provided. The third chapter compares the two methods 
previously validated. Finally, the fourth chapter summarizes the thesis listing the main 
conclusions, and the fifth chapter provides future outlooks including the potential scalability of 
the two methods if a state of the art FPGA board is employed.  
 
 
 
  
 17 
 
2            FPGA 
 
Field-programmable Gate Array (FPGA) it is an integrated circuit that has the capability to be 
reprogrammable. The device can implement any logical function using components called “logic 
blocks”. By wiring these blocks together, the algorithm generated by the designer is created.  
The FPGA configuration is generally specified using a Hardware Description Language 
(HDL) that allows the programmer, with a high level language, to create and optimize on run 
time the algorithm, giving an advantage of flexibility compared to Application-Specific 
Integrated Circuit (ASIC). The disadvantage of the FPGA is that they work with an array of 
logic blocks that are less optimal than those in an ASIC circuit. (Xilinx, Support: Virtex-4 FPGA 
User Guide, 2008) 
Once the design has been simulated using a tech bench, that allows the prevention of possible 
errors, the algorithm is configured on the FPGA by using a bitstream. Each FPGA has a different 
internal structure. For that reason each device has its own machine language. When a bitstream 
is generated, the model and brand of the FPGA must be specified, so the clock, the I/O pins, the 
logic blocks connections are properly implemented. For that reason all brands use their own 
software, to assure the compatibility between the bitstream and the FPGA. 
The FPGA involved in this project is a Virtex-4 User FPGA included in the Xilinx XtremeDSP 
Development Kit-IV, from Xilinx and Nallatech. Applications for the board can be developed 
straight from the MATLAB environment, using Xilinx library for Simulink, a MATLAB toolbox 
called System Generator. Nallatech System Software is also provided with the FUSE Probe Tool 
to accelerate and simplify the configuration of the User FPGA. See Appendix 1 for 
implementation tutorial. 
2.1 XTREMEDSP DEVELOPMENT KIT-IV 
The XtremeDSP Development Kit-IV works perfectly as a platform for the Xilinx Virtex 4 
FPGA and provides all the tools for optimal Digital Signal Processing (DSP) purposes. See 
Picture 1. 
 
PICTURE 1 XTREMEDSP DEVELOPMENT KIT-IV BOARD 
! """#$%&&%'()*#)+, -./012031345667(4/48%9)*4:;4300<
!"#$%$&'()&$*$+,-%$.")/0"123)4*$#*0$5
676)))) !"#$%$&'()&$*$+,-%$.")/0"123)/$8)9$:";#$<
.*(4='9(,(>?@4>(A(&+B,($'4CD'25E46(9A(64%64%$4DF(%&4F(A(&+B,($'4B&%'G+9,4G+94'*(4ED9'(H2!4I@JK4'()*$+&+LM4%$F
B9+ADF(64%$4($'9M4D$'+4'*(46)%&%N&(4>58O25546M6'(,64%A%D&%N&(4G9+,4-%&&%'()*#45'64F7%&4)*%$$(&4*DL*4B(9G+9,%$)(4K>P6
%$F4 >KP6;4 %64 "(&&4 %64 '*(4 76(94 B9+L9%,,%N&(4 ED9'(H2!4 F(AD)(4 %9(4 DF(%&4 '+4 D,B&(,($'4 *DL*4 B(9G+9,%$)(4 6DL$%&
B9+)(66D$L4%BB&D)%'D+$6467)*4%64?+G'"%9(4>(GD$(F4Q%FD+;4RJ4SD9(&(66;4-('"+9TD$L;4U>.E4+94EDF(+45,%LD$L#
.*(4T(M4G(%'79(64+G4'*(4CD'4D$)&7F(V
=:#>5:#$
W ='9(,(>?@4F(A(&+B,($'4N+%9F4 )+$6D6'D$L4+G4 %4,+'*(9N+%9F4 B+B7&%'(F4"D'*4 %4,+F7&(4 XF%7L*'(9
)%9FY4 D$4 %4 N&7(4 6'%$F2%&+$(4 N+%9F4 )%6(#4 .*(4 ,+'*(9N+%9F4 D64 9(G(99(F4 '+4 %64 '*(4 Z[($\-O2CD'
8+'*(9N+%9FZ4%$F4'*(4,+F7&(4D649(G(99(F4'+4%64'*(4Z[($K>>K4>58O2554,+F7&(Z#
W [($\-O2CD'48+'*(9N+%9F
W ?7BB+9'64'*(467BB&D(F4[($K>>K4>58O2554,+F7&(4+$&M
W ?B%9'%$2554I@JK4G+94R#RE]<E4@P54+94^?[4D$'(9G%)(
W U+6'4D$'(9G%)D$L4AD%4R#RE]<E4@P54R32ND']RR28U_4+94^?[4A/#/4D$'(9G%)(6
W ?'%'764`O>6
W a.KJ4)+$GDL79%'D+$4*(%F(96
W ^6(940#/Z4BD')*4BD$4*(%F(964)+$$()'(F4FD9()'&M4'+476(94B9+L9%,,%N&(4I@JK45]\
W [($K>>K4>58O2554,+F7&(
W ED9'(H2!4^6(94I@JKV4=P!E?=R<2/0IIbbc
W 34D$F(B($F($'4K>P4)*%$$(&6V4K>bb!<4K>P4X/!2ND'647B4'+4/0<48?@?Y
W 34D$F(B($F($'4>KP4)*%$$(&6V4K>:1134>KP4X/!2ND'647B4'+4/b048?@?Y
W ?7BB+9'4G+94(H'(9$%&4)&+)T;4+$4N+%9F4+6)D&&%'+94%$F4B9+L9%,,%N&(4)&+)T6
W ."+4N%$T64+G4d[.2?QK84X/RR8U_;4</3CHR32ND'64B(94N%$TY
90?;#$)6@)!"#$%$&'()&$*$+,-%$.")/0"123
18  
 
The XtremeDSP Development Kit-IV works with two integrated FPGA, the Virtex II and the 
Virtex 4. The clock, the I/O connections and the interface with the host PC (Spartan II) are 
manage by the Virtex II, all of these processes are transparent to the user. On the other hand, the 
Virtex 4 is the user FPGA that will implement the developed algorithm (Xilinx, Support: 
XtremeDSP Development Kit-IV User Guide, 2005). 
TECHNICAL SPECIFICATION 
VIRTEX 
II 
• One USB interface managed by a Spartan II, which enables the 
communication mechanism between the User FPGA and the host PC: by 
means of the USB connection, the bit-stream file is laid out in the FPGA 
• One Adjacent Bus Digital I/O Header, a 28-bit general-purpose bus 
programmable to be connected in turn to input/output on the User FPGA 
• Internal clock 105Mhz crystal 
VIRTEX 
4 
• One external clock input via MCX connector, which allows the on board 
use of a suitable external signal as clock signal 
• Two DACs outputs via MCX connectors; each DAC expects offset- binary 
input with 14-bit resolution fixed point with 13 fraction bits and returns 
analog output between -2.15 Volt and 1.56 Volt 
• Two ADCs inputs via MCX connectors; input to ADCs is recommended to 
be between -1.15 Volt and 1.25 Volt to have full-scale output; each ADC 
returns two-complement output with 14-bit resolution fixed point with 13 
fraction bits 
• 96x40 Logic Block array, 34.560 Logic cell 
• 240 Kb of Distributed RAM 
• 192 XtremeDSP Slice (DSP48) 
• 3.456 Kb Block RAM 
TABLE 1 TECHNICAL SPECIFACTION OF THE XTREMEDSP DEVELOPMENT KIT-IV 
2.2 FIXED POINT 
The digital signals inside the Virtex 4 are expressed in Fixed Point. The term "Fixed-Point" 
refers to the position of the binary point. The binary point is analogous to the decimal point of a 
base-ten number, but since this is binary rather than decimal, a different term is used. In binary, 
bits can be either 0 or 1, and therefore we don't have a separate symbol to designate where the 
binary point lies. However, we imagine, or assume, that the binary point resides at a fixed 
location between designated bits in the number.  
For instance, in a 32-bit number, we can assume that the binary point exists directly between bits 
15 (15 because the first bit is numbered 0, not 1) and 16, giving 16 bits of whole numbers, and 
16 bits for the fractional part. Note that the most significant bit in the whole number field is 
nearly always designated as the sign bit.  
 
The Fixed-Point notation is Qm.f, m represents the integers and f fractional, and the first bit is 
used of the signed, for example see Figure 3. 
 19 
 
Q2.3 
 
1 0 . 1 0 1 
FIGURE 3 FIXED-POINT NOTATION QM.F 
Not all numbers can be represented exactly by a fixed-point number, and so the closest 
approximation is used.  
The Virtex 4 assigns the number using this method. See Table 2: 
Decimal 
number 
Fixed 
point to 
used 
Equation Fractional Round Integer Result 
0.5 Q14.13 0.5*2^13 4096 4096 0 0.5 
1/3 Q14.13 1/3*2^13 2730.666667 2731 0 0.3333740234375 
1.5 Q14.13 1.5*2^13 12288 12288 0 0.9998779296875 
1.5 Q15.13 1.5*2^13 12288 12288 1 1.5 
TABLE 2 FIXED-POINT IMPLEMENTATION 
Because the position of the binary point is entirely conceptual, the logic for adding and 
subtracting fixed-point numbers is identical to the logic required for adding and subtracting 
integers. Thus, when adding one half plus one half in Q3.4 format, we would expect to see:  
0000.1000 + 0000.1000 = 0001.0000 
Which is equal to one, as we would expect. 
This applies equally to subtraction. In other words, when we add or subtract fixed-point 
numbers, the binary point in the sum (or difference) will be located in exactly the same place as 
in the two numbers upon which we are operating. 
When multiplying two 8-bit fixed-point numbers we will need 16 bits to hold the product. 
Clearly, since there are a different number of bits in the result as compared to the inputs, the 
binary point should be expected to move. However, it works exactly the same way in binary as it 
does in decimal. (Xilinx, Support: System Generator for DSP, 2008) 
 
2.3 DSP48 
The DSP48 slice has a 48-bit output and is primarily intended for use in digital-signal processing 
applications. 
20  
 
A basic DSP48 slice consists of a multiplier followed by an adder. The multiplier accepts two-
signed 18-bit sample, producing a 36-bit signed result. The result is sign extended to a signed 48 
bits.  
 
2.4 FIR FILTERS 
Using the MATLAB Toolbox called System Generator; the Virtex 4 has the capability to 
implement a FIR filter using the coefficients designed by the MATLAB FDATool, reducing the 
design time. 
The Virtex family can implement three types of filters: the MACC Fir filter, the Parallel Fir 
Filter or the Semi-Parallel Fir Filter. 
• The MACC Fir filter is implemented by a number of TAP corresponding to the order of 
the filter. The advantage of the MACC filter is that it uses regular multipliers instead of 
the DSP48, which is a limited resource. The disadvantage is that the performance of the 
ADC is affected by the MACC FIR filter, that is calculated by the following equation 
Maximum Input Sample Rate = Clock Speed / Number of Taps 
 
The operational bandwidth is reduced while the order of the filter is increased; this effect is 
shown in the Picture 2. The filter is centered at 10 MHz while the input signal is sinusoidal of 10 
MHz. The upper signal is the correct filtered output, and it shows that it’s being blocked. Instead 
the lower signal that is not passed thru the filter is converted to 1 MHz frequency. As a result the 
sample rate is reduced. 
 
PICTURE 2 EFFECT OF THE MACC FIR ON THE SAMPLING FREQUENCY 
 21 
 
In the Parallel Fir filter, all the multiplications and arithmetic required occurs simultaneously. 
However, the memory bandwidth increases dramatically because all “N” coefficients must be 
processed at the same time. The performance of the Parallel FIR filter is calculated: 
Maximum Input Sample Rate = Clock Speed 
 
The Semi-Parallel Fir filter works similar to the fully parallel one, but a mix of registers and 
DSP48 is implemented, the coefficients are stored in the registers instead of in the DSP48. 
The idea of using the register it is to allow a slower FPGA to implement high performance 
filters. 
Maximum Input Sample Rate = (Clock speed/Number of Coefficients) x 
         Number of Multipliers 
 
For each multiplier a register is used. The register contains several coefficients stored inside a 
vector. The structure synchronizes the register with the multiplier. Therefore, the DSP48 reads a 
coefficient, multiplies it by the sample, moves the register address one position, and multiplies 
the next coefficient and so on. 
The advantage is that the number of multipliers is less than the number of coefficients. Hence, 
the needed resources are decreased. Also, the input rate is decreased to making the 
synchronization between the register and the DSP48 possible. See Figure 4. 
 
 
FIGURE 4 DIAGRAM OF THE SEMI-PARALLEL FIR FILTER 
92 www.xilinx.com XtremeDSP for Virtex-4 FPGAs
UG073 (v2.7) May 15, 2008
Chapter 5: Semi-Parallel FIR Filters R
Figure 5-2 illustrates the main structure for the four-multiplier, semi-parallel FIR filter.
The DSP48 slice arithmetic units are designed to be chained together easily and efficiently 
due to dedicated routing between slices. Figure 5-2 shows how the four DSP48 slice 
multiply-add elements are cascaded together to form the main part of the filter structure. 
Figure 5-3 provides a detailed view of the main multiply-add elements. The two pipeline 
registers are used on the B input to compensate for the register on the output of the 
coefficient memory.
An extra DSP48 slice is required on the end to perform the accumulation of the partial 
results, thus creating the final result. A new result is created every four cycles. Every four 
cycles, the accumulation must be reset to the first partial value of the next result. As in the 
MACC FIR Filter, this reset (or load) is achieved by changing the OPMODE value of the 
DSP48 slice for a single cycle. OPMODE is changed from binary 0010010 to binary 
X-Ref Target - Figure 5-2
Figure 5-2: Four-Multiplier, Semi-Parallel FIR Filter in Accumulation Mode
UG073_c5_02_083105
DSP48 Slice
OPMODE  = 0110101
18
Coefficients
4 x 18
Coefficients
4 x 18
Coefficients
4 x 18
Coefficients
4 x 18
0
B
DSP48 Slice
OPMODE  = 0010101
18
B
DSP48 Slice
OPMODE  = 0010101
18
B
DSP48 Slice
OPMODE  = 0010101
18
B
DSP48 Slice
OPMODE  = 0010010
SRL16E SRL16E SRL16E SRL16E
18
WE WE1 WE2 WE3
40
WE4P
PCIN
X-Ref Target - Figure 5-3
Figure 5-3: Detailed Diagram of a Single Multiply-Add Element
UG073_c5_03_083105
DSP48 Slice
OPMODE  = 0010101
A
B
PCIN PCOUT
94 www.xilinx.com XtremeDSP for Virtex-4 FPGAs
UG073 (v2.7) May 15, 2008
Chapter 5: Semi-Parallel FIR Filters R
Coefficient Memory
The coefficients are divided up into four groups of four. This arrangement is determined 
by dividing the total number of coefficients by the number of multipliers used in the 
implementation. In this example, if the total number of coefficients is 16, and the number of 
multipliers is four, four coefficients per memory are needed. 
Filters with a total number of coefficients that are integer-divisible by the required number 
of multipliers are very desirable. System designers should take this into account when 
designing their filters to get the optimal filter specification for the implementation used. 
Otherwise, the coefficients have to be padded with zeros to achieve a number of 
coefficients that are integer-divisible by the number of multipliers. 
The coefficients are simply split into groups according to their order. The first four in the 
first memory, the second four in th  second memory, and so on (see Figur 5-6).
X-Ref Target - Figure 5-6
The adder chain architecture of the DSP48 slice means that each Multiply-Add cascade 
multiplication must be delayed by a single cycle so that the results synchronize 
appropriately when added together. This delay is achieved by addressing of the memories 
and is explained in “Control Logic and Address Sequencing.”
Distributed RAM (refer to Chapter 1, “XtremeDSP Design Considerations,” for detailed 
information) is used for the coefficient memories. Distributed RAM is smaller and 
abundant and allows efficient use of the larger block RAMs, especially given their scarcity. 
The larger block RAM is used when the number of coefficients per memory starts to 
increase to the point where the cost in slice resources becomes significant (for example, 
greater than 64). 
The total cost of the current example is 36 slices. The coefficient width is 18 bits, and 
distributed RAMs cost n/2 slices (that is, nine slices per memory and four memories). For 
larger distributed RAMs (larger than 16 elements), the size begins to increase as Write 
Enable (WE) control logic and an output multiplexer is needed. The distributed memory 
v7.0 in the CORE Generator™ system can be easily used to create the little distributed 
RAMs and get accurate size estimates.
Control Logic and Address Sequencing
The Control Logic and Address Sequencing is the most important and complicated aspect 
of semi-parallel FIR filters and is crucial to the operation of the filter. The control logic is 
discussed in two separate sections:
? Memory Addressing
? Clock Enable Sequencing
Figure 5-6: Coefficient Memory Arrangement
UG073_c5_06_060804
Coefficients Driving
First DSP48 Slice
Coefficients Driving
Second DSP48 Slice
Coefficients Driving
Third DSP48 Slice
Coefficients Driving
Fourth DSP48 Slice
h0
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
h11
h12
h13
h14
h15
22  
 
The Semi Parallel Filter, can emulate a fully parallel filter by using the same number of registers 
as the number of coefficients. 
 
FIGURE 5 THRESHOLD DECISION FOR TYPE OF FIR FILTER IMPLEMENTATION VIRTEX FAMILY  
 
The Virtex 4 family has an ADC of 100 MHz sampling frequency, which means that for a filter 
from order 10 or above, the implementation must be a Semi-Parallel FIR filter. See Figure 5. 
The three types of implementations use the general equation of a FIR filter.  
 
This consists in the multiplication of all the samples Xn-i by all the coefficients hi. As a result Yn 
represents the output signal after the filter.  As explained above, the coefficients are expressed in 
Fixed Point; this means that each coefficient hi, after the implementation on the FPGA, will have 
an approximate value compared to the one that it was designed. Therefore, the main shape of the 
filter will stand but will not be exact. (Xilinx, Support: XtremeDSP for Virtex-4 FPGAs, 2008) 
  
90 www.xilinx.com XtremeDSP for Virtex-4 FPGAs
UG073 (v2.7) May 15, 2008
Chapter 5: Semi-Parallel FIR Filters R
? Number of Coefficients (N)
As illustrated in Figure 5-1, as the sample rate increases and the number of coefficients 
increase, the architecture selected for a desired FIR filter becomes a more parallel structure 
involving more multiply-add elements. Chapter 3, “MACC FIR Filters” addresses the 
details of sequential processing FIR filters including the single and dual MACC FIR Filter. 
Chapter 4, “Parallel FIR Filters” investigates the polar extreme of the fully-parallel FIR 
filter required for the highest sample rate filters. This chapter examines the common 
scenario requiring multiple processing elements working over numerous clock cycles to 
achieve the result. These techniques are often referred to as semi-parallel and are used to 
maximize efficiency of the filter (see Figure 5-1). 
The semi-parallel FIR structure implements the general FIR filter equation of a summation 
of products defined as shown in Equation 5-1.
Equation 5-1
X-Ref Target - Figure 5-1
Figure 5-1: Selecting Filter Architectures 
10
50
500
400
300
200
100
1
Semi-Parallel FIR Filters 
(Chapter 5)
Sequential FIR Filters (Chapter 3)
Parallel FIR Filters (Chapter 4)
10
20 50 500200
100 1000
UG073_c5_01_082404
Number of Coefficients (N)
Log Scale
Sa
m
pl
e 
R
at
e 
(M
Hz
)
Lo
g 
Sc
al
e
5
1
0.5
5
yn xn i– hi
i 0=
N 1–
?=
 23 
 
3      LABEL PROCESSOR METHODS 
 
This chapter explains the two-implemented methods in the FPGA to perform the label processor 
function. For both methods, the FPGA building blocks and the underlying theory are explained 
in details. Moreover, the experimental results employing both methods in laboratory trials are 
presented.  
Both methods are intended to be use for processing the optical labels transmitted within the input 
optical packets. The 1xN optical packet switch includes the label processor and the optical 
switch as shown in Figure 6. This chapter is focused on the realization of the label processor.  
 
FIGURE 6 PHOTINC SWITCH 
The label is generated as shown in Figure 7. The labels consists of N bits, which are represented 
by base band signals. Those base band signals are modulated with a RF tone generating N 
number of tones. Thus, the RF tones generate 2N possible combinations (labels). The combined 
RF signals are used to modulate a laser. The wavelength generated will be carrying the N 
subcarriers. Afterwards, the wavelength of the label is selected to be in-band with the payload.  
 
FIGURE 7 OPTICAL PACKET GENERATION WITH IN-BAND LABELS 
24  
 
The generated optical packets are fed into the 1xN optical packet switch. The labels are extracted 
from the optical payload using a Fiber Bragg Grating (FBG). The FBG reflects particular 
wavelengths and transmits all others. Figure 8 shows schematically the operation of the label 
extractor. By using the reflect signal, the FBG acts as a band-pass optical filter. The bandwidth 
of the FBG is 6 GHz and the reflected wavelength is centered at the same frequency of the label. 
Using a FBG, the label is extracted, and then a photo detector is used to convert the signal to the 
electrical domain. . 
 
FIGURE 8 WAVELENGTH EXTRACTION PROCESS 
The electronic signal contains all the RF tones. The label processor was implemented by using 
an FPGA.  The first stage of the FPGA algorithm is to isolate each RF signal and recover the 
base band signal (label bits). Two methods were used to realize the label processor. The first one 
is based on an array of band pass filters, each one centered at the corresponding RF frequency 
then each signal is down converted to base-band where a low pass filter is applied. The second 
one is based on an array of mixers that down-converts the labels to base band and then filter’s 
them. 
Once the based-band is recovered, a threshold decision is used to extract the binary information. 
The output of the threshold is fed into the decoder. The decoder consists of N inputs and 2N 
outputs. The output signals of the decoder the optical switch. See Figure 9. 
 
FIGURE 9 FPGA LABEL PROCESSOR 
  
 25 
 
3.1        THE FILTER METHOD 
The Filter Method consists of applying a band-pass filter at each subcarrier. Therefore, the labels 
are isolated and an envelope detector eliminates the RF signal. The modulation used is 2-PAM. 
3.1.1.1 MATHEMATICAL MODEL 
 
 
FIGURE 10 FILTER METHOD MATHEMATICAL MODEL  
 
1. Received RF tones signal !!"(!) = !!(!) ∙ cos !!! +⋯+ !!(!)   ∙ cos !!! +⋯ !!(!)    ∙ cos(!!!) !! !      with i= 1‧‧‧N is the base band label bits 
 
2. Band-Pass Filter !!!(!) = !!(!) ∙ !"#(!!!) 
 !!"(!) = !!(!) ∙ !"#(!!!) ⇒ !!"  (!) = !!(!)    ∙ !"#(!!!) 
3. Envelop detection !!"! (!) = !!(!)! ∙ cos(!!!)! ⇒ !!"! (!) = !!(!)!! ∙ (1+ cos 2!!! ) 
 
4. Low pass filter and gain 
!!" ! = ! ∙ !!(!)!2  
 
The mathematical model represents that each signal will be detected and separated in each 
branch for a further process.   
SRF(t) 
SF1(t) 
SFi(t) 
SFN(t
) 
S2F1(t) 
S2Fi(t) 
S2FN(t) 
SOi(t) 
26  
 
3.1.1.2 FPGA MODEL 
 
The FPGA implementation is shown in Figure 11. This implement all the blocks described in the 
mathematical model.  
 
1.- ADC 
2.- Band Pass Filter 
 
3.-Shift  
4.- Square 
5.- Low Pass filter 
6.-Shift 
7.- AGC 
8.- Threshold 
9.- Decoder 
10.- Error Correction 
11.- Digital I/O 
FIGURE 11 SIMULINK MODEL FILTER METHOD 
 27 
 
1. The ADC converter provides a 14 bits Fixed Point word with 13 fractional bits. 
The sampling frequency is at 100 MHz, allowing the work in a 50 MHz 
operational band due to the Nyquist theorem.  
 
Nyquist sampling theorem postulates, if a signal x is band limited to (−B, B), it is completely 
determined by its samples with sampling rate ωs=2B. To avoid aliasing B<50 MHz. See 
Figure 12 
 
FIGURE 12 NYQUIST SAMPLING THEOREM - FREQUENCY DOMAIN 
The aliasing in time domain can be appreciated in the Figure 13. Thus, the input signal 
passes the Nyquist limit; the output signal is reconstructed with a lower frequency. The 
equation that governs this effect can be modeled as follows. 
 
FIGURE 13 NYQUIST SAMPLING THEOREM – TIME DOMAIN 
If B>ωs/2 then the output frequency is fout= ωs-B. Applying the theorem to the Virtex 4 
device, any frequency above 50 MHz, for example 60 MHz, after ADC it would be  
100 MHz – 60 MHz = 40MHz. 
424 www.xilinx.com System Generator for DSP Reference Guide
UG638 (v 12.4) December 14, 2010
Chapter 3: Xilinx XtremeDSP Kit Blockset
XtremeDSP Analog to Digital Converter
The Xilinx XtremeDSP ADC block allows System Generator components 
to connect to the two analog input channels on the Nallatech BenAdda 
board when a model is prepared for hardware co-simulation. Separate 
ADC blocks, ADC1 and ADC2 are provided for analog input channels 
one and two, respectively. 
In Simulink, the ADC block is modeled using an input gateway that 
drives a register. The ADC block accepts a double signal as input and produces a signed 14-
bit Xilinx fixed-point signal as output. The output signal uses 13 fractional bits. 
In hardware, a component that is driven by the ADC block output will be driven by one of 
the two 14-bit AD6644 analog to digital converter devices on the BenAdda board. When a 
System Generator model that uses an ADC block is translated into hardware, the ADC 
block is translated into a top-level input port on the model HDL. The appropriate pin 
location constraints are added in the BenAdda constraints file, thereby ensuring the port is 
driven appropriately by the ADC component. 
A free running clock should be used when a hardware co-simulation model contains an 
ADC block. In addition, the programmable clock speed should not be set higher than 64 
MHz. 
Block Parameters
The block parameters dialog box can be invoked by double-clicking the icon in your 
Simulink model.
Parameters specific to the ADC block are: 
• Sample Period: specifies the sample period for the block.
Data Sheet
A data sheet for the AD6644 device is provided in the XtremeDSP development kit install 
directory. If FUSE denotes the directory containing the Nallatech FUSE software, the data 
sheet can be found in the following location: 
FUSE\XtremeDSP Development Kit\Docs\Datasheets\ADC ad6644.pdf 
28  
 
This is the reason why only RF signals with 45 MHz limited frequencies have been 
considered. 
 
2. The Band Pass Filter is crucial for the correct operation of the concept, because it 
determines a series of factors such as: the maximum number of RF channels within the 45 
MHz band, the delay contribution, the maximum baud rate and crosstalk tolerance.  
The filter is shaped with the Matlab FDATool, reducing the design time; once the coefficients 
are generated, they are loaded into the Xilinx toolbox, in this case the Fir Compiler see Figure 
14. The FIR compiler works as a black box for the rest of the system, for that reason one of the 
outputs signal called Ready signal is used to synchronize it with the rest of the system. The ready 
signal indicates that the sample has been multiplied by all the coefficients. Therefore, this signal 
is used to enable a register that storages the result to be used further on. 
 
FIGURE 14 ARRAY BAND PASS FILTER IMPLEMENTED WITH FIR COMPILER TOOL 
The Fir compiler provides the tools to implement the filter with a specific configuration; a semi 
parallel fir filter was used for this project. The reason is explained in the FPGA chapter in FIR 
Filter section.  
The operational band is 50MHz; each tone was positioned in the following frequencies: 15 MHz, 
25 MHz and 35 MHz. Although, there is bandwidth for more tones, the implementation of the 
filters consumes a specific resource, in this case, the DSP48. The implementation of another 
channel, involves the use of another filter. However, the FPGA does not have the resources to 
implement it; see Table 3. The resources with tree channels are already at 85% of utilization, 
another channel will reach to a 114%. 
 29 
 
 
TABLE 3 DEVICE UTILIZATION SUMMARY - DSP48 85% 
The filters were designed using the following configuration: Hamming window as shape, 2MHz 
bandwidth measured at -6dB, because the pulses have 500ns duration. The order of the filter is 
30. The space between filters is 10 MHz. Picture 3: A) The Matlab band-pass filter shape used; 
B) the FPGA implementation of the filters array. 
 
 
A) B) 
 
PICTURE 3 FILTER IMPLEMENTATION A) MATLAB BAND-PASS FILTER SHAPE B) FPGA IMPLEMENTATION 
30  
 
The filters are working in semi parallel mode. This means that the filter needs one clock for each 
coefficient to be processed. The clock has a period of 10 ns, thus the filters of order 30 generates 
a delay of 300ns.  
The maximum transmission speed is determined by the duration of the envelope pulse and the 
guard time, the smaller these values the faster the speed will be, this will depend on the filter's 
implementation. Because a narrower pulse has a larger bandwidth and if the filter doesn’t have 
the proper pass band to accommodate the pulse, the shape of the signal will change, as it will be 
explained later on. 
In addition, the pulse signal can be studied as an infinite sum of cosine and sinusoidal signals; in 
other words from the frequency domain point of view an infinite sequence of harmonics. The 
equation that models the behavior is the Fourier Series shown in the following equation. ! ! = !!! + !! cos !!"#! + !! sin !!"#!∞!!!  (2) 
To generate the pulse signal, the harmonics that compose it must be separated periodically  !  ! 
being T the period of the signal and  !! ,!! or both is the amplitude of the harmonics n. The 
signal shape will change depending on the number of harmonics that composes it. The more 
harmonics the signal has, the more square it will be. Consequently, the rising and falling time 
will be sharper. However, if the signal is filtered and harmonics are eliminated, the rising and 
falling time will increase, eventually approaching a sinusoidal shape signal.  
3. The shift box is used as an amplifier. The implementation shifts the bits twice to the left to 
recreate a multiplication. The procedure can be seen on the following equation. ! ! =! ! ∗ 4 
Each sample is shifted. 
 
4. The Square box is a multiplier, as seen in the mathematical model. It is used to bring the 
signal to baseband. The output sample suffers an increase in the length of bits, the multiplier 
is a DSP48 and accepts at most an 18 bits word, which means that the output will be 36 bits 
long.  
The next stage in the algorithm is the low pass filter, the filter is composed with DSP48.Then 
if the output of the square box has 36 bits long, the DSP48 inside the filter will be not able to 
read such word. For that reason, in the output of the square box the LSB are truncated. 
 
5. Low pass filter is then used to eliminate the image of the signal created by the multiplier.  
The output signal of the low pass filter is the original base band signal as is explained in the 
mathematical model. The pulses suffer a distortion as described in the band pass filter block.  
 31 
 
Picture 4A shows the signal at the output of the band pass filter centered at 15MHz in the 
frequency domain. In the spectrum the effect of the approximation of the filter 
implementation, explained on the FPGA chapter section FIR filter, can be appreciated. The 
amplitude of the harmonics are not symmetric this is the consequence of the asymmetry of 
the filter shape. Picture 4B illustrates the time domain in different stage of the label 
processor; the green trace signal is the input, govern by the following equation.  ! ! = !"#$% 0,500!" ∗ (cos 15!"# + cos 25!ℎ! + cos 35!"#)  
The yellow trace represents the signal after the band pass filter centered at 15 MHz. It can be 
understood as  ! ! ≈ !"#$% 0,500!" ∗ !"# 15!"#.  
This trace signal demonstrates the effect of eliminated harmonics of the pulse signal due to 
distortion presented.  The purple trace represents the signal after the decoder. 
    
A)      B) 
PICTURE 4 A) FREQUENCY DOMAIN AFTER THE BAND PASS FILTER B) TIME DOMAIN AFTER THE BPF  
 
Picture 5A and Picture 5B shows the signal after the low pass filter in the frequency and the time 
domain, respectively. The signal has been down-converted to base band, which can be 
appreciated also by the representing of the spectrum in the Picture 5A. In the time domain 
picture, the input signal is shown by the green trace. The purple trace represents the output 
channel 7 of the decoder, which corresponds to the three frequencies combination. The blue trace 
shows the 25 MHz signal after the low pass filter. The yellow trace shows the 15 MHz signal 
after the low pass filter. In addition, the proper functioning of the envelope detection can be 
seen.  
 
32  
 
    
A)       B) 
PICTURE 5 A) FREQUENCY DOMAIN AFTER THE LOW PASS FILTER B) TIME DOMAIN AFTER THE LPF 
6. Shift block, is used to amplify the signal due to the square, see mathematical model.  
  ! ! = ! ! ∗ 2 The samples are shifted one time to the left. 
 
7. AGC block marks a threshold and converts the sample to zero if it is below the threshold, 
and if is above the sample, is changed to a known fixed-point value. 
The code is written in Matlab syntactic, the system generator converts it to VHDL 
providing a powerful tool to the designer. 
function z = AGC(x) 
if x <xfix({xlSigned,14,13},0.05) 
   z = fix({xlSigned,14,13},0); 
else 
   z =xfix({xlSigned,14,13},0.99);  
end 
 
 
The variable x represents the input signal, while the variable z is the output of the 
block. The xfix is a function that returns a fixed-point value. xlSigned represents a fixed-
point value with sign, in this case is 14bits with 13 fractional bits (Q14.13). The xfix 
function returns as value 0.05, in this case this value is the threshold. The value 0.99 is 
assigned to the output if the comparison is false. 
 
8. The threshold block, once the pulses are reshaped with a known amplitude value, the fixed-
point sample is converted to a Boolean. The Boolean signal is then passed to the decoder. 
The code is generated with the Matlab syntactic. 
function z = threshold(x) 
if(x>xfix({xlSigned, 14, 13}, 
0.5)) 
 33 
 
    z = xfix({xlBoolean},1); 
else 
    z = xfix({xlBoolean},0); 
end 
 
The output Z will be converted to a Boolean due to the function xlBoolean that returns 1 or 0 
depending on what is assigned. The output signal from the AGC block is 0 or 0.99 in 
Q14.13, for that reason the threshold can be place between this two values, in this case 0.5 
express in Q14.13. 
 
9. The Decoder uses the N Boolean inputs to provide 2N-1 outputs (the input signal referred to 
“all zeros” is not used) the decoder circuit is shown in the Figure 15A. The implementation 
code for the decoder is in appendix 3 and is base on the truth table presented in Figure 15B.  
 
A)        B) 
 
FIGURE 15  A) SCHEMATIC CIRCUIT IMPLEMENTATION OF THE DECODER B) DECODER TRUTH TABLE 
10. Error Correction is needed for the following reason: the pulses that arrive to the AGC stage 
present some jitter. When the decision takes place, the three pulses are not completely 
synchronized creating a Δτ that will introduce a pattern different from all zeros. This pattern 
will be converted by the decoder into a short pulse as shown in the purple trace in Picture 6. 
This is in fact an error. In the worst case scenario the error pulse has a duration less than 40 
ns, which is 4 samples. For that reason, the error correction is programed based in this value. 
 O0 O1 O2 O3 O4 O5 O6 O7 
000 0 0 0 0 0 0 0 0 
001 0 1 0 0 0 0 0 0 
010 0 0 1 0 0 0 0 0 
011 0 0 0 1 0 0 0 0 
100 0 0 0 0 1 0 0 0 
101 0 0 0 0 0 1 0 0 
110 0 0 0 0 0 0 1 0 
111 0 0 0 0 0 0 0 1 
34  
 
 
PICTURE 6 ERROR BIT ON THE PURPLE SIGNAL 
 
In the example shown in Picture 6, the spike in the purple trace is because the blue trace, which 
represents the tone at 15 MHz, and the yellow trace representing the tone at 25MHz, are not 
synchronized. 
The error correction counts the first 4 samples of the pulse. If the fourth sample of the pulse is a 
Boolean "1", this means that the signal is not a spike (an error), and it generates a pulse of 500 
ns. Otherwise the spike is forced to a zero. This block therefore introduces 4 samples delay, 
which means a delay of 40 ns.  
The algorithm can be found in the Figure 16, as you can see, its been divided in two parts. The 
first one, marked in blue, controls the quantities of “1 bits”. When a “1” arrives to the subsystem, 
it first passes thru a gate, if the gate is in the open state, the “1” will be counted, otherwise a zero 
will be sent. The second module, marked in green, controls the gate. If a “1” passes the first 
stage, it will then be compared against a constant with a value of 1 to verify its value. If the 
validations returns true, a counter starts, until it reaches a variable that will define the number of 
samples that needs to be checked. In this example the number of samples is 4. When four “1s” 
are counted consecutively, the blue module sends an “enable” to the green module. On the other 
hand, if the counter receives, for example, three “1s” and one “0”, the counter will automatically 
be restarted, and is at this point when the error is corrected. In case that four “1s” were counted, 
the green module sends an “unable” to the gate, avoiding any other “1s” to enter the subsystem, 
and a 500ns pulse is generated. To generate the pulse, a counter is configured to count until 52, 
this is because each clock cycle is 10ns long. The counter is compared using an “less than or 
equal” to 51, while this validation returns true, the pulse is being generated, once it reaches 52, 
the comparison will fail, and the green module will send an “enable” to the blue module, 
allowing the system to receive more inputs. In the meantime all the system reboots itself. 
 35 
 
 
FIGURE 16 ERROR CORRECTION SIMULINK MODEL 
 
The result after the error correction can be appreciated on the Picture 7. The spike was 
successfully corrected. 
 
PICTURE 7 ERROR CORRECTION IMPLEMENTED 
 
11. Digital I/O are assigned only as outputs, no inputs were used. The board has the following 
ports as shown in Figure 17.  Each pin has a coordinate within the FPGA array. The 
identification of the pins and the implementation tutorial are in the appendix 2. This is how 
the processor knows which route has to wire to each pin.  
 
 
  
FIGURE 17 DIGITAL I/O DIAGRAM 
!
"#$%&$'%'()**+,(#(-
./01(23('$$4
5
5
5
67.88.9,0160:;
4#
!
"#$%
$&
'(
)&
$*$+,-%
$.")/
0"123
)4
5$#)6
708$
9
$:8$#)(
0.)
;
7%
<$#
;
:%
$
4
5$#)=(
6
>
)?@3
ABBB=6
CDCE)
(
2;
);
,
#
<
=
>)!
?#'@
A'B
(
'
<
=
>)!
?#C@
-
'B
C
<
=
>)!
?#$@
-
'4
D
<
=
>)!
?##@
-
'D
4
<
=
>)!
?E@
-
'C
B
<
=
>)!
?2@
-
''
%
<
=
>)!
?B@
-
'#
E
<
=
>)!
?%@
-
'$
2
<
=
>)!
?D@
!
'4
#$
<
=
>)!
?4@
!
'D
##
<
=
>)!
?'@
-
#2
#'
<
=
>)!
?C@
!
#2
#C
<
=
>)!
?$@
F
'B
#D
<
=
>)!
?#@
F
'4
#4
<
=
>)!
?#D@
A#2
#B
<
=
>)!
?#4@
F
'$
#%
<
=
>)!
?#B@
A'#
#E
<
=
>)!
?#%@
A'$
#2
<
=
>)!
?#E@
A'D
'$
<
=
>)!
?#2@
A'C
'#
<
=
>)!
?'$@
F
''
''
<
=
>)!
?'#@
F
'#
'C
<
=
>)!
?''@
F
'D
'D
<
=
>)!
?'C@
F
'C
'4
<
=
>)!
?'D@
>'#
'B
<
=
>)!
?'4@
>'$
'%
<
=
>)!
?'B@
>'C
'E
<
=
>)!
?'%@
>''
'2
C6CG
7H0
C$
I
!
=
7H0
C#
7H0
7H0
C'
7H0
7H0
CC
7H0
7H0
CD
7H0
7H0
F:<+$)@BG)>
8H:I$.")J
75)&
0K0":+)2LM
)9
$:8$#
(
1              33 
2              34
36  
 
3.1.1 MATLAB SIMULATION FILTER METHOD  
 
The system was tested on the Matlab environment before implementing it. The simulation was 
used to test the algorithm’s behavior and to prevent possible errors. For the simulation, a label 
generator was implemented. Three RAMs were used as inputs for the generator block, each one 
generating a different frequency tone. The three tones are then mixed in different combination in 
periods of 500 ns plus a guard time.  The generator uses an external clock as the period time and 
maintains the tones synchronized by generating a trigger signal. Therefore, the outputs of the 
generator provide 8 possible combinations. See  Figure 18.  
 
 FIGURE 18 LABEL GENERATION FPGA   
 
FIGURE 19 LABEL GENERATION + LABEL PROCESSOR FILTER METHOD 
The System Generator allows using the Simulink toolbox for simulation. The scope blocks were 
used for this algorithm. Three points of the system were monitored: one point set right after the 
envelope detection, other one after the decoder and the last one to measure the delay of the label 
processor. 
The three tones generated are then processed by the label processor, as the one illustrated in 
Figure 19. Figure 20 shows the output after the envelope detection stage. The yellow trace 
represents the 15MHz tone, the purple one the 25MHz, and the blue signal at 35MHz. The seven 
different combinations are presented. The figure clearly shows the correct operation of the 
 37 
 
envelope detector. A slightly distortion of the pulses due to crosstalk is observed for the 
combination when two or more tones are sent at same time.  
 
FIGURE 20 SIMULATION STAGE AFTER THE ENVELOPE DETECTION  (X AXIS TIME, Y AXIS VOLTAGE) 
 
Figure 21 illustrates the output signal after the decoder stage. The seven pulses have been 
detected, reshaped, and each one is shown in a different color.  
 
 
FIGURE 21 SEVEN-DIFERENT OUTPUTS FROM THE DECODER(X AXIS TIME, YAXIS VOLTAGE) 
 
To measure the delay, the input signal was overlapped with one bit after the decoder. The 
comparison was between the first pulse and its corresponding decoded bit as shown in Figure 22. 
The Simulink system period is configure to 10 ns to simulate the FPGA clock. To conclude, the 
delay estimation was 520 ns.  
 
FIGURE 22 THE SIMULATION ESTIMATE 520NS OF DELAY (X AXIS TIME, YAXIS VOLTAGE) 
  
Delay 520ns 
38  
 
3.1.2 RISING AND FALLING TIME STUDY 
 
In this section the effect of the bandwidth of the band pass and low pass filters on the quality of 
the down converted pulses is investigated The different filter orders of the band pass and low 
pass filters implemented on the FPGA will result in different rise and fall time of the down 
converted pulses This design of the filters will also provide information on the maximum 
number of possible channels that can be allocated in the 50 MHz band of the FPGA The 
Simulink model implementing the band pass and low pass filters is illustrated in Figure 23. 
 
FIGURE 23 SIMULIK MODEL FOR RISING AND FALLING TIME STUDY 
The study was performed using only on one channel. The implementation consists on one band-
pass filter, the square function and the low pass filter, which are used to detect the envelopes. 
The output signal before the envelope detector and one after the envelope detector were 
monitored. This allows for monitoring the filtering effects on the down converted pulses. By 
measuring the number of harmonics inside the signal, the BW can be measured. Also the rising 
and falling time was measured after the envelope detector.  
 
Four experiments were performed, using two pulses with different length and tested with a 
different filters orders. The number of harmonics was measured along with the rise time in each 
of the 4 cases. Considering the Nyquist limit imposed by the speed of the ADC, the maximum 
number of channels that can be processed by the FPGA depends on the measured bandwidth that 
each tone occupies. As a result, the order of the optimal filters could be estimated. 
  
 39 
 
 
Pulse duration 1µs 
Band pass filter: Hamming window. Cut frequency Fc1=24Mhz Fc2=26Mhz at -6dB 
Low pass filter: Chebyshev window. Cut frequency Fc=1Mhz at -6dB. SidelobeAtten40dB 
  
A)        B) 
CHART 1 A) RISE TIME B) #CHANNELS EXPERIMENT 1 FILTER METHOD 
The Chart 1A represents the behavior of the rising and falling time as function of the band pass 
filter order. It can be appreciated that for BPF with higher order the rising time will be larger. In 
other words, the harmonics of the signal when the filter is sharper are suppressed. The order 5 
filter have a bandwidth of 50MHz at -6dB, this means that the signal doesn’t suffer any 
attenuation due to the filter. Therefore, the 40 ns rising time is because of the ADC. The Chart 
1B shows that for sharper band pass filter, more channels can be placed in the 50 MHz 
bandwidth supported by the FPGA. The BPF of order 15 demonstrates that the implementation 
of the filters is an approximation. This is because, the configuration is allowing the same amount 
of channels that the 25 order filter, due to the fixed-point approximation. 
Pulse duration 1µs 
Band pass filter: Hamming window. Cut frequency Fc1=20Mhz Fc2=30Mhz at -6dB 
Low pass filter: Chebyshev window. Cut frequency Fc=9Mhz at -6dB. SidelobeAtten40dB 
  
A)        B) 
CHART 2 A) RISE TIME B) #CHANNELS EXPERIMENT 2 FILTER METHOD 
0	  20	  40	  
60	  80	  100	  
120	  140	  
5	   10	   15	   20	   25	   30	  
ti
m
e	  
(n
s)
	  
Band	  Pass	  Order	  Rising	  time	  (ns)	   Falling	  time	  (ns)	  
0	  1	  2	  
3	  4	  5	  
6	  
5	   10	   15	   20	   25	   30	  #
Ch
an
ne
ls
	  
Band	  Pass	  Order	  #	  Channels	  
0	  20	  40	  
60	  80	  100	  
120	  
5	   10	   15	   20	   25	   30	  
ti
m
e(
ns
)	  
Band	  Pass	  Order	  Rising	  time	  (ns)	   Falling	  time	  (ns)	  
0	  1	  
2	  3	  
4	  
5	   10	   15	   20	   25	   30	  
#	  
Ch
an
ne
ls
	  
Band	  Pass	  Order	  #	  Channels	  
40  
 
The Chart 2A shows the results of using the same pulse duration but the bandwidths of the filters 
are greater. For that reason less harmonics are suppressed and as a result, the rising time is faster. 
On the other hand, the number of possible channels is decreased because each filter is covering 
more spectrum. Once again the order 5 shows the same rising time due to the ADC and with 
order 5 only one channel can be implemented because it consumes the entire 50 MHz 
operational band.  
Pulse duration 500 ns 
Band pass filter: Hamming window. Cut frequency Fc1=24Mhz Fc2=26Mhz at-6dB 
Low pass filter: Chebyshev window. Cut frequency Fc=1Mhz at -6dB. SidelobeAtten40dB  
  
A)       B) 
CHART 3 A) RISE TIME B) #CHANNELS EXPERIMENT 3 FILTER METHOD 
The experiment 3 used the same filters as the experiment 1 but with input signals with 500ns 
pulses. The short duration of this pulse is translated in frequency in a larger bandwidth. 
Therefore, a lower number of channels can be implemented as shown in Chart 3B. On the other 
hand, the band pass filter has 2 MHz of bandwidth at -6 dB, which is almost the value of the 
bandwidth of the signal. For that reason, the rising time curve shown in Chart 3A is similar as 
the one obtained in the experiment 1 and 3.  
 
 
 
 
 
 
 
0	  20	  
40	  60	  
80	  100	  
120	  140	  
5	   10	   15	   20	   25	   30	  
ti
m
e(
ns
)	  
Band	  Pass	  Order	  Rising	  time	  (ns)	  
0	  1	  
2	  3	  
4	  5	  
5	   10	   15	   20	   25	   30	  
#	  
Ch
an
ne
ls
	  
Band	  Pass	  Order	  #	  Channels	  
 41 
 
Pulse duration 500 ns 
Band pass filter: Hamming window. Cut frequency Fc1=20Mhz Fc2=30Mhz at -6dB 
Low pass filter: Chebyshev window. Cut frequency at Fc=9Mhz. SidelobeAtten40dB 
  
A)       B) 
CHART 4 A) RISE TIME B) #CHANNELS EXPERIMENT 4 FILTER METHOD 
The experiment 4 used the same filters as the experiment 2. The Chart 4B shows that the 
maximum number of channels that can be implemented is two. For that reason, the order 10 is 
the most optimal because is the option that less rising time produce see Chart 4A. However, the 
crosstalk presented in this experiment is the reason why only two channels can be implemented 
and for this the configuration is not chosen. 
The results of the study confirm the theory postulated by the Fourier series: when less harmonics 
pass the filtering process, the rise and fall time increase.  
 
3.1.2.1 SELECTED PARAMETERS OF THE BPF AND LPF 
To summarize, the pulse used in the project has 500ns of duration. The inverse of the period is 
the approx. BW that the signal occupies, that is 2MHz. For that reason, the configuration used in 
the project was a Band pass filter of order 30, a Hamming window with a 2 MHz bandwidth and 
a low pass filter of order 10 with a 1 MHz bandwidth. This configuration is the one that produces 
less crosstalk. In theory, this combination allows the system to use 4 channels. However, this 
couldn’t be implemented for the following two reasons: first, the Virtex 4 has not enough 
resources. Second, the fourth channels would be too close to the Nyquist limit, producing 
distortion.  
0	  20	  
40	  60	  
80	  100	  
5	   10	   15	   20	   25	   30	  
ti
m
e(
ns
)	  
Band	  Pass	  Order	  Rising	  time	  (ns)	   Falling	  time	  (ns)	  
0	  1	  
2	  3	  
5	   10	   15	   20	   25	   30	  
#	  
Ch
an
ne
ls
	  
Band	  Pass	  Order	  #	  Channels	  
42  
 
3.1.3 EXPERIMENTAL SET-UP FILTER METHOD 
 
In the previous section the implementation and verification of the filter method via an FPGA was 
presented. In this section, the FPGA label processor was investigated in a typical optical packet 
switch system including optical packet generation, detection and processing, and switching.  
 
The system allows for benchmarking the following aspects: 
 
• OSNR Tolerance, used to understand the label processor tolerance to the amount of 
optical noise accumulated in the optical system. 
• Crosstalk tolerance indicates the minimum SNR and crosstalk required for a correct 
detection.  
• Dynamic range, behavior of the FPGA when the optical power fluctuates. 
• Scalability, understand the maximum number of RF tones that can be detected  
 
 The experimental set-up employed to validate the FPGA within the optical systems is shown in 
Picture 8.  
 
A) 
  
B) 
PICTURE 8 A) SCHEMATIC EXPERIMENTAL SET-UP B) EXPERIMENTAL SET-UP 
 43 
 
The optical labels were generated as follows. The AWG Lecroy LW42A can generate labels 
with up to 15 different tones. The amplitude is fixed to a value of 2Vpp. The trace of the 15 
tones generated at the same time can be observed in Picture 9.  The arbitrary wave generator 
(AWG) provides the electrical RF tones. The electrical tones drive the optical modulator, which 
modulates a continuous wave light generated by the laser. The laser Santec MLS-2100 was set at 
1546.8 nm. As a result, the RF tones are converted in the optical domain.  
 
PICTURE 9 AWG - 15 TONES WITH A 500nS PULSE AT 2Vpp 
The output power vs. bias voltage of the external modulator shown in Chart 5 was performed to 
select the optimal bias that makes the modulator working on the linear slope. The optimal bias 
was 2.51V.  
 
CHART 5 TRANSFER FUNCTION MODULATOR 
 
The set-up employed an EDFA as a noise loader that is needed to analyze the system 
performance at different OSNR. The generated optical labels are fed into the FBG. The FBG is 
employed as label extractor using the reflected signal, to separate the label from the payload.  
A digital optical attenuator Agilent 83434A, changes the optical power provided to the 
photodetector and then the amplitude of the electrical signals to the FPGA. This allows us to 
0	  0.05	  
0.1	  0.15	  
0.2	  0.25	  
0.3	  
0	   0.5	   1	   1.5	   2	   2.5	   3	   3.5	   4	   4.5	   5	   5.5	   6	   6.5	   7	   7.5	   8	   8.5	   9	   9.5	   10	  
O
ut
pu
t	  P
ow
er
	  
Input	  Voltage	  
500ns 
2Vpp 
44  
 
study the optical power dynamic range supported by the system. The crosstalk tolerance is 
measured when more tones are added.  
The optical signal carrying the RF tones is fed into an optical to electrical (O/E) converter. The 
O/E had a bandwidth of 1.25Gbit/s and a transfer function shown in Chart 6. 
   
 
CHART 6 TRANSFER FUNCTION O/E RECIEVER 1.25GBIT/S 
The O/E provides a low amplitude signal. For that reason, an amplifier is used to increase the 
Dynamic range. The model used was a Miteq with 37 dB gain. The 50 MHz low pass filter after 
the amplifier was used to avoid the interference of tones beyond the Nyquist limit. This was 
placed just before the FPGA, and thus it allows us to study the effect of using more tones on the 
performance of the entire system.  
Expected behavior from the set-up configuration 
Previous to experimentation certain behavior can be estimated due to the configuration of set-up, 
which is given by the intrinsic limitations of devices used. For this reason three assumptions 
have been made: 
 
• The AWG provides 2 Vpp fixed output voltage regardless the amount of generated tones. 
Therefore, the amplitude is distributed equally to each tone as illustrated in Figure 24. 
 
FIGURE 24 AMPLITUDE TONE BEHAVIOR 
0	  100	  
200	  300	  
400	  
-­‐30	   -­‐25	   -­‐20	   -­‐15	   -­‐10	  
O
ut
pu
t	  v
ol
ta
ge
	  (m
V)
	  
Input	  optical	  power	  (dBm)	  
ADC sensitivity 
Fix Amplitude 
1 Tone 2 Tones N Tones 
 45 
 
 
!!"#$ = !"#$%&'()!"!#$!!"#$%  
This means that with a fixed amplitude, the maximum number of tones is limited by the 
sensitivity of the ADC. 
  
A)      B) 
PICTURE 10 EQUAL POWER DISTRIBUTION A) 3 TONES B) 30 TONES 
Picture 10A and 10B shows the distributed power in case of 3 tones and 30 tones, 
respectively. It is clear that in case of 30 tones the amplitude per tone is lower.  
• Second, the OSNR tolerance. The optical noise provides a boundary. When more noise is 
added, more signal is required on each label, to ensure the minimum voltage for 
detection. On the other hand, large power will saturate the maximum voltage accepted by 
the ADC, in this case is 3Vpp. In other word the ADC dynamic range will be reach 
quickly. Consequently, fewer tones can be transmitted. 
 
• Last, the crosstalk. This phenomenon occurs mainly because of two factors. First, the 
filtering process. The more accurate the filter is, the less interference it will create. Second, 
while more tones are sent, more interference will be generated. In the scalability point of 
view, the tones itself will limit the number of labels that can be sent. 
3.1.3.1 PERFORMANCES OF THE LABEL PROCESSOR 
 
The FPGA was tested in different scenarios. The measures consist in testing the maximum 
dynamic range for each scenario. Starting from 1 tone and moving up till 15 tones, repeating the 
process with an OSNR varying from 49dB, 30dB and 20dB. 
3.1.3.1.1 OSNR TOLERANCE 
The different OSNR can be seen in Picture 11. The yellow trace represents the laser centered 
1546.8nm carrying the RF tones. By changing the input power to the EDFA, signals with 49 dB,  
46  
 
30 dB, and 20 dB of OSNR can be generated and are shown in Picture 11A, 11B and 11C, 
respectively.  
  
A)       B) 
 
C) 
PICTURE 11 OSNR MEASUREMENT A) 49dB, B) 30dB, C) 20dB 
The Chart 7 shows the minimum amplitude required by the ADC to detect a specifict number of 
tones. The trend is linear, which proves the first assumption. More power is required to 
compensate the division of the amplitude per tones. The ADC has a dynamic range limited to 
2.5Vpp. Thus, the ADC sets another boundry for the scalabilty because a limitied number of 
tones can be place before it saturate the 2.5Vpp. For this specifict ADC the estimation for 
maximum number of tones is 39. The effect of the different OSNRs are shown in different 
colors. The optical noise can be seen as an offset. As a result, the second assumption  is 
corroborated.  
For example, when the OSNR is 20dB the 15 tones could not be detected. Although the 
estimation reflects that the amplitude for the 15 tones is below of the 2.5Vpp. The reason for not 
detection is that the input voltage is not the only factor for the correct detection of the tones,will 
 47 
 
be seen later that a minimum SNR is requierd. In the case when the OSNR is 20dB with 15 
tones, doesn’t exceed the minimum SNR.  
 
CHART 7 MINIMUM VOLTAGE FOR DETECTION FILTER METHOD 
3.1.3.1.2 CROSSTALK TOLERANCE 
The minimum SNR required to detect correctly the RF tones should be at least 9.1dB as shown 
in Chart 8. The SNR measurements were performed by using the following 
steps. First, the optical power was varied to find the minimum point of detection. Second, 
once this point is found it was measured the SNR. Third, It was gather all measures of all the 
cases and it was found that if the SNR was less than 9.1dB, no pulses were detected.  The effect 
of the crosstalk on the SNR can be study as the slope of the trend line. The line indicates the 
variation of the SNR when more tones are added. In this case the crosstalk generated 
approximately degradations over the SNR of -0.52dB/tone. Consequently the maximum number 
of tones form the SNR point of view that can be detected is 22 tones. 
Continuing the above example of the OSNR 20 with 15 tones, even that the voltage generated 
for the 15 tones is in the dynamic range of the ADC the SNR is 5dB. Therefore, does not 
exceed the minimum required SNR. 
   
 
0	  
500	  
1000	  
1500	  
2000	  
2500	  
3000	  
1	   3	   5	   7	   9	   11	   13	   15	   17	   19	   21	   23	   25	   27	   29	   31	   33	   35	   37	   39	  
Vo
lt
ag
e	  
(m
V)
	  
Number	  of	  tones	  
Back	  to	  back	  
OSNR	  49dB	  
OSNR	  30dB	  
OSNR	  20dB	  
Estimation	  maximum	  #tones	  Linear	  (OSNR	  20dB)	  
48  
 
 
 
CHART 8 MINIMUM SNR FOR DETECTION 
3.1.3.1.3 DYNAMIC RANGE 
The output voltage of the AWG is responsible for the quality of the FPGA input signal in terms 
of the effect of crosstalk over the signal and the optical power needs, this two effects combine is 
the behavior of the Dynamic Range. For that reason it was characterized the dynamic range 
when the output voltage of the AWG varies.   
The FPGA have resources to implement 3 detection channels, for this reason, the study of the 
dynamic range when the voltage provided by the AWG varies were done with 3 tones. On the 
Chart 9 it can be observed that when more voltage is applied to the system, the FPGA provides a 
larger Dynamic Range. On the other hand, the minimum voltage to detect tree tones in a back-to-
back configuration was 224.3mV; the same value was detectable with the set-up. In conclusion, 
due to the 224.3mV was detectable in the set-up these means that the quality of the FPGA input 
signal depends most of the output voltage of the AWG. 
y	  =	  -­‐0.52x	  +	  20.968	  
0	  
5	  
10	  
15	  
20	  
25	  
1	   2	   3	   4	   5	   6	   7	   8	   9	   10	  11	  12	  13	  14	  15	  16	  17	  18	  19	  20	  21	  22	  
SN
R	  
(d
B)
	  
Number	  of	  tones	  
OSNR	  49dB	  
OSNR	  30dB	  
OSNR	  20dB	  
Linear	  (OSNR	  49dB)	  
Minimum 
SNR for 
detection 
 49 
 
 
CHART 9 EYE DYNAMICS FILTER METHOD 
  
The Dynamic range presents an upper limit of 7dB due to the allowed maximum voltage to the 
ADC, that  is 3Vpp. However, in this project we have limited the maximum input voltage to 
2.5Vpp to protect the device. Picture 12 shows the FPGA input signal, and can be observed the 
pulses in the purple trace and yellow trace represents the signal after the envelope detector. Also, 
it is visible the crosstalk represented by the small spikes. The threshold in this project is 
implemented with rise trigger so the spikes cannot be avoided. The threshold in the FPGA is 
fixed. Therefore, the spikes reduce the Dynamic range. Increasing the input voltage results in 
larger spikes that will be detected as ‘1’ by the FPGA threshold. On the other hand, it cannot be 
implemented a threshold decision with an upper value due to the ADC maximum limit voltage. 
When more optical power is used the pulses will increase and the eye for the decision opens until 
the signal is saturated to 2.5Vpp. Increasing the power will result in larger spike limiting the 
dynamic range.  
 
PICTURE 12 FPGA INPUT SIGNAL FILTER METHOD 
 
0	  1	  
2	  3	  
4	  5	  
6	  7	  
8	  
224.3	   250	   500	   1000	   2000	  
dy
na
m
ic
	  r
an
ge
	  (d
B)
	  
Input	  voltage	  AWG	  (mV)	  
3	  tones	  -­‐47dB	  OSNR	  3	  tones	  -­‐30dB	  OSNR	  3	  tones	  -­‐20dB	  OSNR	  
50  
 
The effect of the three assumptions is reflected in the Dynamic Range of the device, which has 
been measured varying the number of tones and OSNR. When more tones are used the Dynamic 
range decreased because of the crosstalk as is illustrated in  Chart 10. In addition, decreasing the 
OSNR will further reduces the dynamic range. Those results corroborate the three exposed 
assumptions. The estimation of maximum number of tones is 36, which is in accordance with the 
previously calculated estimations. 
 
 CHART 10 DYNAMIC RANGE FILTER METHOD 
 
OPTIMIZATION 
The previous measurements were taken using the optimal voltage from the AWG, so the 
modulator operates in the lineal part of the transfer function. Nevertheless, larger dynamic range 
can be obtained by increasing the voltage of the AWG at the price of adding distortion as shown 
in Picture 13. 
 
 
PICTURE 13 SIGNAL DISTORISION NON-LINEAL MODULATOR 
 
0	  1	  
2	  3	  
4	  5	  
6	  7	  
8	  
1	   3	   5	   7	   9	   11	   13	   15	   17	   19	   21	   23	   25	   27	   29	   31	   33	   35	   37	  
D
yn
am
ic
	  R
an
ge
	  (d
B)
	  
Number	  of	  tones	  
OSNR	  47dB	  
OSNR	  30dB	  
OSNR	  20dB	  
Estimation	  maximum	  number	  of	  tones	  
 51 
 
For signal with OSNR ≥ 30dB, the dynamic range can be improved approximately to 6.5dB. 
However, when the OSNR ≤ 20dB, the distortion implemented with the modulator even decrease 
the dynamic range. 
 
OSNR 30dB 
8 
TONES 
AWG 
(mV) 
Optical Power 
(dBm) 
Dynamic 
Range (dB) 
2000 -23.7 3.6 
 -20.1  
3500 -26.1 6.2 
 -19.9  
4000 -15.8 10.1 
 -25.9  
OSNR 20dB 
8 
TONES 
AWG 
(mV) 
Optical Power 
(dBm) 
Dynamic 
Range (dB) 
2000 -26.3 0.7 
 -25.6  
4000 -29.1 0.5 
 -28.6  
TABLE 4 OPTIMIZATION FILTER METHOD MEASUREMENT 
 
3.1.3.1.4 SCALABILTY TEST 
We investigated the scalability by generating and sending the largest number of tones (as inputs) 
as possible. The AWG was set to work with 6Vpp. Picture 14A shows the reception of 50 tones. 
The power is equally distributed. Picture 14B provides the evidence that the tones can be 
correctly detected. 
         
A)     B) 
PICTURE 14 A) SPECTRUM 50 TONES B) DETECTION 50 TONES 
52  
 
It can be observed in Picture 15B, that 60 tones produces distortion and cannot be detected 
correctly. On the other hand, in Picture 15A it can be observed that the power is equally 
distributed. However, the power per tone doesn’t generate sufficient voltage to overpass the 
ADC sensitivity. 
      
A)    B) 
PICTURE 15 A) SPECTRUM 60 TONES B) ERROR DETECTION 60 TONES 
  
 53 
 
3.1.3.2 INVESTIGATION OF THE LABEL PROCESSOR INCLUDING THE 
PAYLOAD  
 
The previous set-up employed only optical labels without the payload. As a new hypothesis, 
from the label point of view the payload will only introduce a degradation of the label OSNR. 
The payload consists of OTDM pulses at 160 Gbit/s. Picture 16 shows the spectrum of the 
payload. 
 
PICTURE 16 SPECTRUM OF THE 160GBIT/S PACKETS 
In the Figure 25 it can be seen the new experimental set-up including the generation and 
transmission of the payload. The set-up includes also the optical switch to test the effective 
processing and control generation of the FPGA.  
 
 
FIGURE 25 EXPERIMENTAL SET-UP WITH PAYLOAD 
 
• The FBG1 carves the spectrum at 1546.8 nm (centered at the optical label wavelength). 
• The label is added to the payload via a circulator. 
• FBG2 extracts the label on the reflected port 3 and the payload is sent through the 
transmitted channel. 
• The FPGA processes the labels and generates the control signals to configure the switch. 
An electrical amplifier is needed to amplify the control signal for providing the proper 
voltage to drive the optical switch.  
Tree different measures have been performed. The first and second experiments study the 
operation of the FPGA with and without the FBG1. Those results can be then compared to those 
54  
 
results when only the label was present. Finally, the third experiment was carried out to test the 
whole packet switch operation including the control signal generated by the FPGA . 
 
3.1.3.2.1 EXPERIMENT 1 WITHOUT CARVING 
In the first experiment, the FBG1 was not employed. The label insertion required more power to 
gain 30dB compared to the payload. On the Picture 17 it can be observed on the yellow trace 
clearly the difference of power between the payload and the label. The optical power and the 
SNR required for correct detection were measured and compared to the previous results when 
not payload was used. 
 
PICTURE 17 SPECTRUM 160GBIT/S WITHOUT CARVING 
 
The Table 5 shows that without the payload the SNR is in average 10 dB larger that when the 
payload was used. On the other hand, more optical power was required without the payload. This 
is because exist a compromised between the SNR and the optical power to achieve similar values 
of Dynamic range. 
3 tones 
Payload 
AWG (mV)  Optical Power (dBm) 
Dynamic 
Range (dB) SNR (dB) 
2000 min -26.4 5.9 10.2 
 max -20.5  18.9 
228 min -18.7 0.7 10.2 
 max -18  12 
Without payload 
2000 min -28.1 5.7 23.6 
 max -22.4  35.9 
228  -20.2 0 21.3 
TABLE 5 COMPARISON 3 TONES WITH PAYLOD VS WITHOUT PAYLOAD (NO CARVING) 
 55 
 
Similar behavior is observed when more tones are used as can be seen the result in the Table 6.  
8 tones 
Payload 
AWG (mV)  Optical Power 
(dBm) 
Dynamic 
Range (dB) 
SNR (dB) 
2000 min -22.6 4 11.2 
 max -18.6  11.9 
470 min -20.7 3.2 11.8 
 max -17.5  12.2 
Without payload 
2000 min -23.7 3.6 31.1 
 max -20.1  44.6 
470 min -19.9 0.8 25.1 
 max -19.1  24.2 
TABLE 6 COMPARISON 8 TONES WITH PAYLOD VS WITHOUT PAYLOAD (NO CARVING) 
The disadvantage of this technique is that the label requires more optical power that the payload. 
For this reason, in the transmission stage, a FBG must be inserted to balance the power of the 
label  
 
3.1.3.2.2 EXPERIMENT 2 WITH CARVING 
The second experiment consists of employing FBG1 in the transmission stage. This FBG1 carves 
the spectrum of the payload to accommodate the label. This allows for reducing the input power 
of the label as can be seen in Figure 26. The blue trace represents the 160Gbit/s payload 
spectrum and the red trace is the wavelength that carries the labels. It can be observed that the 
power used for the labels is balance with the payload in contrast with the experiment 1. 
 
There was measure the optical power and the SNR needed to detect 3 and 8 tones and 
compare the results with those obtained in the same cases when the payload was not used. 
  
 
FIGURE 26 SPECTRUM OF THE 160GBIT/S PACKETS WITH CARVING  (Y-AXIS dB AND X-AXIS nm)  
56  
 
The Table 7 shows that the optical power used for the label, is approximated to the case when 
the payload was not used. The SNR is who suffers the penalty, because in the 3-tone case, 
without the payload, it has as maximum value of 35dB, and for this experiment is 16dB. 
Payload 
3 tones 
AWG (mV)  
Optical Power 
(dBm) 
Dynamic 
Range (dB) 
SNR (dB) 
2000 min -30 5.8 15.2 
 max -24.2  16 
228 min -25.1 0.3 16.4 
 max -24.8  16.6 
8 tones 
2000 min -27.6 4,1 14.7 
 max -23.5  14.9 
470 min -20.7 0.8 13.7 
 max -19.5  14.5 
TABLE 7 MEASUREMENT USING PAYLOAD (CARVING) 
To summarize, the experiment 1 and 2 demonstrated that the payload for the label point of view 
can be seen as added optical noise, for this reason the behavior for more tones can be estimated 
with the results obtain with the first set-up where not payload was used. 
 
3.1.3.2.3 EXPERIMENT 3 1XN OPTICAL PACKET SWITCH  
In this experiment was tested the complete system presented in Figure 25 controlling the optical 
switch trough the digital output of the FPGA. The optical switch used is a 2x2 configuration 
electronically controlled. The transfer function of the two output ports of the switch can be 
observed in Chart 11. The two outputs are shown in the chart and can be appreciated that the 
required voltage to pass from one state to the other is 3V. However, the digital outputs of the 
FPGA generate 2V. For that reason, an amplifier is used to provide the proper voltage to the 
optical switch control.  
 
CHART 11 OPTICAL SWITCH TRANSFER FUNCTION 
The payload generator streams continuous 160Gbit/s packets. In the experiment 3 bit labels were 
employed. According to the optical labels the FPGA sets the optical switch to route the signal to 
0 
0.1 
0.2 
0.3 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 
O
ut
 p
ow
er
 
Input Voltage (V) 
I/O 1-1 
I/O 1-2 
 57 
 
port 1 or port 2 of the switch. The  Picture 18A shows the output 1 of the optical switch. The 
switch is driven with a control signal of 500 ns pulse generated by the FPGA. The extinction 
ratio (ER) was 26dB. On the Picture 18B it can be appreciated the output 2 of the switch. 
    
A)      B) 
PICTURE 18 A) OUTPUT 1 OPTICAL SWITCH B) OUTPUT 2 OPTICAL SWITCH 
The FPGA employed the 7 digital outputs that represent the decoder outputs. According to the 3-
bit labels combination, the seven outputs provided distinct control signal to switch the optical 
packets to 7 different output ports.  Using the different digital outputs, the continuous streaming 
was successfully switched to seven different channels, as can be seen in Figure 27. 
 
FIGURE 27 DECODER SWITCHING SEVEN DIFFERENT CHANNELS  
Channel 1 
 
Channel 2 
 
Channel 3 
 
Channel 4 
 
Channel 5 
 
Channel 6 
 
Channel 7 
 
58  
 
3.1.4 SUMMARY OF THE FILTER METHOD 
 
This section shows in Table 8 the summary of the results obtained with the filter method. 
 
Filter Units 
Delay 470 ns 
Rise time 120 ns 
# Channels 
implemented 3  
Dynamic range 7 dB 
Max OSNR tolerance 30 dB 
Minimum SNR for 
correct detection 9.1 dB 
Minimum voltage 
required for 1 tone 63 mV 
Minimum voltage 
required for 15 tones 1.7 V 
Losses SNR 0.52 dB/Tone 
Resources used 85 % 
Estimated maximum 
number of tones 
detectable without 
distortion * 
36 Tones 
Estimated maximum 
number of tones 
detectable with 
distortion * 
50 Tones 
TABLE 8 FILTER METHOD SUMMARY 
 
* The estimated maximum number of tones detectable with or without distortion is base on using 
different output voltage in the AWG from the set-up that can be seen in Picture 8.   
 59 
 
3.2         THE MIXER METHOD  
 
The mixer method is based on the IQ demodulator theory, which provides the tools to separate 
each of the subcarriers. The fundamental idea is to multiply the RF input signal with two local 
oscillators, differentiated in 90 degrees of phase, as demonstrated in Figure 28. 
3.2.1.1 MATHEMATICAL MODEL 
        
FIGURE 28 MIXER MODEL 
1. Received RF tones signal 
 S!" t = S! t ∙ cos !1t +   S! t ∙ cos !2t +   …+ S! t ∙ cos !!t +⋯ S! t ∙ cos !!t  !! !      with i= 1‧‧‧N is the base band label bits 
2. Mixing with the LO (LO: !ω + ∆ω t + φ) 
 
• S!(t) = S!"(t) ∙ sin( iω + ∆ω t +φ) = Si(t)2 ∙ sin( ∆ω t + φ) + Si(t)2 ∙ sin( 2!! + ∆ω t + φ) − Si+1(t)2 ∙ sin( !!+1 − !! − ∆ω t − φ) + Si+1(t)2∙ sin( !!+1 + !! + ∆ω t + φ) + ⋯ 
•   S!(t) = S!"(t) ∙ cos( iω + ∆ω t + φ) = Si(t)2 ∙ cos( ∆ω t + φ) + Si(t)2 ∙ cos( 2!! + ∆ω t + φ) + Si+1(t)2 ∙ cos( !!+1 − !! − ∆ω t − φ) + Si+1(t)2∙ cos( !!+1 + !! + ∆ω t + φ) + ⋯ 
3. After low pass filtering !!"(!) = !!(!)! ∙ sin( ∆! ! +!)      !!"(!) = !!(!)! ∙ cos( ∆! ! +!) 
4. Square and combine  !!"(!) = !!(!)!!  
The mathematical model demonstrated that in each mixer of the array the corresponded Soi(t) 
will be isolated.  
SRF(t) 
SI(t) 
SQ(t) 
SFI(t) 
SFQ(t) 
SOi(t) !" = cos !! + ∆! ! + !) 
 
60  
 
3.2.1.2 FPGA MODEL 
The Simulink model follows the mathematical model. See Figure 29. Creating an array of 
mixers. 
 
 
FIGURE 29 MIXER METHOD SIMULINK MODEL  
13.- Error Correction 
5.- Shift (Amplifier x[n]*4) 
4.- Mixers 
1.- ADC 
2.- Phase Protection  
protection 
3.- RAM blocks 
6.- Low Pass filter 
7.- Square 
8.- Add 
9.- Shift (Amplifier x[n]*2) 
10.- AGC 
11.- Threshold 
12.- Decoder 
14.- Digital I/O 
 61 
 
The following stages, which correspond to the ADC, AGC, Threshold, Decoder, Error 
Correction and digital I/O, are explained in the filter method chapter. The entire Shift 
blocks are used as an amplifier. 
 
2. Phase protection block is used to avoid the phase transition when the bits are in overflow. 
In other words, the multipliers are used to mix the internal oscillator with the input signal. 
However, when the multiplier is in overflow, the input signal changes its phase in 180 
degrees, affecting the system by reducing its operational range. 
The Simulink model [see Figure 30] emulates the result of the phase change. The input signal 
used is a base band pulse, because for this demonstration is not needed a RF signal. First, it’s 
spliced and multiplied by one, and then is added. As a result, the output signal in the DAC1 
should have twice the amplitude that the signal in the DAC2. Nevertheless, this behavior is 
accomplished until the overflow.  
 
FIGURE 30 SIMULINK MODEL EXPERIMENT PHASE 
 
  
A)                                                   B) 
PICTURE 19 A) CORRECT OUTPUT B) PHASE PROBLEM WHEN OVERFLOW 
 
On Picture 19 is shown the behavior of the model shown in Figure 30. The yellow trace is 
the output of the DAC1, the DAC1 is one of the two digital to analog convertors that have 
the FPGA, is used to monitor the output of the model, the model it has the same behavior as 
62  
 
the mixer method but in base band. In the Picture 19B is shown the moment that the output is 
in overflow, therefore changing its phase in 180 degrees. The blue trace is the output signal 
in DAC2; the blue trace it should have the half of the amplitude than the yellow one. Beside 
the phase problem, it was also demonstrated that the delay between the two branches is the 
same; that can be seen in the fact that the yellow trace has only one amplitude level. 
Otherwise if a delay were present, the yellow trace would have presented two different 
amplitudes.  
The phase protection block saturates the input signal in order to block it from reaching the 
amplitude that produces the 180 degrees phase effect. However, when the signal is saturated, 
it creates distortion and also affects the dynamic range, but with less effect than those 
produced by the phase changes.  
3. The RAM block is used to create an internal oscillator. The RAM generates a sinusoidal 
signal with an intrinsic frequency. Therefore, the frequency has to be multiplied by a factor 
to obtain the desired oscillation. In this case, the natural oscillation frequency is 51.33 KHz. 
The factor to generate a 20MHz signal is 389.63 as shown in Figure 31A and for the 35MHz 
sinusoidal is 681.84 shown in Figure 31B; the implementations of the RAM consist in adding 
the mathematical function and the number of sample to compose it. The sinusoidal in this 
case has 2048 samples. 
   
A)                                              B) 
FIGURE 31 RAM IMPLEMENTATION – A) 15MHZ B) 30 MHZ SINUSOIDAL SIGNALS 
 
For each channel implemented, two RAMs are needed; due to the model, two signals in 
quadrature must be used. The totally of the RAMs are controlled by a single clock that provides 
the synchronization between them.  
 
4. The mixer block is a multiplier. As seen in the mathematical model, each tone generates two 
images as is shown in Table 9. The tones need to be positioned strategically. In other words, 
the largest possible spectral separation, so the output tones will produce the less interference 
as possible. 
 63 
 
INPUT SIGNAL LOCAL OSCILLATOR 
 20 MHz 35 MHz 
20MHz 0Mhz 
40Mhz 
15MHz 
45MHz  
35MHz 15Mhz 
45Mhz 
0Mhz 
30Mhz  
TABLE 9 OUTPUTS FREQUENCIES AFTER THE MIXER 
The Picture 20 shows the behavior of the 20 MHz mixers. It can be appreciated the base band 
and the other three tones: the one at 15MHz, at 40MHz and at 45MHz. The 45 MHz signal is the 
image created for exceeding the Nyquist limit; the frequency should by 55MHz. This was 
explained in the filter method chapter, in the ADC block section. 
 
PICTURE 20 SPECTRUM AFTER MIX WITH 30MHZ LOCAL OSCILATOR 
 
6. Low pass filter eliminates all the images tones created by the mixer. The filter imposes the 
number of channels. If the filter does not attenuate the RF signals, distortion will be 
generated. This effect is going to be explained in the next stage. The low pass filter is 
responsible for the shape of the base band pulse. As was explained in the Filter method 
chapter in the band pass filter block section. Also, the low pass filter is responsible of the 
rising and falling time. 
 
7. The square block is a multiplier and works as the mixer block. Consequently, if the input of 
this stage contained RF signals, it will work as a mixer, down converting all the tones to 
baseband and also generating distortion. The effect can be seen in Picture 21A where the 
purple trace represents the 35MHz signal. In this case the order of the low pass filter in this 
channel was decreased to demonstrate the distortion and compare it to the green trace that 
represents the 20MHz signal used as reference. The results can be seen in Picture 21B.  The 
output is truncated in the LSB, to avoid over passing the length bit word permitted by the add 
box, this is due to the DSP48, as explained in the filter method in the square block section. 
Base band 15MHz 40MHz     45 MHz 
64  
 
    
A)      B) 
PICTURE 21 A) EFFECT OF INSUFFICIENT LOW PASS FILTER B) PROPER LPF 
 
8. The Add block is responsible for adding the quadrature signals from the two branches of the 
mixer, so the phase protection between the local oscillator and the input signal is 
implemented. The mathematical basis of this stage is explained at the beginning of this 
chapter. 
  
 65 
 
3.2.1 MATLAB SIMULATION MIXER METHOD 
 
The matlab simulation was employed to prove the proper operation of the algorithm. The label 
generator is the same as the one use for the filter method only changing the algorithm from three 
frequencies to two the new generator is implemented. This facility to change designs in the 
FPGA shows the versatility of this device. For this reason, for future projects the FPGA could be 
used as a label generator, instead of using the AWG. The new Simulink model can be observed 
on Figure 32. 
 
FIGURE 32 LABEL GENERATOR MIXER METHOD 
The model was simulated after the envelope detection and after the decoder. Then, the delay was 
also simulated overlapping the input signal with a bit of the output as shown in Figure 33. The 
generator provides three different combinations by mixing the RF signals. The generated RF 
signals are fed into the label processor for simulation purposes. 
 
FIGURE 33 LABEL GENERATION + LABEL PROCESSOR MIXER METHOD 
As can been seen in Figure 34 the algorithm is able to extract the two tones and then is able to 
detect the three combinations formed by: the yellow trace, which corresponds to a 20 MHz tone, 
the purple that is at 35MHz tone and finally the two signals sent at the same time. The generator 
works periodically.  
66  
 
 
FIGURE 34  ENVELOPE DETECTION MATHLAB SIMULATION (X AXIS TIME, Y AXIS VOLTAGE) 
The output of the simulation’s decoder stage can be seen in Figure 35. The three output signals 
are presented in a different color, reshaped and properly decoded. 
  
FIGURE 35 DECODER MATHLAB SIMULATIN MIXER METHOD (X AXIS TIME, Y AXIS VOLTAGE) 
The delay is measured between the first pulse and its corresponding decoded bit. The simulation 
is running at 10ns per sample, to emulate the FPGA clock speed. The delay estimated on the 
mixer method is 350ns as can be appreciated in Figure 36. 
 
FIGURE 36 SYSTEM DELAY ON MATHLAB SIMULATION MIXER METHOD (X AXIS TIME, Y AXIS VOLTAGE) 
Delay 350ns 
 67 
 
3.2.2 RISING AND FALLING TIME STUDY  
 
The simulation proves that the mixer concept could be implemented. Here we investigate the 
rising and falling time. The methodology for the experiment is the same as the case of the filter 
method. The envelope detection stage was implemented using only one channel. The low pass 
filter order in this case is the one that is being iterated to study the effect of the filter order over 
the rising time. Two different experiments were conducted, each one with different LPF pass 
band bandwidth. The implementation can be found on Figure 37.  
  
FIGURE 37 SIMULINK MODEL FOR RISE TIME STUDY MIXER METHOD 
The mixer method was studied only with the 500ns pulse. It’s demonstrated in Chart 12A that 
for  sharper filter with higher order, the resulting rising time is increased. On the other hand, the 
number of channels increase rapidly when a higher order filter is used as shown Chart 12B. This 
is because this method is mixing the RF tones and produces also the components at the double 
frequency, which can be seen as crosstalk if not properly filtered.    
Pulse duration 500ns 
Low pass filter: Chebyshev window. Cut frequency Fc=1Mhz at -6dB. Sidelobe Atten 40dB 
 
  
A)                                                               B) 
CHART 12 A) RISE TIME B) #CHANNELS EXPERIMENT 1 MIXER METHOD 
0	  20	  40	  
60	  80	  100	  
120	  140	  
5	   10	   15	   20	  
ti
m
e(
ns
)	  
Low	  Pass	  Order	  Rising	  time	  (ns)	   Falling	  time	  (ns)	  
0	  1	  2	  
3	  4	  5	  
6	  
5	   10	   15	   20	  
#	  
ch
an
ne
ls
	  
Low	  Pass	  Order	  #	  Channels	  
68  
 
The experiment 2 was using more pass bandwidth in the low pass filter from 1MHz to 9 MHz of 
bandwidth at -6dB. As it is expected the rise time and falling time decrease as can be seen in 
Chart 13A but with price of less channels see Chart 13B.As a result, this configuration shows 
better performance in terms of rising time. However, the LPF bandwidth being 9 Mhz produces a 
very marked crosstalk.  
Pulse duration 500ns 
Low pass filter: Chebyshev window. Cut frequency Fc=9Mhz at -6dB.  
Sidelobe Atten 40dB 
 
A)     B) 
CHART 13 A) RISE TIME B) #CHANNELS EXPERIMENT 2 MIXER METHOD 
3.2.2.1 SELECTED PARAMETERS OF THE LPF 
To summarize the low pass filter configuration used in this method was a Chebyshev window of 
order 20 with a cut frequency at 1MHz at -6dB. The study estimated that this method could be 
implemented using 5 channels as maximum. However, only two channels were implemented due 
to the limited resources to generate a sharper low pass filter.  
0	  10	  20	  
30	  40	  50	  
60	  70	  
5	   10	   15	   20	  
ti
m
e	  
(n
s)
	  
Low	  Pass	  Order	  Rising	  time	  (ns)	   Falling	  time	  (ns)	  
0	  0.5	  
1	  1.5	  
2	  2.5	  
5	   10	   15	   20	  
#	  
ch
an
ne
ls
	  
Low	  Pass	  Order	  #	  Channels	  
 69 
 
3.2.3 EXPERIMENTAL SET-UP MIXER METHOD 
 
The set-up for the mixer method is the same as the one used in the filter method illustrated in the 
chapter Experimental set-up Filter Method.  
3.2.3.1 PERFORMANCE OF THE LABEL PROCESSOR 
 
The investigated system performances were:  
• OSNR Tolerance, the system was measured in the same OSNR scenarios than before at 
50dB, 30dB and 20dB. 
• Crosstalk tolerance, the experiments were implemented using different number of tones 
to study the behavior of the crosstalk. 
• Dynamic range, measure the tolerance when the optical power fluctuates.  
• Scalability, a series of estimation were implemented to study the scalability of the mixer 
method 
 
3.2.3.1.1 OSNR TOLERANCE 
The mixer technique is more sensitive to the crosstalk than to the optical noise. This is 
demonstrated with the slope of the curves in Chart 14.  The steep curve indicates that the 
crosstalk is more marked and more amplitude is needed to overpass the minimum SNR required. 
On the other hand, the OSNR acts as an offset that saturates the ADC more quickly. The 
maximum number of tones is 15 with an OSNR 50dB. 
 
CHART 14 MINIMUM VOLTAGE FOR DETECTION MIXER METHOD 
 
3.2.3.1.2 CROSSTALK TOLERANCE 
The Chart 15 represents the minimum SNR for the correct detection of the labels. Again the 
optical noise acts as an offset. The minimum SNR for detection is 9.5dB the crosstalk generates 
approximately 0.62dB/Tone of loss over the SNR; this variation is calculated from the slope of 
 -    
 1,000  
 2,000  
 3,000  
 4,000  
 5,000  
 6,000  
 7,000  
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Vo
lta
ge
 (m
V
) 
Number of tones 
Back to 
back 
OSNR 
50dB 
OSNR 
30dB 
OSNR 
20dB 
70  
 
the trend line. The difference in dBs of the SNR between the one tone and two tones case 
demonstrated that with the mixer technique, the crosstalk produces more losses that the optical 
noise. This is verified because the fall in dBs is steeper between the one tone and two tones case. 
In other words, a greater shifting is produces due to the crosstalk that the offset generated by the 
optical noise. The estimated maximum number of tones was 25. 
 
CHART 15 MINIMUM SNR FOR DETECTION MIXER METHOD 
 
3.2.3.1.3 DYNAMIC RANGE 
The dynamic range depends on the output voltage of the AWG. The resources of the FPGA 
allow the implementation of two detection channels. Therefore, this study was performed with 
two tones. 
In Chart 16 Eye dynamics mixer methodthe dynamic range increases when more voltage is 
applied. The set-up used to test the mixer method is the same as was used in the filter method, 
and it was demonstrated that the set-up provides a lineal system. The mixer method was tested in 
a back-to-back configuration and the minimum voltage to detect the 3 tones was 323.8mV. The 
linearity of the set-up allows to detect the minimum voltage required with 3 tones. The 
measurements were taken until 2Vpp, to avoid the nonlinearity of the external modulator. As a 
result, the mixer method reaches up to 10dB of dynamic range thanks to the phase protector 
block. 
y	  =	  -­‐0.6279x	  +	  25.213	  
0	  
5	  
10	  
15	  
20	  
25	  
30	  
1	   3	   5	   7	   9	   11	   13	   15	   17	   19	   21	   23	   25	  
SN
R	  
(d
B)
	  
Number	  of	  tones	  
OSNR	  50dB	  
OSNR	  30dB	  
OSNR	  20dB	  
Estimation	  maximum	  number	  of	  tones	  
 71 
 
 
CHART 16 EYE DYNAMICS MIXER METHOD  
. 
 The Picture 22 shows the FPGA input signal when the mixer method is used, as can be seen the 
blue trace and the yellow trace are more distorted that the filter method [see Picture 12]. 
However, the threshold decision is clearer in this case that using the filter method due to no 
crosstalk, for that reason the dynamic range is 5 dB larger. 
 
PICTURE 22 FPGA INPUT SIGNAL MIXER METHOD 
The dynamic range Chart 17 covers all the aspects such as: the crosstalk influence and the 
optical power needs. It can be observed that, when more tones are added, the Dynamic range is 
decreased. This is due to the two aspects mentioned before. The crosstalk is larger when more 
tones are added and also more optical power is needed to reach the ADC sensitivity. The 
estimated maximum number of tones was 24.  
0 
2 
4 
6 
8 
10 
12 
323.8 500 1000 2000 
D
yn
am
ic
 r
an
ge
 (d
B
) 
Output voltage AWG (mV) 
OSNR 
50dB 
OSNR 
30dB 
OSNR 
20dB 
72  
 
 
CHART 17 DYNAMIC RANGE MIXER METHOD 
 
OPTIMIZATION 
The optimization process applied in the filter method consists of using the no linear part of the 
modulator by increasing the output voltage of the AWG. However, the implication of this 
technique involves adding distortion to the signal. The mixer method, as seen in the previous 
charts, is more sensitive to distortion. For that reason, the improvements are not considerable as 
can be seen in the Table 10 where the dynamic range was improved of only 0.7 dB. 
OSNR 30 
8 tones 
AWG 
(mV) 
 
Optical Power 
(dBm) 
Dynamic 
Range (dB) 
SNR (dB) 
2000 min -23 4.3 15.9 
 max -18.7  12.5 
4000 min -19.5 5.1 10.7 
 max -14.4  9.5 
5600 min -19.5 5 11.8 
 max -14.5  9.1 
TABLE 10 OPTIMIZATION OF MIXER METHOD MEASUREMENT 
 
 
0	  
2	  
4	  
6	  
8	  
10	  
12	  
14	  
1	   3	   5	   7	   9	   11	   13	   15	   17	   19	   21	   23	   25	  
D
yn
am
ic
	  r
an
ge
	  (d
B)
	  
Number	  of	  tones	  
OSNR	  50dB	  
OSNR	  30dB	  
OSNR	  20dB	  
Estimation	  maximum	  number	  of	  tones	  
 73 
 
3.2.3.1.4 SCALABILITY 
The payload was not included for testing this method because it was already demonstrated in the 
filter method section that the payload acts only as additional optical noise. For more details 
please read the Experimental set-up Filter Method in the section of studies with payload in the 
section 3.1.3.2. Although the estimated maximum number of tones for this method can reach up 
to 24 tones, the main limiting of the number of tones is represented by the dynamic range of the 
ADC. This has resulted in no more than 15 tones.   
  
74  
 
3.2.4 SUMMARY OF THE MIXER METHOD 
 
This section shows in Table 11 the summary of the results obtained with the mixer method. 
 
Mixer Units 
Delay 370 ns 
Rise time 120 ns 
# Channels 
implemented 2  
Dynamic range 12 dB 
Max OSNR tolerance 20 dB 
Minimum SNR 9.5 dB 
Minimum voltage 1 
tone 170 mV 
Minimum voltage 15 
tones 4 V 
Losses SNR 0.62 dB/Tone 
Resources used 72 % 
Estimated maximum 
number of tones 
detectable  
24 Tones 
Maximum number of 
tones detectable  15 Tones 
TABLE 11 MIXER METHOD SUMMARY 
 
  
 75 
 
4 COMPARISON FILTER VS. MIXER METHODS 
 
Both methods were tested under the same conditions. 
 
Filter Mixer Units 
Delay 470 370 ns 
Rise time 120 120 ns 
# Channels 
implemented 3 2  
Dynamic range 7 12 dB 
Max OSNR tolerance 30 20 dB 
Minimum SNR 9.1 9.5 dB 
Minimum voltage 1 tone 63 170 mV 
Minimum voltage 15 
tones 
1.7 4 V 
Losses SNR 0.52 0.62 dB/Tone 
Resources used 85 72 % 
TABLE 12 COMPARISON FILTER METHOD VS MIXER METHOD 
The delay is 100ns greater in the filter method due to the order of the filters, the mixer method 
used a LPF of order 20 that in clock cycles is 200ns and the BPF of the filter method is order 30 
that is 300ns. The remaining 270 ns are due to the rest of the process, which is the same for both 
methods. 
The rising time is the same in both cases, because the filters used had to be sharper to avoid the 
crosstalk. 
The main differences are that the Filter method is more sensitive to the optical noise and the 
mixer method is more susceptible to the crosstalk.  
On the other hand, the mixer technique has better performance. However, the scalability is 
reduced due to the amplitude needed when more tones are added. 
The limitations of the filter method depend on the capabilities of the FPGA, as can be observed 
in the following chapter. 
The scalability of the mixer method can be improved if the array of IQ demodulators is 
implemented in analog mode. In this manner, the signal sent to the FPGA is already on 
baseband. Therefore, the most critical part is performed with low delay, and even more 
important, the signal will be in low voltage.  
76  
 
5           VIRTEX 4 VS. VIRTEX 6 
 
For this thesis the FPGA used is the main limitation for the scalability. Table 13 illustrates the 
specifications of the Virtex 6. Using the same algorithms that were investigated and 
implemented on the Virtex 4, we’ll try to estimate the possible performance under the Virtex 6. 
Device 
Configurable Logic Blocks 
DSP 
Block RAM 
I/O 
Banks 
I/O 
User 
Clk 
(MHz) 
Rise 
time 
ADC 
ADC 
Max 
sampling 
frequency 
Array 
Logic 
Cell Slices 
Distributed 
RAM 
18 Kb 
Blocks 
MAX RAM 
(Kb) 
Virtex 6 476160 74400 7640 2016 2128 1064 38304 21 840 600 90ps 5/2.5/1.25G
bps 
Virtex 4 96*40 34560 15360 240 192 192 3456 11 448 100 40ns 100Mhz 
TABLE 13 COMPARISON VIRTEX FAMILY (XILINX, SUPPORT: VIRTEX-4 FAMILY OVERVIEW, 2010), (XILINX, SUPPORT: 
VIRTEX-6 FAMILY OVERVIEW, 2011) 
 
PICTURE 23 VIRTEX 6 DEVICE 
For the estimation, different spaces between tones were used, in order to calculate the maximum 
theoretical number of tones that can be placed. In addition, the resources needed to implement 
such number of tones were also measured. The order of the filter is decreased when more space 
between tones is added, due to the lack of a sharp filter. See Table 14.  
ADC 
sampling 
frequency 
BW/tone 
(MHz) 
BW/total 
(MHz) 
Number of 
channels 
by BW 
Needed 
DSP 
Filter 
order 
EC-Error 
correction 
(ns) 
Total 
delay 
With 
EC 
(ns) 
Total 
delay 
without 
EC (ns) 
Rise 
time 
(ns) 
100MHz 10 40 3 165 30 40 470 430 120 
 77 
 
1.25Gb 10 600 58 3190 30 6.7 78.3 71.7 0.216 
2.5Gb 10 1250 123 6765 30 6.7 78.3 71.7 0.216 
5Gb 10 2500 248 13640 30 6.7 78.3 71.7 0.216 
          100MHz 15 40 2 110 30 40 470 430 120 
1.25Gb 15 600 38 2090 30 6.7 78.3 71.7 0.216 
2.5Gb 15 1245 81 4455 30 6.7 78.3 71.7 0.216 
5Gb 15 2490 164 9020 30 6.7 78.3 71.7 0.216 
          100MHz 20 50 2 90 20 40 320 280 80 
1.25Gb 20 600 28 1260 20 6.7 53.3 46.7 0.144 
2.5Gb 20 1240 60 2700 20 6.7 53.3 46.7 0.144 
5Gb 20 2500 123 5535 20 6.7 53.3 46.7 0.144 
          100MHz 40 50 1 35 10 40 220 180 50 
1.25Gb 40 600 13 455 10 6.7 36.7 30 0.1125 
2.5Gb 40 1240 29 1015 10 6.7 36.7 30 0.1125 
5Gb 40 2480 60 2100 10 6.7 36.7 30 0.1125 
          100MHz 100 110 1 25 5 40 170 130 40 
1.25Gb 100 600 4 100 5 6.7 28.3 21.7 0.09 
2.5Gb 100 1200 10 250 5 6.7 28.3 21.7 0.09 
5Gb 100 2500 23 575 5 6.7 28.3 21.7 0.09 
TABLE 14 ESTIMATION TABLE 
The delay is calculated by multiplying the time of one clock and the delay produced by each 
component. On the other hand, the error correction is needed in the Virtex 4 because of the slow 
rising and falling time. However the ADC used in the Virtex 6 produces a faster rise time, so the 
possibility of committing an error due to this problem is reduced. For that reason, the delay is 
estimated without the error correction time. 
To summarize, the Virtex 6 is 20x faster that the Virtex 4. Using the same algorithm in the 
Virtex 6 only 27% of the resources are consumed instead of the 85% that is being used at the 
present time with the Virtex4. In other word, there are resources to implement more routing 
process such as the packet contention resolution. 
  
78  
 
6            RECOMMENDATION 
 
There are four proposals that might improve the performance and scalability of the project. 
1. The envelope detection can be implemented with the Hilbert transformation, which is 
governed by the following equation. X(t) is the input signal. ! ! = !"#$%&'(! ! )! + !(!)!        (Algorithms - Envelope Detection) 
The advantage is that this technique detects the amplitude in real-time. Therefore, this 
method can be used to detect M-PAM in each tone. In addition, the Hilbert 
transformation does not create another frequency like the square-law. Hence, less 
distortion is added to the signal. The disadvantage is that it consumes more resources and 
requires feedback to perform the adaptive threshold. However, if an FPGA with more 
performance is used, this technique broadens the scalability possibilities. 
 
The Simulink model can be observed in Figure 38. 
    
A)        B) 
FIGURE 38 A) HILBERT TRANSFORMATION FOR ENVELOPE DETECTION B) SUBSYTEM NOMALIZES THE SIGNAL 
2. The threshold decision has a boundary, which is fixed. However, if the payload power is 
used as a reference, a dynamic threshold can be implemented. As a result, the dynamic 
range of the device can be improved. 
3. Beside the two methods proposed on the thesis based on the filter method and the mixer 
method, a third method can be tested. The FFT can be used on the input signal and in 
each branch, by multiplying it by a binary vector; only letting pass thru the corresponding 
frequency, after this, an IFFT must be implemented. The advantage is that more tones 
can be placed; due to the multiplication by vector that behaves like a really narrow filter. 
The disadvantages are the digital process required in clock cycles terms and the resources 
consumption to implement the FFT.  
4. Improve the generated RF tones, which have substantial initial crosstalk. This can be 
done by using a band-pass filter in each RF tone at the transmitter, thus limiting each 
tone band and therefore the crosstalk.  
 79 
 
7         CONCLUSION 
 
The thesis was focused on the implementation of a low latency and scalable label processor 
being part of an NxN optical switch that can improve the latency and bandwidth of the computer 
communication networks within a datacenter. The label processor needs to accomplish several 
requirements. Those requirements are: fast processing of the labels to minimize the delay and 
large number of labels processing for controlling scalable optical switches with large number of 
ports.  
This thesis demonstrated the successful FPGA implementation of such label processor. 
Characterization in optical system lab trials confirmed the performance and scalability of the 
label processor. 
The new label technique exploiting parallel processing of the encoded labels. Each label is 
encoded by parallel RF tones (bits of the label) carried by an optical wavelength in-band with the 
payload. This label format does not require a serial to parallel circuit and thus a clock recovery 
circuitry, which results in a reduced total delay. The parallel processing requires almost a 
constant processing time regardless the amount of labels.  
The test supported up to 50 tones in parallel therefore encoding 250 addresses. If this 50 bits code 
were sent in serial to the FPGA, it would have taken more than 500ns only for converting it to 
parallel in order to be processed. On the other hand, the new technique allows adding N number 
of tones with a fix delay due to the parallel processing. This has resulted in a reduced total 
process delay. In the reported experimental results, it was demonstrated that the scalability is not 
limited by the concept of the new labeling technique, rather by the performance of the employed 
FPGA.  
Two different algorithms were tested within the FPGA to optimize the process delay. The first 
one, called the Filter Method, introduces a total delay of 470 ns to process labels with 3 tones 
spaced in frequency by 10 MHz. The FPGA resources limit the amount of RF tones. This is due 
to the sampling frequency of the ADC that was 100MHz, providing 50 MHz of operational band. 
This narrow band has forced to design the label processor and the tones with only 10 MHz 
spacing. This means that very narrow filters are required to separate the RF tones with low 
crosstalk. However, to achieve narrow band operation, high order filters should be implemented.  
This will result in large FPGA resources utilization and also in large delay to the overall label 
processing. However, if ADCs with a higher sampling rate would be employed, it will provide a 
larger operational band, allowing more tones to be added at a larger spectrum distance, reducing 
the order of filters and also the total delay. 
The Filter Method was tested in the laboratory and the behavior was characterized under 
different scenarios. As a result, the input signal needs to accomplish several requirements such 
as: a minimum of 9.1dB of SNR, a 30dB of OSNR and a minimum of 63mV input voltage per 
80  
 
tone. These boundaries also limited the maximum number of tones. Based on the experimental 
results, up to 37 tones can be implemented by using an FPGA with ADC with higher sampling 
frequency. This allows to address more that 237 possible ports of the optical switch.   
The second method, called the Mixer Method, has a total delay of 370ns. This method has better 
performance than the previous one in terms of delay, since it does not required higher order 
filters. However, it is less scalable due to the input power required. The Mixer Method was also 
tested in the laboratory under the same conditions and scenarios as the Filter Method, and as a 
result, the minimum SNR for correct detection is 9.5dB, a 20dB of OSNR is supported and 
170mV input voltage per tone is required. Based on the experimental results, up to 15 tones can 
be implemented by using an FPGA with ADC with higher sampling frequency. 
The laboratory results concluded that the filter method is more suitable to be implement in an 
FPGA.  
Finally, the possible performance of the algorithms was estimated by using a state of the art 
FPGA. By using a Virtex 6 processor with a clock cycle of 600 MHz and an ADC with sampling 
frequency up to 5 GHz, the total delay can be reduced from 470ns to 30ns and the number of RF 
tones can be scaled up to 60. This means that instead of 100 ns that could take only to serial to 
parallel convert the 60 serial bits, only 30 ns are required to fully process the 60 bits with the 
new labeling technique proposed in this thesis. 
 
  
 81 
 
8            REFERENCES 
 
Algorithms - Envelope Detection. (n.d.). Retrieved 2011 27-April from Numerix-dsp: 
http://www.numerix-dsp.com/envelope.html 
Calabretta, N., Jung, H., Herrera, J., Tangdiongga, E., Koonen, T., & Dorren, H. (2009). All-
Optical Label Swapping of Scalable In-Band Address Labels and 160-Gb/s Data Packets. Vol 27 
(1). 
Calabretta, N., Wang, W., Ditewig, T., Raz, O., Gomez, F., Zhang, S., et al. (2010). Scalable 
Optical Packet Switches for Multiple Data Formats and Data Rates Packets. Paper We 6.3.4 
Luo, J., Yang, H., Calabretta, N., & Dorren, H. (2011). Scalable and Low Latency Label 
Processor for Large Port Optical Packet Switch. COBRA Research Institute, Eindhoven 
University of Technology, Eindhoven, The Netherlands. 
Xilinx, I. (2008 1-March). Support: System Generator for DSP. Retrieved 2011 16-February 
from Xilinx, Inc.: http://www.xilinx.com/support/sw_manuals/sysgen_gs.pdf 
Xilinx, I. (2010 30-August). Support: Virtex-4 Family Overview. Retrieved 2011 16-June from 
Xilinx, Inc.: http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf 
Xilinx, I. (2008 1-December). Support: Virtex-4 FPGA User Guide. Retrieved 2011 15-February 
from Xilinx, Inc.: http://www.xilinx.com/support/documentation/user_guides/ug070.pdf 
Xilinx, I. (2011 24-March). Support: Virtex-6 Family Overview. Retrieved 2011 16-June from 
Xilinx, Inc.: http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf 
Xilinx, I. (2005 9-March). Support: XtremeDSP Development Kit-IV User Guide. Retrieved 
2011 20-February from Xilinx, Inc.: 
http://www.xilinx.com/support/documentation/boards_and_kits/ug_xtremedsp_devkitIV.pdf 
Xilinx, I. (2008 15-May). Support: XtremeDSP for Virtex-4 FPGAs. Retrieved 2011 3-March 
from Xilinx, Inc.: http://www.xilinx.com/support/documentation/user_guides/ug073.pdf 
 
  
82  
 
9          APPENDIX 
Appendix 1: XtremeDSP for Virtex-4 Start guide 
1.-The Virtex 4 provides a CD called FUSE Probe tool by Nallatech, it’s used for configuring the 
FPGA. Install the drivers for the FPGA and the FUSE Probe software. 
 
2.- After installing the FUSE Probe tool, Matlab with the signal processing tool box is required. 
3.- Install the ISE Design suite, ensuring that you add the System generator module. 
 
4.- Xilinx, Inc. doesn’t support the ISE Design 12.x or older versions running under windows 
vista or 7. For that reason a set of environment variables have to be created and configured. 
 83 
 
5.- To create these variables, go to My computer -> properties ->  Advanced system settings -> 
Environment Variables and then add the following variable.  
DSP_CACHE_DIR = C:/temp/sysgen_cache 
 
6.- Start up System Generator on Matlab Simulink environment. 
7.- The Simulink Library Browser should have the Xilinx Blockset installed. 
 
84  
 
8.- Any Simulink Model must use the System Generator Box, as a restriction imposed by Xilinx. 
It is located under Xilinx Blockset/Basic Elements. 
9.- The System Generator tool will create the bitstream for the FPGA. 
 By double clicking the Box, the 
configuration menu is shown.  
The configuration is the following: 
Compilation: Bitstream 
Part: Virtex4 xc4vsx35-10ff668 
Target directory: default 
Synthesis tool: default 
FPGA clock period: 10 assign the 
speed, it will approx to value to the 
possible inside clock. 
Clock pin location: B15 (Virtex 4) 
Simulink system period: is used for simulation, the value “1” comes by default. It is a relative 
value compared to the FPGA clock period. One means that the simulation goes at the same speed 
as the FPGA internal clock. 
10.- Once the Simulink model is ready, click generate on the System Generator Box. The 
bitstream will be saved in the folder of the Simulink model. 
11.- The XtremeDSP Kit has two FPGA, the Virtex II manages all the clocks. For that reason 
two bitsreams are provided inside the CD XtremeDSP Kit. One call osc_clock_2v80.bit, this bit 
file implements the internal clock. And the other one called ext_clock_2v80.bit, which 
configures the FPGA for an external clock. 
12.-  Connect the FPGA via USB to the PC. 
13.- Inside the FUSE Probe app, click on “open card” at the bottom.   Then add the USB 
interface. And Open the Card. 
  
 85 
 
 
14.- As a result, the two cards are opened. 
  
15.- Right click over the Virtex2. Assign the Bitfile. This board is the one that uses the clock 
bitfile. Depending on the clock type to use. You must choose between the two types provided. In 
this case, the osc_clock_2v80.bit was used. 
 
16.- Repeat the procedure with the bitfile created by the Matlab over the Virtex4 FPGA. 
 
17.- Finally right click over the Nallatech BenADDA. Then on Configure all cards. As a result 
the FPGA is configured. 
 
18.- Once the FPGA was used. Click on close card.  
  
86  
 
 
Appendix 2: System generator tutorial for Virtex 4 
 
1) Any design on Simulink need to have the System generator block. The configuration is 
explained in the appendix 1. 
 
2) The ADC block are configure as follow. The XtremeDSP kit has two different ADCs. Right 
click over the ADC1 block that is found in the Simulink library/Xilinx XtremeDSP kit. 
 Click on Look under Mask. 
 
 
 
 
 
 
 
 
 
 
 
 87 
 
3) Double click over the adc1_d, and the implementation tab writes the correct coordinates from 
MSB to LSB. There are found in the XtremeDSP manual. The procedure is the same for the 
ADC2.  
 
 
4) The two DACs are configured similar to the ADC. The DAC are found in Simulink 
library/Xilinx XtremeDSP kit. Right click over the DAC and click over the Look under Mask. 
Finally double click over the dac1_d block and write the coordinates. 
 
 
88  
 
 
 5) Configure the rest of the parameters with the same procedure following the table. The 
PLLLOCK doesn’t need to be configured. 
6) To configure the I/O digital pins, the Simulink block is found Simulink library/Xilinx 
Blockset/Basic elements. Each pin can be configured as an input or output. Depends of the block 
drop in the Simulink Model. 
 
7) Double click over the Gateway in or Gateway out and assign the pin following the 
coordinates. There are found in the XtremeDSP kit manual. 
 89 
 
                   
8) To create a FIR filter using the method that was used in this project is need it the FIR 
compiler. The fir compiler is located in Xilinx Library/Xilinx Blockset/DSP. Also, is needed it 
the FDA tool that is located at the same place. 
 
9) Connect the Fir compiler to a register. The register is located in Xilinx Library/Xilinx 
Blockset/Basic elements. Double click the register and provide enable port. 
 
10) Create the fir filter design using the FDATool and export the coefficients.  
90  
 
 
11) The name that is given to the coefficients is the one that will be used on the FIR compiler. In 
this case LPF. 
 
12) Double click the FIR compiler, and write on the coefficients text box the name that was 
given in the FDATool.  Using the Hardware_Oversampling_Rate to “1” the filter 
implementation is a fully parallel fir filter. The rest of parameter will deepened of the Simulink 
model. 
 
 91 
 
Appendix 3: Decoder code 
Using the Mcode block for Xilinx toolbox in the Simulink environment it is possible to write 
code in the Matlab syntax and the system generator convert the code to VHDL. 
  
function [x1,x2,x3,x4,x5,x6,x7] = decode(f1,f2,f3)  
 
   if (xl_concat(f3,f2,f1)==1) 
 
           x1=xfix({xlBoolean},1); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},0); 
           x7=xfix({xlBoolean},0); 
 
  elseif (xl_concat(f3,f2,f1)==2) 
 
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},1); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},0); 
           x7=xfix({xlBoolean},0);   
  
  elseif (xl_concat(f3,f2,f1)==3) 
 
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},1); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},0); 
           x7=xfix({xlBoolean},0); 
 
  elseif (xl_concat(f3,f2,f1)==4) 
 
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},1); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},0); 
92  
 
           x7=xfix({xlBoolean},0); 
 
  elseif (xl_concat(f3,f2,f1)==5) 
 
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},1); 
           x6=xfix({xlBoolean},0); 
           x7=xfix({xlBoolean},0); 
 
  elseif (xl_concat(f3,f2,f1)==6) 
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},1); 
           x7=xfix({xlBoolean},0); 
 
  elseif (xl_concat(f3,f2,f1)==7) 
 
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},0); 
           x7=xfix({xlBoolean},1); 
 
  else 
      
           x1=xfix({xlBoolean},0); 
           x2=xfix({xlBoolean},0); 
           x3=xfix({xlBoolean},0); 
           x4=xfix({xlBoolean},0); 
           x5=xfix({xlBoolean},0); 
           x6=xfix({xlBoolean},0); 
           x7=xfix({xlBoolean},0); 
      
   end 
  
 93 
 
ACRONYMS 
 
FPGA Field-Programmable Gate Array 
IQ In-Phase/Quadrature 
HDL Hardware Description Language 
ASIC Application-Specific Integrated Circuit 
DSP Digital Signal Processing 
FIR Finite Impulse Response 
ADC Analog Digital Convertor 
FBG Fiber Bragg Grating 
PAM Pulse Amplitude Modulation 
RF Radio Frequency 
I/O Input/Output 
LSB Less Significant Bit 
AGC Automatic Gain Control 
OSNR Optical Signal Noise Ratio 
AWG Arbitrary Wave Generator 
EDFA Erbium Doped Fiber Amplifier 
OSA Optical Spectrum Analyzer 
ATT Attenuator 
PM Power Monitor 
Rx Receiver 
O/E Optical To Electrical 
SNR Signal Noise Ratio 
RAM Random Access Memory 
FFT Fast Fourier Transform 
IFFT Invert Fast Fourier Transform 
 
  
94  
 
ACKNOWLEDGEMENTS 
 
I take this opportunity to express my most sincere gratitude to all the people that helped me thru 
out the realization of this project, without them, this project couldn’t be possible. 
 
First of all, I want to say thanks to Prof. Dr. Harm .J.S. Dorren and the entire ECO group to 
welcome me and give me the opportunity to be a part of an innovative project. I want to give 
special thanks to Dr. Nicola Calabretta, for his support and guidance thru the entire project. 
Since the beginning of the project he has given me the right ideas for the next steps, once again 
thank you. I also want to thank Jun Luo, for explaining me some valuable concepts that helped 
me with the completion of the thesis. I want to thanks Stefano Di Lucente for all the help that he 
gave me throughout the project. Thanks to Roy. G.H. van Uden and Julio Ramirez for helping 
me with the FPGA at the beginning of the project. 
 
There is no way on how to begin giving thanks to my parents, they give me everything and make 
it possible for me to write this words, they supported me during the entire project, making me 
feel like they where here. I wanted to thanks my brother above all things, because he doesn’t 
only helped me with the thesis, he always have been my guide during the entire degree, and also 
to my entire family, for the confidence that they gave me during the last years. Once and again 
thank you all. 
 
I want to say thanks to Daniela Petersen that helped me beyond the thesis and supported me day 
and night. I would like to say thanks to my friends in Spain for accompanying me throughout the 
degree and also by helping me on surviving it, also to my friends in Venezuela, they are like a 
family to me. I want to thanks my office mates and to all the people who stumbled into my life, 
because affinities and differences between us helped me to be who I am today. 
 
Thanks. 
 
 
