NIKEL: Electronics and data acquisition for kilopixels kinetic
  inductance camera by Bourrion, O. et al.
Preprint typeset in JINST style - HYPER VERSION
NIKEL: Electronics and data acquisition for
kilopixels kinetic inductance camera
O. Bourriona∗, C. Vescovia, J.L. Boulya, A. Benoitb, M. Calvob, L. Gallin-Martela,
J.F. Macias-Pereza, A. Monfardinib .
aLaboratoire de Physique Subatomique et de Cosmologie,
Université Joseph Fourier Grenoble 1,
CNRS/IN2P3, Institut Polytechnique de Grenoble,
53, rue des Martyrs, Grenoble, France
bInstitut Néel, CNRS/UJF,
25 rue des Martyrs, Grenoble, France
ABSTRACT: A prototype of digital frequency multiplexing electronics allowing the real time moni-
toring of microwave kinetic inductance detector (MKIDs) arrays for mm-wave astronomy has been
developed. Thanks to the frequency multiplexing, it can monitor simultaneously 400 pixels over a
500 MHz bandwidth and requires only two coaxial cables for instrumenting such a large array. The
chosen solution and the performances achieved are presented in this paper.
KEYWORDS: Instruments for CMB observations; Electronic detector readout concepts; Data
acquisition concepts..
∗Corresponding author.
ar
X
iv
:1
20
4.
14
15
v2
  [
as
tro
-p
h.I
M
]  
8 J
un
 20
12
Contents
1. Introduction 1
2. Instrumentation methodology 2
3. Hardware development 3
4. Polyphase filter design and full chain simulation 6
4.1 Theoretical formulation 6
4.2 Polyphase filters 7
4.3 Optimized reconstruction 8
4.4 Full chain simulation 8
4.4.1 Excitation DAC 8
4.4.2 Polyphase filter 9
5. Firmware development 12
5.1 FPGA ‘split’ description 12
5.2 FPGA ‘proc’ description 14
6. Prototype performance 17
6.1 System frequency response 17
6.2 System noise 18
7. Conclusion 19
1. Introduction
Microwave kinetic inductance detectors (MKIDs) have proven to be a solid working alternative to
traditional bolometers for millimeter and sub-millimeter astronomy [1, 2, 3, 4]. MKIDs are com-
posed of high-quality superconducting resonant circuits electromagnetically coupled to a transmis-
sion line. They are designed to resonate in the microwave domain [5, 6, 7]. For astronomical
applications, the resonances typically lie between 1 to 10 GHz and have loaded quality factors
around QL = 105, corresponding to a typical bandwidth of ∆f = f/QL ∼ 10− 100 kHz. Provided
that the MKID resonant frequencies can be easily adjusted by layout design, it is possible to couple
a large number of MKIDs with different resonance frequencies to a single transmission line [8].
Indeed, a large number of MKIDs can naturally be read out by a frequency-based multiplexing
system with no loss of performance [9]. In practice, the average frequency spacing between res-
onators is between 1 and 2 MHz [1]. Thus, in order to ensure the largest sky coverage and overall
signal to noise per unit of time with a reduce number of cables (few) feedthrough to the cryostat,
– 1 –
the analog bandwidth and the number of detectors (resonators) managed by the electronics must
be maximized. At this respect, we present here a building block for the NIKA camera [1, 2] that is
able to monitor simultaneously 400 pixels over a 500 MHz bandwidth.
2. Instrumentation methodology
The instrumentation setup used for NIKA and its associated electronics is extensively described in
[9]. In summary, the excitation frequency comb is generated at baseband in the electronics using
coordinate rotation digital computer (CORDIC), up-converted with an IQ mixer to the 1 to 10 GHz
frequency and injected in the resonator line. The returning and thus modified frequency comb is
down-converted and analyzed by channelized Digital Down Converters (DDC) to determine each
tone amplitude and phase. Aside from good signal to noise ratio (SNR) on the whole chain, the
first limitation on the number of MKIDs managed by this solution is given by the digital to analog
converter (DAC) and the analog to digital converter (ADC) bandwidths. The second constraint
comes from the computing power limitation. For a FPGA (Field Programmable Gate Array), the
computational power is determined by the available amount user logic and multiplier block times
their maximum running frequency. Indeed, thanks to the inherently achievable parallelization in
FPGAs, this figure is much larger compared to DSPs that have only a few Multiplier Accumulators.
Starting from the previous version, which was able to manage a line of 128 tones over a
bandwidth of 125 MHz, three solutions are possible to increase the multiplexing factor per line. The
first solution would be to juxtapose several of the previous electronic boards, each one managing
its share of bandwidth, see figure 1.
Unfortunately, the analog filters required to separate each share of bandwidth before down-
converting have such a stringent separation requirement to avoid crosstalk due to image frequencies
that they cannot be constructed.
The second option is to use faster ADCs an DACs combined to a larger computing power
(FPGA) in order to directly cover a larger bandwidth. Following this path, two concurrent ap-
proaches still remain. The first “obvious” solution is to directly generate the frequency comb at
twice the desired bandwidth and to perform channelized DDC with the ADC signal. Unfortunately,
due to the frequency limitation of state of the art FPGAs this can only be achieved by performing
massive design pipelining on both sides, excitation and analysis, and therefore makes it extremely
complicated.
The third option, which we have chosen, is to use modern DACs featuring digital modulator
and interpolator followed by very steep half-band filters for generating the excitation comb. With
these, the total frequency bandwidth to cover is split into smaller bands where the frequency combs
can be computed at a moderate frequency, digitally up-converted and filtered to avoid unwanted
spurious frequencies. Finally, each band contribution is then summed before being up-converted to
the frequency band of interest by an IQ up-mixer. At reception side, the returning signal is down
converted to baseband and is digitized by a fast ADC. Then, the digitized signal goes through
a polyphase filter bank with equal bandwidth overlapping bands. This filter, has the ability to
separate the total bandwidth in five smaller frequency bands and to down convert each of them to
baseband. The sub frequency bands are chosen such as to match the excitation bands. The filter
outputs are fed to the corresponding channelized DDC in order to be analyzed. The benefit of this
– 2 –

	












	




	







		








	

Figure 1. Overview of the setup using the juxtaposition of several electronics to monitor a MKID array.
Each electronics generating the two frequency combs (each tone phase shifted by 90° between I and Q)
is followed by an IQ up-mixer. The excitation combs up-converted at high frequencies are summed and
the resulting signal is fed to a programmable attenuator for power adjustment. After passing through the
cryostat and the low noise boost amplifier each share of bandwidth is separated by highly selective filters
before passing through the down-mixers and returned to the corresponding electronics.
architecture is to limit the massive pipelining to the polyphase filter part, and thus, to dramatically
reduce the required amount of user logic for the frequency comb generation and the channelized
DDC.
3. Hardware development
Following section 2, a dedicated hardware, the New Iram KID ELectronics (NIKEL), able to man-
age 400 resonators over a bandwidth of 500 MHz was developed. NIKEL is designed such as to
manage five adjacent bands of 100 MHz. This choice was driven by the chosen DAC capabilities
(AD9125 from analog devices). As shown in figure 2, the NIKEL electronic board is composed
of a central FPGA (labeled ‘split’) which receives the 12 bit ADC (ADS5400 from Texas Instru-
ments) output data flow at 1 GSPS and of five processing FPGAs (labeled ‘proc’). Each of the latter
is driving its associated DAC with the adequate frequency comb which can feature up to 80 tones.
The ‘proc’ FPGA is connected to the ‘split’ FPGA with two links. The first of these, labeled ‘fake
ADC’, is a 12 bit parallel LVDS link running at 250 MSPS that is carrying the part of bandwidth
corresponding to the excitation signal. The second link, labeled ‘GTX link’, is periodically (at
∼953 Hz) conveying the 80 DDC results over a 2 Gb/s serial link. The six FPGAs are from the
same founder (Xilinx XC6VLX75T-2FFG484C). They provide a satisfactory amount of available
user logic, coupled to a sufficiently large Multiplier Accumulator block (MAC) count. They also
feature eight high speed serial links.
An additional slow speed DAC, driven by the ‘split’ FPGA, is implemented in order to be
able to provide a ∼500 Hz modulation signal. Provided that the board can be clocked with a
reference clock, a bidirectional port was provided to allow synchronization between several boards
– 3 –

	
	






 

  





 !

"	#
$!
%
	!&

'()

*
)

 !
  
++
,

- 

 

	

 	
 
	
.
),


	
/0

)/
1


	
/0

)/
1


	
/0

)/
1


	
/0

)/
1


	
/0

)/
1
/
( '
!
"
 
!


 2


	



 


 
3-!.3
41
3*5.36
	.
 --

+7		.!18$9
(1-

(1-

	
(-
33
+7:+		.!8$-


-!	

'
!.
'
!.
'
!.
'
!.
'
!.

!.

	
!2
!	-; 
!

Figure 2. The electronic board is composed of a central FPGA (labeled ‘split’) which receives the 12 bit
ADC (ADS5400 from Texas Instruments) output data flow at 1 GSPS and of five processing FPGA (labeled
‘proc’). Each of the latter is driving its associated DAC with the adequate frequency comb which can feature
up to 80 tones. The ‘proc’ FPGA is connected to the ‘split’ FPGA with two links. The first of these, labeled
‘fake ADC’, is a 12 bit parallel LVDS link running at 250 MSPS that is carrying the part of bandwidth
corresponding to the excitation signal. The second link, labeled ‘GTX link’, is periodically (at ∼953 Hz)
conveying the 80 DDC results over a 2 Gb/s serial link. An additional slow speed DAC, driven by the
‘split’ FPGA, is implemented to be able to provide a 500 Hz modulation signal. The communication with
the hardware is ensured via a USB2 capable micro-controller and an interface FPGA that accommodates
different voltage levels.
performing acquisition on the same kilo-pixel camera. When using several NIKEL electronics,
one board must be configured as master and provide the synchronization signal, while the others
are configured as slaves and should start their acquisition upon reception of this synchronization
signal.
The communication with the hardware is ensured via a USB2 capable micro-controller and
– 4 –
an interface FPGA that accommodates the different voltage levels. It allows the dynamic FPGA
reconfiguration, the tone frequencies adjustments and the data readout.
A picture of the board can be seen in figure 3. It is a 14 layers PCB having a dimension of
184 mm×153 mm. The inner dielectric layers are made of traditional FR4 epoxy while the outside
layers consist of RO4350 high frequency circuit [10] that have lower dielectric losses and therefore
are well suited to accommodate the 2 Gb/s serial links, the DAC outputs that provides samples at
1 GSPS and the ADC input signal.
 

	





 	


 	



	



	



	



	







 

!

"
# $



 
%
!
!


 


&'
(
)*

+
,-&
+
Figure 3. Picture of the NIKEL board. It is a 14 layers PCB having a dimension of 184 mm×153 mm.
Due to to the extensive FPGA resource usage and their running frequency (250 MHz) special
care was taken in designing the electronic board power supply. Indeed, each FPGA core supply
draws a current of about 5 A when all tones are activated. Thanks to the usage of DC/DC converters
the total current drawn on the input power supply is below 20 A, thus a maximum required total
power of 100 W (or 0.25 W per channel).
– 5 –
4. Polyphase filter design and full chain simulation
As introduced in sections 2 and 3, the received signal, sampled at 1 GS/s, must be decomposed in
five 250 MS/s data streams, each stream having a useful bandwidth of 100 MHz in order to cover
the 0-500 MHz full bandwidth. Frequency modulation/demodulation is a well documented digital
signal processing technique [11] for data transmission (channelization) or audio/image coding ap-
plication. Those classical techniques, based on Discrete Fourier Transform (DFT), Modified DFT
(MDFT) or Cosine Modulated filter banks, suffer from aliasing and data distortion mainly due to
the critical sampling of created sub-bands. Overlapping polyphase filter banks as described in [12],
offer a computationally efficient solution and have only two drawbacks for our application: a first
sub-band which is half the bandwidth of the others and the last sub-band (also half bandwidth) is
not usable. The same technique can however be adapted to the wanted filter bank specificity, with
an acceptable increase of complexity.
4.1 Theoretical formulation
Digital filter banks implementations are often non-intuitive, but are however composed of simple
successive digital signal processing blocks, re-arranged in different form to increase computing
efficiency. The simplified processing for each band of the filter bank is described hereafter. At
first, a frequency shift is performed to translate the band of interest around 0 Hz. Then a low pass
filtering followed by decimation is applied to select the frequencies of interest. This paragraph
described the basic blocks arrangement involved in the specific processing used here.
The input data stream is a real signal, sampled at Fsi = 1GS/s where four consecutive samples
are presented at the filter bank input at each system clock cycle (250 MHz) while the filter bank
outputs five different samples, one for each output band. The signal processing for each band k
(k=0..4) is done in five consecutive steps. An illustration is provided for band k=2 in figure 4 and
the operations are described hereafter:
1. Perform an input signal frequency shift of −(2k+1) ·Fsi/20 where Fsi/20 =50 MHz. This
is obtained by multiplying the input signal by the complex exponential e−jpi(2k+1)n/10 where
n is the input signal sample index.
2. Filter the complex signal by a low pass Finite Impulse Response (FIR) filter having a pass-
band of Fsi/20 and a maximum rejection after Fsi/10+Fsi/80 = Fsi/16.
3. Decimate the result by a factor of 4. The new data rate then becoming Fso = Fsi/4 and the
resulting filtered signal bandwidth [−Fso/4,Fso/4].
4. Up convert by Fso/4. In practice, realized by multiplying the previous signal by e− jpim/2
where m = n/4 is the sample index of the decimated data stream. The resulting complex
signal covers the frequency band [0,Fso/2]
5. Finally, keep only the complex signal real part. This will add the complex conjugate negative
frequency image in the frequency plane. The output real signal is then correctly sampled at
Fso without aliasing.
– 6 –
         
	





	








		









  

	













 






 
	

	










 














	
	



	






 















 










 























Figure 4. Illustration of the polyphase filtering algorithm detailed for band k=2.
For a given tone c (frequency fck), located in the kth band of the input data stream can have its
frequency expressed as fck = Fsi/10 · k+ fc. Due to the whole processing, it should be noted this
tone will not appear in the kth filter bank output at the frequency fc, but at fc+Fso/20. Consequently,
the tones used for KID excitation must present a frequency shift of−Fso/20 with respect to the one
used for performing DDC on the returning signal provided by the filter.
In practice, the FIR filter do not have to be as steep as noted in step 2 given the fact that any
aliasing causing frequency folding in the useless sidebands causes no harm. Consequently, a poorer
rejection up to Fsi/16+Fsi/80 = 3Fsi/40 can be tolerated and greatly eases the FIR filter design.
Unfortunately, this processing is very inefficient in this direct form for several reasons. For
instance, the filtering is done for each band and on complex data. Furthermore, resource consuming
FIR filtering is performed on the frequency shifted data, but it is followed by decimation. In
other words, samples are computed needlessly. These computing inefficiencies can be considerably
improved by grouping the different frequency shifts and by using polyphase filters.
4.2 Polyphase filters
If x(n) is the input sample signal, x
′
k(n), the frequency shifted data stream for the band k, k=0..4, is
expressed by equation 4.1.
x
′
k(n) = x(n) · e−2 jpi
(2k+1)n
20 (4.1)
The output, x
′′
k(n), of the low pass FIR filter with coefficients a(p) is then
x
′′
k(n) =∑
p
a(p) · x′k(n− p) = e−2 jpi
(2k+1)n
20 ∑
p
a(p) · x(n− p)e2 jpi (2k+1)p20 (4.2)
Provided that the filtered signal is down-sampled by a factor of 4, x
′′
k(n) can be only computed
for n = 4m. By decomposing the filter into a 20 phases polyphase filters, where the coefficients
– 7 –
index p is given by p= q+20r with q=0..19, equation 4.2 can be written in the following form
x
′′
k(m) = e
−2 jpi (2k+1)m5
19
∑
q=0
e2 jpi
(2k+1)q
20 ·wq(m) (4.3)
where wq(m) is the output of the qth phase polyphase filter.
wq(m) =∑
r
a(q+20r) · x(4m−q−20r) (4.4)
The final step is to up convert the signal by Fso/4, which is equivalent to a complex multipli-
cation by jm and taking the real part of the resulting complex number
yk(m) = Re
[
e j
pi
2 m · e−2 jpi (2k+1)m5 ·
19
∑
q=0
e jpi
(2k+1)q
10 ·wq(m)
]
(4.5)
The use of the polyphase decomposition of the FIR filter considerably reduces the computation
cost. However, it can be seen in equation 4.5 that a lot of calculation still need to be done on
complex numbers before keeping only the real part. This leaves some margins for optimization.
4.3 Optimized reconstruction
Since input and outputs of the polyphase filter banks are real signals, it is possible to perform all
computation only on real numbers. Equation 4.5 can be re-written as
yk(m) = Re
[
19
∑
q=0
e j
pi
10 [5m+(2k+1)(q−4m)] ·wq(m)
]
(4.6)
We can change the order of the polyphase filter outputs wq(m) in the sum by introducing new
data streams w
′
l(m) = wq(m) with l = q+(m mod 20). Due to the 2 jpi periodicity of the complex
exponential function, the output of the filter bank can be expressed by the following formula:
yk(m) = Re
[
(−1)mk ·
19
∑
l=0
e j
pi(2k+1)l
10 ·w′l(m)
]
= (−1)mk ·
19
∑
l=0
cos
(
(2k+1)lpi
10
)
·w′l(m) (4.7)
This simple rotation of the polyphase filter outputs orders, greatly simplify the formula. More-
over, each filter bank output can now be computed without complex arithmetics.
4.4 Full chain simulation
4.4.1 Excitation DAC
In order to validate the DAC choice and to select its best configuration for each band, that are the
digital modulator frequency and the half-band filters to engage, the DAC behavior was simulated.
Indeed, the Frequency Tuning Word (FTW) allowing the configuration of the modulator frequency
is given by the following formula FTW =
fcarrier
fnco
×232, where fnco is 500 MHz. Ideally, it is desired
to have five frequency bands and thus five different carrier frequencies going from 0 to 400 Mhz
in steps of 100 MHz. Consequently, the first approach would be to select these exact values that
– 8 –
are perfectly suited to fit with the half-band filters. Unfortunately, these carrier frequencies would
yield real FTW instead of integer FTW. Using these rounded values would induce a small offset in
frequency which would be observed as a 2pi/232 phase shift every 232 clock cycles. Consequently,
the carrier frequencies were adequately chosen to obtain a
fcarrier
500 MHz
ratio of 0, 7/32, 13/32, 19/32
and 26/32 yielding integer FTW values.
Given the fact that the manufacturer provided the DAC half-band filter coefficients, a thorough
simulation of the DAC, having the appropriate filters selected, was conducted for the five excitation
bands. The results confirm that non optimal carrier frequencies are acceptable. In particular, the
flatness is slightly degraded while the ripple remains below 0.06 dB.
This mandatory carrier frequency shift with respect to the ideal value, must be pre-compensated
in the FPGA ‘proc’ building the excitation frequency comb by a digital modulator that apply a fre-
quency shift in the opposite direction to virtually obtain carrier frequencies at the requested values
(from 0 to 400 MHz). The required compensations, expressed as a ratio of Fso, are respectively: 0,
-3/80, -2/80, +2/80 and -4/80.
This shift is accomplished in the meantime as the frequency shift of −Fso/20 needed to com-
pensate the polyphase filter bank induced shift (see section 5.2). Consequently, the final required
compensations, again expressed in ratio of Fso are : -1/20, -7/80, -4/80, -3/80 and -3/40. In practice,
these are implemented with 80 values sine and cosine table feeding digital modulator.
4.4.2 Polyphase filter
Likewise to the excitation DAC, the polyphase filter was simulated in order to assess its perfor-
mances and to find the best implementation options matching the FPGA available resources. Dur-
ing the firmware design, the mathematical simulation tool was used to build stimulus files and
reference filter output that were used by the VHDL simulation tool to speed up the design and
validate the firmware implementation of the filter.
The simulation was also an asset in designing the FIR filter used. As shown in figure 5, the
selected FIR has a good flatness over the useful bandwidth (<0.01 dB). The choice was made to
concede a larger than specified transition band [50-75 MHz] while having an excellent rejection
(-170 dB) in the stopband. As explained in section 4.1, possible resulting aliasing does not impact
DDC performances in the useful bandwidth. Additionally, the quantization noise due to the use of
the fixed point Multiplier ACcumulator (MAC) was evaluated and confirmed to be negligible with
respect to the quantization noise of the ADC.
A full polyphase filter simulation, where three tones (205 MHz, 250 MHz, 299 MHz) are in-
jected at the filter input, is shown in figure 6. The top left figure shows the input signal spectrum,
and the other plots show the frequency content of each output of the polyphase filter. It can be
observed that the expected tones lie in the expected band k=2, while the spurious appearing in band
k=1 and k=3 are in their rejected side bands, i.e above 200 MHz for band k=1 and below 300 MHz
for band k=3.
– 9 –
-200
-150
-100
-50
0
0 50 100 150 200 250 300 350 400 450 500
FIR frequency response
frequency (MHz)
m
a
gn
itu
de
 
(d
B)
-0.10
-0.08
-0.06
-0.04
-0.02
0.00
0.02
0 10 20 30 40 50
FIR passband zoom
frequency (MHz)
m
a
gn
itu
de
 
(d
B)
-200
-150
-100
-50
0
50 55 60 65 70 75 80
FIR t ransit ion band zoom
frequency (MHz)
m
a
gn
itu
de
 
(d
B)
Figure 5. Simulation of the selected FIR filter. Top left figure shows the global filter response. Top right
figure, shows the gain fluctuation in the passband and bottom figure shows the steep rejection after the
passband.
– 10 –
-90
-80
-70
-60
-50
-40
-30
-20
-10
-0
0 50 100 150 200 250 300 350 400 450 500
Input : 0-500 MHz
frequency (MHz)
PS
D
 
(d
B 
FS
)
-90
-80
-70
-60
-50
-40
-30
-20
-10
-0
0 20 40 60 80 100
Output  for band k= 0: 0-100 MHz
frequency (MHz)
PS
D
 
(d
B 
FS
)
-90
-80
-70
-60
-50
-40
-30
-20
-10
-0
100 120 140 160 180 200
Output  for band k= 1: 100-200 MHz
frequency (MHz)
PS
D
 
(d
B 
FS
)
-90
-80
-70
-60
-50
-40
-30
-20
-10
-0
200 220 240 260 280 300
Output  for band k= 2: 200-300 MHz
frequency (MHz)
PS
D
 
(d
B 
FS
)
-90
-80
-70
-60
-50
-40
-30
-20
-10
-0
300 320 340 360 380 400
Output  for band k= 3: 300-400 MHz
frequency (MHz)
PS
D
 
(d
B 
FS
)
-90
-80
-70
-60
-50
-40
-30
-20
-10
-0
400 420 440 460 480 500
Output  for band k= 4: 400-500 MHz
frequency (MHz)
PS
D
 
(d
B 
FS
)
Figure 6. full polyphase filter simulation, where three tones (205 MHz, 250 MHz, 299 MHz) are injected at
the filter input. The top left figure shows the input signal spectrum, and the other plots show the frequency
content of each output of the polyphase filter. It can be observed that the expected tones lie in the expected
band k=2, while the spurious appearing in band k=1 and k=3 are in their rejected side bands, i.e above
200 MHz for band k=1 and below 300 MHz for band k=3.
– 11 –
5. Firmware development
5.1 FPGA ‘split’ description
	


	

   
		
 !"##	

$%&'()
%	
!*)
##+,-

 .
 .
 .
 .
 .

	


	

	/


	

+*/%
0*
+)	1)%2
    
    
	0
!*3	/

)
0
*
)
"%)!#+,-
	0 .
%)!#+,-
4
$	$	
Figure 7. Overview of the ‘split’ FPGA firmware. The firmware is divided in two main parts. The first part,
which is the key-point of the overall design, is composed of the ADC interface, the polyphase filter bank
and the five ‘fakeADC’ outputs, each carrying its share of the bandwidth to the dedicated ‘proc’ FPGA. The
second part consists of five GTX receivers that collect the I/Q data provided by each ‘proc’ FPGA, a data
concentrator, a large FIFO and a USB interface.
The ‘split’ FPGA, shown in figure 7, contains two main parts. The first part, which is the
key-point of the overall design, is composed of the ADC interface, the polyphase filter bank and
the five ‘fakeADC’ outputs, each carrying its share of the bandwidth to the dedicated ‘proc’ FPGA.
The second part consists of five GTX receivers that collect the I/Q data provided by each ‘proc’
FPGA, a data concentrator, a large FIFO and a USB interface.
The GTX2IQ receiver blocks are designed to operate at a speed of 2 Gb/s. This is the speed
required to carry 32 bit at 50 MHz with an 8b/10b encoding. Every ∼1.05 ms (218 clock cycles at
250 MHz) a 644 bytes data frame is received (see section 5.2) and stored in a small reception buffer
(1 k word deep). Once all GTX2IQ received its data frame, the ‘data concentrator’ transfers each
link data into the global data buffer labeled ‘USB interface FIFO’ (32 k word deep) to make the
complete data frame available for data acquisition via the USB interface.
The USB interface is mostly in charge of reading out the ‘USB interface FIFO’ and thus
of performing data acquisition. The required data throughput is 644× 5× 953 = 3 MB/s. The
interface is also used to set the master/slave mode, to arm the acquisition, to select the modulation
mode and to configure and recover the status of the GTX transceiver links.
– 12 –
The DAC modulation block is used to generate an optional modulation signal which can be a
2 or 4 values modulation signal, depending whether it is desired or not to compute the sensitivity
(first derivative) and the sensitivity variation (second derivative) of the I/Q measurement. When
this block is activated, the modulation signal is modified every integration cycle. To ensure the
modulation synchronousness with the integration performed in the DDC, the initial start of the
modulation is adjustable with a resolution of 4 ns and up to one full integration cycle.
The polyphase filter bank implementation (shown in figure 8) is composed of five successive
stages. During the design, several stratagems where used to minimize the number of DSP48 blocks
used and hence to allow the filter to fit in the chosen FPGA.
The input stage is composed of a shift register bank featuring 20 registers of 12 bit. It receives
four new ADC samples every clock cycle and at the same time performs a four data samples shifting
from the newest data to the oldest. At the output of this stage the n to n−19 samples are provided
to the following stage.
The following stage is composed of 20 FIR filters, each processing one of the ‘input stage’
output. The FIR filters feature 45 taps and are implemented in the transposed direct form which
suits perfectly the possibilities offered by the DSP48 blocks inside the Virtex 6 FPGA. Given the
fact that for each FIR filter only 9 taps out of 45 are non zero, the zero coefficient taps are replaced
by simple registers. This artifice alone allows to use only 180 DSP48 blocks for the whole filter
bank instead of 450.
The third stage, named ‘rotation block’, is used to rotate the vector composed of the 20 FIR
filter outputs and to provide it to the ‘optimized reconstruction block’. The rotation consist in
routing the data according to the following equation: W
′
l (m) =W(l−m) mod 20(m) where l=0..19 and
m the sample index. In practice, this is implemented with 20 high performance multiplexers having
20 inputs and one output. Each of these multiplexers is controlled by a counter having a 0 to 19
range and is initialized with a value according to the ‘optimized reconstruction block’ input it is
connected to.
I n
p
u
t  
s
t a
g
e
4 x 12 bit
ADC
20
2
0
 p
a
r a
l l e
l  
F
I R
s
 (
2
0
 b
i t
 c
o
e
f f
i c
i e
n
t s
)
O
p
t i
m
i z
e
d
 r
e
c
o
n
s
t r
u
c
t i
o
n
( 1
0
 b
i t
 c
o
e
f f
i c
i e
n
t s
)
B
a
n
d
s
 o
u
t
20 x 9=
180 DSP48
16 x 2=
32 DSP48
288 DSP48 available in virtex 6 
20
r o
t a
t i
o
n
m
 m
o
d
u
l o
 2
0
20
w
' l
w
1
'' Invert 1
sample /2
y
1
w
0
''
buffer y0
w
3
'' Invert 1
sample /2
y
3
w
2
''
buffer y2
w
4
''
buffer y4
12 bit 14 bit 14 bit
4
 A
D
C
 s
a
m
p
l e
s
 n
- 3
 t
o
 n
Newest 
sample (n)
oldest sample 
(n-19)
Optimized reconstruction 
output stage
w
q
Figure 8. Polyphase filter bank implementation overview.
According to equation 4.7 given in section 4.3, having the vectors Y(m), W
′′
(m) and W
′
(m)
– 13 –
being respectively composed of yk(m), w
′′
k(m) and w
′
l(m) for k=0..4 and l=0..19, the optimized
reconstruction can be computed by Y(m) = Jm ·W′′(m) = Jm ·A ·W′(m), where:
J =

1 0 0 0 0
0 −1 0 0 0
0 0 1 0 0
0 0 0 −1 0
0 0 0 0 1
 (5.1)
and
A=

1 a b c d 0 −d −c −b −a −1 −a −b −c −d 0 d c b a
1 c −d −a −b 0 b a d −c −1 −c d a b 0 −b −a −d c
1 0 −1 0 1 0 −1 0 1 0 −1 0 1 0 −1 0 1 0 −1 0
1 −c −d a −b 0 b −a d c −1 c d −a b 0 −b a −d −c
1 −a b −c d 0 −d c −b a −1 a −b c −d 0 d −c b −a
 (5.2)
with:
a= cos
( pi
10
)
,b= cos
(pi
5
)
,c= cos
(
3pi
10
)
,d = cos
(
2pi
5
)
(5.3)
It can be seen that computing W
′′
(2) does not need any multiplier since the sign inversion
can be simply obtained by computing the two complement of the input value. Moreover, by using
2×16 DSP48 slices for computing the non zero and non one values multiplication of the two first
row of the A matrix, the last two rows can be obtained by sign inversion only. The sign inversion is
applied on one out of two multiplications only (when l is odd). Figure 9 provides a visual summary
of the block implementation scheme. The whole processing must be pipelined and as opposed to a
FIR filter, each single sample is multiplied by the 20 coefficients of each row and then the operation
results are all summed together. This requires the use of two pipelined adders types: one having
ten inputs for W
′′
(2) (with four pipeline stages) and another having 18 inputs for the others (with
five pipeline levels). Finally, for a [5,20]× [20,1] matrix multiplication, only 16 DSP48 slices are
used.
The last stage is actually associated with the previous stage (‘optimized reconstruction output
stage’), but for the sake clarity it is shown as separate block. It corresponds to the first term of
equation 4.7, which performs an alternate sign inversion for the odd bands resulting in a frequency
shift by half the sampling frequency and in a frequency scale reversion.
The whole design uses 216 out of 288 DSP48 blocks, 18442 out of 93120 slice registers and
14879 out of 46560 slice LUT.
5.2 FPGA ‘proc’ description
As explained before, the processing FPGA, whose block diagram is shown in figure 10, is in charge
of generating the frequency comb in its share of bandwidth and to perform the channelized DDC
for each considered tone.
The communication between the USB interface and the FPGA is ensured via a serial link run-
ning at 50 MHz. The various commands received are interpreted by the ‘proc_cmd’ state machine.
– 14 –


 	

	

 	

 	







 



 

 
 
 
 

 
 	

! "



!#


#


$%""&'
!		
(		)		*(		+)		

(		
)		
*(		
+
!	
	
(	
	)	
	*(	
	+)	
	

(	
	
)	
	
*(	
	
+
 %",--
!






 



 



 

$%"".!






 



 



 

$%"".
		)	
	)		-	


 




$%.
	/	


 




$%"".!
	/	
!

Figure 9. Optimized operator taking benefit of the sign symmetry in A matrix between row k=0 and k=4
and row k=1 and k=3. For row k=2 no multiplier is needed and since there is no phase requirement between
the different frequency bands, the delay adjustment needed to compensate the DSP48 latency are in fact
unnecessary.






 


	

 




 









 


	

 




 
 










 
 

 





	




 !"#
$
%



 &'

$
%







&(
%

&



&) &
	&%	

!"#&	%%&	%
'&!*&+,&
*-./&0&*-*1
	
	









	






&
&
&

&
&
&

 &'&*%

	

	*

%

' 
'
2
2
2
2
&*&	
$

%	




$

%	




' 
'

	
 	
 
Figure 10. Overview of the ‘split’ FPGA firmware.
Commands are of two kinds: the write commands and the read commands. The write commands
– 15 –
are used, for instance, to set the individual phase increment values and tone attenuation, the digital
gain and the mixing table to use for performing a frequency shift. Configuration and test modes
can also be set via this interface. Among the provided test modes, it may be noted that it is pos-
sible to record a ‘fakeADC’ signal snapshot of 32 k samples in the ‘fakeADC_mem’. The read
command are used to request data from the FPGA like the GTX link status, the ‘fakeADC’ link
synchronization status. Moreover the DAC internal registers values can be accessed.
Given the fact that the ‘fakeADC’ data emitted by the ‘split’ FPGA are synchronized by
the system wide reference clock, a dedicated interface (fakeADC_input) is used to adjust the
‘fakeADC’ bus delay in order to compensate the data sampling phase misalignment and thus to
guarantee stable information sampling. The locally synchronized data are provided to the tone
managers.
The 80 tone manager outputs are fed to two pipelined adder in order to construct the in-phase
and quadrature versions of the frequency comb. Each comb version is then frequency shifted by
an IQ mixer in order to compensate the residual up converting due to the polyphase filtering and
the frequency shift due to the non optimal selection of the DAC internal modulator frequency (see
section 4). The digital gain is used to numerically amplify the resulting signal by 0 dB up to 36 dB
in steps of 6 dB before driving the DAC. This feature is useful to adapt the signal to the ADC input
range when less than 80 tones are used.
The IQ2GTX block is used to transmit the DDC results through the high speed link to the
‘split’ FPGA for data concentration. Along with these data, the detected peak amplitude, in ab-
solute value, is transmitted for monitoring and to avoid DAC clipping. Hence, the data frame is
composed of 2×80 32 bit words representing the in-phase/quadrature information.
sin
cos
Tone generator and processor
CORDIC
48 bit acc
48 bit acc
10 10
12
21
19 bit acc
32
32
18
angle
Low pass CIC filters
R↓
R↓
21
Digital 
attenuator
Digital 
attenuator
PhaseIncrement
sin
cos
10
10
fakeADC
I
Q
DSP48
DSP48
gainSelect
4
Figure 11. Overview of a tone manager. The block comprises a CORDIC generator, two digital attenuators
for individual tone power adjustment and a DDC implemented with DSP48 blocks.
The tone manager, which is depicted in figure 11, features a COordinate Rotation DIgital
Computer (CORDIC) [13] block and a DDC that is composed of an I&Q demodulator followed
by a Low Pass Filter (LPF). The LPF, which is primarily used to remove the summed frequencies
component from the spectrum, also provides unwanted frequencies rejection (e.g. frequencies
tuned to other pixels, white noise, . . . ). Each CORDIC, implemented in a pipelined fashion and
composed only of adders and subtracters, was designed to provide a 10 bit precision on the sine
– 16 –
and cosine values calculated. It uses 10 precalculated arc tangent values with 20 bit resolution.
The phase accumulator that feeds the CORDIC is used to adjust the frequency with a precision
of 250 MHz/218 ∼ 953 Hz. In order to avoid in phase startup at the maximum cosine or sine
amplitude of all CORDIC, the phase accumulator is initialized at a quarter of its full scale, i.e each
phase accumulator is reseted at pi/4.
The I&Q demodulation is performed by multiplying a copy of the ADC output by replicas of
the generated sine and cosine values. For practical reasons (FPGA logic resources), the Low pass
Filter (LPF) is obtained by averaging 218 data samples and it is thus in the order of the kHz of
bandwidth. It must be noted, that the accumulator period must be chosen as a multiple of the phase
accumulator period in order to avoid beat frequency phenomena. At the end of the accumulation
cycle, each tone manager transfers its I&Q data to the IQ2GTX interface for transmission to the
‘split’ FPGA.
To allow individual tone power adjustment, the sine and cosine wave are passed through digital
attenuators before being provided to the block output. Tones can be tuned in the range 0 to 8/8 and
have a resolution of 1/8th of the input power.
The whole design uses 164 out of 288 DSP48 blocks, 60412 out of 93120 slice registers and
43508 out of 46560 slice LUT.
6. Prototype performance
6.1 System frequency response
The frequency response of the system was measured for In phase and Quadrature output of the
board in loop-back mode, i.e one of the board output connected directly to the board ADC input.
For each measurement, 400 tones uniformly distributed over the system bandwidth were generated
and analyzed by the embedded DDC. The amplitude of each tone is plotted in figure 12.
The expected juxtaposition of the five frequency bands of 100 MHz, corresponding to each
DAC contribution, can be observed on the plot. The maximum amplitude variation observed over
the full bandwidth is less than 6 dB.
We explain the amplitude variation by several factors. A part of the dispersion is due to the
active and passive electronic components that display a certain amount of dispersion. For instance,
the DAC gain has a worst case dispersion of±3.6 %, while the DAC full scale current resistor has a
dispersion of ±1 %. Then, there are also the dispersion of the resistor in the passive combiner and
the balun transformer loss dispersion (not documented). Additionally, the balun transformers have
a frequency dependent loss (-2 dB at 500 MHz) which partly explains the decreasing tendency of
the curve. It may also be noted, that the original board design was foreseen to use sum amplifiers
to sum the five DAC signals (I and Q). Unfortunately they were causing distortion and picking
noise from the power supplies. In consequence, they were replaced by passive combiners. This
modification required the implementation of wire straps to bypass the amplifiers that certainly
induces attenuation as the frequency increase.
Besides, the DAC output of the bands 100-300 MHz and 400-500 MHz were not routed on the
outer PCB layers (as striplines), but in the inner FR4 layers (as microstrips) and thus they have
higher dielectric losses. From the dielectric manufacturer specification, a loss difference of 0.2 dB
can be observed between FR4 and RO4350 microstrips.
– 17 –
120
121
122
123
124
125
126
127
128
0 100 200 300 400 500
Am
pl
itu
de
 (d
B)
frequency (MHz)
I output
Q output
Figure 12. Plot of the system frequency response measured for In phase and Quadrature output of the
board in loop-back mode, i.e one of the board output connected directly to the board ADC input. For each
measurement, 400 tones uniformly distributed over the system bandwidth were generated and analyzed by
the embedded DDC.
Finally, some routing choices were not optimal (bends, stubs, ...) and certainly, they cause
small impedance variations over the lines which induce transmission losses as well.
Even though the fluctuation is not fully explained, it remains totally acceptable for such a
bandwidth. Moreover, this can be corrected by applying tone per tone power adjustment.
6.2 System noise
As shown in [9], the main system noise contributors in the KID readout electronic chain are the
RF mixing electronics and the cold amplifier. Consequently, this prototype was also tested in loop-
back to measure its noise power spectrum distribution. The measurements were performed for
one tone generated in the middle of each frequency band and at different output power level. The
output level was digitally adjusted with the digital gain module available in each FPGA ‘proc’ (see
figure 10). The highest signal level reached by this method was just slightly above midscale for the
25 gain.
For each tone and in each digital gain conditions, 6000 points were recorded at 23.84 Hz and
were windowed with a Hann function. The resulting data were used for computing the Fast Fourier
Transform (FFT) and the 6 dB loss due to the windowing function was compensated. Finally the
resulting FFT was smoothed by FFT filtering (20 bins kept).
Figure 13 shows the system noise Power Spectrum Distribution (PSD) for one tone in each
frequency band. With the exception of tone 4, all tones have a similar Signal to Noise Ratio (SNR).
– 18 –
This is compatible with the board losses mentioned previously that reduce the signal amplitude by
about 6 dB.
-130
-125
-120
-115
-110
-105
-100
-95
-90
10-1 100 101
PS
D 
(dB
c)
frequency (Hz)
Gain 25
tone 0 (freq=50 MHz)
tone 1 (freq=150 MHz)
tone 2 (freq=250 MHz)
tone 3 (freq=350 MHz)
tone 4 (freq=450 MHz)
Figure 13. Power spectrum distribution plot showing system noise for one tone in each band. At the
exception of tone 4, all tone have a similar Signal to Noise Ratio (SNR).
Figure 14 shows the system noise PSD for a given tone but for different excitation signal
amplitudes. The noise floor (relative to carrier) is seen to increase accordingly with each amplitude
decrease. It may be noticed that when all tones are activated in a single band, it is possible to keep
a digital gain between 21 and 22 without DAC clipping because of the frequency values random
distribution which minimizes the risk to sum all tones at their maximum amplitude at the same time.
Therefore, the 21 and 22 gain curves, provide the achievable performance when the full capabilities
of the board are used.
7. Conclusion
We have presented in this paper a first prototype of the NIKEL electronic board which was specif-
ically designed for the NIKA camera to be installed at the IRAM 30 m telescope at Pico Veleta,
Spain. We have proved that NIKEL is able to perform real-time frequency multiplexing of an ar-
ray of up to 400 MKIDs over a bandwidth of 500 MHz with outstanding performances in terms of
noise. This is due to an innovative solution based on the splitting of the original 500 MHz band
into five bands of 100 MHz each, thanks to state of art electronic components and sophisticated
– 19 –
-130
-125
-120
-115
-110
-105
-100
-95
-90
10-1 100 101
PS
D 
(dB
c)
frequency (Hz)
Tone 0 (freq=50 MHz)
gain 20
gain 21
gain 22
gain 23
gain 24
gain 25
Figure 14. Power spectrum distribution for a given tone but for different signal amplitude. The noise floor
(relative to carrier) can be seen to be increased accordingly to each amplitude decrease.
numerical filtering algorithms. The NIKEL multiplexing factor is three times larger compared to
previous single board systems and it opens a clear path towards the exploitation and monitoring
of future kilo-pixel arrays of MKIDs. Consequently, the resulting minimization of the cable count
towards the cryogenic system makes it an asset. Such large arrays will be with no doubt a serious
alternative to standard bolometric techniques for millimeter astronomy both because of the intrinsic
quality of MKIDs (low noise and fast response) and because of the large multiplexing capabilities.
References
[1] A. Monfardini et al 2010, NIKA: A millimeter-wave kinetic inductance camera, Astronomy and
Astrophysics, Volume 521, id.A29 (2010),arXiv:1004.2209
[2] A. Monfardini et al 2011, The new NIKA: A dual-band millimeter-wave kinetic inductance camera
for the IRAM 30-meter telescope, ApJS 194 24 (2011) arXiv:1102.0870
[3] J. Baselmans, Kinetic inductance detectors, JLTP, online first (2012),
DOI:10.1007/s10909-011-0448-8
[4] A.J. Schlaerth et al, the status of MUSIC: A multicolor sub/millimeter MKID instrument, JLTP,
online first (2012), DOI:10.1007/s10909-012-0541-7
[5] P. K. Day, Nature 425, Issue 6960, pp 817-821 (2003)
– 20 –
[6] B. Mazin et al, 9th International Workshop on Low Temperature Detectors. AIP Conference
Proceedings, Volume 605, pp. 309-312 (2002)
[7] S. Doyle, Lumped Element Kinetic Inductance Detectors. PhD thesis, Cardiff University, United
Kingdom, 2008
[8] Swenson L. J., et al., 2009, AIPC, 1185, 84
[9] O. Bourrion et al 2011, Electronics and data acquisition demonstrator for a kinetic inductance camera,
2011 JINST 6 P06012, arXiv:1102.1314
[10] Rogers corporation web site, http://www.rogerscorp.com/
[11] P.P. Vaidyanathan, Multirate systems and filter banks, P T R prentice Hall, Englewodd Cliffs, NJ,
1993.
[12] G. Comoretto, A. Russo, G. Tuccari, A 16 channel FFT multiplexer, Arcetri technical report nrˇ
1/2009.
[13] J. Volder, The CORDIC Trigonometric Computing Technique, IRE Transactions on Electronic
Computers, pp330-334, September 1959
– 21 –
