The first version Buffered Large Analog Bandwidth (BLAB1) ASIC for high
  luminosity collider and extensive radio neutrino detectors by Varner, Gary S. et al.
The first version Buffered Large Analog Bandwidth (BLAB1) ASIC
for high luminosity collider and extensive radio neutrino detectors
L. Ruckman a, G. Varner a,∗, and A. Wong a
aDepartment of Physics and Astronomy, University of Hawaii, 2505 Correa Road, Honolulu HI 96822, USA
Abstract
Future detectors for high luminosity particle identification and ultra high energy neutrino observation would
benefit from a digitizer capable of recording sensor signals with high analog bandwidth and large record depth, in
a cost-effective, compact and low-power way.
A first version of the Buffered Large Analog Bandwidth (BLAB1) ASIC has been designed based upon the lessons
learned from the development of the Large Analog Bandwidth Recorder and Digitizer with Ordered Readout
(LABRADOR) ASIC. While this LABRADOR ASIC has been very successful and forms the basis of a generation
of new, large-scale radio neutrino detectors, its limited sampling depth is a major drawback. A prototype has been
designed and fabricated with 64k deep sampling at multi-GSa/s operation. We present test results and directions
for future evolution of this sampling technique.
Key words: Super B factory, particle identification, CMOS, radio neutrino detection
PACS: 85.40.-e, 85.40.Qx, 87.66.Pm
1. Introduction
Observation of the early universe through neu-
trino messengers of the highest possible energies
requires a detector of enormous instrumented vol-
ume [1]. At the same time, lepton flavor identi-
fication of such radio detection events represents
a completely unique tool for the study of cosmo-
logical evolution of the universe. Particle interac-
tions at extreme energies provide a probe capable
of illuminating the completely unknown accelera-
tion mechanisms of the highest energy cosmic ray
events [2].
∗ Corresponding author. Tel.: +001 808-956-2987.
Email address: varner@phys.hawaii.edu (G. Varner).
Particle identification is also crucial to the
physics program of a next generation “Super”
B Factory. Such an accelerator will produce B
mesons in sufficiently copious quantities to permit
detailed scrutiny of standard model predictions in
the flavor sector [3]. Any new theories for physics
beyond the standard model must leave fingerprints
that can be detected via flavor transformation
of particles in the final state. Therefore, particle
identification is essential and the detector and
readout electronics must survive the very high
signal occupancies expected [4].
We present results from a deep-sampling ASIC
that meets these requirements, based upon exten-
sion of the successful LABRADOR ASIC [5].
Preprint submitted to Elsevier Science 3 October 2018
ar
X
iv
:0
80
2.
22
78
v2
  [
ph
ys
ics
.in
s-d
et]
  5
 M
ay
 20
08
2. Architectural Details
The BLAB1 ASIC is a single channel, multi-
GSa/s waveform sampler with a record depth of
216 analog storage samples. The BLAB1 analog in-
put is AC coupled with an external capacitor and
50 Ω terminated with an on-chip terminator, as
should be expected for a high-performance RF de-
vice. After the on-chip terminator, an analog buffer
tree fans out copies of the signal to the matrix of
128 rows of 512 samples composing the 64k array.
Each of the rows may be independently addressed
to initiate a storage cycle. Within each Switched
Capacitor Array (SCA) storage cell is a capacitor
and a comparator. A block diagram of the BLAB1
readout is shown in Fig. 1.
TSA
CTRL
x 32
Q(15:0)
Ramping 
Voltage
24.9 fF
Pedestal
Voltage
Analog 
Buffering
Sampling
Switch
x 2
16
BLAB1 ASIC
50 Ω
+
-
Comparator
MUX
Gray Code
Counter
CLK
Q(15:0)D(15:0)
ADC 
Data
FD flip-flop
x 32FPGA
Fig. 1. A block diagram of the BLAB1 readout, where for
compactness the comparators are located inside the BLAB1
device, while the high-speed time encoding is done inside
a companion FPGA.
When an analog switch is pulsed closed, the in-
stantaneous input signal is stored on a 14 fF capac-
itor. The charge is then held until either overwrit-
ten or discharged due to leakage current. Each sam-
pling capacitor is connected to the negative input
of a comparator. The positive input of each com-
parator is connected to a common voltage ramp.
A wire-bonded BLAB1 die photograph is shown
in Fig. 2, with this storage array contained within
about 5.25 square mm of the die shown.
Fig. 2. A photograph of the wire-bonded BLAB1 ASIC.
The die is 3 mm by 2.8 mm and was fabricated in the
TSMC 0.25µm process.
Conversion of these stored samples is via a
Wilkinson ADC method, where the stored voltage
is converted into a transition time of the in-cell
comparator due to an applied voltage ramp. This
ramp is generated with a current mirror and can be
adjusted both by varying the ramping current, as
well as an external capacitor. The typical ramping
current range is 10-100µA and the ramp capacitor
size is a few hundred pF . Encoding is performed
by measuring the time interval between the ramp
start and the comparator output transition. In
a simple form of time-to-digital conversion, this
interval is measured by counting the number of
high-speed clock cycles taken. In the predecessor
ASIC [5], the Gray code counter was implemented
on-chip, whereas in BLAB1 it is implemented
inside a companion programmable logic device,
in this case a Field Programmable Gate Array
(FPGA). When the voltage ramp is started, a
Gray code counter in the FPGA is enabled coin-
cident to a high speed clock (500MHz) and the
comparator output is used to latch the counter
value. By knowing the ramping voltage slope and
the high speed clock frequency, the latched counter
value can be converted into voltage. A group of 32
comparators are selected, as illustrated in Fig. 3,
and are read out during each ramping cycle.
2
4
32
32
7
32 32 3232
Col Select
7
32
              
              
              
              
              
              
              
              
              
              
              
              
              













Output
MUX
Analog
Input
Analog
Fan−out
Tree
Outputs to FPGA
TSA Strobe 128 rows
512 samples (columns) per row
TSA row Select Readout row Select
RAMP
Fig. 3. Schematic of the BLAB1 sampling array.
By addressing a row and selecting a group of
32 columns for each conversion cycle, the window
of interest inside the ASIC is read out. Impor-
tantly, this readout operation can be done while
sampling continues, providing continuous pipelin-
ing and subsequent deadtime reduction. This de-
cision to move the high-speed clock and registers
off-chip also means that the size of each storage cell
can be significantly reduced. A schematic of the
base BLAB1 storage cell is shown in Fig. 4, where
the comparator is simply a differential NMOS pair.
The corresponding layout is shown in Fig. 5, where
the overall dimensions are 40λ by 139λ, where λ =
0.12µm. This corresponds to 4.8µm by 16.68µm,
or about 80µm2 required per storage cell.
Fig. 4. Schematic of the BLAB1 basic storage cell.
Therefore the core of the sampling array requires
only 5.25mm2 of chip area, permitting more than
Fig. 5. Layout of a single SCA storage cell, where the units
displayed are in units of λ, which is 0.12µm.
an order of magnitude improvement in storage
density compared with existing devices [5,6,7,8].
Reducing the cell size and subsequently the stor-
age capacitance also helps improve the bandwidth
that can be coupled into each storage cell. Since
the “on” resistance of the switch is relatively high
(Ron ' 5kΩ), frequencies above
f3dB =
1
2piRonCpix
(1)
will roll off for a given pixel capacitance Cpix.
The extracted capacitance value for the layout in
Fig. 5 is approximately 14 fF. Therefore the ex-
pected f3dB from the common input bus line into
each storage cell is approximately 2.3 GHz.
We note that the size of the storage cell can be
reduced further by removing individual sample de-
lay timing chains from each storage row. As seen
in the bottom of Fig. 5, this inverter pair is more
than half the area of the storage cell. For power
dissipation reasons, this removal turns out to be
important, as will be discussed later.
3
A further benefit of decoupling the latching reg-
ister and clocking functionality is that the conver-
sion clock can be run at a much higher speed inside
the FPGA, since it is routinely fabricated in either
a 65nm or 90nm process, compared with the rela-
tively coarse 250nm (0.25µm) process of BLAB1.
Typically with the chosen Xilinx Virtex family em-
ployed we are able to use a 500MHz clock, and
record the phase of the clock as well, thereby ef-
fectively having a 1ns least count. Separate testing
indicates that this TDC performs very close to the
ideal binary interpolation 1√
12
limit (∼ 300ps), as
reported previously [9]. Moreover, the number of
bits of resolution or precision can be completely
configurable, which permits a trade-off of the read-
out latency versus required sample resolution for
various applications. We note in passing that there
is a potentially much better method based upon
applying this same waveform sampling technique
to the timing encoding of the comparator output.
The TDC least count would then become 1ns →
170ps, and fitting the output shape, better than
binary encoding time resolution may be possible.
While the coupling into individual storage cells
can support high analog bandwidth (≥ 2GHz), the
cumulative capacitance seen when trying to drive
the array of 64k cells is very problematic. The ex-
tracted capacitance of each of the 216 switch drains
is about 1.5fF, which sums to a total array capaci-
tance Carray of 98pF. Clearly, for a reasonably low
input coupling impedance of Zin = 50Ω, this band-
width limitation to
f3dB =
1
2piZinCarray
= 32.5 MHz (2)
would be completely unacceptable. Therefore a
3-level buffer tree has been employed, to reduce
the loading seen at each stage of signal fan-out.
The unity gain for zero capacitance of these buffer
amplifiers is in excess of 1GHz. In retrospect, the
choice of fanout: 1 → 16 → 128 was not optimal,
as the capacitance of the intermediate state was
rather high and limits the performance, as will be
shown in the testing section.
The sampling speed is controlled by adjusting
the VDD/VSS supply voltages of one of the two
inverter-inverter delay stages between each adja-
cent sampling cell in a particular sampling row.
As mentioned early, by addressing a row and
pulsing the first cell of that particular row, a
write strobe then propagates along the row until
it reaches the last cell in the row. The leading
edge of the pulse closes the switch and the trailing
edge opens the switch, at which point the analog
voltage value is stored.
Upon the determination of an external trigger
condition, further sampling to the row(s) or inter-
est are blocked in firmware and a ramping volt-
age is generated by using a constant current source
and reference capacitor, as mentioned earlier. The
ramping voltage for the BLAB1 can be generated
using either an external capacitor or an on-chip ca-
pacitor. A external capacitor is necessary for slower
ramping speeds. The current source is set by an
external resistor. A unique feature of the BLAB1’s
digitization technique is that the ADC resolution
does not have a default value. For a fixed clock fre-
quency, reducing the ramping voltage speed will in-
crease ADC resolution. However, by using a slower
ramp, it will take longer to digitize.
BLAB1 was designed to be a low power ADC.
Three voltage sources are required to operate the
BLAB1. A voltage source of 2.5 volts is the main
power source. An adjustable VDD source is used
to control the sampling speed. A pedestal voltage,
typically 1.3 volts, is used to set the DC offset of
the RF input. When in quiescent mode, the power
draw can be 10mW or less. A list of the key BLAB1
specifications are summarized in Table 1.
Table 1
Important BLAB1 ASIC Specifications.
Item Value
Sampling Input Channels 1
Storage rows 128
Storage cells/row 512
Total storage cells 65,536
Sampling speed (GSa/s) 0.1 - 6.0
Storage record 10.9 - 655 µs
Wilkinson Outputs 32
Operation mode continuous storage/readout
100ns window readout 80µs (5.12 GSa/s, 12-bits)
Full chip readout ∼10ms (12-bits)
4
3. Readout Test System
A series of printed circuit boards have been fab-
ricated to evaluate various aspects of BLAB1 per-
formance. Beyond this, these evaluation devices
are proving useful for instrumenting a next genera-
tion of Cherenkov radiation detectors [10]. A pho-
tograph of a 2 BLAB1 ASIC (precision differential
timing evalution) circuit board is shown in Fig. 6.
The three main components on this circuit board
are two BLAB1 chips, an FPGA (largest pack-
age in center), and a Universal Serial Bus (USB)
interface. The external communication protocol
is USB 2.0. A USB microcontroller, the Cypress
CY7C68013-56PVC, located on the circuit board
interprets the USB 2.0 protocol and controls the
flow of data being sent and received from the
FPGA to a computer interface. The FPGA used
is a Xilinx XC3S400 and controls the digital logic
and timing for the BLAB1 readout. An internal
FPGA RAM buffers the data while the data is be-
ing dumped into the USB data stream. A custom
readout and control software utility was developed
using the wxWidgets tool kit [11], a screen shot of
which is shown in Fig. 7.
In this configuration, it becomes apparent that
this BLAB1 “oscilloscope on a chip” can, with this
small readout board, turn any PC (or laptop) into
a high-performance digital signal oscilloscope. This
software package sends commands to the FPGA
and records the BLAB1 data via the USB 2.0 inter-
face. Running this utility on a standard PC, a sus-
tained triggered event rate of approximately 7kHz
(single row readout) has been demonstrated. This
rate should not be considered a hard limit as nei-
ther the software nor the firmware was optimized
for speed. The sampling rate is controlled by set-
ting a DAC, which then adjusts the VDD voltage
(ROVDD) of the on-chip voltage-controlled delays.
4. Basic Sampler Performance
Employing the test system described in the pre-
vious section and its variants, a number of the basic
performance parameters of the BLAB1 have been
evaluated. Because timing performance is such a
USB2 Interface
BLAB1 ASICs
Fig. 6. Photograph of the BLAB1 differential timing per-
formance evaluation circuit board, with key components
indicated.
critical feature of this sampling device, it is de-
scribed in detail in a subsequent section.
4.1. Sampling speed
Determination of the sampling speed is made by
measuring the time interval between insertion of
the timing strobe and appearance of the output
5
Fig. 7. Screen capture of the acquistion/control program.
pulse from the last cell of the row, minus pad buffer
delays. The sampling speed is calculated by taking
the number of cells in a row and dividing it by
the propagation time for a given control voltage
setting. A plot of the sampling speed versus control
voltage (ROVDD) is shown in Fig. 8, where it is
seen that sampling rates from below 1.0 GSa/s to
above 6.0 GSa/s are possible.
ROVDD versus ADC sampling speed
0
1
2
3
4
5
6
7
1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
ROVDD (volts)
Sa
m
pl
in
g 
Sp
ee
d 
(G
Sa
/s
)
Fig. 8. Sampling rate as a function of the ROVDD control
voltage, where extended operation (> 2.5V ) is possible.
One potential disadvantage of this voltage con-
trolled delay technique is that the circuit is tem-
perature dependent. This dependence is seen in
Fig. 9 and is roughly 0.2%/◦C, and completely
matches expectation from SPICE simulation.
While for many applications this variation would
not be significant, and can potentially be cali-
brated out with an external reference clock [5], the
delay can also be monitored and stabilized using a
firmware control loop.
0.2%/degree C
6GSa/sSample aperture (172ps = 5.8GSa/s)
Matches SPICE simulation
Fig. 9. Temperature dependence of the sampling rate.
4.2. Noise performance
Noise distributions were measured for all stor-
age cells in the process of determining the pedestal
values. These measurements are made by termi-
nating the BLAB1 analog input, and reading each
cell multiple times. An example of the the noise
distribution for a typical storage cell is shown in
Fig. 10, which represents the ensemble mean noise
average of about 1 mV RMS. With an input dy-
namic range of greater than 1 Volt (1.5V nom.)
and this average noise level, each stored sample
represents 10 real ADC bits of resolution, which
is very competetive with commercially available,
large power-dissipation ADCs [12].
For comparison, the expected RMS noise due to
the small charge quantization
vRMS =
√
kT
Cpix
(3)
where k is Boltzmann’s constant and we take T
to be 300K. Plugging in the Cpix from above, we
expect a contribution due to this “kTC” noise of√
25.9m[eV] · 1.6× 10−19 [Ce ]
14f
[
C
V
] = 0.54mV (4)
6
Cell Noise
Entries  10000
Mean  -0.002435
RMS  0.9714
 / ndf 2χ
 57.31 / 29
Constant  14.1±  1151 
Mean  0.00971± -0.02482 
Sigma
 0.0069± 0.9649 
voltage (mV)
-6 -4 -2 0 2 4 60
200
400
600
800
1000
1200
Fig. 10. A representative storage cell noise distribution,
where a Gaussian fit yields a noise level of about 1 mV
RMS.
which subtracted in quadrature indicates that
the excess ASIC and board level noise is approx-
imately 0.84mV, and could perhaps be improved
through better layout.
4.3. Analog bandwidth
A determination of the analog frequency re-
sponse of the BLAB1 ASIC was performed by
recording fixed amplitude sine waves of varying
frequencies and comparing the ratio of the actual
amplitude to the recorded amplitude. The ampli-
tude roll-off versus frequency is shown in Fig. 11,
where the -3 dB attenuation point is about 300
MHz, and the -10dB point extends beyond 600
MHz.
To illustrate how this performance corresponds
into the ability to sample an RF sine wave, consider
the uncalibrated waveform of Fig. 12. As the noise
is small, deviations from a smooth curve give an
indication of the level of calibration required in the
following precision timing section.
4.4. Leakage current
Because leakage current is a concern for long
storage times, and the array contains a large num-
ber of samples, which potentially take a long pe-
-3dB ~300MHz
Fig. 11. Measured frequency response of the BLAB1.
200 MHz sine wave, 5.9 GSa/s
Fig. 12. Reference 125MHz sine wave recorded with the
BLAB1.
riod to read out completely, this issue was studied
extensively. A measurement of the leakage current
for all 64k sampling capacitors was performed. This
measurement was done by terminating the BLAB1
analog input and reading out each cell repeatedly,
without a write update, for 20 seconds. A sum-
mary histogram of the leakage current determined
for all storage cells from a fit to each leakage cur-
rent slope is plotted in Fig. 13.
For reference, these values are in quite good
agreement with leakage currents measured pre-
viously by our group for a similar TSMC CMOS
process in different fabrication runs [13,14]. If the
effect of this leakage current is to be reduced to
a level comparable with the noise, the following
condition must be met:
7
Fig. 13. Leakage current histogram for all cells of a given
BLAB1 device, where the mean leakage current is slightly
under 3fA.
1mV = ∆V =
∆Q
Cpix
=
Ileak ·∆T
Cpix
(5)
where ∆T is the maximum storage to readout
interval and Cpix is the pixel storage capacitance.
Using a leakage current of Ileak = 25fA, which is
conservatively larger than almost all storage cells,
the maximum readout latency is thus
∆T =
1mV · 14fF
25fC/s
= 560µs (6)
and is discussed in the readout speed subsection
next. In general deep storage is needed for trigger
latency buffering and a far smaller window of in-
terest need only be read out.
It had been posited that the more extreme leak-
age current values might correlate with the co-
location of other logic or structures at the sampling
array periphery. This conjecture is tested and re-
jected in Fig. 14, where the measured leakage cur-
rent for each cell is plotted by array location. No
obvious pattern is seen, and the values are consis-
tent with being completely random.
4.5. Readout Speed
As mentioned earlier, there is flexibility in choice
of the resolution versus speed trade-off. If deter-
mined to read out the entire array, the conversion
cycle duration may be expressed as
Tcycle = Tconv + Tswitch (7)
Fig. 14. Array summary plot of the leakage current for
all 64k pixels, where vertical is row number, horizontal is
sample number and color code is in units of femto-Amperes.
No pattern is observed in the layout.
where Tswitch is the fixed latency (typically
50ns) associated with resetting the voltage
ramp/changing addresses and Tconv is the interval
required for the conversion to n-bits, given by the
expression
Tconv = 2n · (1 ns) (8)
for the 500MHz, dual-phase clock reference used
in our measurements.
Complete BLAB1 Readout
0.1
1
10
0 2 4 6 8 10 12
Number ADC bits
T
im
e
 [
m
s
]
Fig. 15. Time required to read out the entire BLAB1 ASIC
as a function of the number of bits of resolution.
As mentioned previously, full chip readout is a
rather extreme case. For a detector of the size of
a typical high energy physics experiment, for “fast
timing” signals, something like 100ns is the largest
8
window required. Even for a multi-km scale radio
neutrino detector, the aperture of interest would
still only be in the µs range, corresponding to less
than 10% of the array, and for which the readout
latency would be less than a millisecond. For a
100Hz radio trigger, or a 30kHz collision trigger,
the deadtime is negligible for pipelined operation.
4.6. Power Dissipation
During sampling, the power dissipation can be
as low as
P = I ·∆V = δQ
δt
· (2.5V) ' 15mW (9)
where δQ is the inverter transition charge and δt
is 86ps at the nominal 5.8GSa/s sampling. During
sampling all of the other biases may be disabled.
Quite unexpectedly, it was observed that lower-
ing ∆V in the delay chain (running more slowly)
dissipated more power, opposite of what the ex-
pression above would indicate. Below 2V, sig-
nificantly more power was drawn. Returning to
SPICE, it was found that indeed as the ROVDD
is lowered, the leakage current of the inverters
becomes important. In particular because of the
decision to give each storage cell its own inverter
pair. That ∼ 6 × 104 multiplier proved to be a
huge factor and precluded sustained low-speed
sampling due to enormous power dissipation. Data
and simulation agree qualitatively, though at large
current draws it is likely the voltage drop in the
finite resistance of the die power wiring becomes
important (and ignored in simulation).
During readout, the current surges and the die
subsequently heats substantially due to an over-
sight in the original design. While the comparator
bias currents can be shut down during sampling,
when conversion is required, they must all be op-
erated. Again a large multiplier (64k comparators)
applies, and even a 10µA comparator bias leads to
a 0.65A surge. This is addressed in future designs.
4.7. Concurrent Operation
A key feature of the BLAB1 architecture is the
ability to operate in a multi-hit buffer mode, to
• SPICE Simulation
 Measurement
Fig. 16. Simulated versus measured current draw of the
entire array of voltage-controlled delay inverters.
effectively reduce the deadtime to negligible levels.
Concurrent readout while continuing to sample can
have a deleterious impact on the quality of storage
samples. Therefore we have performed a noise scan
where the delay time of storage in Row 2 (adjacent
row) is varied while Row 1 recording continues. The
result appears in Fig. 17, where a small amount of
cross-talk is observed right about the comparator
transition time for Row 1. The effect is tiny ('1
mV) and can be neglected.
Row 1
Row 2
Fig. 17. Observed noise in storage Channel 2 when simul-
taneous readout is performed in Channel 1.
As an example of the potential benefit, for a fu-
ture 16 channel BLAB2 ASIC, where a 32ns win-
dow (320 samples at 10GSa/s) is recorded from
each channel upon receipt of a Level 1 trigger, the
net conversion time to 10 bits is roughly 160µs if all
these samples are read out. However, with an ex-
pected hit occupancy in the window for each ASIC
9
(monitored by trigger out signal) of about 3.2%,
the mean latency for readout is 5.12µs. For a 30kHz
maximum trigger rate, this is a 15.4% deadtime,
though with large fluctuations. Having an 8 deep
hold buffer for each channel (100ns wide), the prob-
ability of an overflow becomes a negligible 5×10−8.
5. Precision Timing Performance
Recent developments in high-density, high pre-
cision timing photodetectors are finding appli-
cations in Cherenkov detection techniques for
particle identification, as well as medical imaging
applications. To fully exploit the potential of these
devices, robust performance, fine resolution tim-
ing and highly integrated readout electronics are
needed. Over the decades a number of electron-
ics techniques have been explored to maximize
the timing performance of photodetector signals.
These include Constant Fraction Discrimination,
multi-level thresholding, charge integration for
threshold timewalk correction, among a long list
too extensive to adequately summarize here.
However, all of these techniques suffer from a
number of practical limitations in actual applica-
tion, which has served to degrade the realized per-
formance. In the end, one simply cannot do bet-
ter than having a high-fidelity “oscilloscope on a
chip” for every sensor channel. Cost and data vol-
ume precluded this type of waveform recording un-
til recent generations of SCA ASICs [5,6,7] demon-
strated such techniques were practical, especially
for large systems.
We present here some preliminary results of tim-
ing resolution tests with this BLAB1 ASIC. As
these devices are distributed to interested users
around the world, and more clever algorithms for
improved timing performance are considered, fur-
ther improvements on already promising results
may be obtained.
5.1. Calibration
In order to address bin-by-bin timing width
differences, a couple of different calibration tech-
niques have been tried. The first utilizes a sine
wave zero-crossing technique used for calibrating
the LAB3 ASIC[5]. That technique works best
when the frequency of the sine wave is such that
the measured interval between zero crossings can
be uniquely assigned to a limited number of bins
between successive crossings.
Pix Position
0 100 200 300 400 500
B
in
 W
id
th
 (p
s)
160
162
164
166
168
170
172
174
176
178
180Two CH Timing 
Fig. 18. Residual bin-by-bin sample timing aperture devi-
ations from a nominal bin width using the histogram oc-
cupancy technique described in the text.
Due to intrinsic curvature limitations, this tech-
nique has an irreducible systematic error that is
a function of sample rate. A more successful tech-
nique is to histogram the zero crossings of a sine
wave and use the bin occupancy to derive the ef-
fective aperture width, the residual distribution is
shown in Fig. 18. The most striking aspect of this
distribution is the linear slope across the array. Ap-
plying only this linear slope correction leads to the
15ps RMS jitter in the determination of zero cross-
ings for a subsequent sine wave data set, as seen
in the inset distribution in Fig. 19. Applying a full
bin-by-bin correction improves the distribution to
11ps RMS, with about an 8ps core.
5.2. Bench Test Signals
Timing performance was then evaluated using
a pair of pulses separated by approximately 30ns.
As seen in Fig. 20, over this longer timebase sepa-
ration, a differential error of 27ps is obtained. The
contribution of each edge then is then estimated as
27ps/
√
2, or 20ps per recorded edge.
For complex curvature along the leading edge of
the signal, the timing resolution obtained is seen to
be rather sensitive to the method choosen to char-
acterize the signal “hit” time. Unless the photode-
10
After basic linearity and 
bin-by-bin correction
~11ps intrinsic (~8ps core)
6GSa/s
400MHz sine wave
Extracted Period [ns]
15ps
Linearity
only
Fig. 19. Results for extracting the zero-crossing timing of
a 400MHz sine wave after the application of the histogram
occupancy timing corrections. Inset is the result for a sim-
ple linear (slope) correction, and the main plot after a
bin-by-bin correction.
Bench Test tim ing
~27ps for two edges
~20ps for each edge
6GSa/s
~30ns pulse pair
30ns
Fig. 20. Timing resolution for a pair of pulses separated
by approximately 30ns. Each edge can be inferred to be
extracted a factor of
√
2 better.
tector signal is for a single p.e. quanta, the actual
shape can be rather complex and dependent upon
photon arrival statistics. Even in this simple case,
noise and aperture systematics upon the leading
edge can be important and can also be reduced
by using multiple samples to fit to an analytic sig-
nal shape. In general, the estimate error can im-
prove as something like 1/
√
N for N samples along
the leading edge. This is perhaps the most pow-
erful aspect of having the full waveform samples
to fit. Individual sampling errors can be averaged
out. Examples are provided in the following sub-
section, where it is clear that at the sampling rates
being studied, this waveform recording technique
logs many samples on the leading edge, which can
be used to improve the signal timing extraction.
5.3. PMT signal observation
A convenient feature of the BLAB1 ASIC is that
a PMT output transmitted over a 50Ω coaxial ca-
ble can be directly connected to the BLAB1 input,
as per the diagram of Fig. 1. Two example pho-
todetector outputs, intended for fast-timing appli-
cations, are recorded in Fig. 21.
Fig. 21. Example waveforms recorded with a Hamamatsu
R6680 fine-mesh PMT (top) and Burle 85011 Micro-Chan-
nel Plate PMT (bottom).
11
Both photodetectors specialize in fine time res-
olution and a direct comparison is informative. In
the upper figure, the observed signal is an aggra-
gate of a number of scintillation photons collected
from a bar scintillator described in the next subsec-
tion. At bottom is the risetime of Micro-Channel
Plate photodetector (MPC-PMT), intended for
precise single photon detection. For future sub-
10ps devices, the transit-time spread in the single
p.e. amplification process may limit the ultimate
resolution.
Finally affordable fast electronics may be able,
on a channel-by-channel basis, to measure system-
atic variations and provide the requisite compen-
sating corrections to achieve the penultimate res-
olution.
5.4. Belle TOF Counter
In order to evaluate the waveform sampler per-
formance with a realistic set of pulses, we use cos-
mic muons incident on a spare TOF counter of the
Belle detector [15]. The test set-up is illustrated in
Fig. 22, and is located in the University of Hawaii
Instrumentation Development Laboratory.
PMTPMT
BLAB1 
Evaluation 
Board
CH1 CH2
NIM
Discriminator
NIM
Coincidence
Trigger
Delay
Cable
Delay
Cable
PMT
PMT
21.5 cm
255 cm
130 cm
Fig. 22. Schematic of the cosmic test system with a Belle
TOF counter and trigger counters.
A sample of a few thousand cosmic ray muons
were recorded using the test configuration shown
in Fig. 22. PMT signals from both ends of the Bi-
cron BC408 plastic scintillator bar are recorded.
The bar is 4cm thick, 255cm long and viewed by
Hamamatsu R6680 fine-mesh PMTs at each end.
The Cherenkov trigger telescope counters con-
sists of lucite slabs (approx. 5cm x 6cm x 3.5cm),
also viewed by prototype R6680 fine-mesh PMTs.
To estimate expected system performance, we
recorded the trigger counters and extract an intrin-
sic error on determination of the trigger time by
comparing the observed time difference in the two
trigger counters. This jitter, as shown in Fig. 23, is
quite large and should be improved in the future.
Cosmic Data
Entries  2557
Mean     4.52
RMS    0.6523
 / ndf 2χ
 31.29 / 25
Constant_Narrow 
 20.0± 195.8 
Mean_Narrow 
 0.021± 4.518 
Sigma_Narrow 
 0.0296± 0.3407 
Constant_Wide 
 18.5± 143.8 
Mean_Wide 
 0.02±  4.52 
Sigma_Wide 
 0.0268± 0.7615 
time (ns)
2 3 4 5 6 7 8 90
50
100
150
200
250
300
350Trigger Timing 
Fig. 23. Timing results obtained for the trigger counter
time difference.
The contribution of the narrow gaussian can be
subtracted in quadrature from the time difference
observed at the ends of the Belle TOF counter, the
distribution of which is shown in Fig. 24.
Doing this common mode subtraction leads to a
resolution of about 190ps per PMT end. Compar-
ing this observed signal resolution with a detailed
Monte Carlo study [16], we can see that these val-
ues are comparable to the 150ps (170ps) or so for
single end times from MC (data).
12
BLAB1
Entries  1428
Mean    3.058
RMS    0.4543
 / ndf 2χ
 29.94 / 28
Constant  4.3± 127.2 
Mean      0.012± 3.047 
Sigma    
 0.0093± 0.4337 
time (ns)
1 1.5 2 2.5 3 3.5 4 4.5 50
20
40
60
80
100
120
140Two CH Timing 
Fig. 24. Timing difference results from fits to the PMT
signal waveforms at each end of the Belle TOF counter.
6. Future Directions
While the analog bandwidth of the BLAB1 is
adequate for many RF recording applications, a
higher bandwidth device will be explored, based
upon the lessons learned from this first device. In
particular, the tree structure and design of the ana-
log amplifier tree is being scrutinized and improved
in simulation. It is hoped that an almost arbitrar-
ily large storage depth can be accommodated up to
1GHz of analog bandwidth through a careful lay-
out of the buffer amplifier cascade array. In future
devices, it is possible to significantly improve the
number of storage cells. A specific example of the
Particle Identification (PID) readout ASIC for the
Belle upgrade is shown in Fig. 25.
In the upper plot a 4µs storage depth is assumed,
or 40,000 storage cells at 10GSa/s. In the lower
plot, four separate curves indicate the number of
input channels and their subsequent depth versus
array linear distance (assumed to be square). A die
larger than 1cm per side was not considered for
yield reasons. Also, pin constraints, particularly on
making the output parallel to reduce readout la-
tency, probably limit the practical number of input
channels to 16. It is noted that many photodetec-
tors operate at gains requiring additional amplifi-
cation in order to provide a signal with sufficient
amplitude for either triggering or recording. Inte-
gration of transimpedance and other input ampli-
fier topologies are being studied and results from
future devices that use such on-chip, high analog
Storage Channel Capacity
0
10
20
30
40
50
60
70
80
90
0 2 4 6 8 10
Array Linear Dimension [mm]
N
u
m
b
e
r 
o
f 
B
e
ll
e
2
 P
ID
 c
h
a
n
n
e
ls
Storage Depth Capacity
0.1
1
10
100
0 2 4 6 8 10
Array Linear Dimension [mm]
S
to
ra
g
e
 D
e
p
th
 i
n
 [
u
s
] 
a
t 
1
0
G
S
a
/s
 
S
a
m
p
li
n
g
4 Chan
8 Chan
16 Chan
32 Chan
Fig. 25. Packing density estimates for a future Belle up-
grade Particle Identification readout ASIC. At top is num-
ber of channels versus linear dimension of the (square) stor-
age array, at bottom, the number of µs of storage versus
channel count also at 10GSa/s. Pin limitations will likely
limit the practical number to 16 channels.
bandwidth elements will be reported later.
There is a misconception that waveform sam-
pling is significantly more expensive than tradi-
tional discriminator + TDC methods. Certainly
when packaged as a full oscilloscope and sold as
a commercial unit, with large buffer depth, this
can be true. In Fig. 26 is listed the fabricated and
quoted prices during the 2007 fiscal year in the
same TSMC 0.25µm process. It is interesting to
note that the slope of the first 3 devices correspond
to Multi-Project Wafer runs, and the latter 3 are
dedicated wafer runs. Packaging is not included
and is a minimum of about $1/die in high volume.
A summary of active ASIC designs inspired by
the performance of the BLAB1 ASIC may be found
13
Economy of Scale for Quoted ASICs
0.1
1
10
100
1000
10 100 1000 10000 100000 1000000
Total Number of System Channels
C
o
s
t 
p
e
r 
C
h
a
n
n
e
l 
[2
0
0
7
 $
]
Fig. 26. Channel cost scaling for a reference waveform sam-
pler ASICs based upon recent experience.
in Table 2.
Table 2
Future BLAB1 inspired ASIC designs.
ASIC # Samples Rate BW power
Acronym Chan per Chan [GSa/s] [GHz] mW/chan
BLAB2 16 2k 2-10 ≥ 1 ≤ 20
TARGET 16 4k 0.5-1 0.5 ≤ 20
RAL64 64 512 DC - 5 ≈ 0.3 ≤ 10
APTD 4 8k DC - 0.5 0.2 ≤ 1
BIRD 1 256k 1 0.5 ≤ 30
These devices find application niches for the fol-
lowing reasons:
– TimingPerformance→ BLAB2 is intended for
sub-10ps photodetector pulse time recording
– Low Cost → TARGET is intended for the low-
cost instrumentation of 1M photodetector chan-
nels of a future TeV γ telescope
– High Density→ RAL64 is a dense array read-
out device, where 128 channels or more could be
considered in the future
– Low Power → APTD is a demonstrator low-
power ADC device for a proposed Advanced Pair
Telescope satellite
– Extended Depth→ BIRD is a very deep stor-
age ASIC for the future IceRay extended radio
neutrino detector at the South Pole
A number of these designs are reaching maturity
and two have already been submitted for fabrica-
tion. Details of the designs and results from opera-
tion of these devices will be reported in the future.
7. Summary
A first generation of deep-storage Switched
Capacitor Array (SCA) CMOS device has been
studied in a 0.25µm process. This architecture is
optimized for concurrent acquisition and readout,
permitting deadtimeless operation. Demonstrated
low-power, high-resolution and exquisite timing
performance make this device and subsequent
variants attractive for readout of a broad range of
particle and astroparticle detectors.
8. Acknowledgements
The authors gratefully acknowledge the gener-
ous support of the MOSIS Educational Program,
which provided the fabrication of the BLAB1
ASIC prototype through their University Research
Program. Testing was supported in part by De-
partment of Energy Advanced Detector Research
Award # DE-FG02-06ER41424.
References
[1] P.W. Gorham et al., Phys. Rev. D 72, 023002 (2005).
[2] D.Seckel and T. Stanev, Phys. Rev. Lett. 95, 141101
(2005).
[3] S. Hashimoto (ed.) et al., KEK-Report-2004-4 (2004).
[4] T. Tsuboyama et al., Nucl. Inst. Meth. A 541 421
(2005).
[5] G.S. Varner, L.L. Ruckman, J.W. Nam, R.J. Nichol,
J. Cao, P.W. Gorham, M. Wilcox, ”The large analog
bandwidth recorder and digitizer with ordered readout
(LABRADOR) ASIC,” Nucl. Inst. Meth. A 583 447
(2007).
[6] S. Kleinfelder, IEEE Trans. Nucl. Sci. 50 (2003) 955.
[7] C. Bro¨nnimann, R. Horisberger and R. Schnyder, Nucl.
Instr. Meth. A420 (1999) 264.
14
[8] S. Ritt, Nucl. Instr. Meth. A518 (2004) 470.
[9] G. Varner, Journal Instr. 1 (2006) P07001.
[10] G. Varner, L. Ruckman, J. Schwiening and J.
Va’vra, “Compact, Low-power and Precision Timing
Photodetector Readout,” PoS (PD07) 026.
[11] wxWidgets is a cross-platform GUI and tools
library for GTK, Linux, MS Windows, and MacOS.
www.wxwidgets.org
[12] For example the ADC08D1500 from National
Semiconductor, which digitizes two channels at
1.5GSa/s to 7.2 effective bits of resolution while
consuming 1.9Watts.
[13] G. Varner et al., Nucl. Inst. Meth. A 541 166 (2005);
Int. J. Mod. Phys. A20:3808-3810, 2005.
[14] M. Barbero, G. Varner et al., IEEE Trans. Nucl. Sci.
52 1187 (2005).
[15] H. Kichimi et al., Nucl. Instr. Meth. A 453 315-320
(2000).
[16] J.W. Nam et al., Nucl. Instr. Meth. A 491 54-68
(2002).
15
