Data encoding efficiency in binary strip detector readout by Garcia-Sciveres, Maurice & Wang, Xinkang
Preprint typeset in JINST style - HYPER VERSION
Data encoding efficiency in binary strip detector
readout
Maurice Garcia-Sciveresb and Xinkang Wanga
a University of California,
Berkeley, CA, USA.
b Lawrence Berkeley National Laboratory,
Berkeley, CA, USA.
E-mail: mgs@lbl.gov
ABSTRACT: A prescription to calculate the minimum number of bits needed for binary strip de-
tector readout is presented. This permits a systematic analysis of the readout efficiency relative to
this theoretical minimum number of bits. Different level efficiencies are defined to include context
information and engineering properties needed for reliable transmission, such as DC-balance. A
commonly used encoding method is analyzed as an example and found to have an efficiency only
of order 50%. A new encoding method called Pattern Overlay Compression is introduced to il-
lustrate how the systematic analysis can guide the construction of more efficient readout methods.
Pattern Overlay Compression significantly outperforms the above example in the occupancy range
of interest.
KEYWORDS: Particle tracking detectors (Solid-state detectors); Data acquisition concepts;
Electronic detector readout concepts (solid-state); Data reduction methods; Information theory.
ar
X
iv
:1
30
9.
18
69
v4
  [
ph
ys
ics
.in
s-d
et]
  2
7 M
ar 
20
14
Contents
1. Introduction 1
2. Level 0 Efficiency 3
2.1 Choice of n and Entropy per Address 6
3. Level 1 Efficiency 7
3.1 Efficiency with Engineering Properties 9
4. Level 2 Efficiency 10
5. Case Study of 10-Chip Module with Fixed Packet Format 10
6. Pattern Overlay Compression 12
6.1 Practical Application to Interconnected Chips 13
6.2 Level 1 and Level 2 Efficiency of POC 14
6.3 Pattern Overlay in the time domain and triggerless readout 16
7. Conclusion and Outlook 17
8. Appendix 17
8.1 Entropy of k Random Addresses 17
8.2 CAS Efficiency as a Function of n and Occupancy 18
8.3 Entropy Overhead due to Engineering Properties 18
1. Introduction
Present silicon tracking detectors at the LHC [1] are read out with an external trigger of order
100 KHz rate. This requires a data bandwidth that was consistent with technology available at the
time of construction. As data transmission technology improves, it is natural to consider reading
out future tracking detectors with a higher rate. The maximum readout rate achievable depends
partly on the efficiency of the data encoding used, and evaluating the limits of readout encoding
efficiency is therefore important. This note develops a general method for analyzing the readout
efficiency by considering the theoretical bound of information encoding. The technical details of
the method are the central topic of the note. To illustrate the method, a data encoding example for
readout of binary silicon systems is analyzed as a function of sensor occupancy. A new encoding
scheme is then constructed to outperform the original example. This demonstrates the power of the
analysis method as a tool to make concrete gains, rather than just a theoretical exercise. By readout
efficiency we mean the number of bits used in practice to extract all the required information from
– 1 –
the detector, relative to the minimum possible number of bits needed to meet the same requirements.
This does not mean the bits used for a single event or readout cycle, but the average bits per event
for a large ensemble of events. In practice we define efficiency as the minimum possible number
over the actual number of bits, so that it has a value between 0 and 1. Clearly our main challenge
is to calculate the minimum possible number of bits for given readout requirements.
Our analysis assumes lossless data compression, with the use-case in mind of reading all
information from a detector at high rate. An alternative is to read partial information at a high
rate, followed by full information at a lower rate. Proposals to implement high rate partial readout
followed by low rate full readout have taken the form of a two-level trigger in regions of interest [2]
or self-seeded high transverse momentum track triggers [3, 4, 5]. The present analysis is specific
to silicon strip detectors with binary output. Analysis of pixel detector readout and of readout
including signal strength information (as opposed to binary) will involve additional concepts which
are the subjects of on-going research, and are beyond the scope of this note.
To simplify some of the discussion, a 256-channel front end chip with a single serial output is
assumed where needed, but the results easily generalize to n channels. Typical channel occupancy
for strip systems is of order 1% hit strips per interaction, and so an occupancy range from 0 to 5%
is considered. The roughly 1% occupancy value is dictated by pattern recognition requirements
and applies to any layer radius. In an actual detector the sensor strip dimensions would be adjusted
accordingly to achieve everywhere an occupancy in this range, so as far as the readout chip is
concerned, every chip in the detector should experience approximately the same input occupancy.
Note that here occupancy always means average occupancy over a fairly long time (eg. 1 second).
The average occupancy is what determines the output data bandwidth requirement, rather than the
instantaneous occupancy of a single event. Single event fluctuations can be addressed with local
buffering, and we assume that a system will be designed with sufficiently large buffers wherever
needed to address single event occupancy fluctuations.
Figure 1. Schematic representation of the four parts that make up the detector output data.
To carry out the analysis we regard the detector output data as composed of four parts shown
in Fig. 1. The information about which detector strips are hit is contained in the address and cluster
bits as explained below. The context contains all other information that may be needed for operation
or required by the user, while the format is a kind of wrapper to condition the other parts to meet
data transmission requirements, but does not add any new information- all information is contained
in the previous three parts. This partitioning allows us to make minimal assumptions needed for the
analysis. An actual encoding method need not structure the data in this way for the analysis to be
valid. We analyze the readout efficiency in three levels. Level 0 deals only with how efficiently the
address part is encoded. Level 1 adds the effect of formatting needed for data transmission. Level
2 adds the final two parts: cluster bits and context. This ordering of the levels goes from most
general to most specific. Level 2 would be used to analyze a specific detector, where the details of
– 2 –
cluster bits and context are specified. Level 1 can be used to compare detector readout schemes in
general, not having to know which detector one is talking about. Finally, Level 0 applies to data
compression in general- without even having to assume that it is data from silicon detectors.
Figure 2. Example bit pattern (top) and resulting addresses (bottom) assuming 2 cluster bits.
The cluster bits are a characteristic feature of silicon detector readout. They have their origin
in the underlying physics of charged particles passing through detector layers. Charged particles
cross each layer in approximately random fashion, but each particle may cause a cluster of one or
more adjacent strips to fire. In a binary representation of the hit strips, a cluster is a continuous
string of ones, called a run in data compression literature. This is commonly exploited in detector
applications by encoding the position of only the first strip of a run, followed by a fixed number of
cluster bits to specify the length of the run. Thus the address identifies a run and the cluster bits
specify the length of said run. Note that there is not a 1:1 correspondence between clusters and
addresses. The physical process producing a cluster has no limitation on the number of continuous
hits, and multiple clusters can also be merged, overlapping one another. On the other hand, a fixed
number of cluster bits limit the maximum run that can follow each address. So multiple addresses
may be needed to encode a long cluster, or multiple merged clusters may result in a single address.
Fig. 2 illustrates how a sample bit pattern results in a collection of addresses for the case of 2 cluster
bits (maximum run of 4). However, assuming a suitable choice of number of cluster bits will be
made, the remaining address part should be approximately random. We are thus able to analyze the
address encoding using general methods applicable to a random pattern of bits.
The level 0, 1, and 2 efficiencies are derived in the next three sections. The analysis is pro-
gressively applied to the commonly used Channel Address Sparsification (CAS) encoding scheme,
as an illustrative example. The case of multi-chip module readout is then analyzed in Section 5.
Finally, an example of a new encoding scheme called Pattern Overlay Compression (POC) is pre-
sented in Section 6. Formula derivations as needed are included in the appendix.
2. Level 0 Efficiency
The level 0 efficiency measures how well the address information is compressed by a given encod-
ing scheme. We begin by considering the theoretical lower bound to the number of bits needed to
encode a random bit pattern. We then refine the analysis to consider random address patterns and
find that the difference is negligible in the occupancy range of interest.
Data compression is critical in many fields and is a vast subject of its own [6]. General algo-
rithms have been developed for lossless compression of large computer files, such as the commonly
used zip [7]. A given pattern of bits will be denoted by xi and the collection of all possible xi pat-
terns by X , so that X = {x1,x2,x3, ...}. The smallest average number of bits needed to encode all
– 3 –
Figure 3. The entropy, or minimum average number of bits needed to count the total number of 256-bit
patterns containing a given number of ones (solid). The dashed line, equal to 8 times the number of ones, is
shown for comparison.
the patterns in X is given by the entropy H(X), where X is regarded as a random variable [8],
H(X) =−∑
i
pilog2(pi) (2.1)
where the sum runs over all possible values of i and pi is the probability of finding the pattern xi.
By using the base 2 logarithm the entropy is conveniently expressed in units of bits. Note that
H(X) is a real number, not an integer, as it is the smallest possible average number bits per pattern
and not the number of bits encoding a specific single pattern in X . To make this intuitive, consider
all n-bit patterns as equally likely. The total number of patterns is 2n, and so n bits are needed
to count them. The entropy should therefore be n. Equally likely means all patterns occur with
the same probability pi = 1/2n. Substituting into Eq. 2.1 indeed yields H(X) = n. Calculating
the entropy for this special case of equally likely states was easily done by simply counting states,
without resorting to Eq. 2.1, but for an arbitrary probability distribution simple counting will not
be adequate, while Eq. 2.1 applies in general.
Since the variable of interest for detector readout is occupancy, let us consider the number of
n-bit patterns with a given occupancy, Xnk , where every pattern has exactly k ones. For the case of
completely random occupancy the total number of possible patterns with occupancy k can again be
simply counted, and it is nothing other than the binomial coefficient
(n
k
)
. The entropy should be the
number of bits needed to count the patterns, log2
(n
k
)
. The same result is obtained by using Eq. 2.1
for
(n
k
)
patterns each occurring with equal probability 1/
(n
k
)
. Fig. 3 shows the entropy H(Xnk ) as a
function of k for n = 256. A dashed line of slope 8k is shown for comparison, which corresponds
to the commonly used scheme of listing the addresses of all non-zero channels. We will refer to
this as Channel Address Sparsification (CAS). Note that 5% occupancy, which is the upper limit
we are considering, corresponds to just under 13 ones in this figure. Full frame readout (one bit per
strip whether hit or not) would use 256 bits regardless of occupancy.
– 4 –
Rather than the entropy of a random bit pattern, we wish to calculate the entropy of a ran-
dom address pattern. The difference is that certain address combinations are excluded by the use
of cluster bits. For example, the hit pattern ...0110... is perfectly valid, but the address pattern
...0XX0... is not. When using cluster bits, the hit pattern ...0110... would result in the single ad-
dress ...0X00... This restriction can be expressed in general by the condition that addresses can
not be contiguous; all addresses must be separated by at least one strip. This condition leads to a
modified expression for the entropy. While the entropy of k random ones was log2
(n
k
)
, the entropy
of k random addresses is log2
(n−k+1
k
)
(see Appendix for derivation). The difference between the
expressions grows with increasing k (at k = 1 they are identical). However, for k n the difference
is negligible: at k/n =5% the difference is only 1.3%. The remainder of this note will consider
address occupancy rather than raw occupancy. Note that if the addresses are not exactly random
then the Eq. 2.1 entropy will be smaller, so the randomness assumption makes our entropy bounds
conservative.
Figure 4. Address encoding efficiency for channel address sparsification (CAS) as a function of address
occupancy in percent, for a 256-bit pattern.
As mentioned in the introduction, we define the encoding efficiency as the ratio of the entropy
over the number of bits, B, used by any given encoding method (the entropy being the smallest
possible number of bits needed to encode the information).
ε0 = H/B (2.2)
where we have used the symbol ε0 to denote the level 0 efficiency. This efficiency is shown for
CAS encoding in Fig. 4. Fig. 4 is approximately the ratio of the solid line over the dashed line
in Fig. 3, with the difference that we consider address occupancy rather then raw occupancy, as
explained above.
In Fig. 4 one can see that the efficiency is >90% for 1% occupancy, which seems very good,
yet CAS encoding is not generally used for data compression of sparse bitstreams in commercial
applications. Other methods, such as Huffman codes and prefix compression are used instead [6].
One may wonder why. The reason is that CAS is only efficient for compressing a short bitstream of
– 5 –
Figure 5. Address encoding efficiency for channel address sparsification (CAS) as a function of address
occupancy in percent, for a 1024-bit pattern. The dotted line is the efficiency curve for a 256-bit pattern, for
comparison.
very low occupancy. In general applications, bitstreams are much longer (megabits) and occupancy
is not tightly constrained to very low values. As n increases in Xnk , the fractional occupancy (k/n)
at which CAS works efficiently drops. Fig. 5 shows the CAS efficiency vs. address occupancy
for n = 1024. The efficiency is lower because CAS encoding is only efficient for patterns where
k is small in absolute terms. Thus, for any given fractional occupancy k/n, the efficiency of CAS
encoding can be made arbitrarily low by choosing larger and larger n. So even though CAS may be
an efficient choice for compressing a single hit pattern from a single chip, it is not efficient when
considering the aggregated data from a significant number of chips or patterns. The efficiency of
CAS as a function of n and fractional occupancy α ≡ (k/n) is given in Eq. 2.3 (see Appendix for
derivation). This shows how the CAS efficiency drops with increasing n.
ε0(n,α)≈ 1− ln(α)ln(n) (2.3)
2.1 Choice of n and Entropy per Address
Eq. 2.3 suggests that the efficiency of a given encoding method will depend on the choice of n.
For detector readout applications there are several choices given by the hardware design. Most
obviously there is the number of channels in a single readout IC, but it is a relatively small value:
historically 128 and more currently 256. On the other hand, the number of channels in an entire
detector will be in the millions. The readout must allow the unique location of each hit within the
detector. Therefore, a natural choice of n from an encoding efficiency standpoint is the number of
channels served by a single data link, denoted by nL. As a group, the position of these nL channels
in the detector can be established by “following the readout cable”. Therefore, the information in
the data stream on said “cable” need only uniquely identify each channel within the group, not the
absolute location in the detector. Of course, a data link need not be a physical cable, but rather a
part of the detector that is permanently mapped to a data stream in the data acquisition system by
– 6 –
hardware connections and/or configuration. In typical silicon strip detectors this corresponds to a
module. A module typically contains multiple readout ICs and a number of channels of order 1000.
As we have seen above, CAS encoding is more efficient for n = 256 than for n = 1024. Use
cases of CAS encoding in past detectors in fact relied on a 7-bit address space (corresponding to
128-channel ICs), and identified different ICs within a module using a few-bit IC address space.
This is a form of the Prefix Compression method [6]. For the case of nL = 1024, patterns could
be encoded with a 3-bit IC identifier and a 7-bit subaddress. The reason this is in principle more
efficient than a plain 10-bit address space is that the IC identifier is only transmitted once for
all addresses within that IC, instead of needing a full 10 bits for each address. As long as the
occupancy is high enough that multiple addresses within the same IC are common, splitting the
10-bit address space into an x-bit prefix and a y-bit subaddress seems advantageous. However, a
new problem arises: one must encode how many addresses there are in each IC (or alternatively
when the address words from one IC end and those of the next IC begin). Thus, the efficiency gain
is not as large as naively expected. Nevertheless, this suggests looking to the entropy per address,
Hnk /k, for guidance of how many bits might be ideal to use for subaddress encoding vs. prefix.
Fig. 6 shows the entropy per address as a function of address occupancy for nL = 1024. This shows
that at 1% occupancy the entropy content is about 8 bits per address. Therefore, a Prefix Encoding
method should use fewer than 8 bits for the subaddress to have any hope of approaching the entropy
limit.
Figure 6. Entropy per address vs. address occupancy for a 1024-bit pattern.
3. Level 1 Efficiency
So far the analysis has been abstract in asking how a pattern of bits can be compressed, without
worrying about how the bits may be stored or transmitted. These practical aspects place important
constraints on the format of the encoded data, which in turn affects the achievable compression.
For reliable transmission a bit pattern with “engineering properties” is needed. These include DC-
balance, framing, and error detection with or without error correction.
– 7 –
DC-balance means that up to a reasonably short time interval an arbitrarily chosen segment
of data should contain equal numbers of ones and zeroes. Different implementation methods use
different length intervals over which balance is achieved. The effect of balancing is to constrain to
a narrow band the frequency spectrum of a serial bit stream viewed as an A/C signal. This permits
optimal functioning of driver and receiver circuits as well as transmission lines. Typically the same
hardware can transmit a significantly higher bit rate without errors when DC balance is used than
when it is not, allowing more information to be transmitted per unit time even if extra bits are
needed to achieve DC balance.
Framing allows the receiver of a serial bit stream to determine where meaningful bit patterns
begin and end, while error detection allows the receiver to know when a transmission error has
occurred. One can simply know that an error occurred and therefore discard the affected data, or
one can have enough information to undo the error and correct the data. The level of error detection
and correction varies widely depending on what the application requires. For silicon detector hit
data error detection alone is appropriate, since no detector is 100% efficient to begin with, and
therefore loss of a small amount of data during transmission is acceptable (typical hit inefficiencies
not related to data transmission are of order 1%).
Several commonly used transforms can be applied to an arbitrary raw bit pattern to produce
a transformed pattern with engineering properties. The inverse transform can then be applied to
the transformed pattern in order to recover the original raw pattern. The 8b/10b (8 bit 10 bit) [9]
transform divides the raw pattern into 8-bit words or symbols and replaces each one with a 10-bit
symbol. The 10-bit symbols should ideally be chosen to always have 5 of each ones and zeroes, but
since there are only
(10
5
)
= 252 such symbols (while there are 256 possible 8-bit symbols), 10-bit
symbols with 6 or 4 ones are also used, and 8-bit symbols can be mapped to more than one 10-bit
symbol. DC balance is guaranteed over a 20-bit span by alternating the use of symbols with 6 or
4 ones. A greater number than 256 10-bit symbols is used, effectively allowing the transformed
pattern to contain extra information that was not present in the raw pattern, which the user can
exploit in different ways, for example to mark start and end of transmission. Framing is provided
by the fact that the number of valid symbols is much less that the total number (1024) of symbols
with 10 bits, so the frame alignment is varied until “only” valid symbols are seen. Quotes were
used for “only” because invalid symbols are still tolerated at a low rate without triggering frame
adjustment. Such infrequent invalid symbols are interpreted as transmission errors. In fact, flipping
one bit in any of the valid 10-bit symbols results in an invalid symbol. The 8b/10b transform is
simple to implement in hardware and clearly suited to raw data consisting of 8-bit words.
Scrambling transforms calculate each successive bit of the transformed pattern as a function
of previous bits in the transformed pattern itself as well as the incoming bits in the raw pattern. A
descrambling algorithm reverses the operation to recover the raw pattern. This type of transform
provides DC balance only, without increasing the length of the bit pattern. One must then provide
for framing and error detection separately. Ethernet connections use the 64b/66b transform [10],
which as the name suggests uses 66 bits to transmit 64-bit content. Unlike 8b/10b, 64b/66b consists
of a scrambler plus a 2-bit header, which provides framing, the ability to switch between two
encoding formats, and error detection for the header only, not the scrambled data. Another common
method for framing a random bit pattern is to insert fixed synchronization bits with a periodic
interval. The receiver can then scan for such a repeating pattern. Similarly, error detection can be
– 8 –
implemented by inserting extra bits that are a function of the raw pattern. The simplest incarnation
is the parity bit. Hamming coding [11] provides a higher level of detection and even correction at
the expense of more added bits.
3.1 Efficiency with Engineering Properties
Figure 7. Overhead E(H) to account for engineering properties needed for data transmission. Analytic
approximation (solid) and numerical calculation (dashed).
It is clear from the above examples that adding engineering properties to a raw bit pattern
entails increasing the number of bits. Therefore, the level 0 efficiency seems “unfair” when con-
sidering data transmission, since it would be impossible to achieve 100% efficiency. The level 1
efficiency, ε1, takes into account the overhead due to engineering properties. This is accomplished
by making the substitution H→ H +E(H), leading to
ε1 =
H +E(H)
B1
(3.1)
where B1 is the number of bits used by a given encoding method including engineering properties.
Note that we anticipated that the number of bits one needs to add should depend on H. Unfortu-
nately, we have no absolute measure of the smallest possible E(H) needed to achieve DC-balance,
framing, and error detection. Nevertheless we derive a plausible expression for E(H). We do this
by generalizing the 8b/10b transform concept. We consider a pattern of H bits as an H-bit symbol.
There are 2H such symbols (note that this expression is valid for non-integer H even though the
concept of H bits is not). We therefore want to find the value m such that
(2m
m
)≥ 2H . (2mm ) is the
number of 2m-bit symbols with exactly m ones, and thus DC-balanced by construction. As with
8b/10b, by using only DC-balanced symbols one achieves not just DC-balance, but also framing
and error detection, which are all the things we want to include in the level 1 efficiency. (As noted
before, 8b/10b performs a little better by alternating not perfectly DC-balanced symbols, but we
will ignore this). We can turn the expression around and for every integer m calculate the entropy
– 9 –
content of all the DC-balanced 2m-bit symbols,
H2m = log2
(
2m
m
)
(3.2)
For these special, discrete entropy values, E(H2m) = 2m−H2m. Eq. 3.3 gives an approximate ana-
lytic expression for a continuous range of H (derived in the Appendix). Fig. 7 plots this expression
(solid) compared to a numerical calculation (dotted).
E(H)≈ log2(piH)−1
2
(3.3)
4. Level 2 Efficiency
The level 2 efficiency includes all the elements shown in Fig. 1. However, the number of cluster
bits and the amount of context information needed per transmission are detector and experiment-
specific choices. We therefore will not consider how to determine optimum number of bits, or how
to best encode context information. Instead, we assume values C and ν for the number of context
and cluster bits, respectively. These lead to a fixed overhead in the level 2 efficiency, which is
considered equal for any encoding method. We thus arrive at a level 2 efficiency definition,
ε2 =
H +E(H)+C+νK
B2
, (4.1)
where ν is the number of cluster bits per address used to specify run length and K is the number
of addresses used to encode the raw pattern. B2 is the number of bits used by a particular encoding
method, including context and cluster bits. We did not explicitly break out C and νK in the denom-
inator, because the way they fold into the total number of bits may depend on the encoding method
used.
5. Case Study of 10-Chip Module with Fixed Packet Format
We study the case of encoding the hit data from a 10-chip module with 256 channels per chip,
corresponding to the silicon strip detector readout proposed in [12]. We consider the hit data
encoded in fixed length packets, each packet containing a chip ID and a fixed number of 8-bit
addresses denoted by nP. As already discussed the cluster bits can be separated out and are not
considered in ε0 of Eq. 2.2. Thus at level 0 the packet length is 4+ 8nP, where 4 bits are needed
to count 10 chips. We study this efficiency as a function of address occupancy given by K/2560,
where K is the total number of addresses needed for the full module. The i th chip will use ki
addresses and K = ∑ki. We calculate the average efficiency < ε0 > assuming the K addresses are
randomly distributed among the 10 chips. The results are shown for several values of nP in Fig. 8.
It is interesting to note that ε0 has a plateau over a range of occupancy depending on nP, which
seems like an appealing property. However, the value of the plateau is only in the neighborhood of
60% efficiency, without a strong dependence on nP.
We next consider ε2 of Eq. 4.1, which includes context information, cluster bits, and engi-
neering properties. We arbitrarily take v = 2 cluster bits and C = 6 context bits for the sake of
– 10 –
Figure 8. Level 0 average efficiency for fixed size packet CAS encoding in a 10-chip module for several
choices of packet size. A packet size of 1 is equivalent to plain CAS encoding with 12 bits per cluster. The
address occupancy is for the whole module.
illustration. For engineering properties we make the minimal assumption of scrambling plus 2 bits
per packet, one for framing and the other a parity bit. Thus,
ε2 =
H +E(H)+6+2K
P[2+4+(8+2)nP+6]
(5.1)
where P is the average number of packets used for transmission and the terms in the denominator
correspond to: framing and parity (2), chip number (4), address and cluster bits ((8+ 2)nP), and
finally context bits (6). The results are shown in Fig. 9. Larger packets (i.e. larger nP) now perform
Figure 9. Level 2 average efficiency for fixed size packet CAS encoding in a 10-chip module for several
choices of packet size. Each packet includes 6 context bits and 2 cluster bits per address. The address
occupancy is for the whole module.
– 11 –
better at large module occupancy, not surprisingly, as the context and engineering property bits
must be added to every packet. The efficiency plateaus flatten and move to higher occupancy, and
in all cases the efficiency for reasonable occupancy is 55% or less. This suggests that a significant
gain is possible relative to this encoding method.
6. Pattern Overlay Compression
Understanding the sources of inefficiency using the method we have developed enables the con-
struction of better encoding methods. As an example we develop a new encoding method called
Pattern Overlay Compression (POC), tailored to the specific case of multi-chip readout. This
method is based on the observation that an N-bit pattern with N/2 ones has entropy that approaches
N for large N (consider the approximation
(2x
x
)
= 4x/
√
pix). Thus, a long pattern with 50% random
occupancy will have high efficiency. One can therefore try to represent the hit data using high
occupancy patterns, rather than compressing low occupancy ones. The POC method constructs
one high occupancy pattern by overlaying multiple low occupancy patterns of equal length. In
order for this compression method to be lossless, the low occupancy source patterns must be la-
beled and the labels attached to the resulting high occupancy pattern. The method is best explained
algorithmically as follows.
Let the source patterns all have N bits. The result pattern will consist of N zeroes and K ones,
where K =∑ki is the total number of ones in the source patterns. Thus the result pattern has a total
of N +K bits. The zeroes in the result pattern act as delimiters, effectively creating a table with
N bins. In each bin of this table, a number of ones is placed, which act as flags to indicate how
many source patterns had a one in this location: 3 flag ones means 3 source patterns and so on.
Table 1 illustrates an example with 4 source patterns with N = 8. The results pattern from Table 1
is 10110001100010, where there is one zero for every vertical line in the table, except the first one,
which we have chosen to ignore as it is not needed. The number of flag ones in the result pattern
(six in this case), is the total number of ones from all the source patterns.
The result pattern alone is not enough for lossless compression. We also need to label which
source pattern each flag refers to. In our example from Table 1, the result pattern should be followed
by DACBCB, to indicate which source pattern each of the six flags belongs to. Alternatively, a
label can be placed right after each flag instead of all the labels at the end. Thus, if there are
ns source patterns, the result pattern must include or be followed by K.log2(ns) bits. The full
POC bitstream for our example is therefore: 10110001100010(11)(00)(10)(01)(10)(01) with labels
at the end, or 1(11)01(00)1(10)0001(01)1(10)0001(01)0 with a label after each flag, where the
parentheses are just a visual aid to highlight the pattern labels. Further compression of the pattern
labels is possible whenever there are multiple labels in the same bin. If the labels are always
written monotonically (smallest to largest) then the most significant bits of multiple labels in the
same bin may be known from the value of the first label. In the example given, the pair of labels
(01)(10) could be abbreviated as (01)(0) without loss of information, since from the first label and
the condition that labels are written in increasing order it is already known that the first bit of the
second label must be a one.
If no compression is applied to multiple labels in the same bin, the total number of bits used by
POC is simple to calculate. For the case of a 10-chip module with address occupancy K, chip-wise
– 12 –
Source pattern A: 0 1 0 0 0 0 0 0
Source pattern B: 0 0 0 0 1 0 0 1
Source pattern C: 0 1 0 0 1 0 0 0
Source pattern D: 1 0 0 0 0 0 0 0
Result pattern: 1 11 11 1
0 0 0 0 0 0 0 0
Table 1. Example to illustrate the calculation of the result pattern in POC. The zeroes below the table
indicate which bin boundaries (vertical lines) are represented by a zero in the result pattern. This result
pattern would be written as 10110001100010.
Figure 10. Level 0 efficiency for POC encoding in a 10-chip module for different result pattern number
of bins (256, 128, 64, 32), as a function of module address occupancy. Also shown for comparison is the
average efficiency of fixed packet CAS encoding with nP=4.
POC encoding (each chip is one source pattern) would lead to B0 = 256+K + 4K bits, as there
are 256 bins and 4 bits are needed to label the 10 chips. Since the result pattern is a representation
of a table where each zero marks one bin, it is always possible to re-bin this table and so change
the number of zeroes. For example, instead of 256 bins we can use 128 bins, where each new bin
combines two of the original bins. An additional label bit will then be needed to identify which
original bin was occupied. Thus, B0 = 128+K+(4+1)K bits, B0 = 64+K+(4+2)K bits, etc,
would also be valid and practical POC encoding lengths.
Fig. 10 shows ε0 for 10-chip module POC encoding for various choices of result pattern bin-
ning, between 256 and 32. No compression has been applied to multiple labels in the same bin and
nP = 4 fixed packet ε0 is also shown for comparison.
6.1 Practical Application to Interconnected Chips
POC is naturally suited to readout of multiple chips that are daisy-chained together, as shown in
– 13 –
Figure 11. Diagram illustrating a number of chips connected in a daisy chain configuration. The arrows
indicate data flow direction. The number of chips is arbitrary.
Fig. 11. The format is valid for any number of chips, even a single chip. The first chip in the daisy
chain sends out its data in POC format. The next chip receives this data, adds its bin contents, and
sends the combined data to the next chip. This is repeated down the chain until the last chip, which
produces the module output with the data from all chips.
6.2 Level 1 and Level 2 Efficiency of POC
For addition of engineering properties it is possible to make the same minimal assumptions of
scrambling, framing, and use of parity bits that were made for analyzing a fixed packet format.
However, the presence of a predetermined number of zeros in every result pattern presents an
opportunity to develop a special method unique to POC, so we propose one such possible method
here. This method is slightly less efficient than the minimal assumptions made for fixed packet
format and therefore makes the example more conservative. In analogy to 8b/10b, we use a counter
to keep track of DC-balance. The counter is incremented for every output one and decremented
for every zero. The counter reports whether the output is so far perfectly balanced, has too many
ones or too many zeroes. The proposed encoding starts from the result pattern with embedded
source pattern labels. Each original zero (bin delimiter) remains a zero if the DC-balance counter is
perfectly balanced or has too many ones, but is replaced with a one otherwise. Thus, for example,
the empty result pattern 000000... will become 010101... The bin content flags, which were always
one in Table 1, remain one if the counter is balanced or has too many ones, but are replaced with
zero otherwise. So far it can be said that bin delimiters act to restore DC-balance, while bin content
flags act to increase DC-imbalance. The ideal effect of a label should therefore be to restore DC
balance. As each flag is followed by a label, the combination of flag plus label would then be
DC-balanced. For the example of a 10-chip module, one can use 5-bit labels containing exactly 3
zeroes and 2 ones. There are exactly 10 such labels. Furthermore, the bit-wise inverse of each label
contains 3 ones and 2 zeroes. If the DC-balance counter is balanced or has too many ones, a label
with 3 zeroes is used, otherwise its bit-wise inverse (which has 3 ones) is used. Thus the labels act
to restore DC-balance. Table 2 summarizes this encoding. Note that this is a special case for this
example, but illustrates a method that can be generalized. Note also that 5-bit symbols are used as
labels where only 4-bit codes would be needed without DC-balance, which is the same overhead
as in 8b/10b.
The result pattern DC-balanced this way contains the needed labels, but context information
and cluster bits must still be added. For simplicity of this estimate, we propose to add such infor-
mation immediately following the result pattern using 8b/10b encoding. We will also use an 8b/10b
comma as a header just before the result pattern (this provides additional framing and marks the
start of transmission. A different comma would fill any idle time between transmissions). The
8b/10b codes as well as the 5-bit pattern labels provide error detection. Since the pattern labels
– 14 –
DC-balance counter balanced too many ones too many zeroes
Bin delimiter 0 0 1
Content flag 1 1 0
Pattern label 00011 00011 11100
Table 2. Illustration of the method to add engineering properties to the POC result pattern. The pattern
label shown is representative to indicate the number of zeros and ones in the label, but there are 10 possible
such labels (not shown).
have the same overhead as 8b/10b, we write the total number of bits for level 2 encoding as:
B2 = 10+N+K+1.25(qK+C+νK) (6.1)
where q is the number of label bits at level 0, C is the number of context bits, and ν the number of
cluster bits. The factor of 1.25 arises from replacing every 8 bits with 10 bits (or 4 label bits with 5
bits). In practice a few padding bits may need to be added on a transmission by transmission basis
to make (qK+C+νK) a multiple of 8. A different number of bins in the result pattern, if desired,
can be accommodated by placing the additional binning bits in the 8b/10b encoded part.
Figure 12. Level 2 efficiency, ε2, for POC encoding in a 10-chip module for different result pattern number
of bins (256, 128, 64, 32), as a function of module address occupancy. Also shown for comparison is ε2 for
fixed packet CAS encoding with np=4.
Fig. 12 shows the level 2 efficiency for POC encoding applied to our 10-chip module case
study, compared to fixed packet CAS encoding. Recall that we made an optimistic assumption for
the engineering properties overhead added to fixed packet CAS, while the POC example is more
conservative. POC encoding significantly outperforms fixed packet CAS even with this bias. For
convenience of evaluating the practical impact on module operation, Fig. 13 shows the average
number of bits per event for the case of 128 bins in the POC result pattern, compared to fixed
packet CAS with nP=4, and to the entropy bound, all at level 2. A 0-3% module occupancy range
– 15 –
was used in this figure for clarity. The ratio of each of the top two curves to the entropy bound
results in the corresponding efficiency curve in Fig. 12.
Figure 13. Average number of bits per event used by POC encoding in a 10-chip module for 128 result
pattern bins, as a function of module occupancy, compared to fixed packet CAS encoding with nP=4. Also
shown is the entropy bound for the minimum number of bits possible. All have engineering properties
overhead and context data included.
While we have not included it in the above calculation to keep it simple and conservative,
further compression of multiple pattern labels in the same bin is possible as described in Section 6.
It is instructive to see how that can be implemented within the DC-balanced scheme, because it also
illustrates how the treatment of labels generalizes. For the first label in a bin there are always 10
possibilities in this particular 10-chip case, so a 5-bit symbol is used as described. For subsequent
labels in the same bin, the symbols used depend on how many possibilities remain, assuming the
labels are always written in increasing order. If only 6 possibilities remain, then a 4-bit label with
2 ones and 2 zeroes can be used. If only 3 possibilities remain, a 3-bit label with 2 ones and a
single zero (or its inverse) can be used. While labels with an odd number of bits act to restore
DC-balance as described, labels with an even number of bits have no effect (neither restore balance
nor increase imbalance). Such labels will not compensate the imbalance introduced by the flag bits,
and so cannot be used indiscriminately. In order to ensure that all transmissions will be balanced,
regardless of K or the distribution of the hits, an additional condition must be imposed when even-
bit labels are to be used. This condition is that the number of remaining bins must be greater
than the present imbalance (given by the DC-balance counter). Since bin delimiters act to restore
balance, as long as enough bins are left one can guarantee that balance will be restored by the end
of the result pattern. On the other hand, if use of an even-bit label would violate this condition,
then the next larger odd-bit label must be used instead, which will act to restore DC-balance. This
label selection method can now be used in general, not just for a 10-chip module, with the number
of bits in the longest label as demanded by the number of source patterns being combined.
6.3 Pattern Overlay in the time domain and triggerless readout
We have analyzed the combination of data from 10 chips in one module and in a single event.
– 16 –
However, POC could also be applied entirely within a single chip to data from multiple events. In
this case the pattern labels would correspond to an event counter rather than identifying different
physical chips. Combination in the time domain has the advantage that it does not need to match
a physical hardware configuration, such as the number of chips in a module, and therefore the
number of source patterns, ns, can be optimized. However, in a triggered system, the combination
of many source patterns each from a different triggered event would imply a significant latency.
Thus POC encoding in the time domain is particularly suited to triggerless readout. For triggerless
readout applications, achieving the maximum possible encoding efficiency will likely be a driving
design consideration.
7. Conclusion and Outlook
We have developed techniques to analyze the efficiency of data encoding and transmission for
binary strip detector readout. This analysis has shown that in order to achieve high encoding ef-
ficiency for readout of low occupancy detectors, it is necessary to aggregate in a non-trivial way
the data from either multiple readout chips or multiple frames. This is because the information en-
tropy increases logarithmically with channel count, while the number of bits used by any encoding
method applied to a small set of channels (e.g. a chip) will increase linearly with the number of
such sets. We have used this finding to develop an example encoding method called Pattern Overlay
Compression (POC), which indeed achieves high efficiency by aggregating data in a natural way,
well suited to multi-chip modules or to combination of consecutive events in triggerless readout.
Further work is in progress to develop a similar analysis of pixel detector readout, and to include
signal strength information, rather than just binary readout.
8. Appendix
8.1 Entropy of k Random Addresses
We wish to count the possible combinations of k ones and (n− k) zeroes with the condition that
none of the ones are adjacent, as defined in Sec. 2. Obviously only patterns with n ≥ 2k− 1 can
meet this condition. We begin with the unique pattern 1010...101, containing k ones and (k− 1)
zeroes. If n = 2k−1 this unique pattern is the only possibility. For n > 2k−1 (the case of interest
in this paper) there will be a number of zeroes “left over” given by n− (2k− 1). We now must
count all the possible ways to insert these additional zeroes into the 1010...101 pattern, in order
to generate all possible n-bit patterns with exactly k isolated ones. There are (k+1) places where
zeroes can be inserted (because the zeroes being inserted are indistinguishable from the zeroes
already present). This is equivalent to the problem of chopping a string of n− (2k− 1) zeroes
by inserting k boundaries, which can be represented by k ones. The boundaries (or ones) can go
anywhere. For example, if there are 4 zeroes to be chopped up by inserting 3 ones, we can have:
0000111, 0101010, 0110001, 1010010, etc. Thus, this is the familiar problem of choosing k ones
out of (n− k+1) bits, and the solution is the binomial coefficient (n−k+1k ). Therefore, the entropy
of all the random n-bit patterns with k isolated ones is log2
(n−k+1
k
)
.
– 17 –
8.2 CAS Efficiency as a Function of n and Occupancy
The number of bits needed by CAS is B = k.log2(n). Therefore,
ε0 = ln
(n−k+1
k
)
/[k.ln(n)] (8.1)
= {ln[(n− k+1)!]− ln[(n−2k+1)!]− ln(k!)}/[k.ln(n)] (8.2)
We now substitute the following approximations valid for n k 1: ln(x!) ≈ x.ln(x)− x (Stir-
ling’s approximation) and (n−k+1)[ln(n−k+1)− ln(n−2k+1)]≈ k. After collecting terms we
obtain
ε0(n,k)≈ 1− ln(k)−1ln(n) (8.3)
Eq. 2.3 follows by substituting ln(k) = ln(n)+ ln(α), where α = k/n.
8.3 Entropy Overhead due to Engineering Properties
We start from Eq. 3.2 of section 3.1. To obtain an expression for a continuous range of H we use
the approximation
(2m
m
)
= 4m/
√
pim and then make the substitution m→ x, where x is real instead
of integer. Clearly this substitution is only possible in the approximate formula. We can now write,
H2x ≈ log2( 4
x
√
pix
) (8.4)
H2x ≈ 2x− 12 log2(pix) (8.5)
and
E(H2x) = 2x−H2x ≈ 12 log2(pix) (8.6)
We numerically plotted E(H) vs. H in Fig. 7 (dotted) by looping over discrete values of x and
calculating the ordered pair (H(x),E(x)) for each x. However, we would like an expression for
E(H) rather E(x). From Eq. 8.6 we have x = 4E/pi . Since we know H E(H), we can write,
2×4E/pi ≈ H +E ≈ H (8.7)
from which Eq. 3.3 follows. Eq. 3.3 is also shown in Fig. 7 (solid), where one can see that it is
indeed a good approximation to the numerical result.
Acknowledgments
This work was supported in part by the Office of High Energy Physics of the U.S. Department
of Energy under contract DE-AC02-05CH11231. We thank A. Grillo and J. Agricola for helpful
comments.
References
[1] L. Evans and P. Bryant (editors), “The CERN Large Hadron Collider: Accelerator and Experiments:
The LHC Machine,” 2008 JINST 3 S08001.
[2] D. Wardrope et al., Instrumentation of the upgraded ATLAS tracker with a double buffer front-end
architecture for track triggering, 2012 JINST 7 C08010.
– 18 –
[3] E. Salvati et al., A Level-1 Track Trigger for CMS with double stack detectors and long barrel
approach, 2012 JINST 7 C08005.
[4] D. Abbaneo et al., A hybrid module architecture for a prompt momentum discriminating tracker at
HL-LHC, 2012 JINST 7 C09001.
[5] A. Schöning et al., A Self Seeded First Level Track Trigger for ATLAS, 2012 JINST 7 C10010.
[6] D. Salomon, “Data Compression,” Springer-Verlag London Limited (2007).
[7] PKWARE, inc., “.ZIP File Format Specification,” www.pkware.com (1989).
[8] T. M. Cover and J. A. Thomas, “Elements of Information Theory” (2nd edition), John Wiley & Sons,
Hoboken, New Jersey (2006).
[9] A. Widmer and P. Franaszek, “A dc-balanced, partitioned-block, 8b/10b transmission code,” IBM
Journal of Research and Development, 27(5), 440-451, (1983).
[10] R. C. Walker and R. Dugan, “64b/66b low-overhead coding proposal for serial links,” in IEEE 802.3
High Speed Study Group, update 1/12/00 (2000).
[11] W. W. Peterson and D. T. Brown, “Cyclic Codes for Error Detection,” in Proceedings of the IRE,
Volume 49, Issue 1 (1961).
[12] F. Campabadal et al., “Design and performance of the ABCD3TA ASIC for readout of silicon strip
detectors in the ATLAS semiconductor tracker”, Nucl. Instrum. Meth. A 552, p.292 (2005).
– 19 –
