Running cross-correlation using bitstream processing by Lande, T S et al.
Running Cross-Correlation using Bitstream Processing 
 
Tor Sverre Lande, Timothy G. Constandinou, Alison Burdett and Chris Toumazou 
 
Abstract: A novel architecture for running cross-correlation and convolution using 
bitstream processing is proposed. The computationally intensive multiplications inherent 
in cross-correlation and convolution are replaced by simple logic operations (AND XOR) 
using bitstream representation. The reduced complexity enables compact and energy 
efficient silicon solutions suitable for small, portable devices such as wearable heart beat 
detecting electronics embedded in the actual ECG patch. 
 
Introduction: The time-domain operation known as cross-correlation is a 
computationally intensive algorithm due to the large number of multiplications required.  
 
 
The index I is the shift or offset parameter between two sampled signals x(n) and y(n). 
For each shift in time all n samples within the correlation window must be multiplied and 
accumulated, i.e. results summed. In a system operating a running correlation these n 
multiplications must be performed in parallel (or multiplexed at higher speed). Since 
multiplication is a power hungry (or slow) operation, power efficient hardware for cross 
correlation is challenging to make. However, by changing the representation or coding of 
the signal, hardware efficient equivalents of running cross-correlators exists. 
 
Bitstream processing: A popular data conversion technique is known as Δ−Σ 
modulation using over-sampling to move in-band noise to higher frequencies (noise 
shaping). As the sampling rate is increased, the precision of the quantiser is relaxed. 
)()(
1
lkykxr
n
k
l −= ∑
=
The simplest quantiser is just a single bit comparator; comparing the input signal to 
some reference level. The output of such a coarse quantiser is known as a bitstream 
encoding of the input signal. This intermediate representation is usually decimated to 
Nyquist sampling rate.  
 
However, this internal bitstream is another digital representation of the incoming signal 
mixed with high-frequency sampling noise and quite conveniently, complex operations 
like multiplication simplifies to bitwise AND or XOR operations. The idea of signal 
processing on bitstreams was proposed by O'Leary and Maloberti already in 1990 [1] 
proposing addition of bitstreams. In [2] and [3] more complex computations are proposed 
like reducing multiplication to simple AND gates. Also use of bitstreams for filtering is 
found in the literature [4]. In this letter we will extend bitstream processing to general 
cross-correlation or convolution.  
 
In the following we will estimate the figure-of-merit (FOM) for the multiplier based 
correlator proposed in Error! Reference source not found. and compare to a FOM 
estimate of the bitstream implementation as shown in Error! Reference source not 
found.. An appropriate FOM for digital microelectronics is: 
counttransistorclockspeedFOM ⋅=  
The clock speeds are normalized to the Nyquist clock frequency (=1). 
 
Cross-correlation computation: A feasible hardware implementation of discrete-time, 
linear cross-correlation using a standard digital approach is shown in Error! Reference 
source not found.. A sequence of samples of length n (history) must be stored for both 
the incoming signal and the template. A multiply-and-accumulate hardware is 
multiplexed at n times the Nyquist clock rate.  
 
The linear cross-correlation of length n is computed for each shift of the incoming signal 
by cycling through both the stored signals and the template and accumulating the result 
in the adder. This assumes a single cycle multiplier, to avoid requiring a higher clock 
frequency. In [5] a power optimized and area efficient array multiplier architecture is 
reported. A rough, but optimistic estimate of transistor count extracted from the paper is 
shown in table 1, where m denotes be number of bits of the multiplier. 
 
We have to include the transistors used for storing signal history and template profile. 
The latches proposed in [6] are the most efficient that is in extensive use. We adopt a 
differential 12 transistor static flip-flop based on this topology. The multiplexer could be 
implemented using a bus structure, but for simplicity we implement a multiplexer tree 
based on 3 gates (=12 transistors) per element. A full adder is assumed to use 30 
transistors per bit is. The FOM of the complete multiplier based correlator array may 
then be estimated as: 
( )( ) nmnnmmmFOMM ⋅+−+++= 30)1(122414014 2  
It is important to note that this is an optimistic estimate based on the state-of-the-art 
array multiplier architecture excluding control logic. 
 
Bitstream cross-correlation: The clue of Δ−Σ modulators is increase of clock speed for 
reduced number of quantization levels of signal value known as oversampling. The 
oversampling ratio, OSR, is defined as the ratio between the Nyquist frequency and the 
actual modulator clock. Samples at Nyquist rate with m bits resolution will each require a 
sequence of OSR bits encoded as bitstream. As a consequence the length of the stored 
samples is increased by OSR*n, but each sample is just a single bit (not a word). A 
bitstream correlator is shown in Error! Reference source not found..  
 
A parallel multiplication architecture is feasible since the multipliers are substituted by 
simple AND gates (NAND gates will do with correction in the adder). We are assuming 
the input signal, xb(n), is available encoded as a bitstream using some adequate Δ−Σ 
modulator. The history of the incoming bitstream signal is shifted through a register of 
length n*OSR and the cross-correlation with the template stored as a bitstream, is 
computed for each shift. The corresponding cross-correlation is then computed by 
counting the number of ‘1’s at the AND gate outputs. An asynchronous counter/register 
using one RS-latch and 2 gates for each bit is combined with an encoder. No increased 
clock speed is required encoding the correlation result to be accumulated in the adder. 
The estimated FOM is then: 
( )( ) OSRmnOSRnFOMB ⋅+++= 3016244  
The trade-offs using bitstream processing for cross-correlation is shown in Error! 
Reference source not found.. 
 
The bitstream advantage is significant with low oversampling ratios (OSRs). The relative 
improvement is also increasing with number of bits of the samples. In practical 
implementations high precision Δ−Σ modulators with low OSR may can be hard to make.  
A typical Δ−Σ modulators [7] with OSR=8 and 14 bits resolution will reduce transistor 
count of the cross-correlator with a factor of 13! For OSR>32 minor or no improvements 
are expected using bitstream processing. 
 
The continuous cross-correlation computed at the oversampling rate is mixed with 
significant high frequency noise. As for all Δ−Σ modulators decimation is required. A 
simple first-order low-pass filter (sinc) of the signal can be made by averaging the 
computed results down to Nyquist rate. Depending on the modulator order more 
elaborate decimators must be used. The increased complexity of the decimator will 
increase hardware complexity somewhat, but since moving average filtering is often 
used in decimator, no multiplication is required. Simple adders are used and the number 
of adders increases with the decimator order. The increased hardware demand is 
therefore minor. 
 
In order to evaluate the signal processing quality on real signal a complete model of the 
bitstream cross-correlator was programmed in MATLAB and compared to the MATLAB 
xcorr() function. As signal source ECG measurements from the MIT-HIB database was 
used (10bits@250Hz). A sigma-delta bitstream was created using a simple first-order 
modulator with linear interpolation and 64 times oversampling. The template was created 
from the first beat and upconverted to a bitstream. The simple first-order decimation (as 
proposed above) was used aiming at reliable heartbeat detection. 
 
The upper trace in Error! Reference source not found. shows the cross-correlation 
using bitstream processing while the lower trace is a “true” cross-correlation using the 
xcorr() function in MATLAB. Although the bitstream cross-correlation still contains some 
noise components, heartbeats are easy to detect using simple thresholding. 
It should be noted that convolution can be done using the same hardware by simply 
reversing the template sequence. Since convolution in time-domain is equivalent to 
multiplication in the frequency domain, efficient filters are done by generating the 
appropriate template. Another improvement is substitution of the AND gate with an XOR 
gate correlating also ‘0’ states in the bitstreams. In a signal processing prospective 
multiplication can be simplified to addition/subtraction since both signals are scaled. 
 
Conclusion: In this letter we have presented a novel, generic hardware implementation 
of running cross-correlation/convolution architecture using bitstream processing suitable 
for power efficient implementation is silicon. The advantage of bitstream processing is 
highest for low oversampling ratios and higher resolution. 
 
Acknowledgements: The authors wish to acknowledge Toumaz Technology Limited for 
sponsoring this research. 
 
References: 
[1] O'LEARY, P., MALOBERTI, F., ‘Bit stream adder for oversampling coded data,’ IET 
Electronics Letters, 1990, 26 (20), pp. 1708-1709 
[2] MALOBERTI, F., O’LEARY, P., ‘Processing of signals in their oversampled delta-
sigma domain,’ Int Conf. Circ. Syst, China, 1991, 12, pp. 438-441. 
[3] MALOBERTI, F., ‘Non conventional signal processing by the use of sigma delta 
technique: a tutorial introduction,’ IEEE Int Symp. Circ. Syst, 1992, 6, pp. 2645-2648.  
[4] SUMMERFIELD, S., KERSHAW, S.M., SANDLER, M.B., 'Sigma-Delta bitstream 
filtering in VLSI,' IEEE Proc. Midwest Symp. Circ. Syst, 1994, 2, pp. 1200-1203. 
[5] HSU, S., VENKATRAMAN, V., MATHEW, S., KAUL, H., ANDERS, M., DIGHE, S., 
BURLESON, W., KRISHNAMURTHY, R., 'A 2GHz 13.6mW 12 /spl times/ 9b multiplier 
for energy efficient FFT accelerators,' IEEE ESSCIRC, 2005, pp. 199-202. 
[6] JIREN, Y., SVENSSON, C., 'New single-clock CMOS latches and flipflops with 
improved speed and power savings,' IEEE JSSC, 1997, 32 (1), pp. 62-69. 
[7] RUOXIN J.  FIEZ, T.S., 'A 1.8 V 14 b ΔΣ A/D converter with 4 MSamples/s 
conversion,' IEEE ISSCC, 2002, 1, pp. 220-461. 
Figure Captions: 
Fig. 1 Cross correlation principle 
Fig. 2 Bitstream correlator 
Fig. 3 The relative improvements of bitstream processing 
Fig. 4 Cross-correlation of measured heartbeats from the MIT-HIB database. Upper 
trace is bitstream cross-correlation and lower trace the MATLAB xcorr() function. 
 
Table Captions: 
Table 1 Multiplier transistor count extracted from [5] 
Fig. 1: 
 
Fig. 2: 
 
 
Fig. 3: 
 
 
Fig. 4: 
 
 
Table 1: 
 Module count # transistors 
Adder m2 14 
compressor 4m 28 
CP adder 2m 14 
 
