Abstract-Ultra Wideband systems are hard to implement in standard CMOS technology. In this paper we present a novel spatial RAKE-receiver, exploring mixed-mode circuits for symbol detection and inverter delay lines for synchronization. The receiver is implemented as a RAKE structure combining digital shift registers with analog computation in a series of parallel taps of a synchronizing delay line. In each parallel bit stream the incoming signal is cross-correlated with a stored template. By combining a delay line and a mixed-mode correlator we can explore multipath reflections in a time domain statistical computation for symbol recovery.
The concept of impulse radio (IR) has interesting properties. The wide transmission band makes penetration through different materials better than narrow band transmission. The lack of carrier may be traded for low power solutions provided a power efficient receiver may be implemented. Unlike narrow band radio, demanding statistical computation must be carried out. This is often done in a parallel (RAKE) architecture.
Although several portable applications are striving for higher bandwidth, there is however demands for short-range low bandwidth communication links like in wearable and implantable microelectronics. In several of these applications ultra low power is important. In addition other properties of impulse radio transmissions may be appreciated such as interference immunity and penetration.
The purpose of this paper is to explore low-power solutions for correlator-based impulse radio receivers. A mixed-mode RAKE-like structure is realized in a standard 0.12pm CMOS technology. Simulations are carried out and show promising results with regard to power consumption and overall functionality. Measured results are presented confirning basic functionality of the circuit.
II. UWB POWER TRADE-OFFS.
In current impulse radio receivers the major power consumption is the symbol recovery. A typical receiver is shown in Figure 1 . The transmitted pulse is received in a broadband antenna. After some crude bandpass filtering (not shown) the impulse is matched with a template. Since the emitted energy is severely restricted, the UWB signal is virtually buried in white noise. Through integration pulses are recovered and quantized (ADC). With significant noise present and interfering transmissions from other sources, several erroneous detections will occur. In order to transmit a symbol ('0' or '1') a number of pulses must be combined for one symbol. Periodic repetition of emitted pulses is impossible due to FCC regulations, so pseudo random sequences (PR) of pulses are used for symbol encoding. Typically 50-100 pulses are used for each symbol. Additional benefits with pseudo random coding is that both scrambling and canalization is achieved as well. PR sequences sparsely populated may enable a large number of simultaneous transmissions with minor interference (robust wireless link).
The faint monocycle buried in noise is also reflected by the surroundings giving delayed copies or reflections. In narrow band systems this kind of interference is destructive (fading). In impulse radio technology multipath pulses are explored constructively to reconstruct symbols. The pulse sequence is correlated in the time domain with the expected pseudo random pattern. With the presence of reflections, the pseudo random pattern will be repeated. If Although a large number of multipath pulses are reported for one emitted pulse [2] , only 2-3 fingers are used in the RAKE receiver. The main reason is the increased computational demand when implemented on a DSP.
Aiming at lower power consumption in UWB systems, the DSP implemented RAKE receiver structure is certainly an obvious candidate for power reduction. Current solutions depend on DSP hardware running at several GHz. In order to understand the trade-offs three different time scales may be identified (Figure 2) . At Figure 3 the PR sequence stored in the shift registers is cross-correlated with a stored PR sequence. As bits are shifted through the shift register, a running cross-correlation is done against the stored PR sequence. With more than one symbol, correlators and code registers must be duplicated. The different fingers are not only correlating the PR sequences, but the PR sequences are also synchronized. We are now able to compute the degree of combined match by simultaneously combine the correlation output from all the correlators. However, we need one combiner for each symbol.
By exploring spatial "maps" using delay-lines we are able to trade parallel correlation for lower clock frequency. The question is how this again may be implemented to achieve overall lower power consumption [5] . With pulse duration <Ins processing rates of several GHz must be used in order to combine a number of RAKE fingers and figure out the probability of a transmitted symbol with a DSP approach. Our aim is to explore mixed-mode circuits to implement a real-time RAKE structure reducing clock frequency two orders of magnitude [5] .
III. THE ORTHOGONAL RAKE ARCHITECTURE
As shown in Figure 3 the incoming pulse is detected and quantized [4] in real time without any synchronization or clocking. The received and quantized pulse sequence is "stored" in a delay line using standard inverters. The delay is spanning one bin (typically 50ns) possibly containing the received pulse with reflections. The delay line is sampled with a clock reflecting the transmitting pulse repetition rate. For low bandwidth transmissions this may be in the order of some MHz.
The samples from the delay line are clocked into shift registers making up the RAKE fingers. Some finger will The full RAKE receiver structure in Figure 3 is after all a considerable matrix of correlators and shift registers. It is built up as a 50x50 matrix of cells in a finger structure each consisting of one D-latch and one correlator, with a total of 2500 correlators and D-latches.
The arriving pulses have been shaped by a front-end, and are distributed as Ins pulses through the delay line [4] . No delay is needed for the first finger so the delay line consists of 49 elements with 30 standard minimum sized inverters in each element, which adds up to a number of 1450 inverters. Aiming at a unit-delay of approximately Ins, 30 inverters should match this requirement with the process used.
The computation involved is statistical in the sense of finding a probability of symbol occurrence. Striving at lower power we return to analog computation, as a correlator may be implemented very simply using three transistors.
However it is interesting to investigate the property of canalization. For PR-sequences of a given length there is a trade-off between canalization and bit error rate (BER). An increasing number of channels result in a higher BER. Increasing the length of the PR-sequence will improve both.
By including simple digital logic, two different detection 66 monc pIihoiniu modes are available. Detecting only matching of '1' is effective in a noisy environment while correlation of both '1' and '0' may be more efficient enabling shorter symbol sequences. Thus we needed the possibility to set the operation mode of the correlator between correlation on only ones and both ones and zeroes, which gives a combined XOR-and AND-gate. Extra logic had to be included in the correlator circuit in order to achieve the desired functionality as depicted conceptually in Figure 4 .
InI Figure 4 The implemented correlator circuit
The correlator consists of two parts. Two pMOS transistors controls the correlation current in case of a match, and a differential cascode coupled circuit performs in this case the actual comparing between the two inputs in a pull down network (PDN). The PDN consists of 13 nMOS transistors implementing the required combinatorial function. This kind of logic combines two concepts: differential logic and positive feedback, and requires that each input is provided in complementary format [6] . In order to meet this requirement three inverters, one for each input, are embedded in the PDN. The Control input decides whether the correlator matches only ones or both binary levels, while the two other inputs are the stored template and the output of the D-latch in each cell. The Vbias input controls the current flowing to the combining output line, or combiner line, which contributes to pull the voltage level on the combiner line up to threshold level. Vmatch is the actual connection to the combiner line. 
J=1
N is the length of the pseudo random sequence, Ce E [0,i] and Ij is the unit current from each correlator.
The finger current, Ifinger, is directly proportional to the degree of match between the stored pseudo random pattern and the received bit sequence. Simply by matching the finger current with an appropriate pull-down current, the output will be high if and only if an appropriate degree of match is present. The pull-down current provides a simple pre-charging to the combiner line. This pre-charging is performed by two nMOS transistors in series like a currentmode AND-gate. One transistor is running on inverted clock to pull down the combiner line between sampling, while the other transistor is for current limitation.
The comparator at the output of each finger ( Figure 6 ) is a simple structure chosen because of its low power consumption and self-biasing properties in addition to a fairly good range of adjustment [3] . With some additional pulse shaping and driving capability in the output inverters this should be sufficient. In order to reduce the number of output pads required for the output signals, six pre-designed 8-to-I multiplexers were used. These will operate on a 160MHz clock frequency in order to keep up with the streaming data from all outputs. The RAKE-receiver is realized in a standard 0. l2jam
process on the same silicon as a front-end circuit [4] . The 67 V.w size of the complete receiver structure is 993ptm by 682pm and contains just over 103000 transistors. Figure 7 show a picture of the RAKE-receiver. On the left side the delay line can be seen, while on the right side the six multiplexers and the output from the comparators are easily observed. Figure 8 shows the complete chip with pads. The RAKEreceiver is easily observed. The front-end is seen in the upper left corner with its decoupling capacitors as black squares. The total size of the chip is 1792pm by 1333pim.
So far simulations show promising result with regard to power consumption. Based on simulations estimated idle power consumption of the receiver is about 5OnW. An estimate of a fully correlating finger, with a sampling frequency of 20MHz is about 10ptW. Similar hard wired receiver topologies for comparison are hard to find.
V. PARTIAL RAKE RECEIVER The proposed structure in Figure 3 is often called a full RAKE receiver with a finger for all available taps of the delay-line. It is however possible to reduce the number of fingers if the interesting bits are occurring repeatedly in the same position of the delay line. This may be possible if the clock is locked to some property of the bit stream. It is reasonable to assume some clustering [2] since reflections is a consequence of an initial emitted pulse. A possible approach could be to measure the "energy" of the lower part of the delay-line. Provided some clustering around the emitted pulse exists, the clock could be tuned to achieve the highest energy at that location. We may again turn to a current-mode approach. If digitally controlled current sources are attached to the lower taps and summed on a wire, the total summed current should be proportional to the number of '1' in the lower part of the delay line. This again may be used to synchronize the clock with a PLL. Due to the pseudo random occurrence of the clusters, clock adjustments must be slow.
The clock adjustment must however cope with the high frequency bit stream and significant current must be used to keep up. Based on simulations done on one finger, an increase in sampling frequency from 20 MHz to 40 MHz cause an estimated increase in power consumption of about 300o.
The upside is that the number of fingers may be reduced to taps at the lower part of the delay line. When earlier taps are not used, we may simply remove them and reduce the power consumption.
Another important consequence is that we may reduce the transmitted data rate by increasing bin length and still use the same receiver topology.
VI. SIMULATIONS Figure 9 shows the result of a simulation of a delay element. The delay is just a few picoseconds less than Ins. ones, the correlator is set to correlate on only ones and the input of the finger is set to the positive power rail. 1' Figure 13 Basic functionality of a RAKE finger Figure 13 shows the result of a measurement confirming the functionality of a RAKE finger at a sampling frequency of 40MHz. The signal at the top waveform is the input data, the middle waveform is the output from the comparator in one finger. The bottom curve is a simulated output with a similar setup, at a higher data rate. The measurement is obtained by comparing input data with a sequence of ones. Thereby match is achieved when the input data is high. The result is an output signal consisting of bursts of pulses similar to the ones in Figure 12 . The bottom waveform confirms the appearance of the output signal.
VIII. CONCLUSIONS The power efficient UWB receiver presented in this paper is combining digital delay-lines with analog computation to implement a full RAKE receiver. High-speed clocks are avoided with estimated maximum clock-rate at 20MHz. This is enabling power efficient implementation. Real time correlation is implemented with simple analog correlator circuit and symbol probability matching is computed using Kirchoff's current law. The receiver is realized in a standard 0. 12ptm CMOS process, with simulations showing promising results and measurements confirming basic functionality. 
