A novel synchronization scheme for mostly digital UWB impulse radio architecture by ZHANG QI
A NOVEL SYNCHRONIZATION  
SCHEME FOR MOSTLY DIGITAL  



















A NOVEL SYNCHRONIZATION  
SCHEME FOR MOSTLY DIGITAL  









A THESIS SUBMITTED 
 
FOR THE DEGREE OF MASTER OF ENGINEERING 
 
DEPARTMENT OF ELECTRICAL AND COMPUTER 
ENGINEERING 
 






Name: Zhang Qi 
Degree: Master of Engineering 
Dept: Electrical and Computer Engineering 




Ultra wideband has become hot area of research in recent years. It has promising features 
such as low power and high data rate support which makes it a suitable candidate to be a 
future short range wireless solution. While UWB can provide short range and extreme 
high speed data communication, it could also be used in low data rate applications such 
as biomedical and WPAN. Among all UWB system architectures, impulse radio structure 
could facilitate low power and low system complexity implementations. The design of 
traditional IR UWB transceivers has been studied intensively in literature. The 
continuous trend of downscaling of CMOS technology has lead to the shift of analog 
regime to digital counterpart. Mostly digital UWB transceivers have been reported in 
literature and demonstrated promising results in terms of cost and power consumption. 
However, some challenges still lie in implementing low power architecture. Among them, 
synchronization remains as a great challenge for UWB receiver design due to the ultra 
fine sub-nanosecond scale involved in the transmitted UWB pulses. In this work, 
  
4 
traditional UWB transceiver synchronization architecture is studied and reviewed. 
Conclusion is reached upon that traditional synchronization suffers from a trade of 
between system complexity and receiver performance. As such, a novel digital 
synchronization scheme together with mostly digital receiver architecture is proposed. 
The receiver consists of a low noise amplifier, a threshold detector, a pulse capture block 
and digital signal processing block. Threshold detector performs an early quantization of 
the received pulse while the novel pulse capture block eliminates the traditional 
exhaustive search algorithm for synchronization. Transmitted data are Baker-Code 
modulated and the DSP block in the receiver decodes the received data. The proposed 
receiver is implemented in standard CMOS 0.35µm process. Simulation and 
measurement results have been presented and the simulated overall power consumption 
of the receiver without the LNA is 1.9mW. The silicon area consumption is only 0.19 
mm2. The low power and small area benefits are well maintained which makes the 
proposed scheme suitable for low power low data rate applications. 
 









I would like to thank my project supervisor Associate Professor Lian Yong for his 
continuous guidance and support throughout the two year project period. Also, I would 
like to express my gratitude to my fellow lab mates for valuable discussions. 
 
This work was partially supported by Singapore Agency for Science, Technology and 
Research (A*STAR) under Thematic Strategic Research Programme: UWB Enabled 
Sentient Computing and Faculty of Engineering of National University of Singapore. 
 











Table of Contents 
 
List of Figures…………………………………………………………………………...5  
List of Tables…………………………………………………………………………….9 
List of Symbols…………………………………………………………………………..9  
Summary………………………………………………………………………………..11 
 
Chapter 1 - Introduction and Literature Review                                     12 
 
1.1 A Brief introduction of Ultra Wide Band system……………………………….12 
1.2 The FCC regulations………………………………………………………….....13 
1.3 Literature review of current UWB transceiver architecture…………………….16 
     1.31 A comparison between UWB IR and carrier based RF transceiver              19 
1.4 Possible multiple access scheme for UWB IR architecture…………………….21 
1.5 Impulse Radio UWB transmitter………………………………………………..22 
1.6 Impulse Radio UWB receiver…………………………………………………..26 
1.7 Shortcomings of Traditional Synchronization Schemes………………………..27 
1.8 Objective of This Project…………..…………………………………………...30 
1.81 Organization of the Thesis Main Body                                                        31 
 
Chapter 2 – Digital UWB Receiver Architecture                                   33 
 
     2.1 Threshold Detection Scheme........……………………………………………..34 
  
3 
            2.11 The unity gain buffer                                                                                   36 
            2.12 The threshold detector                                                                                 38 
 
     2.2 Design of High Speed Edge Triggered DFF………….......................................41 
 
            2.21 A close look at a MOS transistor                                                               42 
            2.22 The systematic optimization process                                                         45 
            2.23 An improved version of DFF                   55 
            2.24 CMOS implementation and simulation/measurement Results      57 
 
      2.3 A Novel Pulse Capture Block…………………………………………………61 
        2.31 Proposed structure of a novel pulse capture block                                      61 
           2.32 Layout and Some Measurement Results                                                     65 
 
Chapter 3 – Implementation of the Synchronization Scheme  70 
 
      3.1 DSSS technique.................................................................................................71 
      3.2 Barker sequence.................................................................................................71 
            3.21 Choice of Barker Code            73 
 
      3.3 Proposed Synchronization Scheme.…………………………………………..73 
            3.31 The Search Mechanism           73 
            3.32 Synchronization Algorithm                   75 
            3.33 Declaration of synchronization           76 
      3.4 Bitwise Correlator…………………………………………………………….77 
      3.5 Power saving feature………………………………………………………….78 
      3.6 The DSP Block………………………………………………………………..81 
            3.61 The Binary Merge Adder                                                                           82 
            3.62 Threshold Select Block                                                                              83 
      3.7 The Synchronization Core…………………………………………………….85 
  
4 
            3.71 Synchronization Comparator                                                                     86 
            3.72 Data Validity block                                                                                    88 
                    3.721 The ASIC counter                                                                            89 
                    3.722 The End of Data indicator                                                                92 
                    3.723 Data Reset Block                                                                              93            
                    3.724 Counter control block                                                                    94 
      3.8 Data decoder block………………………………………………………….95 
      3.9 The choice of Barker sequence revisited……………………………………96 
      3.10 Proposed receiver structure and layout diagram…………………………...97 
 
Chapter 4 – Simulation and Measurement Results                           99 
      
      4.1 Simulation result for RF to baseband conversion…………………………..100 
      4.2 Simulation for synchronization and re-synchronization……………………101  
      4.3 PCB design using Altium Designer………………………………………...103 
      4.4 Measurement result for DSP Barker code demodulation…………………..104 
      4.5 Merits of the proposed synchronization scheme ………………………......105 
 
Chapter 5 - Conclusion and Future Work              106 
 
      5.1 A possible waveform for sub-1GHz UWB pulse...........................................106 
      5.2 Another possible modulation using BPSK.....................................................109 
      5.3 Automatic threshold adjustment.....................................................................110 
      5.4 Intermittent LNA operation............................................................................110 
      5.5 Automatic channel threshold selection...........................................................111 
      5.6 Multi-finger for multipath energy harvesting.................................................111 






LIST OF FIGURES 
Fig. 1.1 FCC-regulated spectral mask for UWB indoor communication systems         p.14 
Fig. 1.2 A typical system architecture for an OFDM UWB transceiver                        p.16 
Fig. 1.3 A typical transmitter for UWB impulse radio system                                       p.17 
Fig. 1.4 A typical receiver for UWB impulse radio system                                           p.17 
Fig. 1.5 A typical carrier based RF transmitter                                                              p.19 
Fig. 1.6 A typical carrier based RF receiver                                                                  p.20 
Fig. 1.7 Multiple Access for Impulse Radio UWB system                                            p.21 
Fig. 1.8 Digital UWB pulse generator schematic and timing diagram                          p.23 
Fig. 1.9 Generated digital UWB pulse and its power spectrum                                     p.24 
Fig. 1.10 Direct generation of UWB signal by baseband data                                       p.25 
Fig. 1.11 Receiver architecture proposed in [10]                                                           p.26 
Fig. 1.12 Synchronization algorithm implemented in [10]                                            p.28 
Fig. 1.13 Exhaustive synchronization search algorithm implemented in [12]              p.29 
Fig. 2 Proposed receiver architecture                                                                            p.33 
Fig. 2.1 Structure of the implemented threshold detector                                                       p.35 
Fig. 2.2 CMOS schematic diagram of the unity gain buffer                                                 p.36 
Fig. 2.3 Post layout simulation result for the op-am                                                     p.37 
Fig. 3.4 Schematic of the implemented threshold detector                                           p.38 
Fig. 3.5 CMOS schematic of the implemented threshold detector                               p.39 
Fig. 3.6 Measurement result of threshold detector output                                            p.40 
Fig. 3.7 Basic structure of CMOS transistor                                                                 p.42 
  
6 
Fig. 3.8 CMOS transistor with its source tied to GND                                               p.43 
Fig. 3.9 CMOS transistor in a string of transistors                                                     p.44 
Fig. 2.10 Schematic diagram for a pulse triggered DFF                                             p.46 
Fig. 2.11 Timing diagram for the pulse triggered DFF in toggle configuration         p.47 
Fig. 2.12 Transistor size optimization for state transition 1                                        p.47 
Fig. 2.13 Transistor size optimization for state transition 2                                        p.50 
Fig. 2.14 Transistor size optimization for state transition 3                                        p.51 
Fig. 2.15 Transistor size optimization for state transition 4                                        p.53 
Fig. 2.16 Overall transistor size optimization                                                              p.54 
Fig. 2.17 Overall transistor size optimization for a modified DFF                              p.55 
Fig. 2.18 Overall transistor size optimization for an improved DFF to reduce 
inconsistency                                                                                                                p.56 
Fig. 2.19 CMOS Implemented DFF in toggle configuration                                       p.57 
Fig. 2.20 Structure of an N bit asynchronous counter                                                  p.58 
Fig. 2.21 Layout diagram for counter implemented in standard 0.13µm CMOS         p.58 
Fig. 2.22 Post layout simulation results for counter in standard 0.13µm CMOS         p.59 
Fig. 2.23 Measurement result of high speed DFF in toggle configuration                   p.60 
Fig. 2.24 Proposed pulse capture block                                                                        p.62 
Fig. 2.25 Post layout simulation result for Pulse Capture block                 p.63  
Fig. 2.26 Layout snapshot of the proposed pulse capture block                 p.65 
Fig. 2.27 Measurement result of threshold detector and TFF output                 p.66 
Fig. 2.28 Measurement result for pulse capture block showing conversion of 100mV p-p 
RZ OOK pulse data to full rail NRZ data                          p.67 
  
7 
Fig. 2.29 Measurement results for receiver to decode alternate ‘1’s and ‘0           p.68 
Fig. 3.1 Barker code autocorrelation illustration                                                     p.71 
Fig. 3.2 Barker code template correlation with corrupted received sequence         p.72 
Fig. 3.3 Search algorithm using barker sequence correlation                                  p.74 
Fig. 3.4 Receiver synchronization algorithm flowchart                                           p.76 
Fig. 3.5 Gate level schematic for a XNOR gate                                                     p.77 
Fig. 3.6 Modified Baseband digital synchronization architecture                            p.80 
Fig. 3.7 Block diagram for implemented receiver in Cadence                                 p.81 
Fig. 3.8 Cascading to form a 5 bit ripple carry adder                                               p.82 
Fig. 3.9 Gate level schematic for a 1bit half adder                                                   p.83 
Fig. 3.10 Schematic of correlation threshold select block                                        p.84 
Fig. 3.11 Schematic of the synchronization core block                                            p.85 
Fig. 3.12 Gate level schematic of the synchronization comparator                          p.88 
Fig. 3.13 Block diagram for data validity block                                                       p.88 
Fig. 3.14 Timing diagram for critical signals in the proposed scheme                     p.89 
Fig. 3.15 Functional representations of a JK flip flop and its truth table                 p.91 
Fig. 3.16 Gate level schematic of the implemented counter                                     p.92 
Fig. 3.17 Gate level schematic for data reset block                                                  p.93 
Fig. 3.18 Gate level schematic for data decoder block                                             p.95 
Fig. 3.19 Illustration of possible wrong location of data frame                                p.96 
Fig. 3.20 Layout snapshot for receiver in standard CMOS 0.35µm technology      p.98 
Fig. 4.1 Simulation input to the synchronization scheme                                         p.99 
Fig. 4.2 Post layout simulated outputs from the shift register                                  p.100 
  
8 
Fig. 4.3 Post layout simulation result for proposed synchronization architecture   p.101 
Fig. 4.4 Post layout simulation result showing sync. lost and re-sync.                   p.102 
Fig. 4.5 Layout of the fabricated PCB using Altium Designer                               p.103 
Fig. 4.6 Measurement result for the proposed synchronization scheme                 p.104 
Fig. 5.1 Time domain 2nd order Gaussian derivative pulse                                     p.107 
Fig. 5.2 Power spectrum of a 2nd order Gaussian derivative pulse                         p.108 
Fig. 5.3 BPSK modulation showing binary phase                                                  p.109 

















LIST OF TABLES 
Table 1 XNOR truth table                                                                                           p.78 
Table 2 JK flip flop input derived from the special counting sequence                      p.91 
Table 3 Feedback control truth table                                                                           p.94 
Table 4 Data decoder truth table                                                                                  p.95 
 
LIST OF SYMBOLS AND ABBREVIATIONS 
Symbols: 
Vth – Threshold Voltage of a MOS Transistor 
Cgs – Gate Source Capacitance of a MOS Transistor 
Cgd – Gate Drain Capacitance of a MOS Transistor 
Csb – Source Bulk Capacitance of a MOS Transistor 
Cdb – Drain Bulk Capacitance of a MOS Transistor 
p – Probability of a Wrongly Decoded Bit 
 
Abbreviations:  
UWB – Ultra Wide Band 
WPAN – Wireless Personal Area Network 
CMOS – Complementary Metal Oxide Semiconductor 
DSP – Digital Signal Processing 
LNA – Low Noise Amplifier 
FCC – Federal Communication Committee 
RF – Radio Frequency  
  
10 
GPS – Global Positioning System 
SNR – Signal to Noise Ratio 
OFDM – Orthogonal Frequency Division Multiplexing 
ADC – Analog to Digital Converter 
OOK – On Off Keying 
PPM – Pulse Position Modulation 
BPSK – Binary Phase Shift Keying 
SQNR – Signal to Quantization Noise Ratio 
IR – Impulse Radio 
LO – Local Oscillator 
CDMA – Code Division Multiple Access 
EIRP – equivalent isotropically radiated power 
DFF – Data Flip Flop  
DSSS – Direct Sequence Spreading Spectrum 
DAC – Digital to Analog Converter 
ASIC – Application Specific Integrated Circuit 
TFF – Toggle Flip Flop  
RZ – Return to Zero  
NRZ – Non Return to Zero  
PN – Pseudo Noise  
BCM – Barker Code Modulation  





As a start of the project, a brief study of the wireless communication system was performed. The 
relevant spectrum and other regulations related to UWB were also studied.  
The aim of this project is to build a mostly digital UWB transceiver whereby the power 
consumption is a major design consideration. Bearing in mind the objective, current UWB 
transceiver structure was studied intensively and impulse radio implementation was chosen as it 
could be  a suitable candidate to facilitate the low power and low complexity implementation. In 
the research process, synchronization architecture was identified as one of the greatest challenges 
in implementing power efficient transceivers. Thus, intensive literature reviews were performed 
on the current synchronization schemes and their pros and cons are concluded. 
In order to address the synchronization challenge, a power efficient synchronization scheme was 
proposed which based on novel application of a high speed asynchronous toggle flip flop in a 
UWB transceiver system. The high speed pulse triggered flip flop was designed and optimized in 
a systematic manner. It was implemented using standard CMOS 0.35µm technology and 
measurement results showed its capability in capturing extreme narrow pulse of 200ps width.  
With this flip flop, a pulse capture block was also proposed and implemented which performed a 
direct down conversion for the UWB pulses from radio frequency band to baseband. 
Measurements were performed and the results confirmed its functionality.  
Finally, a complete receiver architecture featuring the new synchronization scheme is proposed 
and implemented. Baker code modulation is introduced which facilitates the synchronization 
tracking. Measurements carried out on the fabricated chip confirmed the feasibility of the 
proposed scheme. The total simulated power consumption of the receiver excluding the LNA is 
1.9mW at a data rate of 2Mb/s. 
  
12 
Chapter 1 – Introduction and Literature Review 
 
1.1 A Brief introduction of Ultra Wide Band system 
The ever changing world has brought us technological advances which enhance our life 
standard. From the era of posting letters as a mean for communication, to the era of 
electronic mails, the evolutional progress of technology has benefitted us in virtually all 
aspects of life. With the supreme creativity of mankind, the communication systems 
transform into a regime with more freedom. The emerging wireless transmission has 
found various applications in our daily life, for the mobile phones we use, for the wireless 
internet we surf; we have been joyfully enjoying the convenience of wireless freedom.  
 
The wireless radio has been first proposed by Heinrich Hertz in 1890s. It underwent 
several evolutionary changes and till today, the frequency spectrum up to several tens of 
GHz has been exploited by various communication systems. With the ever increasing 
wireless application, our frequency spectrum has been filled up gradually. The scarcity of 
the frequency spectrum has inspired us to look for new means to accommodate more 
needs for communication evolvement. Communication system working at 60 GHz has 
been proposed [1] which opens a new era in utilizing extreme high frequency band. 
Another innovative proposal has been popular in recent years which attempts to squeeze 
very low emission power communication system into the frequency spectrum that has 
already been utilized by other licensed systems. This is where the idea of ultra wideband 
  
13 
comes in. Ultra wide band (UWB) system by its name is a technique in transmitting ultra 
narrow pulse sparks so that the frequency spectrum of a UWB pulse is widely spread. 
 
The idea of this novel approach originates in 1890s when the radio communications are 
carried out in sparks to transmit RF energy.  The earliest transmitters of Bose and 
Marconi in 1895, used spark gap technology that generated radio waves across a multi-
GHz spectrum in a largely uncontrolled manner. Over the next 25 years, radio 
technologists sought methods to allow more systems to share spectrum on a non-
interfering basis. Motorized spark generators and LC tank circuits limited the bandwidth 
of spark-based signals and helped control center frequencies. With DeForest’s invention 
of the vacuum tube triode, in 1906, it became possible to transmit very narrowband 
signals at a frequency of one’s choice. As a result, spark technology largely vanished by 
the 1920s. However, as the available spectrum are continuously filled up due to the more 
demanding wireless systems, it becomes essential to exploit the available spectrum even 
more. This is the time when the spark transmission start to revive. For these short 
duration sparks if controlled properly, when their power spectrum is examined, its peak 
emission power is low enough to be treated as noise to other licensed communication 
systems.  
 
1.2 The FCC regulations 
 
However, one must question as what is the level of interference that other operation 
systems could tolerate so as to ensure that its co-existence does not degrades their 
  
14 
performance significantly. Thus, Federal Communication Committee (FCC) has released 
unlicensed 2.1-10.6GHz frequency band for ultra wideband (UWB) applications. UWB 
transmission is defined as the occupied fraction bandwidth more than 20% or larger than 
500MHz of absolute bandwidth [2].  
 
 
Fig. 1.1 FCC-regulated spectral mask for UWB indoor communication systems 
 
Fig. 1.1 shows the power spectrum mask of the UWB system. It sets a regulation on the 
emission power for the UWB transmitter at different operation frequencies. As seen in 
Fig. 1.1, FCC emission mask has been partitioned clearly into two regions, which are the 
sub-1Ghz range and the 2.1 GHz-10.6 GHz band. This partition is primarily due to the 
presence of GPS band. The emission mask near the GPS band of 960 MHz is extremely 
low which render the band from 960 MHz to 1.61 GHz to be eliminated from practical 
  
15 
usage for UWB application. The emission limits in both these two bands are at -41.3 
dBm/Mhz. 
 
The wide bandwidth nature has its intrinsic advantage over narrowband systems in terms 
of system performance. According to Shannon’s law, the maximum channel capacity has 
the following relationship with respect to bandwidth and signal to noise ratio (SNR).  
 
C = B [log2 (SNR+1)]                                                                       Eq.(1.1) 
 
As depicted in equation 1.1, the maximum channel capacity could be increased in a 
logarithm manner if SNR increases linearly. Thus, by arbitrarily increasing the SNR 
using high power techniques, it has limited enhancement on channel capacity. However, 
the channel capacity has a linear relationship to the signal bandwidth. This gives an 
inspiring future for ultra wide band signals which well qualifies for the bandwidth 
requirement.  Since the FCC has permitted the unlicensed use of this huge amount of the 
scarce frequency spectrum, intensive literature studies and experimental results have been 
available.  
 
In the next section, reviews on literature that features a few representative UWB 






1.3 Literature review of current UWB transceiver architecture  
 
 
Fig. 1.2 A typical system architecture for an OFDM UWB transceiver 
 
UWB transceiver can be classified into two broad categories which are OFDM based 
system and impulse based system. Fig. 1.2 shows a typical implementation of an OFDM 
UWB transceiver pair. Orthogonal frequency division multiplexing is a technique which 
breaks the transmission into several sub frequency bands that are orthogonal to one 
another. Sub-carriers generated by the local oscillator are used to achieve orthogonal 
frequency multiplexing. Thus, it allows multiple accesses whereby one user would 
occupy one dedicated frequency channel. Thus, OFDM UWB systems are carrier based 
and its transceiver architecture have a high degree of resemblance with traditional 
narrowband transceiver system. The implementation complexity is high and the filter 
required could be a 5th order to reject adjacent band interferences. Numerous power 
consuming RF blocks such as power amplifier, mixer and variable gain amplifier are 
typically adopted. Also high speed Analog to Digital Convertor (ADC) is required to 






Fig. 1.3 A typical transmitter for UWB impulse radio system 
 
In contrast, a typical UWB Impulse Radio system is illustrated in Fig. 1.3. A typical 
transmitter implementation consists of a pulse generator and a pulse shaper which shapes 
the transmitted pulse to fit the FCC mask. Data modulation is performed by the 
modulator. Population modulation schemes such as on off keying (OOK), pulse position 
modulation (PPM) and binary phase shift keying (BPSK) are commonly adopted. A 
driver amplifier or a power amplifier may be required base on the transmission distance 
and performance desired.  
 
 
Fig. 1.4 A typical receiver for UWB impulse radio system 
  
18 
The receiver has a front end low noise amplifier (LNA). The LNA serves to amplify the 
weak received signal and also provide some crude selectivity. The correlator/integrator 
and the template pulse generator works together as the energy detector. A local template 
pulse generator is required at the receiver.  Generated template pulses should ideally have 
the same shape as the received pulse. By perfectly aligning of the received pulse with the 
template pulse, the correlator output is integrated and a subsequent ADC performs the 
sampling of the received signal.  
 
As one could observe, UWB IR architecture is typically easier to implement in terms of 
the hardware than multiband OFDM system. Firstly, OFDM typically requires variable 
gain amplifier for gain control so that the input to the ADC exercises its full dynamic 
range, thus maximizing the signal to quantization noise ratio (SQNR). Secondly, UWB 
IR systems typically employ correlator and matched filter to detect signal energy. The 
filtering requirement is not stringent. While for OFDM system, high order such as 4-5th 
order filter are typically employed to ensure high selectivity to reject the interferences 
from adjacent channels. Higher order filters are typically power consuming which might 









1.31 A comparison between UWB IR transceiver and carrier based RF transceiver 
 
 
Fig. 1.5 A typical carrier based RF transmitter 
 
Fig. 1.5 shows a traditional carrier based radio frequency module. The data stream is 
modulated by a RF carrier produced by a local oscillator. A power amplifier at the 
transmitter is usually required as traditional RF transceiver usually targets at long 
distance transmission.  
 
As for the transmitter, UWB IR architecture is carrier-less and thus does not contain a 
local oscillator. The traditional carrier based transmitter contains an oscillator which 





Fig. 1.6 A typical carrier based RF receiver 
 
A typical carrier based receiver contains a low noise amplifier as the gain block. The 
local oscillator LO1 together with the mixer brings the RF frequency down to 
intermediate frequency IF. An appropriate filter is required to reject out of band 
interferences and the LO2 brings the received signal to baseband. Then the subsequent 
demodulator demodulates the received signal to recover the transmitted data. 
 
As for the receiver, OFDM UWB system is similar to traditional carrier based receiver as 
they all have a local oscillator and mixer to down convert the received signal from RF 
transmission band to intermediate frequencies. In contrast, UWB IR receiver typically 
uses a local template pulse for energy detection without any down conversion in 
frequency domain. Thus, it could possibly facilitate the implementation of low 






1.4 Possible multiple access scheme for UWB IR architecture 
 
Fig. 1.7 Multiple Access for Impulse Radio UWB system 
 
Multiple accesses for IR UWB system are also possible. As illustrated in Fig. 1.7, the 
pulses are modulated by carrier at different frequency and thus, they are distinguishable 
in frequency domain. As such, the receiver would require local oscillators and filters for 
channel selectivity and interference rejection. This would make its architecture similar to 
OFDM system. Another possible technique is code division multiple access (CDMA) 
which has been commonly adopted in telecommunication systems.  
 
In this work, single user impulse based UWB would be the main focus of discussion as 





1.5 Impulse Radio UWB transmitter 
 
Firstly, we would take a closer examination of a typical IR UWB transmitter. UWB 
transmitter poses very stringent power requirements on the transmitted signal power in 
terms equivalent isotropically radiated power (EIRP). It has to be shaped to fit into the 
FCC spectrum mask so that it does not impose performance degradation to other wireless 
systems. The popular 1st and the 2nd derivatives of the Gaussian pulse were proposed to 
be designed using CMOS technology in [3, 4]. However, they must be filtered out to 
satisfy the FCC regulation. In addition, the current source to generate the pulse dissipates 
constant power at all times. In [5], a transmitter with a pulse shaper consumes a static 
55mA current base on a power supply of 1.8V which translates to a power consumption 
of 99mW.  
 
For low power consumption, one would often seek the possibility to implement it in 
digital domain. Researchers have realized this possible solution and thus, mostly digital 
UWB transceiver architectures are proposed [6] [7]. 
 
It was reported in [8] that the 5th derivative of the Gaussian pulse is a single pulse with 
the most effective spectrum under the FCC limitation floor, and this pulse can be 





The coefficient σ defines the output pulse width, while A defines the output amplitude. 
These are fitting parameters to regulate the output power of the transmitted pulse. 
 
Fig. 1.8 Digital UWB pulse generator schematic and timing diagram 
 
In [7], an all digital UWB transmitter was reported whereby the generated 5th order 
Gaussian pulse is fully compliant with FCC regulations. As shown in Fig. 1.8, the simple 
digital gates are triggered by the delayed version of input clock to give ultra narrow pull-
up or pull-down pulses. These pulses widths could be controlled by the variable voltage 
controlled delays cells. The pull-up pulse would switched on the top PMOS that charges 
the output node while the pull-down pulse would switched on the bottom NMOS that 
provides discharging current. The PMOS and NMOS needs to be properly sized to 




Fig. 1.9 Generated digital UWB pulse and its power spectrum 
 
This paper also presents the simulation results for a typical load antenna of 50 ohms. The 
power spectrum density in Fig. 1.9 shows that it is fully compliant with the FCC power 
spectrum mask. 
 
Another 8 stage driver transmitter is proposed in [9]. The charge up and down control is 
accomplished by eight drivers instead of a signal driver. The extra effort involved is 
meant to shape the pulse spectrum again to fit into the FCC mask. 
 
However, in circuit design, there is always trade off. Simple implementations discussed 
above suffer from process variation significantly as the pulse shape and amplitude is 
solely relying on proper device ratio and sizing. Process variation is inevitable and thus 





Fig. 1.10 Direct generation of UWB signal by baseband data 
 
Another technique in generating UWB signal is presented in [10]. As shown in Fig. 1.10, 
the baseband data signal is fed into the antenna directly. Since the antenna has a much 
wider bandwidth than the baseband signal, it could essentially be modeled as a dipole 
antenna. Thus, the baseband signal is differentiated by the antenna and the resultant pulse 
could also fit into the FCC spectrum mask. 
 
It could be seen that a lot of effort has been spent on implementing fully digital 
transmitter. These efforts tally with the continuous trend of CMOS downscaling whereby 
the analog designs are getting less compatible with the newer CMOS technologies. In 
terms of power consumption, digital implementation of UWB transmitter has achieved an 
order of hundred microwatts in [9] for 100Mbps data rate which has significantly lower 






1.6 Impulse Radio UWB receiver 
 
Now we will take a close look at a typical receiver architecture which comprises of few 
current starving RF blocks already shown in Fig. 1.4. Firstly, a template pulse generator 
is required to generate a copy of the received UWB pulses for energy detection. High 
order of Gaussian derivatives is not trivial to generate unless again digital implementation 
of pulse generator is adopted. Secondly, the correlator and the sampling high speed ADC 
are power consuming as well. Typical power consumption of analog correlation type of 
receiver could easily exceed 100mW even based on advanced standard 0.18um CMOS 
process [5]. The high power requirement prohibits its applications in low power mobile 
applications as the developments of battery capacity still lags behind the paces that circuit 
complexity raises.   
 
Fig. 1.11 Receiver architecture proposed in [10] 
 
Some novel attempts were made in literatures such as [10]. As for the receiver shown in 
Fig. 1.11, the authors propose to adopt analog correlation with a digital template pulse. 
This eases the template pulse generation. Furthermore, the receiver low noise amplifier 
  
27 
operates intermittently which achieves very low average power. Power control is also a 
very useful technique in reducing the power consumption in UWB systems as the duty 
cycle of the UWB pulse is typically very low. However, in order to implement power 
control, one needs to know exactly when to switch on/off the LNA and other analog 
blocks. It could be extremely challenging as the exact location of the received ultra 
narrow pulse is unknown to the receiver. This intrinsically poses another challenge to 
UWB receiver design which is the synchronization process.  
 
1.7 Shortcomings of Traditional Synchronization Schemes 
 
To synchronize the receiver in sub-nanosecond scale is not easy to implement with a low 
power budget. The crudest way could be using a very high speed ADC that works at the 
Nyquist rate to sample the received pulse at Gigahertz bandwidth. It could be difficult to 
design such a high speed ADC and power consumption could be very high. The 
synchronization challenge leads the receiver into two categories which are known as 
coherent and non-coherent receivers. Coherent receiver could result in high system 
complexity as the synchronization timing precision needs to be in the order of sub-
nanoseconds. However, it could achieve higher data rate than non-coherent counterparts 




Fig. 1.12 Synchronization algorithm implemented in [10] 
 
Fig. 1.12 illustrates one commonly adopted tapped delay line synchronization algorithm 
in [10]. The delay and search algorithm is easy to implement if simple inverters are used 
as delay cells. The multiple taps from the delay line are controlled by some digital logic 
to exhaustively scan through the possible locations of incoming pulses. One trade off of 
such simple implementation is the accurate delay generation required to perfectly align 
the received pulse with local template pulse. Accurate delay generation could be achieved 
by well established techniques such as delay locked loop [11], but at the expense of 
higher area and power penalty. While inaccurate delay could plaque receiver performance 
due to insufficient analog correlation, local pulse template pulse generation and analog 
correlation could be power starving that result in high system complexity if higher order 
derivatives of Gaussian pulses are used. Furthermore, the simple inverter chain based 
slide and correlate synchronization scheme could suffer process variations and the supply 
voltage variation adds more uncertainty to the unit delay from the delay taps. These 
  
29 
uncertainties could not be modeled very accurately in post layout simulations. In [10], 
search interval of 2ns at 1Mb/s would require 500 slide and correlate search cycles that 
result in long acquisition time. In the event of false lock, the recovery time could be long 
if a nearby back and forth search algorithm is adopted due to the extreme low duty cycle 
and long idle cycle of the UWB IR pulses. 
 
Fig. 1.13 Exhaustive synchronization search algorithm implemented in [12] 
 
As shown in Fig. 1.13, a full RAKE receiver is implemented to perform exhaustive 
search for the correct frame of the incoming pulse in [12]. While full RAKE receiver 
structure provides enhancement in heavy multipath environment, only a fraction of 
fingers captures the reflected energy while other fingers seem to be redundant other than 
exhaustive scan for the possible locations of the incoming pulses. Each finger contains 
  
30 
simple digital blocks, but the aggregated large branch of fingers adds in significant power 
and area penalty. The system also needs to provide certain delay margin due to the 
inaccurate delay generation in the simple structures.  
 
Synchronization remains a great challenge in UWB transceiver design. The ultra narrow 
pulse narrow brings in the merit of wide bandwidth, while the trade off is the ultra fine 
resolution requirement that challenges efficient low power designs. 
 
1.8 Objective of This Project 
 
In the view of promising UWB applications in various areas, synchronization algorithm 
needs to be addressed so that a low cost and power efficient transceiver system could be 
realized.  
 
It is necessary to qualify what are the essential elements of a good synchronization 
algorithm. Ideally, one would like the synchronization locking to be fast. It should 
consume little silicon area and consumes little power. Also, it should provide some 
synchronization tracking mechanism to keep track of the synchronization status. In the 
event that synchronization is lost, it should have the ability to re-synchronize as quickly 
as possible. 
 
The ultimate difficulty in locating the pulse is due to its extremely low duty cycle. 
Traditional UWB receiver actively searches for the possible location of the pulse. An 
  
31 
innovative question to ask is that what if the receiver could wait for the pulse to trigger 
some circuitry in it rather than searching for it exhaustively. For the receiver to take a 
passive role, it is when the concept of asynchronous circuits comes in. Asynchronous 
circuit could wait for an event to trigger itself rather than requiring a periodic clock to 
trigger an event. 
 
A suitable candidate to be used in a UWB receiver could be simply an asynchronous Data 
Flip Flop (DFF) which is pulse triggered. For every UWB pulse received, ideally we 
would like the pulse being treated as the input clock signal to the DFF.  
 
One problem which one could easily foresee is that the receiver pulse amplitude is 
typically very weak which is far below the amplitude level required to trigger the logic 
state of a DFF. Free path loss in typical indoor environment is very serious. Also, the 
DFF needs to be fast enough to capture the ultra narrow pulse received.  
 
The second issue could be that as a DFF only captures one single pulse, it could not yet 
be utilized in the receiver while data stream is being transmitted continuously.  
 
1.81 Organization of the Thesis Main Body 
 
The overall proposed receiver architecture is illustrated in Chapter 2. To address the first 
issue, threshold detection scheme is introduced in Chapter 2. It serves to quantize the 
weak received pulse to a logic level that is distinguishable by a digital logic flip flop. 
  
32 
Chapter 2 also provides a detailed and systematic approach in designing near optimal 
high speed edge triggered circuits.  
 
To address the second issue, a pulse capture block is proposed which captures the UWB 
pulses and performs a RF to baseband direct conversion. Chapter 3 introduces the direct 
sequence spreading spectrum (DSSS) technique of implementing barker code data 
modulation scheme. A novel synchronization scheme is proposed in this Chapter which 
achieves both synchronization tracking and resynchronization capability. The design of 
digital baseband processing circuits is also illustrated in detail. Chapter 4 presents the 
simulation and measurement results of the integrated receiver in standard 0.35µm 
technology. Chapter 5 suggests future possible work that could improve the receiver 























As discussed in Chapter 1, inherent high power consumption in traditional UWB receiver 
of the systems comes from a number of analog blocks: LNA, template pulse generator, 
Mixer/Integrator and/or high speed ADC. Attempts should be make in virtually all 
transceiver systems to replace analog blocks with their digital counterpart. On off keying 
(OOK) modulation has been adopted here while BPSK modulation is also compatible 
with the threshold detection receiver architecture [9]. 
 
Fig. 2 Proposed receiver architecture 
 
 
Fig. 2 illustrates the proposed receiver architecture. The traditional template pulse 
generator, analog correlator and synchronization search blocks have been replaced by 
digital domain counterparts. Variable threshold detector is feedback controlled by the 
synchronization scheme which sets a dynamic threshold for different channel conditions. 
The proposed pulse capture block performs a direct down conversion of the received 
pulse from RF to baseband. The system architecture and detailed design are discussed in 
  
34 
the subsequent chapters. 
 
In mostly UWB transceiver, pulse digitization is a crucial functional block which 
performs a conversion of the received signal from RF to baseband signal. In the proposed 
receiver architecture, the pulse digitization block consists of a variable threshold detector 
and a pulse capture block which would be discussed in detail in this chapter. 
 
2.1 Threshold Detection Scheme 
 
Fig. 1.11 in Chapter 1 illustrates the commonly adopted UWB receiver architecture 
which bases on analog correlation of the received pulse with the template pulse. There 
are few issues in implementing the analog correlation. Firstly, the mismatch between the 
transmitter antenna and the transmitter would alter the pulse shape. For instance, in [10], 
as the antenna has a much wider frequency response than the transmitted pulse, it acts as 
a differentiator to the transmitted pulse. Thus, at the receiver, the template pulse has to be 
the derivative of the transmitted pulse. Higher orders of Gaussian derivative pulses are 
costly to generate in analog domain. Analog correlation could also be power hungry and 
thus these could add significant power penalty to the transceiver. Secondly, analog 
correlation requires very accurate alignment of the received pulse with the template pulse 
or the receiver performance would be plagued. Template pulses are typically triggered by 
a clock edge in the receiver. The perfect alignment challenge then transforms into the 
challenge of generating accurate delays. As the UWB pulses are ultra narrow, especially 
  
35 
for the 3-10 GHz high frequency range, the typical resolution of the delay required could 
be in the range of tens of picoseconds.  
 
In the view of costly analog implementation, threshold detection algorithm has become 
popular in recent literatures [9] [13] which provides promising results in relatively high 
SNR environment that eliminates the need for template pulse generator, mixer/integrator.  
 
Essentially, single level threshold detector can be viewed as a 1-bit ADC which performs 
early quantization of the received pulse. It is obvious that the threshold detection 
mechanism is level based detection that is inferior to analog correlation that is energy 
based detection. This issue has to be addressed before the threshold detection algorithm 
could be fully exploited. In Chapter 5, a DSSS technique is introduced to enhance the 
reliability of the single level threshold detection algorithm. 
 
Fig. 2.1 Structure of the implemented threshold detector 
 
In our synchronization scheme, simple threshold detection scheme is adopted to facilitate 
the low power requirements for mobile applications. As illustrated Fig. 2.1, a simple 
  
36 
threshold detector used is essentially a 1-bit quantizer that consists of a ratioed common 
source amplifier, biased at high gain region by a voltage follower. The threshold detector 
together with a unity gain buffer will allow a simple implementation of a tunable 
threshold detector as the bias voltage to the threshold detector can be set by a DC voltage 
input at the unity gain buffer. 
 
2.11 The unity gain buffer 
 
Fig. 2.2 CMOS schematic diagram of the unity gain buffer 
 
The unity gain buffer serves to superimpose a DC voltage to the received pulse so that the 
DC bias to the input gain stage is set to an appropriate value. A simple active current 




The biasing tail current is set to 100µA which is mirrored by the left most current mirror 
branch. The resistive division is achieved by simple NMOS in diode connected 
configuration. 
 
Fig. 2.3 Post layout simulation result for the op-am 
 
Fig. 2.3 shows the post layout simulated results for the open loop gain of active current 
mirror load amplifier. The design target is set at an open loop gain of 200 which is about 
46dB. The -3dB bandwidth is not critical as the input signal is almost a DC voltage 
coming from an ADC. Simulation results show a -3dB bandwidth of about 8 MHz which 





2.12 The threshold detector 
 
Fig. 3.4 Schematic of the implemented threshold detector 
 
The gain of the threshold detector is set to about 26dB which is typically sufficient to 
quantize the received pulse to a logic level distinguishable by subsequent inverter stages. 
As shown in Fig. 3.4, the PMOS load comprises of M1 and M3. The feedback transistor 
M2 serves to maintain the ratio between PMOS load and NMOS while M1 could then be 
downsized which speeds up the pull down of intermediate node A. For low data rate 
application, due to the narrow pulse width of UWB pulses, the duty cycle is extremely 
low. The top PMOS M1 thus can be turned off by switching its gate voltage between Vdd 






Fig. 3.5 CMOS schematic of the implemented threshold detector 
 
The purpose of threshold detector is two-fold; firstly, an appropriate threshold is set to 
distinguish the received pulses from noise, secondly to further amplify the received 
pulses after LNA to full supply rail for subsequent signal processing. We have chosen a 
simple pulse shape in [10] based on the UWB transceiver system working below 1GHz. 
0-960MHz band is more suitable for low-speed applications than the 2.1-10.6GHz band 




Fig. 3.6 Measurement result of threshold detector output with 500ps-width 100mV p-p pulse input 
Fig. 3.6 shows the simulation and measurement result of the threshold detector 
implemented in 0.35µm standard CMOS technology. The simple threshold detector 
demonstrates the capability of detecting low amplitude 100mV peak to peak, 500ps-width 
pulses based on the measurement results. With more advanced CMOS technology with 
smaller gate dimension, even narrow pulses could be captured. This would be fully 
compatible with high order Gaussian pulses that have multiple narrow pulse peaks. 
 
One question could possibly be raised as how higher order Gaussian pulses could be used 
in threshold detection scheme as it could have multiple peaks which all trigger the 
threshold detector. A simple solution could be ‘delay and add algorithm’ which delays 
the detected peaks by a pulse peak interval and adds it to the original waveform to form a 
wider pulse. No accurate delay is needed here as although the digital domain adding 
  
41 
processing may lead to glitches, these glitches could typically be too narrow to be 
captured by subsequent processing stages. 
 
The receiver antenna may act as a differentiator to the transmitted pulse but threshold 
level detection mechanism remains unchanged.  
 
2.2 Design of High Speed Edge Triggered DFF 
 
Digital circuits design is usually completed by CAD tools such as synopsis. The 
advantage of CAD tools includes the ease of implementation and it does not require the 
detailed knowledge of circuits at transistor level. However, it also has intrinsic limitations 
such as high cost due to larger silicon area and higher power consumption. Another 
limitation is the speed constraints. CAD tools might not meet the stringent timing 
requirement if very high speed circuits are required.  
 
This would be the typical case in an ASIC design whereby a small portion of the circuit 
operates at very high speed which is beyond the capability of CAD synthesis tools. As 
such, high speed digital circuits could virtually be treated as analog circuits that transistor 
level optimization is required. For high speed designs, the high speed requirements 
typically comes from the edge triggered circuits whereby they have to respond to fast 
changing clocks. In this Chapter, a systematic approach would be introduced to design a 





2.21 A close look at a MOS transistor 
 
Firstly, one could easily identify the speed bottleneck of a basic building block of CMOS 
circuits that is a MOS transistor. 
 
Fig. 3.7 Basic structure of CMOS transistor 
 
Fig. 3.7 shows the basic structure of a transistor. The change of MOS logic status is not 
instantaneous as it requires the charging and discharging of parasitical capacitance that is 
intrinsic to a MOS transistor. Thus, the speed bottleneck comes virtually from these 
parasitics and the analysis of these capacitances would definitely help in identifying the 
major cause to speed limitations. Among all these parasitics, the gate source capacitance 
typically dominates. Thus, in order to achieve high speed operation, one would like to 
minimize the Cgs.  
 
One way to achieve small Cgs is to use more advanced technology so that the width and 
gate length are proportionally scaled down which leads to a reduction of parasitical 
  
43 
capacitances. The other way seems to be using a smaller width for a transistor since Cgs ~ 
CoxWL. However, it does not help to increase the speed performance as the charging and 
discharging current has a linear relationship with transistor width. By reducing the width 
W, the current driving capability also decreases proportionally which degrades the speed.  
 
With the limited CMOS technology accessible at hand, one could possibly have 
difficulties in optimizing the circuit especially when the circuit involves a large number 
of transistors. This would lead to a non polynomial problem which has huge complexity 
when it comes to optimization. A systematic way is much preferred in optimization but 
before that, some interesting phenomenon about transistor parasitics will be addressed 
first. 
 
Fig. 3.8 CMOS transistor with its source tied to GND 
 
This is a case when both source and bulk of a transistor is tied to GND. Apparently, the 
load of this transistor to its driver is Cgs ~ CoxWL. This is a typical case when the NMOS 
  
44 
transistor is at the bottom of a string of transistors so that both its source and bulk are 
connected to the lowest potential.  
                                                                                     
Fig. 3.9 CMOS transistor in a string of transistors 
 
Fig. 3.9 illustrates another case then the load transistor to the previous state is in the 
middle of a string of transistors. A simplified 1st order model translates it into a RC 
circuit in Fig. 3.9. One could easily work out the equivalent impedance of this simple RC 
network which is  
                               where the R here is the channel resistance of the bottom transistor. 
 
We would now consider two cases based on the status of the bottom transistor. 
 
1. If the bottom transistor is ON, then the channel resistance would be small and the  
      above expression simplifies to            . This simply indicates that the load transistor  
presents a load equivalent to its gate source capacitance Cgs. 
2. If the bottom transistor is OFF, then the channel resistance R would be very large 
















                 which is a series combination of Cgs and the drain bulk capacitance Cdb.  
 
This is a very critical and useful observation as for high speed digital circuits, the 
transistor sizing ration is typically greater than 20. As such, Cdb is much smaller than Cgs 
and a serious combination of Cdb and Cgs results in a significant reduction of load 
capacitance to the previous stage.  
 
An important conclusion is that if a load transistor is in the middle of a string of 
transistors and this string of transistors are off, then this transistor would very unlikely be 
a speed bottleneck. With this principle in mind, we could eliminate quite some transistors 
in speed optimization process and that could greatly reduce the optimization complexity. 
 
2.22 The systematic optimization process 
 
In order for the flip flop to capture UWB pulses, it has to be pulse triggered in nature 
since the UWB pulses are extremely narrow. Pulses could be viewed as very high speed 
clock since the time interval between the rising and falling edge is very short. For clock 
edge triggered circuits, the transistors only change its state on the clock edges. Thus, a 








Fig. 2.10 Schematic diagram for a pulse triggered DFF 
 
Fig. 2.10 illustrates the implementation of a high speed pulse triggered flip flop in toggle 
configuration. To optimize 9 transistors in flip flop, the complexity is 29 since each 
transistor could be scaled up or down in order for speed optimization. The complexity 
rises exponentially with the total number of transistors. We will now see how the above 
mentioned observations help to de-emphasise certain transistors in certain state 























Fig. 2.11 Timing diagram for the pulse triggered DFF in toggle configuration 
 





















A      B      C      D 
  
48 
We denote the need to upsize the transistor by an up arrow while downsize would be 
represented by a down arrow. A cross would imply that the sizing of this particular 
transistor does not matter in this state. 
 
Now we would consider the first case A whereby the clock changes from 1 to 0. Qbar is 
at logic 0. For a positive edge triggered flip flop, Qbar would remain at logic 0. We make 
an assumption that the CLK signal has sufficient driving power so that we could upsize a 
clocked transistor if necessary. This assumption could be justified easily if a driver is 
inserted to drive the clocked transistors.  
 
For transistor MN1, since Qbar remains at logic 0, its sizing does not matter in this 
transition and thus a cross is marked on it. 
 
For a CLK transition from 1 to 0, clocked NMOS are off and thus the sizing of MNC1 
and MNC2 does not matter. Again, crosses are marked on them. 
 
MN1, MNC1, MNC2 and MN3 are off and thus, we represent them by broken nets in Fig. 
2.12. There are two critical transition nodes which are marked by N1 and N3.  
 
In order for Qbar to stay at logic 0, MP2 has to be off which means N2 will need to stay 
at logic 1. For the previous state when CLK is 1, node N1 has to be at logic 0 so that N2 




Thus, in the current state when CLK transits from 1 to 0, N1 will be charged up by MP1 
and MPC1. Thus, these two transistors need to upsized to speed up the charging process. 
 
Now we would consider transistor MN3. For MN2, one may think that since it presents 
significant load to node N1, it should be downsized. However, as according to the 
observations mentioned earlier, since MNC1 is off, the load it presents to node N1 is 
much smaller than its Cgs. Thus, it is very unlikely to be the speed bottleneck in this state 
transition and thus, we could mark it with a cross. 
 
Since N2 is at logic 1 in the previous state and remains at logic 1 after the CLK transition, 
the size of MPC2 does not matter as it does not charge N2 in this state transition.  
 
Since N2 does not change, thus the size of load at N2 does not matter. We mark crosses 
on transistor MP2 and MN3. The size of MNC1 and MNC2 also does not matter as it is 
off when clock transits to logic 0. 
 
In such a systematic matter, we easily identify the transistor sizing requirements in Fig. 





Fig. 2.13 Transistor size optimization for state transition 2 
 
Now we consider the next state transition B whereby CLK changes from 0 to 1. 
 
Since CLK changes to logic 1, MPC1 and MPC2 are off and we represent this by two 
broken nets at node N1 and N3. 
 
In this transition state, Qbar will toggle from logic 0 to logic 1. Intuitively, the loads to 
node Qbar should be small. Also, since MPC1 is off, N1 does not change its state. Thus, 
MP1 and MN1 should be scaled down so that the load to Qbar is minimized. The size of 
MPC1 does not matter in this state as N1 retains its logic value of 1. 
 
Size of MPC2 does not matter as it is off when CLK is logic 1. For the critical node N2, it 
will have to be discharged to logic 0 by MN2 and MNC1 and thus, both MN2 and MNC1 



















For transistor MP2, it presents load to discharging node N2 and thus it should be scaled 
down. On the contrary, MP2 will need to charge Qbar from logic 0 to logic 1. Thus, 
ambiguity arises in choosing the appropriate sizing of MP3. 
 
MN3 presents loading to node N2 and needs to be scaled down. For clocked transistor 
MNC2, since we have assumed CLK has sufficient driving power, size of MNC2 does 
not matter. 
 
Fig. 2.14 Transistor size optimization for state transition 3 
 
Next, we consider the state transition C whereby clock changes to logic 1 from 0 again. 
 
Since the CLK is logic low, clocked transistors MNC1 and MNC2 are off. They are 
represented by broken nets. The size of MNC1 and MNC2 does not matter as we are 




















For transistor MPC1, its size does not matter not only because it is a clocked transistor, 
also because N1 does not change its logic state so MPC1 will not contribute to any 
charging or discharging process. 
 
For transistor MP1 and MN1, since Qbar remains at logic 1, they do not present any 
loading to their driver. Thus, the size of MP1 and MN1 does not matter also. 
 
Since node N1 remains at logic 0, its load MN2 has no effect in this transition and thus 
size of MN2 does not matter. 
 
For transistor MPC2, since in this transition, it is required to charge N2 from logic 0 to 
logic 1, it needs to be scaled up. 
 
For transistor MP2 and MN3, since they only act as loadings to node N2 in this transition, 






Fig. 2.15 Transistor size optimization for state transition 4 
 
Finally, we consider the last clock transition D as clock changes from 0 to 1. 
 
For clocked transistor MPC1, since clock is at logic 1, MPC1 and MPC2 are off and thus 
broken nets are drawn at node N1 and N3. The size of MPC1 and MPC2 does not matter. 
 
Since Qbar discharges from logic 1 to logic 0, its loading MP1 and MN1 need to be sized 
down. In this transition, MP1 and MN2 only act as loadings to output node. 
 
N1 remains at logic 0 in this transition and thus size of MN2 does not matter. Clocked 
transistor MNC1 does not matter as well since it does not perform any function in this 




















N2 remains at logic 1 and thus transistors which present the load to N2 do not have an 
effect on the speed of this transition. As such, size of MP2 does not matter. 
 
For transistor MNC2 and MN3, since they discharge the output Qbar, they need to be 
scaled up to provide high discharging current. 
 
 
Fig. 2.16 Overall transistor size optimization 
 
After finishing the four state transition analyses, we end up with the above sizing diagram 
in Fig. 2.16. It is noticed that three transistors namely MP1, MP2 and MN3 have sizing 
ambiguities. Certain transition states require them to be scaled up while other transition 
requires them to be scaled down. It is definitely preferable to have minimum transistors 
with sizing ambiguities so that the optimization complexity is in the order of 2n where n is 
the number of ambiguous transistors. However, with the above systematic approach, we 



















Fig. 2.17 Overall transistor size optimization for a modified DFF 
 
For edge triggered circuits, we can change the position of the clocked transistors without 
affecting its logic functionality. When the position of MN3 and MNC2 is swapped and 
state analysis is carried out, it ends up with 3 ambiguous transistors again as shown in Fig. 
2.17, but this time, MP2, MNC2 and MN3 are all in the same branch which makes it 
possible for further modification to reduce the number of ambiguous transistors. 
 
2.23 An improved version of DFF 
 
A careful state transition analysis reveals that MNC2 and MN3 need to be scaled down 
during the transition when CLK rises from logic 0 to logic 1. In this transition, N2 is 
discharged from logic 1 to logic 0 by MN2 and MNC1. The transition of N2 is not 



















This significantly limits the speed that Qbar raises as MNC2 and MN3 forms a 
discharging path.  
 
 
Fig. 2.18 Overall transistor size optimization for an improved DFF to reduce inconsistency 
 
One novel approach is to stack a transistor MN4 on top of MN3. Also, a very small sized 
inverter is inserted between node N1 and the gate of transistor MN4. These modifications 
will serve two purposes. Firstly, during the state when MNC2 and MN3 need to be scaled 
down, N1 is at logic 1. Thus MN4 is off. This actually cuts the initial discharging path 
which speeds up the charging of Qbar from logic 0 to logic 1. Secondly, since MN4 is off, 
MNC2 would be in the middle of a string of transistors. Thus, the load it presents to N2 is 
much smaller than its gats source capacitance Cgs. As such, its sizing does not matter 





















A new sizing diagram is formed in Fig. 2.18. As observed, the conflicting transistor turns 
out to be only MP3. That transform to an optimization complexity of 2 which is easily 
manageable with manual tuning of transistor sizing of MP3. 
 
2.24 CMOS implementation and simulation/measurement Results 
 
For high speed digital circuits, typical transistor sizing ration W/L would be 
approximately 20. With this guideline, the transistor sizing are chosen and the above flip 
flop is designed and implemented in both 0.13µm and 0.35µm standard CMOS 
technology. 
 
Fig. 2.19 CMOS Implemented DFF in toggle configuration 
 
The circled minimum sized NMOS MN4 is added as shown in Fig. 2.19. It serves a 






0, transistor MP4 is switched on. The channel charges in PMOS MP5 are passed down to 
node y1 which causes glitches at node y1 that is high undesirable if the TFF is operating 
at high frequencies. Transistor MN4 forms a negative feedback and creates a discharging 




Fig. 2.20 Structure of an N bit asynchronous counter 
For verification purpose, an N bit digital counter is implemented using the implemented 








Fig. 2.22 Post layout simulation results for counter implemented in standard 0.13µm CMOS 
 
As shown in Fig. 2.22, post layout simulation in 0.13um standard CMOS technology 
verifies the capability of the counter in capturing narrow pulses down to about 100ps. 
This could be inspiring as the width of the UWB pulse peaks in 3-5 GHz has a width 
greater than 100ps. Thus, the pulse peaks can be captured with the high speed pulse 
triggered flip flop implemented with the above mentioned techniques.  
 
A 4-bit counter is also implemented with 0.35 µm standard CMOS technology and the 





Fig. 2.23 Measurement result of high speed DFF in toggle configuration with 200ps wide pulses input 
Fig. 2.23 shows the measurement result of a 4 bit counter implemented with high speed 
TFF with 200p-wide pulse as clock input. The measurement results further confirms the 
functionality of the implemented DFF. 
 
We have introduced a systematic approach for the implementation of a high speed flip 
flop. A very trivial but yet critical observation about the loading of the gate capacitance is 
also discussed. With the aid of the systematic approach, a novel flip flop is designed and 
implemented in standard CMOS technology. Both simulation and measurement results 




With the high speed pulse triggered flip flop, it only indicates the presence of a pulse but 
does not reveal anything on the transmitted data pattern in a UWB transceiver system. In 
the following section, a novel pulse capture block will be introduced which successfully 
recovered the transmitted pulse pattern. 
 
 
2.3 A Novel Pulse Capture Block 
 
 
2.31 Proposed structure of a novel pulse capture block 
 
A data flip flop merely stores the data while a toggle flip flop changes its logic state once 
it is triggered. However, in practical data communication, data pattern is random. For a 
DFF, the current data will be overwritten by the next available data.  
 
However, for a toggle flip flop, we know there is incoming data when its logical states 
changes. If we adopt on off keying modulation scheme, a data bit ‘1’ will trigger the TFF 
to toggle while a data bit ‘0’ will be transparent to the TFF.  
 
With this observation, intuitively, one should check if there is a toggle in the TFF logic 
state to identify the presence of a data bit ‘1’. This could be done by comparing the 
previous and current logic state. If they are different, it indicates that the receiver has 
recover a data bit ‘1’. If the previous and current state of the TFF is the same, it indicates 
no data or a data bit ‘0’ is received. To distinguish if it is the case of no data transmission 
  
62 
or a data bit ‘0’, we would require some sampling mechanism at regular intervals to 
check the availability of data. 
 
Fig. 2.24 Proposed pulse capture block 
 
With such design goals in mind, a pulse captured block is proposed and shown in Fig. 
2.24. It consists of a high speed pulse triggered TFF, a XNOR gate and two standard DFF. 
Standard DFF and XNOR gate could be readily obtained from the standard CMOS 
technology library as they will only need to operate at data rate.  
 
Sampling clock running at data rate is required for two reasons. Firstly, since OOK 
modulation is used, one will not be able to distinguish a data bit ‘0’ with idle time 
whereby there is no transmission. Thus, a sampling clock checks the data at regular data 
rate intervals to distinguish a data bit ‘0’. Secondly, since XNOR operation is carried out 
asynchronously, we need to synchronize the output of XNOR. Thus, the sampling DFF is 




UWB pulses might not be processed directly by a simple high speed TFF due to its weak 
amplitude or its shape. Thus, it is fed into the threshold detector first as mentioned in 
Chapter II. After the threshold detector, UWB pulses are digitized to narrow pulses which 
could be readily captured by the high speed TFF. 
 
 
Fig. 2.25 Post layout simulation result for Pulse Capture block 
 
The operation principle of the pulse capture block is illustrated in Fig. 2.25. The input 
digital pulse data is actually the quantized UWB pulses. A data bit ‘1’ correspond to one 
narrow pulse while data bit ‘0’ is just transparent.  Every pulse from the threshold 
detector would trigger a toggle in the DFF in toggle configuration. Thus, the TFF output 
is toggled once there is a presence of data bit ‘1’. A standard DFF would track the 
  
64 
previous state of the high speed TFF by a sampling clock at the know data rate. A 
comparison is done using logic XNOR gate to check for the difference between the 
current and previous state. A change between the previous and current state indicates the 
presence of a pulse and it is being captured by the high speed TFF for OOK modulation. 
Finally, the circuit is synchronized by a sampling DFF running at data rate. As such, the 
input return to zero data are converted and fully recovered to non return to zero data. 
 
Meanwhile, multi-threshold could also be used whereby one threshold level requires one 
threshold detector branch at very little extra area and power cost.  
 
The pulse capture block takes a passive role to wait for the pulse to arrive rather than 
exhaustively search for the pulse. Toggle flip flop only consumes power if the preceding 
threshold detector is triggered. Since the duty cycle of UWB pulses are very small, one 
would easily foresee that the power consumption would be very small as the circuit is 
idle most of the time.  
 
The UWB pulses are transmitted at radio frequency wideband while after being processed 
by early quantization and pulse capture block; we obtained the raw data at data rate. 
Thus, we can justify that the pulse capture block performs a direct down conversion of 
the incoming pulse from RF to baseband data rate which substantially eases the 





2.32 Layout and Some Measurement Results 
 
 
Fig. 2.26 Layout snapshot of the proposed pulse capture block 
 
Both the TFF and the pulse capture blocked are fabricated using standard CMOS 0.35µm 




Fig. 2.27 Measurement result of threshold detector and TFF output with 1ns pulse input 
 
To simulate the sub 1G UWB pulses, we used Agilent pulse generator to generate 1ns 
width digital pulses which also occupy a frequency band of approximately 1 GHz. Fig. 
2.27 shows the measurement result of the threshold detector in cascade with the high 
speed TFF. The input is 100mV p-p RZ narrow data pulses. The 100mV p-p amplitude 
serves to emulate the path loss as the amplitude received at the receiver is much smaller 
compared to the transmitted pulse amplitude. Fig. 2.27 shows the TFF output where it 







Fig. 2.28 Measurement result for pulse capture block showing conversion of 100mV p-p RZ OOK pulse 
data to full rail NRZ data 
 
Fig. 2.28 shows the output of the pulse capture block when 100mv p-p RZ narrow data 
pulses are converted to NRZ data. The input is the complementary RZ data from Agilent 
3GHz pulse generator. Due to its limitation in data mode, we are only able to have the 
50% duty cycle while actual data duty cycle should be very low. We made the 
justification as following: The high speed TFF is the most crucial component while the 
rest of the cells could be obtained readily from technology library. In order to 
demonstrate the capability of converting UWB pulse RZ data of extreme low duty cycle 
to NRZ data at data rate, output from the TFF in the pulse capture block is measured 
when we separately provide narrow pulses input of 1ns. Since the functionality of the 
  
68 
high speed TFF is confirmed by measurement result, the feasibility of the proposed pulse 
capture block is justified. 
  
To further confirm the functionality of the proposed pulse capture block with the limited 
testing equipment, the following testing method is used. 
 
The input pulse is set at a regular frequency while the pulse width is set at 1ns. If the 
local sampling clock runs at the same speed as the pulse frequency, the captured data 
would be a continuous sequence of ‘1’s. However, if the local sampling clock is set at 
twice the pulse frequency, effectively, the pulse capture block will treat the interval 
between two consecutive ‘1’s to be data bit ‘0’s. As such, the decoded data would be 
similar to a clock signal featuring ‘1’s and ‘0’s in alternate pattern. 
 
 
Fig. 2.29 Measurement results for receiver to decode alternate ‘1’s and ‘0’s 
  
69 
The measurement result in Fig. 2.29 illustrates the testing algorithm and the pulse capture 
















































In the previous Chapter, a proposed pulse capture block directly decodes the transmitted 
data pattern. However, one could never neglect the difference between ideal cases and 
practical cases. In the simulation and measurement carried on the proposed pulse capture 
block, the presence of noise is completely ignored. This is not the case as actually 
communication system suffers noises corruption and also interferences from other 
communication systems working in the same frequency band. 
 
A simple single level threshold detection scheme is used and one could easily see that the 
performance of the threshold detector is significantly inferior to correlator type of 
receiver. Threshold detection mainly relies on level detection while correlator receiver 
relies on energy detection. In the event a pulse is corrupted a noisy channel, a data bit ‘1’ 
could be missed while some spurious noise could exceed the threshold level which lead 
to an erroneous detection of a data bit ‘1’. While a unified model for threshold detector is 
not yet available, it would be trivial to simulate BER performance. Instead, one could 
assign a fixed error probability p is assigned to be the probability that one pulse is 
corrupted by noise and wrongly identified. Without any modulation techniques, the error 







3.1 DSSS technique 
 
One popular modulation technique in telecommunication system is the DSSS which is 
known as the direct sequence spreading spectrum technique. DSSS has been used widely 
in CDMA system which allows the multiple accesses for multiple users. In DSSS, data 
stream is multiplied (encoded) by a pseudo noise PN sequence before it is transmitted. 
This noise signal is a pseudorandom sequence of 1 and −1 values, at a frequency much 
higher than that of the original signal, thereby spreading the energy of the original signal 
into a much wider band. 
 
DSSS is used in CDMA system mainly to allow multiple access since each user is 
assigned a distinct PN sequence that would appear as noise to other users upon 
correlation. However, in our synchronization scheme, DSSS will be used to reduce the 
decoding error probability of the received data as well as facilitation of the proposed 
novel synchronization scheme. However, instead of an ordinary PN sequence, a sequence 
of good autocorrelation property is used which is known as the barker sequence. 
 
3.2 Barker sequence 
 
Fig. 3.1 Barker code autocorrelation illustration 
  
72 
Baker Code is among those codes with superior autocorrelation property. The bitwise 
shifted version of the Barker Code has poor autocorrelation sum with its original version. 
One can easy verify that all the bitwise shifted versions have a correlation sum of only 5 
for Baker Code with length 11. Fig. 3.1 illustrates the correlation process by simple 
digital logic XNOR gates. 
 
One could also choose barked code length of 7 for data encoding but in our 
implementation, a sequence length of 11 is chosen and the system is designed according 
to the 11 bit barker code modulation. A single data bit would be encoded into a 
corresponding bark sequence of length 11. 
 
Fig. 3.2 Barker code template correlation with corrupted received sequence 
 
The synchronization scheme should provide certain degree of tolerance for noise. 
Perfect correlation sum of 11 is not achievable in realistic channels. Upon 
synchronization, if we allow the correlation sum to have a threshold of 8 to distinguish a 
valid data, meaning we allow 3 bits inside the barker sequence to differ from the original 
sequence,  the decoding error probability would drop to p3 (p<1). As shown in Fig. 3.2, 
  
73 
there are 3 bits in the sequence that is wrongly identified which gives a correlation sum of 
6. This is still perceived as a valid data received as we have allowed a tolerance of 3. The 
tolerance could be set by simple digital logic to cater to different channel conditions. 
 
3.21 Choice of Barker Code 
 
For barker sequence of length 11, the only available sequence is 11100010010. Thus, we 
choose this sequence to encode a data bit ‘0’. For a data bit ‘1’, a simple choice would be 
a cyclic shift version of the initial sequence. For instance, one could choose 11000100101 
to encode a data bit ‘1’ since the cyclic shifted version will have poor correlation with its 
initial sequence which make data bit ‘1’ and ‘0’ distinguishable from one and another. 
 
However, the choice is not that simple and straightforward. The choice of barker 
sequence will be revisited in the next Chapter whereby synchronization scheme is 
presented. 
 
3.3 Proposed Synchronization Scheme 
 
3.31 The Search Mechanism 
 
Barker sequence, with its superior autocorrelation property, could be used as a PN 
sequence similar to the case in CDMA system. However, in our implementation, barker 
sequence serves to enhance the error correcting capability which enhances the simple 
threshold detection scheme.  
  
74 
Before synchronization, the receiver has no prior information about the start of a 
sequence as the transmitter and receiver are not synchronous as they could not be 
switched on simultaneously at one instant. In barker code modulation (BCM), each data 
bit is multiplied by a corresponding barker sequence before it is transmitted. The receiver 
may miss the 1st bit or a few bits in the barker sequence. However, if we are able to 
perform a search algorithm that looks for a perfect match of the sequence, we essentially 
find the starting bit in a sequence. This is when the correlation property of barker code 
comes in. As discussed earlier, the cyclic shift of a barker code has poor correlation sum 
with its original sequence. In the case when any bits in the sequence is missed by the 
receiver, the receiver will continuously search bits by bits until it perfectly aligns with the 




Fig. 3.3 Search algorithm using barker sequence correlation 
……    1    1    0    0    0    1    0    0    1    0    1    1    1    0    0    0    1    0    0    1    0 
First N bits missed and 
receiver start searching 
here 
Correct data frame for 
data bit 0 located 
Digital 
correlation to 
check 11 bits 
Check next 
possible location 
Correlation sum smaller 




As shown in Fig. 3.3, suppose the receiver is on after the transmitted has transmitted a 
few bits in a valid baker sequence. In such a case, the receiver will slide bit by bit and 
correlation sum is monitored. Once the receiver has found the correct timing, the starting 
location of a valid data bit is confirmed and the subsequent decoding could then start. 
 
3.32 Synchronization Algorithm 
 
 The challenge of synchronization in UWB systems comes naturally from the high timing 
resolution required due to the extreme narrow pulses transmitted. In traditional UWB IR 
system, exhaustive timing search algorithm is performed. In the case of delay line 
searching in [10], it could take many clock cycles before synchronization is acquired due 
to the ultra small duty cycle of a UWB pulse. In the case of a crude RAKE search in [12], 
the hardware overhead is too costly to build in for a low power mobile device. 
 
However, in the proposed implementation, no exhaustive timing searching is required as 
the pulse capture block waits for the UWB pulse and it only consumes power when a 
pulse is received due to the asynchronous nature. The BCM proposed not only enhances 
the error correction in decoding, but also facilitates the bit wise slide search mechanism 
to locate the starting bit of a valid barker code sequence. 
 
One trade off spread spectrum technique is that the actual bit rate for transmission has to 
be a multiple N of the data rate where N is the length of the sequence used. In our 
  
76 
implementation, the bit rate would be 11 times the effective date rate since length 11 
barker code modulation is adopted 
 
With the above algorithm in mind, the following flowchart in Fig. 3.4 illustrates the 
implementation of the algorithm. 
 
 

















Fig. 3.4 Receiver synchronization algorithm flowchart 
 
 
3.33 Declaration of synchronization 
 
 
In our implementation, we declare that synchronization is achieved after two consecutive 
valid data bit ‘0’ is received. Some arguments may arise that what if the noise happens to 
trigger the threshold detector to give the exact pattern as the barker sequence chosen for a 











for bit ‘0’ 
Bitwise 
Correlation 

















We provide the following justifications: 
 
1. If one would to compute the probability that the noise pattern being exactly the 
same as the barker sequence, the highest probability of that would be 0.511. As 
mentioned earlier, since noise tolerance is allowed for the proposed scheme, the 
threshold correlation sum one could use might be 6. This would translate to a 
probability of 0.58 which is still very small. 
 
2. Furthermore, the synchronization is only declared after two consecutive valid data 
bit ‘0’s are received. This translates to an even smaller probability of false 
synchronization declaration. 
 
3.4 Bitwise Correlator 
 
 
Fig. 3.5 Gate level schematic for a XNOR gate 
 
The bitwise correlation could be done with a simple XNOR gate. As shown in Fig. 3.5, 






A B A XNOR B 
0 0 1 
0 1 0 
1 0 0 
1 1 1 
 
Table 1 XNOR truth table 
 
XNOR checks the logic state between its two inputs. Logic one is given if the two states 
are the same and a logic zero is output when the two states are different. Since the 
correlation is done at the bit rate which is not very high, this simple digital gate could be 
obtained directly from the technology library. 
 
3.5 Power saving feature 
 
One observation for this correlation process is that before synchronization is achieved, 
the correlation needs to be at the bit rate which is 11 times the effective data rate. This 
could cause power overhead as the logically operations are running at 11 times of the 
data rate. In fact, after synchronization is achieved, it is redundant to perform bitwise 
correlation at the bit rate. The correlation could be performed at effective data rate to 
check the synchronization status. This could save significant power as the power 
overhead is large if digital correlation is performed at a much higher rate. 
 
In order to achieve this power saving feature, it is necessary to have some modifications 




Firstly, since the aim is to achieve a digital correlation in one shot after synchronization is 
achieved, it is essential to latch the whole received sequence of length 11. Thus, storage 
registers are required. 
 
Secondly, some digital signal processing block must be present to determine the correct 
interval to perform the bitwise correlation. Some feedback mechanism should be 
provided to latch the receive sequence and pass the whole sequence of length 11 to the 
digital correlation block at the correct timing interval. 
 
Thirdly, synchronization is declared after two consecutive valid data bit ‘0’s are received. 
Before synchronization, digital correlation for data bit ‘0’ is done at bit rate while after 
synchronization, it is done at a much lower data rate. Since they are performed at 
different rate, two paths must be provided for digital correlation for data bit ‘0’. In order 
to save hardware overhead, the optimum solution is to have a multiplexer that selectively 










With the above design consideration, a modified version of the synchronization structure 
is proposed. As shown in Fig. 3.6, feedback is provided by the DSP block to the storage 
register so that it passes the entire 11 bits at the correct interval for digital correlation. 
This feedback signal is denoted as End of Data (EOD) flag. Also, the DSP block should 
provide a data Validity Flag which serves as a synchronization indicator. This validity 
flag serves also as a selection input to the multiplexer to selectively pass the data stream 





Fig. 3.7 Block diagram for implemented receiver in Cadence 
 
Fig. 3.7 illustrates the entire receiver architecture without the front end low noise 




3.6 The DSP Block 
 
The DSP block should provide feedback to the storage registers as well as the multiplexer 
as discussed earlier. Also, it should have a functional block that computes the degree of 






3.61 The Binary Merge Adder 
 
Since the digital correlation is done by XNOR gate, thus the outputs are binary 0s and 1s 
which needs to be added to each other to compute the correlation sum. The correlation 





Fig. 3.8 Cascading to form a 5 bit ripple carry adder 
 
 
The largest possible correlation sum is decimal 11 which is 1011 in binary format. Thus, 
the largest adder required is a 4-bit adder. Ripple carry adder has been adopted for its 
simplicity in implementation.  
 
As illustrated in Fig. 3.8, a total five 1-bit half adder, three 2-bit adders, one 3-bit and one 






Fig. 3.9 Gate level schematic for a 1bit half adder 
 
 
The basic building block of a 1-bit adder is shown in Fig. 3.9. The design of this adder is 
straightforward as it is not required to perform at very high speed. Before synchronization, 
this 5-bit adder works at the bit rate while after synchronization, this adder only works at 
the data rate which is 11 times slower than the bit rate. Thus, the proposed 
synchronization implementation not only saves power in the digital correlation blocks, 
but also in the adders. 
 
3.62 Threshold Select Block 
 
As mentioned in the previous Chapter, it is practically impossible to get perfect 
correlation sum of 11 in realistic channels due to not only noise corruptions but also 
interferences from other communication systems. 
 
Thus, it is mandatory to provide some degree of allowance for the received bit pattern to 
differ slightly but yet still regard it as a valid data. Such a degree of tolerance is defined 
as the synchronization threshold.  
  
84 
In terms of hardware, it is simple to implement based on the threshold that one selects for 
a specific channel. In our implementation, a threshold select block is implemented to 
cater for a threshold of 10 or 8 based on the logic value of the threshold select input. A 1-
bit threshold select input allows a switching between 2 threshold levels. A lower 
threshold could be more suitable for a noisier channel. This threshold could be tuned by 
providing an external input. Also, it could be implemented as auto tuning based on the 
channel estimation performed by the receiver. More threshold levels could be 
accommodated with slightly increase in complexity since it is purely digital 
implementation. 
 
Fig. 3.10 Schematic of correlation threshold select block 
 
 
Fig. 3.10 shows the implementation of threshold selection function. The input BIT3 and 
BIT1 are from the correlation sum in binary format. As one can see, if Thresh_Sel is set 
at logic 0, the value of BIT3 will pass to the output. This would indicate that we have 
selected a threshold which is at least 8 since x1xx (x denotes don’t care) in binary format 
is at least 6. If Thresh_Sel is set at logic 1, the value of logic operation (BIT3 AND BIT1) 
will pass to the output. This would indicate that we have selected a threshold which is at 




The threshold select block outputs logic 1 when the correlation sum exceeds the preset 
level. Two threshold select blocks are required for data bit ‘0’ and ‘1’ respectively. One 
could easily modify the above circuit to cater for different thresholds. 
 




Fig. 3.11 Schematic of the synchronization core block 
 
 
Fig. 3.11 shows the schematic of the proposed synchronization core. It consists of a 








3.71 Synchronization Comparator 
 
Based on the output of threshold select block, a synchronization comparator is needed to 
decide if synchronization is achieved. This could be easily done based on the threshold 
select block outputs. If both blocks give logic 0, it indicates neither a valid bit 0 nor a 
valid bit 1 is received. Thus, synchronization is not achieved. 
 
The synchronization comparator should provide a logical output to the next stage 
indicating if a valid data bit is received. If a bit 0 is received before synchronization, pre- 
synchronization should be declared as the next valid data bit 0 would declare a true 
synchronization. This pre-synchronization indicator is useful as it also allows the 
implementation of frame searching. Since pre-synchronization indicates the possible 
presence of a valid data at present timing, we could then locate the timing of the next 
possible valid data bit. This facilitates the implementation of digital correlation and 
adding at data rate rather than the bit rate which is 11 times higher. 
 
After synchronization, the synchronization comparator blocks also serves to check if the 
synchronization status is maintained at each data interval. Thus, synchronization tracking 
is provided as the synchronization status is monitored at runtime. Also, it does not incur 
any extra hardware cost for providing the tracking function as a bonus. 
 
One point should be noted that this synchronization block should have the ability to 
distinguish if the receiver is in a state before synchronization or after. This is required as 
  
87 
the requirement for synchronization declaration is to have two consecutive valid data bit 
‘0’s. Thus before synchronization, if a data bit 1 is received after a data bit 0, meaning a 
sequence of data 01, it should not be considered as synchronization is achieved. Thus, the 
pre-synchronization output should be at logic 0. However, in the case when 
synchronization is already achieved, if a data bit 1 is received after a data bit 0, the pre-
synchronization indicator should be at logic 1 since synchronization is already achieved 
and should be maintained. As such, some feedback must be provided for the 
synchronization comparator to distinguish if the receiver is in a state before 
synchronization or after. 
 
We name this feedback signal as feedback_control which serves as an indicator as if 
synchronization is still maintained and the counter should still be enabled. Pre_Sync is an 
output provided from the synchronization comparator to the ASIC counter in the next 
stage to determine if counter should be enabled. 
 
One can deduce the logic expression: 
  
Pre_Sync = valid data bit 0 + feedback_control + bit1 × feedback_control*. The asterisk 
indicates the previous state of that variable. We interpret the above expression in this way 
that if a valid data bit 0 is received, the counter should be enabled. Also, if feedback 
control is logic 1 or if a valid data is received and the previous state of feedback control is 
logic 1, then the counter should still be enabled. This enables the synchronization 
  
88 
comparator to know the status of synchronization and maintain enabling the counter if 
consecutive valid data bits are received. 
 
 
Fig. 3.12 Gate level schematic of the synchronization comparator 
 
 
Fig. 3.12 shows the realization of the synchronization comparator in Cadence. 
 
3.72 Data Validity block 
 





The data validity block consists of the following functional blocks which are the ASIC 
counter, a counter control block; a data reset block and an EOD indicator block as shown 
in Fig. 3.13. 
 
3.721 The ASIC counter 
 
In order to determine the correct correlation timing interval, the easiest way to implement 














Fig. 3.14 Timing diagram for critical signals in the proposed scheme 
 
Counter Output remain at 
0 before a valid data bit 0 
Count from 
1 to 11 
Count from 
1 to 11 
Count start from 0 when a 
valid data bit 0 is received 
Check is performed here to 




Fig. 3.14 illustrates the timing diagram of some critical signal paths of the proposed 
synchronization scheme. Counter output is initialized to 0 and it should remain at 0 
before any valid data bit 0 is received. Once a valid data bit 0 is received, a Pre_Sync flag 
will become logic 1, indicating that the receiver is entering pre-synchronization stage. 
Once Pre_Sync turns logic one, counter is enabled and the counting process starts in the 
next clock cycle. The counter will start counting from 0001 to 1101. At the count of 0001, 
feedback_control should change to logic 1. This is crucial as before synchronization, End 
of Data is not employed yet. Thus, in the next clock cycle, the correlation sum will fall 
below the threshold since the bits is misaligned. Valid data bit 0 flag will change back to 
logic 0. In order to maintain the enable status of the counter, feedback control needs to be 
changed to logic 1. As such, the counter will remain enabled regardless if the receiver has 
just entered pre-synchronization stage or has maintained the synchronization status. 
 
At a counter output of 1101 which is 11, all 11 bits in an encoded data bit is available and 
correlation check is done during this clock cycle. If the correlation sum exceeds the 
preset threshold, the counter will restart its counting sequence from 1 to 11 again. 
 
Thus, we end up with an ASIC counter that has special counting sequence of 0 to 11 and 



















For the implementation of the required counter, a typical JK flip flop is used as shown in 
Fig. 3.15. Its logical state table is also shown while Q+ means the next state and Q- 
means the previous state of the JK flip flop output. 
 
Q3 Q2 Q1 Q0 Q3+ Q2+ Q1+ Q0+ J3 K3 J2 K2 J1 K1 J0 K0 
0 0 0 0 0 0 0 1 0 x 0 x 0 x 1 x 
0 0 0 1 0 0 1 0 0 x 0 x 1 x x 1 
0 0 1 0 0 0 1 1 0 x 0 x x 0 1 x 
0 0 1 1 0 1 0 0 0 x 1 x x 1 x 1 
0 1 0 0 0 1 0 1 0 x x 0 0 x 1 x 
0 1 0 1 0 1 1 0 0 x x 0 1 x x 1 
0 1 1 0 0 1 1 1 0 x x 0 x 0 1 x 
0 1 1 1 1 0 0 0 1 x x 1 x 1 x 1 
1 0 0 0 1 0 0 1 x 0 0 x 0 x 1 x 
1 0 0 1 1 0 1 0 x 0 0 x 1 x x 1 
1 0 1 0 1 0 1 1 x 0 0 x x 0 1 x 
1 0 1 1 0 0 0 1 x 1 0 x x 1 x 0 
 
Table 2 JK flip flop input derived from the special counting sequence 
 
According to the counting sequence derived earlier, we arrived at the logic state table in 
table.  
 
Thus, we have the following logical expressions 
J3 = Q2×Q1×Q0 and K3 = Q1×Q0 
J K Q+ 
0 0 Q- 
0 1 0 
1 0 1 








J2 = Not_Q3×Q1×Q0 and K2 = Q1×Q0 
J1 = Not_Q1×Q0 and K1 = Q1×Q0 




Fig. 3.16 Gate level schematic of the implemented counter 
 
 
Using the logical expression derived, the counter is implemented according to the 
schematic in Fig. 3.16. 
 
3.722 The End of Data indicator 
 
As mentioned earlier, in order to achieve digital correlation and correlation sum 
computation in one shot, an EOD indicator is required so that the storage registers know 
when they are supposed to latch and pass all 11 ready bits to the digital correlation block.  
EOD indicator block uses the counter output to judge if all 11 bits have been received and 
are ready for collection. Thus, its logic function is easily deduced as  
  
93 
EOD = BIT3×NOT_BIT2×BIT1×BIT0 where BIT0 to BIT3 are the counter outputs. 
Thus, EOD will only be logic 1 when the count reach binary 1011 which is 11 since the 
length of Barker sequence used is 11. 
EOD serves as a sampling clock for the storage registers to fetch in all the 11 ready bits 
from the shift register in one shot. Before the EOD is fed into the storage register, it has 
to provide some delay margin to ensure that after the rising edge of the clock signal, the 
data is already latched by the shift registers before it is sampled by the storage registers. 
 
EOD is also useful in data decoding block as the data is only valid after EOD is declared. 
Another delay must be provided as EOD will become logic 1 immediately after a counter 
delay when the clock rises. However, data received needs to be latched, stored and 
compared with the template Barker sequence. Thus, a sufficient delay must be provided 
to ensure EOD to the data decoding block is declared only after all the above mentioned 
logic operations are completed. 
 
3.723 Data Reset Block 
 
 
Fig. 3.17 Gate level schematic for data reset block 
 
 
Received data sequence in the shift register needs to be reset on a regular basis. Once a 
sequence is received and compared with the pre-stored Barker sequence, it needs to be 
  
94 
cleared before the next sequence arrives. Thus, it is chosen to be cleared in count 1 while 
the previous data sequence has already been used and is no longer useful. Reset signal is 
active low in the shift registers and thus, the derived logic expression for DATA_RST is  
Bit3 + Bit2 + Bit1 + Not_Bit0 + clk_Bar which resets the shift registers during the 
positive clock cycle in count 1. 
 
3.724 Counter control block 
 
As mentioned earlier, the synchronization comparator block requires some form of 
feedback to ensure the counter in the next stage is enabled when synchronization is 
maintained. Pre_Sync from the synchronization comparator serves to enable the counter 
while the count is increasing before all 11 bits in a Barker sequence is received. Thus, 
once the counter is enabled, it should always count to 11 before any decision could be 
made. Based on the counter’s output, before it reaches a binary count of 1011, the counter 
should always be enabled.  
 
Since Pre_Sync will be logic 1 as long as feedback_control is logic 1. Thus, we can 
derive the following truth table for feedback_control 
 
 B3 B2 
B1 B0 0  0 0  1 1  1 1  0 
0  0 0 1 1 1 
0  1 1 1 1 1 
1  1 1 1 1 0 
1  0 1 1 1 1 
 





When the counter output is 0, it indicates that the counter has not been enabled yet, thus 
the feedback_control should remain at 0. While for other counter output, the counter has 
to remain increasing its count, thus, the feedback_control should be at logic 1. When the 
counter reaches a count of 1011 which is 11, it should be set to logic 0 as Pre_Sync now 
should rely on if the data received is valid to determine the synchronization status and if 
the counter should still remain enabled. 
 
3.8 Data decoder block 
 
 
Fig. 3.18 Gate level schematic for data decoder block 
 
 
Fig. 3.18 shows the schematic diagram of the data decoder. The EOD (END) is delayed 
as mentioned earlier. Data Bit0 and Bit1 are latched by EOD.  
 
Bit 0 Received Bit 1 Received Output Data Validity 
0 0 0 0 
0 1 1 1 
1 0 0 1 
1 1 0 0 
 




Table shows the truth table of the data decoding process and thus, 
Output Data = Bit1×NOT_Bit0 
Validity = Bit1 + Bit0 
 
3.9 The choice of Barker sequence revisited 
 
As discussed in the previous Chapter, since for Barker sequence of length 11, there is 
only one sequence available, thus one could choose its initial sequence for data bit ‘0’ 
and its cyclic shift version for data bit ‘1’. 
 
Consider the case when a bit ‘1’ is transmitted immediately after a bit ‘0’ is transmitted. 
If we choose 11100010010 to encode bit ‘1’ and 11000100101 to encode bit ‘0’, we 
would have a sequence of 1110001001011000100101 for a data sequence of  ‘01’. 
 
 
Fig. 3.19 Illustration of possible wrong location of data frame 
 
 
1    1    1    0    0    0    1    0    0    1    0    1    1    0    0    0    1    0    0    1    0    1 
1
st
 bit missed and receiver 
start searching here 
Correct data frame for 
data bit 0 
Correct data frame for 
data bit 1 
Wrong data frame decoded as a 
data bit 0 as correlation sum is 10 
  
97 
As shown in Fig. 3.19, if the receiver missed some bits in the first sequence, when the 
receiver starts to locate the starting bit of a valid bit ‘0’, since it has no prior knowledge 
to the correct frame window, it could decode a data bit 0 in the wrong time frame as 
shown in Fig. 3.19. 
 
To solve this problem, a novel solution is proposed. Since bit pattern 11100010010 is 
used to encode a bit ‘0’, if we negate every bit in the bit pattern, we would end up with 
00011101101 which is another Barker sequence that has the autocorrelation property. 
Since the bitwise correlation between sequence 11100010010 and 00011101101 is zero, 
we would expect the cyclic shift of 11100010010 to have a correlation sum of 6 with 
00011101101. For instance, one may easily verify that for 011000100101 which is a 
cyclic shift of 11100010010, it has a correlation of only 6 with 00011101101.  
 
3.10 Proposed receiver structure and layout diagram 
 
Thus, we could avoid the above mentioned issue and we conclude that 11100010010 is 









The above receiver architecture in Fig. 3.20 is implemented in standard CMOS 0.35µm 
technology. The layout dimension is only 570µm by 335µm which translate to a silicon 























Fig. 4.1 Simulation input to the synchronization scheme 
 
Fig. 4.1 shows the output from the threshold detector which also serves as input pulses to 
the synchronization scheme. The pulse width is at about 1ns which corresponded to our 


















Fig. 4.2 shows the post layout simulated outputs from the storage shift register (b0 to b10) 
when pulse data input is modulated by barker sequence. The input pulse is return to zero 
(RZ) data while the output from the shift registers is non return to zero (NRZ) data. As 
illustrated above, the pulse input at radio frequency is successfully converted to NRZ data 









4.2 Simulation for synchronization and re-synchronization  
 
 
Fig. 4.3 Post layout simulation result for proposed synchronization architecture 
 
 
Fig. 4.3 shows the post layout simulation result for the synchronization scheme 
implemented in 0.35µm standard CMOS technology. The input data are barker code 
modulated. First valid data bit ‘0’ triggers the receiver into a pre-synchronization state 
and a subsequent data bit ‘0’ causes the Data Validity flag to become logic 1 which 
indicates the successful synchronization. Upon synchronization, the DSP in the 
synchronization scheme decodes the received data while monitoring the synchronization 







Fig. 4.4 Post layout simulation result showing sync. lost and re-sync. 
 
While the data pattern is missing or falls below the preset correlation threshold for the 
channel, a lost of synchronization is declared. As shown in Fig. 4.4, when there are 
missing tone between the two data bits, a lost of synchronization is indicated by the Data 
Validity flag. This proves the synchronization tracking capability during runtime and no 
extra hardware or software overhead is required. 
 
When synchronization is lost, the receiver will enter into a search state again to wait for a 
valid data bit 0 to trigger it back into pre-synchronization state again. Once a valid data is 
received after a valid data 0, the receiver re-synchronizes and data decoding resumes 
while Data Validity flag returns to logic 1 again. As confirmed by post layout simulation 
  
103 
result, the proposed synchronization scheme provides re-synchronization capability 
making use of the existing data pattern. No repeated preamble is required for 
synchronization tracking and re-synchronization after synchronization lost.  
 




Fig. 4.5 Layout of the fabricated PCB using Altium Designer 
 
The fabricated chip is mounted on a FR4 printed circuit board as shown in Fig. 4.5. FR4 
is capable of handling frequency at sub 1GHz. The connectors used are merely SMA 
connectors for RF application. 
  
104 





Fig. 4.6 Measurement result for the proposed synchronization scheme 
 
 
In order to test the functionality of the fabricated chip, logic analyzer is used for barker 
code programming due to the absence of a Barked code encoding transmitter. As 
illustrated in Fig. 4.6, measurement results further confirms the feasibility of the 
proposed scheme. The synchronization, tracking as well as re-synchronization 






4.5 Merits of the proposed synchronization scheme 
 
1. Low power (Post layout simulation shows average current consumption is 130uA 
with supply of 3V using 0.35µm AMS technology) If implemented in 0.13um 
IBM technology with supply of 1V, even lower power can be achieved. 
 
2. Fast Synchronization (Worst case synchronization occurs when first bit in a 
barker sequence is missed, a search time of another 10 bits is required. Thus worst 
case synchronization time is less than one bit data period.) 
 
3. Synchronization tracking capability (Synchronization is tracked during runtime at 
no extra cost, i.e. data received are used to track synchronization) 
 
4. Ability to re-synchronize once synchronization is lost (Synchronization can be 
regained if a data zero is received at the cost of losing one data bit while this data 
bit 0 could be easily recovered by some simple digital algorithm. 
 
In addition, the implementation of End of Data (EOD) has further reduced the power 
consumption as the digital correlation and adding process is performed at a much lower 
rate. Simulation result shows an reduction power of 40% (when EOD is not employed, 
simulated current consumption is about 200µA while employing EOD feature, current is 




Chapter 5 – Conclusion and Future Work 
 
5.1 A possible waveform for sub-1GHz UWB pulse 
 
Since the receiver implementation is proposed merely for a UWB receiver under 1 GHz 
band, a suitable transmitter working under 1 GHz is required to complete the transceiver 
system. As studies in literature reviews, Gaussian derivatives are suitable for UWB pulse 
generation. In search for a suitable pulse which fits the sub 1Ghz UWB mask, 1st , 2nd and 
3rd derivatives for Gaussian function is tested and it is realised that the 2nd derivative of 
Gaussian function could provide a pulse which its spectrum falls completely within the 
sub 1Ghz mask without hitting the GPS band barrier. It has the following general form 
   where A is a scaling factor. 
 








Pxx = periodogram(z) 
  
107 




Fig. 5.1 Time domain 2nd order Gaussian derivative pulse 
 
 
Fig. 5.1 shows the pulse shape of a 2nd order Gaussian derivative with its critical 




Fig. 5.2 Power spectrum of a 2nd order Gaussian derivative pulse 
 
 
For the power spectrum density shown in Fig. 5.2, it is noticed that the peak power 
density is 13.19dB/Hz at about 340 MHz. At about 880 MHz, the power density drops to  
-23.49 dB/Hz. The difference in power density at 880 MHz from peak power density is 
about -38dB/Hz. As illustrated in Fig. 1.1 in the introduction, for frequency band below 
960Mhz, the required power density back off at 960Mhz from the peak power density is 
about -35 dBm/Hz. Thus, the simulated 2nd order Gaussian pulse could be a suitable 
candidate which satisfies the power spectrum requirement below 1 GHz as long as the 
pulse is scaled by an appropriate factor A. This pulse could be synthesized digitally 
according to the pulse shape in Fig. 5.1 with digital pull up and down technique. If we 





5.2 Another possible modulation using BPSK 
 
Also, one may observe in our implementation, simple on off keying is assumed which 
facilitates the possible use of a single level threshold detection mechanism. Simple OOK 
system allows simple implementation while its performance might not be comparable to 















Fig. 5.3 BPSK modulation showing binary phase 
 
 
For BPSK shown in Fig. 5.3, since it involves one positive and one negative peak, dual 
level threshold detection could be implemented. This could reduce error probability as 
the probability of noise creating the dual peak pattern is definitely lower than the 
probability of it creating a spark which crosses a single threshold level. Furthermore, the 
relative timing between the positive and negative peaks could be utilized to differentiate 
noise and a valid data pattern. 
 
 




5.3 Automatic threshold adjustment 
 
One may also notice that the threshold detector adjustment in the current implementation 
is manually done through an input pin. In fact, the threshold adjustment could be done 
automatically by a Digital to Analog Converter (DAC). 
 
As shown in Fig. 3.6, the synchronization DSP provides a data validity flag which 
indicates the synchronization status. Before synchronization is achieved, the Validity flag 
remains as zero. A simple algorithm to control a DAC would be based on the logical 
level of the Validity flag. If its logical level is 0, a cyclic increment could be applied to 
the digital input to a DAC to change its output. The output of the DAC could then be 
used to set the threshold level in the threshold detector. Once a correct level is set, upon 
receiving two valid data bit 0s, synchronization will be declared and the validity flag 
would turn to logic 1. Once its logic level is 1, the digital input to the DAC would be 
fixed and in turn, the analog output of the DAC would remain as it provides a correct 
threshold level based on the current channel conditions. 
 
5.4 Intermittent LNA operation 
 
The most power consuming block in the receiver could be the low noise amplifier block. 
Since the duty cycle of the received UWB pulse is extremely low, it is not necessary to 
keep the LNA in on state all the time. Once synchronization is achieved, the DSP could 
provide a feedback signal to switch LNA on and off to further reduce the power 
  
111 
consumption. This intermittent operation could also be applied to threshold detector as 
the top PMOS load could also be switched between VDD and GND. 
 
5.5 Automatic channel threshold selection 
 
The correlation sum threshold could also be automatically tuned to adapt to different 
channel conditions. A noisier channel would have more distortions to the received data 
pattern and thus, the receiver could tune into a lower threshold sum. While for a quite 
channel, the receiver could choose a higher threshold sum to allow more reliable 
decoding. 
 
5.6 Multi-finger for multipath energy harvesting 
 
In some office environment, multipath components may have high magnitude which 
could enhance the receiver performance. Multipath phenomenon has been exploited in 
RAKE system whereby multiple fingers rake in the received pulse energy from different 
multipath components. This concept could also be used in our implementation as one 
could use more than one branch of the receiver whereby each branch is preset at a 






Fig. 5.4 Possible implementation of multiple branches to harvest multipath energy 
 
 
As shown in Fig. 5.4, front end analog amplifier could be shared thus no significant 





The obligatory UWB power spectrum requirements necessitate the transmission of ultra 
narrow pulses which create challenges for power efficient designs. This work presents a 
novel synchronization scheme that base on early quantization and Baker Codes 
autocorrelation property for UWB impulse radio system. A receiver synchronization 
architecture using OOK and Barker Code modulation is proposed. Simple threshold 


















circumvents the tedious need of sub-nanosecond resolution for synchronization by 
employing asynchronous pulse capture block. This asynchronous pulse capture block is 
introduced to perform a direct down conversion of the UWB pulses from RF band down 
to baseband. The new scheme provides run time synchronization tracking and 
resynchronization capability. The receiver architecture including threshold detector, pulse 
capture block and baseband synchronization circuitries are implemented in standard 
CMOS 0.35µm process. Measurement and simulation results for threshold detector 
demonstrated the detection of 100mV peak to peak UWB pulses while high speed T flip-
flop used in pulse capture block successfully capture narrow pulses down to a resolution 
of 240ps. The simulated power consumption of the proposed receiver circuit architecture 
is 2.15mW at a data rate of 2Mb/s.  
 
Although the proposed scheme is compatible with higher order Gaussian UWB pulses, 
the preferable UWB transmission band is below 1GHz. The transmission below 1GHz 
band simplifies the transmitted pulse shape which permits the implementation of low 
power and low complexity transceiver architectures. While working in 2.1-10.6GHz band 
could offer very high data rate, the maximum data rate supported by UWB systems below 
1GHz is still sufficient for low speed applications. Thus, the proposed architecture could 











[1] Oncu, Ahmet, Fujishima, Minora, “Low-power CMOS transceiver circuits for 60GHz 
band millimeter-wave impulse radio”, Proceedings of the Asia and South Pacific 
Design Automation Conference, ASP-DAC 2009, p 99-100. 
[2] “FCC notice of proposed rulemaking, revision of part 15 of the commission’s rules 
regarding ultra-wideband transmission systems,” Federal Communications 
Commission (FCC), Washington DC, ET-docket 98-153. 
[3] Stoica, L., Tiuraniemi, S., Rabbachin, A., and Oppermann, I., “An ultra wideband 
TAG circuit transceiver architecture”. IEEE Conf. on Ultra Wideband Sys. and Tech., 
Kyoto, Japan, May 2004, pp. 258–263. 
[4] Kim, H., Park, D., and Joo, Y., “Design of Scholtz’s monocycle pulse generator”, 
IEEE Conf. on Ultra Wideband Sys. and Tech., Reston, VA, USA, Nov. 2003, pp. 
81–84. 
[5] Y. Zheng, K. Wong, M. Asaru, D. Shen, W. Zhao, Y. The, P. Andrew, F. Lin, W. 
Yeoh, R. Singh, “A 0.18µm CMOS Dual-Band UWB Transceiver,”  IEEE 
International Solid-State Circuits Conference 2007, p114-590. 
[6] T. Norimatsu, R. Fujiwara, M. Kokubo, M. Miyazaki, Y. Ookuma, M. Hayakawa, S. 
Kobayashi, N. Koshizuka, K. Sakamura, “A Novel UWB Impulse-radio Transmitter 
with All-digitally-controlled Pulse Generator”, Proceedings of ESSCIRC, Grenoble, 
France, 2005, p267-p270. 
[7] H. Kim and Y. Joo, “Fifth-Derivative Gaussian Pulse Generator for UWB System”, 
2005 IEEE Radio Frequency Integrated Circuits Symposium, p671-p673. 
  
115 
[8] Sheng, H, Orlik, P., Haimovich, A.M., Cimini, L.J., and Zhang, J., “On the spectral 
and power requirements for ultra-wideband” transmission”, IEEE Int. Conf. on 
Communications, Anchorage, AL, USA, March 2003, Vol. 1, pp. 738–743. 
[9] L. Liu, Y. Miyamoto, Z. Zhou, K. Sakaida, R. Jisun, K. Ishida, M. Takamiya, T. 
Sakurai, “A 100Mbps, 0.41mW, DC-960MHz Band Impulse UWB Transceiver in 
90nm CMOS,” Symposium on VLSI Circuits Digest of Technical Papers, VLSIC, 
2008, p 108-109. 
[10]T. Terada, S. Yoshizumi, M. Muqsith, Y. Sanada, T. Kuroda, “A CMOS Ultra-   
Wideband Impulse Radio Transceiver for 1-Mb/s Data Communications and 3.5-cm 
Range Finding,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 4, Apr. 2004. 
[11]D. Shin, W. Yun, H.Lee, Y. Choi, S. Kim,   C. Kim, “A 0.17–1.4GHz low-jitter all 
digital DLL with TDC-based DCC using pulse width detection scheme,” IEEE Solid-
State Circuits Conference, 2008, p82 - p84. 
[12]C. Limbodal, K. Meisal, T.S. Lande, D. Wisland, “A Spatial RAKE-Receiver for 
Real-Time UWB-IR Applications,” IEEE International Conference on Ultra-
Wideband, 2005, p 65-69. 
[13]T. Atit, I. Hiroki, I. Koichi, T. Makoto, S. Takayasu, “1-V 299µW flashing UWB 
transceiver based on double thresholding scheme,” IEEE Symposium on VLSI 






































Layout snapshot of Binary Adder 
 












Layout snapshot of Unity Gain Buffer 
