# REDUCING JITTER UTILISING ADAPTIVE PRE-EMPHASIS FIR FILTER FOR HIGH SPEED SERIAL LINKS by ## Marius Eugene Goosen Submitted in partial fulfilment of the requirements for the degree **Master of Engineering (Microelectronic Engineering)** in the Faculty of Engineering, Built Environment & Information Technology UNIVERSITY OF PRETORIA February 2011 # REDUCING JITTER UTILISING ADAPTIVE PRE-EMPHASIS FIR FILTER FOR HIGH SPEED SERIAL LINKS #### BY MARIUS EUGENE GOOSEN Supervisor: Prof S Sinha Department of Electrical, Electronic & Computer Engineering Degree: MEng. (Microelectronic Engineering) Jitter requirements have become more stringent with higher speed serial communication links. Reducing jitter, with the main focus on reducing data dependant jitter (DDJ), is presented by employing adaptive finite impulse response (FIR) filter pre-emphasis. The adaptive FIR pre-emphasis is implemented in the IBM 7WL 0.18 $\mu$ m SiGe BiCMOS process. SiGe heterojunction bipolar transistors (HBTs) provide high bandwidth, low noise devices which could reduce the total system jitter. The trade-offs between utilising metal oxide semiconductor (MOS) current mode logic (CML) and SiGe bipolar CML are also discussed in comparison with a very high $f_T$ (IBM 8HP process with $f_T$ = 200 GHz) process. A reduction in total system jitter can be achieved by keeping the sub-components of the system jitter constant while optimising the DDJ. High speed CML circuits have been employed to allow data rates in excess of 5 Gb/s to be transmitted whilst still maintaining an internal voltage swing of at least 300 mV. This allows the final FIR filter adaptation scheme to minimise the DDJ within 12.5 % of a unit interval, at a data rate of 5 Gb/s implementing 6 FIR pre-emphasis filter taps, for a worst case copper backplane channel (30" FR-4 channel). The implemented integrated circuit (IC) designed as part of the verification process takes up less than 1 mm<sup>2</sup> of silicon real estate. In this dissertation, SPICE simulation results are presented, as well as the novel IC implementation of the proposed FIR filter adaptation technique as part of the hypothesis verification procedure. The implemented transmitter and receiver were tested for functionality, and showed the successful functional behaviour of all the implemented CML gates associated with the first filter tap. However, due to the slow charge and discharge rate of the pulse generation circuit in both the transmitter and receiver, only the main operational state of the transmitter could be experimentally validated. As a result of the adaptation scheme implemented, the contribution in this research lies in that a designer utilising such an IC can optimise the DDJ, reducing the total system jitter, and hence increasing the data fidelity with minimal effort. **Keywords:** High speed serial links, backplane serial link, jitter, data dependant jitter, inter-symbol interference, FIR pre-emphasis, adaptive pre-emphasis, SiGe, BiCMOS, IBM $7WL\ 0.18\ \mu m\ BiCMOS$ . ## VERMINDERING VAN TYDAFWYKING DEUR GEBRUIK TE MAAK VAN 'N SELF-AANPASBARE VOORAFBEKLEMTOONDE FIR FILTER VIR HOË SPOED SERIE KOMMUNIKASIE #### DEUR MARIUS EUGENE GOOSEN Toesighouer: Prof. S Sinha Departement Elektriese, Elektroniese en Rekenaars-ingenieurswese Graad: MIng. (Mikroelektroniese Ingenieurswese) Tydafwyking spesifikasies het strenger geword met hoër spoed seriekommunikasie koppelinge. Vermindering van die tydafwyking, met die fokus op die vermindering van data afhanklike tydafwyking (DDJ), word aangebied deur gebruik te maak van selfaanpasbare, eindigende impuls respons (FIR) filter voorafbeklemtoning. Die selfaanpasbare FIR voorafbeklemtoning is geïmplementeer in die IBM 7WL 0.18 μm SiGe BiCMOS proses. SiGe hetero-vlak bipolêre transistors (HBTs) voorsien hoë bandwydte, lae ruis komponente om moontlik die totale stelsel tydafwyking te verminder. Die oorwegings tussen die gebruik van metaaloksied halfgeleier (MOS) stroom-mode logika (CML) en bipolêre CML word ook bespreek in vergelyking met 'n baie hoë $f_T$ proses (IBM 8HP proses met $f_T$ = 200 GHz). 'n Vermindering in totale stelsel tydafwyking kan behaal word deur die subkomponente van die stelsel tydafwyking konstant te hou terwyl die DDJ geoptimeer word. Hoë spoed CML stroombane is geïmplementeer om die transmissie van datatempos van meer as 5 Gb/s moontlik te maak en steeds 'n interne spanning swaai van meer as 300 mV te behou. Gevolglik minimeer die finale self-aanpasbare FIR voorafbeklemtonings filter die DDJ tot binne 12.5 % van 'n eenheidsinterval, met die implementasie van 6 FIR voorafbeklemtonings filtertappe en teen 'n datatempo van 5 Gb/s, vir 'n swak koper rugkantkanaal (30" FR-4 kanaal). Die geïmplementeerde geïntegreerde stroombaan (IC) is ontwerp as deel van die verifiëringsproses en beslaan minder as 1 mm² silikon eiendom. In hierdie verhandeling word SPICE simulasie resultate, sowel as die nuwe IC implementasie van die voorgestelde self-aanpasbare FIR filter tegniek as deel van die hipotese verifiëringsproses aangebied. Beide die sender en ontvanger van die stelsel is getoets vir funksionaliteit en het getoon dat die sender CML-stroombane ge-assosieer met die eerste filtertap suksesvol werk. Ongelukkig, as gevolg van lae laai en ontlaai tempos van die puls bronne in beide die sender en die ontvanger kon slegs die hoof toestand van die stelsel eksperimenteel getoets word. As gevolg van die self-aanpasbare skema wat geïmplementeer word, berus die bydrae van die navorsing dat 'n ontwerper wat van so 'n IC gebruik maak, die DDJ kan optimeer. Dit het tot gevolg dat die totale stelsel tydafwyking verminder kan word en sodoende die data integriteit kan verbeter met minimale moeite. **Sleutelwoorde:** Hoë spoed serie kommunikasie, tyddeviasie, data afhanklike tyddeviasie, inter-simbool interferensie, voorafbeklemtoonde FIR filter, self-aanpasbare voorafbeklemtoning, SiGe, BiCMOS, IBM 7WL 0.18 µm BiCMOS. ## **ACKNOWLEDGMENTS** I would firstly like to thank our heavenly Father for my talents and strength to carry this through. I would also like to thank Karen Mostert, my family (Eugene and Trudie Goosen, Frikkie and Liesel Fourie, Roelf and Lydia Mostert) for all their help and encouragement. I would especially like to thank Karen for her proofreading as well. Then, I would like to thank my friends in CEFIM, Jannes Venter, Wayne MacLean, Mladen Božanić, Marnus Weststrate, Johan Schoeman and Christo Janse van Rensburg who have been there with advice whenever I needed it. I would also like to thank my study leader, Prof Saurabh Sinha for his support, time and effort that he put into this. I would also like to thank Mrs Loubser for all the effort she puts in to keep CEFIM operating like it is. I also thank Armscor, the Armaments Corporation of South Africa Ltd, (Act 51 of 2003) and the Council for Scientific and Industrial Research (CSIR) for the financial support for the duration of the dissertation. Without their financial support it would not have been possible to carry out this research. I also thank MOSIS for the approval of our educational wafer run, allowing us to tape-out in testing and validation of the hypothesis under question. In the words of Albert Einstein: Science is the attempt to make the chaotic diversity of our sense-experience correspond to a logically uniform system of thought. ## TABLE OF CONTENTS | ACKNOWLEDGMENTS | VI | |------------------------------------------------------------------|-----| | TABLE OF CONTENTS | VII | | LIST OF ABBREVIATIONS | IX | | CHAPTER 1: INTRODUCTION | 1 | | 1.1 BACKGROUND TO THE RESEARCH | | | 1.2 RESEARCH PROBLEM AND HYPOTHESIS | | | 1.3 JUSTIFICATION FOR THE RESEARCH | 4 | | 1.4 Methodology | 5 | | 1.5 RESEARCH CONTRIBUTION | | | 1.6 OUTLINE OF THE DISSERTATION | | | 1.7 DELIMITATIONS OF THE SCOPE OF THE RESEARCH | | | 1.8 Conclusion | | | CHAPTER 2: LITERATURE REVIEW | | | 2.1 Introduction | | | 2.2 HIGH SPEED SERIAL COMMUNICATION | | | 2.2.1 Overview | | | 2.2.2 Trends and standards | | | 2.2.3 Different types of serial links and data transfer rates | | | 2.2.5 Backplane serial communication links | | | 2.2.6 Backplane channel modelling | | | 2.3 JITTER | | | 2.3.1 Jitter in high speed serial links | | | 2.3.2 Causes of jitter in high speed serial links | | | 2.3.3 Mathematical definitions | | | 2.3.4 Eye diagram analysis | | | 2.3.5 Statistical jitter analysis | | | 2.4 Integrated circuit technology | | | 2.5 REDUCING JITTER IN HIGH SPEED SERIAL COMMUNICATION LINKS | | | 2.5.1 Overcoming data dependant jitter | | | 2.5.2 Equalisation | | | 2.5.3 Pre-emphasis | | | 2.5.5 FIR filter implementation | | | 2.6 CONCLUSION | | | CHAPTER 3: RESEARCH METHODOLOGY | | | | | | 3.1 Introduction | | | 3.2 JUSTIFICATION FOR THE PARADIGM AND METHODOLOGY | | | 3.3 RESEARCH METHODOLOGY AND OUTLINE | | | 3.4.1 Mathematical modelling | | | 3.4.2 Circuit level modelling and simulation | | | 3.4.3 Layout design and verification. | | | 3.5 SPICE MODELS USED IN THE DESIGN | | | 3.6 PACKAGING AND EXPERIMENTAL MEASUREMENT EQUIPMENT | 46 | | 3.7 Measurement setup | 48 | | 3.8 CONCLUSION | 50 | | CHAPTER 4: MATHEMATICAL AND SYSTEMS DESIGN | 51 | | 4.1 Chapter organisation | 51 | | 4.2 MATHEMATICAL MODELLING OF CHANNEL RESPONSE | | | 4.2.1 Package parasitic modelling | | | 4.2.2 Copper backplane channel | | | 4.2.3 Complete channel response | | | 4.3 MATHEMATICAL MODELLING OF ADAPTIVE FIR FILTER IMPLEMENTATION | | | т.э.1 Quantanve aescription ој те итритенича адарианон SCHeme | 9/ | | UNIVERSITEIT VAN PRETORIA<br>UNIVERSITY OF PRETORIA | | |---------------------------------------------------------------------|-----| | YUNIBESITHI YA PRETORIA | | | 4.3.2 Quantitative description of the implemented adaptation scheme | | | 4.3.3 Mathematical simulation results | | | 4.4 BIPOLAR CML VERSUS MOS CML | | | 4.4.1 Background | | | 4.4.3 Rise and fall times | | | 4.5 COMPLETE SYSTEM INTEGRATION | | | 4.6 PILOT SIGNAL GENERATOR | | | 4.7 Adaptive FIR pre-emphasis driver | | | 4.8 Control logic design. | | | 4.9 Receiver design | | | 4.10 CONCLUSION | 87 | | CHAPTER 5: SIMULATION RESULTS | 88 | | 5.1 Introduction | 88 | | 5.2 PILOT SIGNAL GENERATOR | | | 5.3 CONTROL LOGIC | | | 5.4 Receiver | | | 5.5 ADAPTIVE FIR PRE-EMPHASIS DRIVER | | | 5.6 CONCLUSION | | | CHAPTER 6: LAYOUT AND FABRICATION | | | 6.1 Introduction | | | 6.2 CIRCUIT LAYOUTS | | | 6.3 TRANSCEIVER CONFIGURATION | | | 6.4 LAYOUT CONSIDERATIONS | | | 6.5 CONCLUSION | | | CHAPTER 7: EXPERIMENTAL RESULTS | | | 7.1 Introduction | | | 7.2 MANUFACTURING AND MOUNTING | | | 7.3 TRANSMITTER RESULTS | | | 7.4 Receiver results | | | 7.5 POWER DISSIPATION | | | 7.6 CONCLUSION | | | CHAPTER 8: CONCLUSION | | | 8.1 Introduction | | | 8.2 CRITICAL EVALUATION OF THE HYPOTHESIS | | | 8.3 LIMITATIONS AND ASSUMPTIONS | | | | | | REFERENCES | | | APPENDIX A: MATLAB CODE FOR FIR ADAPTATION | 125 | APPENDIX B: DETAILED LAYOUTS OF THE SYSTEM 130 APPENDIX C: DATASHEET FOR THE PRE-EMPHASIS IMPLEMENTATION 136 OVERVIEW 136 PIN LAYOUT AND DESCRIPTION 136 BIASING RESISTORS 138 Transmitter 138 Receiver 138 THRESHOLD VOLTAGE 138 ### LIST OF ABBREVIATIONS ADE Analog design environment AMS Austriamicrosystems BER Bit error rate BERT BER tester BCML Bipolar CML BiCMOS Bipolar complementary metal oxide semiconductor BGA Ball grid array BJT Bipolar junction transistor BUJ Bounded uncorrelated jitter CDF Cumulative distribution function CDR Clock and data recovery CML Current mode logic CMOS Complementary metal oxide semiconductor DAC Digital to analogue converter DCD Duty cycle distortion DDJ Data dependant jitter DFE Decision feedback equaliser DFF D-flip-flop DJ Deterministic jitter DRC Design rule check DSP Digital signal processor DUT Device under test ECL Emitter coupled logic EDA Electronic design automation EMI Electromagnetic interference EOC End of conversion ESD Electro-static discharge FFE Feed forward equaliser FIR Finite impulse response FPGA Field programmable gate array FSF Fractionally spaced filter GaAs Gallium arsenide HBT Heterojunction bipolar transistor HIT-Kit High performance interface toolkit IBM International business machines corporation IC Integrated circuit IEEE Institute of electrical and electronic engineers IIR Infinite impulse response InP Indium phosphide I/O Input/ output IP Internet protocol ISI Inter-symbol interference LMS Least mean squares LSB Least significant bit LTI Linear time invariant LVDS Low voltage differential signalling LVS Layout versus schematic MOS Metal oxide semiconductor MOSIS MOS implementation system MEP MOSIS educational program, MOSFET MOS field effect transistor MPW Multi-project wafer MSE Mean square error NPF Network processing forum NRZ Non-return to zero OIF Optical internetworking forum PAM Pulse amplitude modulation PCB Printed circuit board PCI-SIG Peripheral component interconnect – special interest group. PDF Probability density function PJ Periodic jitter PLL Phase locked loop PPM Pulse position modulation PRBS Pseudo random bit sequence PWM Pulse width modulation QFN Quad flat no-lead RF Radio frequency RJ Random jitter RMS Root mean square ROM Read only memory RX Receiver RZ Return to zero SerDes Serialiser and deserialiser SiGe Silicon germanium SMA Sub-Miniature version A SNR Signal-to-noise ratio SPICE Simulation program with integrated circuit emphasis SSF Symbol space filter TJ Total jitterTX TransmitterUI Unit interval VBIC Vertical bipolar inter-company VCO Voltage controlled oscillator XOR Exclusive-or ## **CHAPTER 1: INTRODUCTION** #### 1.1 Background to the research The popularity of serial communication links has outgrown conventional parallel links due to its high bandwidth implementation ability immune to clock skew between data buses [1]. The reduced pin count of serial links coupled with its high bandwidth capability increases the bandwidth per pin allowing for lower cost and higher component population density on implementation. High speed serial links do however suffer its own set of non-idealities, one of which is jitter. There are various ways to implement serial communication links each set apart by the type of channel implemented. Serial communication link channels can consist of plain copper cable, fibre optic cable or backplane copper channels. A backplane copper channel is the preferred type of transmission media for discrete system implementations and is the focus of this research. Integrated circuit (IC) technology plays an important part in high speed serial communication links. ICs provide high bandwidth circuits and devices with which high speed serial link transceivers can be developed and implemented. Complementary metal oxide semiconductor (CMOS) devices have been the popular choice for implementing high speed serial link transceivers, but improved silicon germanium (SiGe) bipolar CMOS (BiCMOS) processes simultaneously provide high bandwidth, low noise, low base resistance and high current gain which make it a great contender in radio frequency (RF) IC design [2]. Jitter requirements in high speed serial links are becoming increasingly stringent due to the small pulse widths associated with increased data transfer rates. Channel bandwidth limitations restrict current serial links on the maximum achievable data transfer rates and various techniques are being implemented to alleviate or bypass this limitation. Alleviating the channel limitation will prove to be a significant contribution for future high speed serial links over copper channels with data transfer rates in the excess of 10 Gb/s. An example of the stringent jitter requirements is presented in [3]. With a data transfer rate of 10 Gb/s utilising non-return to zero (NRZ) signalling, the root mean square (RMS) value of the total jitter should be below 0.7 ps to achieve a bit error rate (BER) of less than $10^{-12}$ . This jitter requirement is about 10 % of the transmitted pulse width. Data dependant jitter (DDJ) is one of the main contributors to the total system jitter [4]. This type of deterministic jitter (DJ) is caused by the channel bandwidth limitation. The channel bandwidth limitation exhibits a low pass filter response attenuating the different frequency components within the data signal by different amounts. Thus higher data rates result in lower amplitude signals at the receiver as well as received signals exhibiting a long "tail", directly interfering with adjacent bits. This leads to uncertainty in determining the exact pulse edge with regards to the optimal sampling instant. To alleviate the DDJ imposed by the backplane channel, the transmitted data is pre-distorted by a pre-emphasis filter. Figure 1.1 illustrates the subsystems within a typical serialiser and deserialiser (SerDes) serial link and the context of a pre-emphasis filter in the system. Figure 1.1. Subsystems depicting a typical SerDes serial link employing pre-emphasis in the transmitter. Adapted from [5]. Figure 1.1 shows the common subsystems depicting a typical serial communication link. A number *N* low speed data lines are encoded and retimed to be combined into a single serial bit stream. In a system employing NRZ signalling, this bit stream is ready for transmission. Pre-emphasis is employed in the driver just before transmission in order to decrease the inter-symbol interference (ISI) present at the receiver. The data bit stream is then transmitted over the bandwidth limited channel utilising low voltage differential signalling (LVDS). The distorted data with embedded clock is received and equalised to open the eye diagram for sampling to increase the data integrity. The clock is recovered from the data and is used to recover the data with as little as possible erroneous bits. The recovered serial bit stream is de-serialised and decoded into the same amount, *N*, low speed data lines. An 8B/10B transmission method has become a standard for many high speed serial links [5]. The 8B/10B encoding scheme creates a DC balanced bit stream, thus equal amount of positive and negative pulses, preventing any distortion caused by AC coupling [6]. Furthermore, together with scrambling, the encoding ensures at least one signal transition in every five transmitted bits [6]. This provides efficient signal transitions for stable clock and data recovery. #### 1.2 Research problem and hypothesis DDJ caused by backplane channel bandwidth limitations imposes a restriction on the maximum achievable data transfer rates [7]. Alleviating DDJ while keeping other jitter subcomponents small in relation to the data pulse width opens a new possibility of increased data transfer rates or alternatively reduced BERs. Alleviating DDJ by reducing the effect of the bandwidth limitations of the channel has been exploited by various designers [1], [8], [9], [10], [11], [12], [13]. High speed serial links have been widely implemented to achieve high data transfer rates otherwise not achievable by conventional parallel links. Table I illustrates some of the high speed serial links implemented as well as the type of implementation technology used. TABLE I. COMPARISON OF IMPLEMENTED SERIAL LINKS | Reference | Data rate | Technology | Pre-emphasis | |-----------|-----------|-------------|--------------| | [1] | 10 Gb/s | CMOS | 5-tap FIR | | [3] | 10 Gb/s | SiGe BiCMOS | None | | [8] | 5 Gb/s | CMOS | 3-tap FIR | | [9] | 10 Gb/s | CMOS | 3-tap FIR | | [10] | 5 Gb/s | CMOS | 3-tap FIR | | [11] | 10 Gb/s | CMOS | 3-tap FIR | | [12] | +10 Gb/s | CMOS | 8-tap FIR | | [13] | 8 Gb/s | CMOS | 3-tap FIR | |------|--------|------|-----------| | [14] | 1 Gb/s | CMOS | None | Although high speed serial links have been widely implemented using CMOS technology, few attempts have been made using SiGe. Furthermore, serial link designs that have not utilised pre-emphasis are only implemented for use over cables or shorter distances [3]. The proposed research is focused on backplane copper channels which by its nature has higher distortion and attenuation. Although CMOS technology is a more economically viable solution for the implementation of a serial link, SiGe technology provides numerous advantages to be investigated in designing higher performance serial links. Pre-emphasis has been implemented to pre-distort the transmitted signal by emphasising the high frequency components attenuated by the channel. Finite impulse response (FIR) filters are the preferred type of transmitter pre-emphasis implemented due to its ease of implementation and its easy adjustable nature. Channel impulse responses change for each application and initial installation and tuning of a fixed pre-emphasis filter can become tedious due to its trial and error nature. A need for adaptive pre-emphasis clearly exist whereby the optimal tap coefficients of the FIR filter can reached automatically without the need for tedious tuning processes. The adaptive FIR pre-emphasis filter research can be stated by means of the following hypothesis: A fully functional high speed serial link transmitter employing adaptive FIR pre-emphasis filtering implemented in the 0.18 µm SiGe BiCMOS process for the improvement of off-chip bandwidth and data speeds. The adaptive FIR pre-emphasis filter is proposed to reduce DDJ and improve the timing and sampling uncertainty introduced by it, hence improving the BER of the serial link. #### 1.3 Justification for the research Serial communication links have been established as a well developed technology, but this does not mean that the technology market for high speed serial links has been saturated. Data transfer rates are on the increase and overcoming current limitations is one of the main challenges remaining in achieving higher data transfer rates. Thus the design challenges are far from over, as discussed in [15]. Adaptive pre-emphasis provides the advantage of always pre-distorting the transmitted data by the optimal amount leading to an optimally open eye diagram at the far end of the serial link. Thus, higher data transfer rates could be realised if the channel bandwidth limitation is optimally alleviated. In order to keep the delays imposed by the implemented technology to a minimum, the gate delay has to be as small as possible to achieve the necessary switching speed. Table II illustrates the gate delays of three popular SiGe technologies. The respective technologies are from international business machines (IBM) corporation and Austriamicrosystems (AMS). TABLE II. GATE DELAY COMPARISON | | IBM 8HP SiGe | IBM 7WL SiGe | AMS S35 SiGe | |----------------------|--------------|--------------|------------------| | Minimum Feature size | 0.13 μm | 0.18 μm | 0.35 μm | | <b>Gate Delay</b> | 0.01 ns | 0.05 ns | 0.1 ns (typical) | For data speeds in the region of 10 Gb/s the signal period is about 0.1 ns, thus clearly from Table II, the IBM 8HP process should be chosen for the application at hand. The possible advantages that could be gained from utilising the IBM 8HP process over the IBM 7WL process are presented in Chapter 4. Due to budgetary constraints, as well as a research grant from MOS implementation system (MOSIS¹) for a free wafer run under their MOSIS educational program (MEP), the SiGe technology used to validate the hypothesis is the IBM 7WL process. This process still outperforms the AMS S35 SiGe process as can be seen in Table II, giving the advantage to be able to use fast bipolar devices to possibly provide faster switching circuits. This is discussed in section 4.4. #### 1.4 Methodology High speed serial links are mainly integrated on ICs to achieve high speed capability and maintain low noise. Modern integrated circuit processes have increased in speed and noise performance allowing the designer to improve previous designs with the use of higher www.mosis.com. MOSIS approved an educational run suitable for this research. bandwidth devices and circuits. SiGe processes are becoming the technology of choice where high signal integrity is important. These implementations, previously dominated by the more expensive indium phosphide (InP) and gallium arsenide (GaAs) processes, are moving towards SiGe implementations due to lower cost while maintaining comparable noise performance. SiGe heterojunction bipolar transistors (HBTs) however do not necessarily provide an increase in performance under certain conditions as is discussed in this dissertation. #### 1.5 Research contribution A method of adaptive FIR pre-emphasis for application in a high speed serial link has been proposed, evaluated and implemented as part of this research. Although the fundamental idea of pilot signalling and peak detection is not novel [16], the implementation and the test of the hypothesis through an IC design is to the authors' knowledge completely novel. A detailed list of the resulting contributions to the body of knowledge is given here. - The pilot signalling and peak detection method of determining the optimal FIR filter tap coefficients has been demonstrated both on a mathematical level as well as on a circuit level. - The reduction in DDJ achieved through the implementation of such a FIR filtering pre-emphasis scheme is also presented. - The novel prototype design for testing the proposed hypothesis occupies an area of 0.8 mm<sup>2</sup>. The design incorporated CMOS circuits for low speed switching and control circuits and current mode logic (CML) circuits for high speed switching in the order of 5 10 GHz. - The advantages of utilising HBTs in CML drivers has been presented and compared regarding the unity gain frequency, and comparing to optimally sized MOS CML. The final design utilised only CMOS devices for the complete system design. - The circuit level implementation has been verified through simulations in Cadence Virtuoso using the chosen IBM high performance interface toolkit (HIT-Kit). The prototype IC has been submitted for fabrication as part in a multi-project wafer (MPW) run. The fabricated IC was mounted on a custom designed printed circuit board (PCB) for experimental hypothesis validation. The experimental validation of the transmitter and receiver is presented, although limited due to unexpected outcomes of a critical subsystem. The following peer reviewed conference articles have been published and presented as part of this research: - M.E. Goosen, S. Sinha, A. Müller and M. du Plessis, "A low switching time transmitter for high speed adaptive pre-emphasis serial links," *Proc. of IEEE CAS* 2009, pp. 481-484, Sinaia, 12-14 Oct. 2009. - M.E. Goosen and S. Sinha, "Analysis of adaptive FIR filter pre-emphasis for high speed serial links", *Proc. of IEEE Africon* 2009, Nairobi, 23-26 Sept. 2009. - M.E. Goosen and S. Sinha, "Adaptive FIR filter pre-emphasis for high speed serial links," *Proc. of the South African conf. on semi and superconductor technology* (SACSST), Stellenbosch, pp. 37-42, 8-9 April 2009. The author has also submitted two articles to accredited (listed by the Institute for Scientific Information (ISI)) peer reviewed journals on the following topics: - M.E. Goosen and S. Sinha, "Reducing data dependent jitter utilising adaptive FIR pre-emphasis in 0.18 µm CMOS", submitted to *Elsevier Microelectronics Journal*. - M.E. Goosen and S. Sinha, "A low switching time BiCMOS CML transmitter for high speed adaptive pre-emphasis serial links", submitted to *Romanian Journal of Information Science and Technology (ROMJIST)*. #### 1.6 Outline of the dissertation The dissertation is organised as follows: ☐ Chapter 1: Introduction The chapter introduces the research problem to be addressed as well as the hypotheses and motivation behind the research. The research conducted is placed into context with other research conducted globally as well as common implementation practices. #### ☐ Chapter 2: Literature review The chapter brings forward the body of knowledge incorporated within the research conducted. The chapter is divided into the four main fields of knowledge namely, high speed serial communication links, jitter, integrated circuit technology and pre-emphasis filtering and the associated implementation. The chapter serves to show the narrowing down of a broadly defined research topic to arrive at specific research questions addressed in the hypothesis. #### ☐ Chapter 3: Research methodology The chapter elaborates on the research methodology used for data gathering, simulation and analysis in proof of the hypothesis. Justification and limitations to the methodology is also presented and discussed. The implemented IC verification using experimental testing is also presented in this chapter. Details on the software suites used for testing of the hypothesis are also provided. #### ☐ Chapter 4: Mathematical and systems design The chapter describes the mathematical, system design and simulation. The chapter is divided into three main parts namely, mathematical design and simulation, system design, sub-system implementation. System design incorporates all the knowledge needed to move from a working mathematical model to a system that can be implemented. The most important aspects of the design are covered in the section about sub-system implementation. #### ☐ Chapter 5: Simulation results The chapter contains relevant simulation program with IC emphasis (SPICE) simulation results to verify the hypothesis under consideration. The methodology and tools used in the simulation of the system is discussed in Chapter 3. #### ☐ Chapter 6: Layout and fabrication The layouts of the main parts of the prototype IC are presented as well as some discussion on implementation techniques employed for improved matching. #### ☐ Chapter 7: Experimental Results 8 This chapter contains the corresponding experimental results obtained from testing the IC after being mounted on a PCB. Although limited due to the slow charge and discharge rates of the pulse generation circuits, basic functionality of the system under the main operational state is shown. #### ☐ Chapter 8: Conclusion and future work The chapter finishes the dissertation with the conclusions and critical evaluation of the research hypothesis presented. The limitations to the current research are discussed, leading to a discussion on future areas of research spawning from this research, as well as proposed future technical improvements. #### 1.7 Delimitations of the scope of the research The scope of the research is limited to reducing DDJ imposed by the bandwidth limited copper backplane channel implemented in discrete system solutions. DDJ will be investigated and ways of alleviating the jitter requirements of current system implementations through adaptive FIR filtering will form the core of the research conducted. #### 1.8 Conclusion This chapter laid the core foundations for the dissertation. The core research problem and hypotheses were presented, as well as the background to the research. Abbreviations, definitions and initial methodology were presented. The organisation of the dissertation was outlined to introduce the path to be followed through the dissertation. Chapter 2 follows with a detailed literature review of the body of knowledge followed by Chapter 3 continuing with a discussion on the method for data gathering, simulation and experimental testing of the research presented. Chapter 4 contains the mathematical verification, system level design and the design considerations used in this research. The simulated and experimental results used in the verification of the hypothesis are presented in Chapter 5. Chapter 6 elaborated the layout design of the prototype IC. Chapter 7 discusses the limited experimental results achieved before Chapter 8 concludes the research leading from this study. ### **CHAPTER 2: LITERATURE REVIEW** #### 2.1 Introduction In the development of discrete solutions regardless of the application field, serial communication links provide a data interconnect capable of high speed and large bandwidth with lower implementation cost. The interconnection of different systems requires a common path through which all communication should take place. Focussing on large discrete system implementations, the common path for data interconnects is provided by a backplane. As suggested, the backplane provides interconnectivity between systems. Figure 2.1 provides an overview of how the backplane fits into a discrete system implementation. Figure 2.1. Overview of backplane communication in discrete system level design and implementation. Figure 2.1 illustrates a simplified block diagram of a system consisting of subsystems responsible for sensing "real-world" signals and processing it to produce information about the application. The sensor system is usually located as a separate subsystem within a larger system and is usually also physically separated from the digital signal processing unit. Thus, to provide interconnectivity between the sensors and the digital signal processing, a backplane is utilised to allow for continuity while maintaining signal integrity. In the determination of the effectiveness of a communication system, signal integrity and data recovery are of utmost importance. The data transmitted should be able to be recovered with minimum amount of error bits received. Impediments inherent in any communication system include noise, limited channel bandwidth and jitter. All of these impediments add in degrading the performance of the communication system. Differential serial interconnection provides various advantages over increased bus width interconnections (conventional parallel buses) [15]. By increasing the bus width the total amount of clock skew between signal lines increase, thereby negatively affecting the data integrity of the communication system. The added signal lines of the increased bus width connection also require a larger connector and ultimately a larger, more expensive solution to provide multiple line interconnections. Serial links have the advantage of a reduced pin count. By reducing the pin count and making good use of differential signalling, high speed serial connections can be established with a bandwidth exceeding that of increased bus width implementations. This inherently reduces the overall cost by increasing the bandwidth per pin unrivalled by increased bus width implementations. When increasing the operating speed of a serial link the noise margin of the link reduces, requiring better design techniques to accompany the serial link. The most important technique is that of differential signalling. Differential signalling, by its true nature, has the ability to effectively reject common-mode noise. This is increasingly important in a discretely implemented system where matched PCB traces will have the same induced common-mode noise. Further, with the decrease in supply voltages of IC processes to provide for faster switching and higher unity gain frequencies, the noise margin becomes an increasingly important factor to consider. LVDS is a low noise, low power and low amplitude method for high speed multi Gb/s data transmission over copper channels [15]. LVDS further reduce crosstalk and electromagnetic interference (EMI) due to the lower amplitude levels. The following section covers all aspects of a high speed serial communication link. This includes trends and standards, popular high speed serial link implementations, signalling schemes and finally copper backplane serial links which is the focus area of this research. #### 2.2 High speed serial communication #### 2.2.1 Overview High speed data transport from one point to another and device integration are two of the main requirements of system development and design [15]. Demand for higher bandwidth serial links has been increasing as communication systems require higher quality of information at an increased rate [14]. This also refers to the increase in demand for userend bandwidth [3]. The goal of high speed interconnect design is the optimisation of bandwidth, power, pin count, number of wires and total implementation cost [17]. One popular application of high user-end bandwidth is in multimedia applications where high definition video is transmitted. The high definition video requires data rates of hundreds of Mb/s to a few Gb/s to satisfy the need for real time imaging [8]. The increase in the demand for high user-end bandwidth, pushing the speed of serial transceivers and channel interconnects, led to the fact that leading field programmable gate array (FPGA) suppliers provide core standards already integrated on their high-end programmable devices. Although these implementations already provide high speed serial communication links, do not mean that the design struggle to implement more reliable and faster data interconnects have been solved. In contrary, the device community is viewing such offerings as an indication that even higher data transport rates are possible, hence the design challenge continues [15]. System vendors, especially backplane vendors, want to avoid deploying new backplanes due to the high development and production costs, thus new transmitter and receiver circuit technology need to be developed to achieve increased bandwidth over legacy copper channels [9]. This also opens a market for short to medium distance serial links for use in ASICs and microprocessors [12]. #### 2.2.2 Trends and standards In order to understand the trends of high speed serial communication links, the current requirements and limitations of existing communication links need to be understood. Many techniques have been applied to increase the bandwidth per pin such as increasing frequency, widening the interface, pipelining and out of order completion, but continuing to work with the bus creates several design issues such as increased complexity or increased noise [15]. Various standards have been established to govern the fast growing high speed serial communication link market. These include optical channels, copper cable channels and copper backplane channels. The Optical Internetworking Forum (OIF), the Network Processing Forum (NPF), the Institute of Electrical and Electronic Engineers (IEEE) and the Peripheral Component Interconnect – Special Interest Group (PCI-SIG) are only some of the institutions working on the development of commercial high speed serial links. Ultimately, the emerging trend is that of built-in scalability that can provide for longer lasting input/output (I/O) standards both on frequency and on a data bandwidth level [15]. An example of this is the implementation of high speed serial link transceivers in leading FPGAs. #### 2.2.3 Different types of serial links and data transfer rates There are various types of serial links currently available in industry. These range from backplane serial communication links to cable-based serial communication links, whether it is a fibre optic cable or simply a copper cable. Fibre optic cables have the advantage over conventional copper cable for its ability to carry information over much longer distances due to its low loss. Copper on the other hand exhibits a high loss factor with a -3 dB cut-off frequency of in the region of 400 MHz [8]. Using coaxial cable (PE-142LL) a 3-dB cut-off frequency of 1.2 GHz is achieved [10]. This low pass filter effect exhibited by the copper cable causes an attenuation of more than 40 dB at 10 GHz. Table III illustrates the different types of serial links and their respective maximum achievable data transfer rates. TABLE III. DIFFERENT TYPES OF SERIAL LINKS AND THEIR RESPECTIVE DATA RATES. | Standard | Speed | Туре | |----------------------|--------------------|------------------------------| | USB 2.0 (High speed) | 480 Mb/s | Copper cables | | USB 3 | 4.8 Gb/s (Planned) | Fibre and copper cables | | IEEE 802.3 | 1 Gb/s | Copper cables | | IEEE 1394b | 1.6 – 3.2 Gb/s | Copper cables and backplanes | | Infiniband | 2.5 Gb/s | All | | RapidIO | 10 Gb/s | Copper cables and backplanes | | SATA | 3 Gb/s | Copper cables | | Fibre Channel | 1.0625 Gb/s | Fibre and copper cables | | PCI-Express | 2.5 Gb/s | Copper cables and backplanes | Table III shows some of the most popular serial communication buses and standards. RapidIO is currently increasing in popularity for its high data rates of up to 10 Gb/s. RapidIO primarily provides connectivity between components, which include digital signal processors (DSPs) and FPGAs. The problem currently faced with implementing and using RapidIO is its stringent design requirements to successfully acquire a working, high integrity RapidIO serial communication link at 10 Gb/s. Therefore, RapidIO is usually scaled down to 6.25 Gb/s. Most FPGA manufacturers are moving toward the general trend of implementing RapidIO serial transceivers in their product. This provides the design engineer with a configurable high speed serial link to interface between other devices and systems. Due to its high speed and hence also its high frequency content, a RapidIO link undergoes significant distortion when passed through the bandwidth limited channel. The distortion at the far end of the serial link consists mainly of DDJ. The clock jitter in the transmitter adds to the DDJ. One way to alleviate the problem of DDJ causing bit errors is to reduce the frequency content of the transmitted signal. To reduce the frequency content of a signal without having to pay in terms of data transfer rate, different signalling and modulation schemes are used. #### 2.2.4 Signalling The vast majority of serial links on fibre, backplanes or copper wire implement NRZ signalling [1], [8], [18]. NRZ is popular for its advantage in simplicity of the design and compatibility with conventional digital logic switching techniques, but comes at a cost of a higher bandwidth requirement. As data rates increase, ISI in NRZ signalling schemes increase to the point at which the simplicity of NRZ is no longer feasible and a more complex signalling scheme needs to be implemented to alleviate the problem of ISI. Return to zero (RZ) signalling alleviate the problem of increased ISI, but requires higher bandwidth. This restricts the use of RZ signalling schemes to fibre optic channel implementations since copper channels have a limited bandwidth when comparing to fibre optic channels [18]. Multilevel signalling techniques are emerging to increase the data rates without increasing the frequency content of the signal. The most popular multilevel signalling technique implemented is pulse amplitude modulation (PAM) [14], [18], [19], while pulse width modulation (PWM) is also used [20]. By making use of N-PAM the spectral efficiency is given by $2*log_2(N)$ , where N is the number of PAM levels [15]. This implies that by using 2 bits, which equates to four PAM levels, the spectral efficiency is increased by a factor of 4. In other words, the same data rate can be achieved with half of the original frequency content. Channel spectral efficiency is measured in terms of the number of bits per second per hertz (b/s/Hz) and is determined by the bandwidth of the basis waveform [10], [13]. To illustrate the idea of acquiring higher data rates without increasing the frequency content of the signal refer to Figure 2.2. Figure 2.2. Illustration of spectral efficiency improvement by using N-PAM techniques [14]. (a) Conventional parallel communication. (b) Serial communication utilising NRZ signalling. (c) Serial communication using 4-PAM signalling. As illustrated by Figure 2.2, all the data links have exactly the same data transfer rate, namely 1 Gb/s. Figure 2.2 (a) suffer from crosstalk and clock skew due to the increased bus width. To overcome the limitations of parallel communication, serial communication is adopted. Figure 2.2 (b) shows a conventional serial link which utilises NRZ signalling. Figure 2.2 (c) shows a 4-PAM serial link. The maximum frequency content of the 4-PAM signal is equal to the frequency content of two of the lower speed parallel busses multiplexed, but has the same transfer rate as a conventional serial link with four multiplexed signals. This shows that N-PAM can increase the data rates without increasing the frequency content of the signal [14]. Apart from 4-PAM signalling, 4-bit PWM-PAM signalling was implemented to utilise a 1 Gb/s serial link. In that way, the data transfer rate of 1 Gb/s is exactly equivalent to transmitting 250 MS/s. In other words, the 4-bit PWM/PAM scheme transmits 4-bits per symbol effectively transmitting at a data transfer rate of 1 Gb/s with the frequency content of a 250 MHz NRZ signal. *N*-PAM systems with more than four levels have also been implemented [21]. When using signalling techniques, only Gray-code mapping guarantees that every neighbouring symbol error results in only 1 bit error [10]. When using coding with added parity bits, one error bit can usually be corrected whereas two bit correction is not usually compensated for with the amount of parity bits added to the data. *N*-PAM implementations suffer from the limited supply voltages in modern high speed IC processes. This implies that the receiver decision circuit should be carefully designed to still obtain a BER of less than 10<sup>-12</sup> which is commonly accepted. This relies heavily on advances in IC fabrication technology producing faster transistors. Figure 2.3 illustrates the scaling of supply voltage with the minimum drawn channel length of a transistor in order to produce higher speed devices [22]. Figure 2.3. Power supply voltage scaling with minimum drawn channel length [22]. Figure 2.3 illustrates the problem facing design engineers. The margin between the power supply voltage and the threshold voltage is edging closer to each other limiting the amount of devices which can be stacked telescopically. Thus alternative design techniques need to be implemented to help with the design. One such method is using the inversion coefficient design technique [22]. #### 2.2.5 Backplane serial communication links Backplane channels are used to provide a common, easily used interconnect between different subsystems to produce a final system with a value greater than the sum of its parts. Backplane channels are an example of a mature technology with various backplanes developed and commercially available. Physical copper interconnects provide the interconnections on the backplane. Although backplane channels have long since been used, frequency dependant distortion of the signal is becoming increasingly problematic at higher data transfer rates. The frequency dependant distortion introduced by the copper backplane channel arises from the skin effect, dielectric losses and signal reflections of the backplane [7], [15]. Skin effect is the phenomenon whereby the current flow tends to concentrate on the surface of a conductor at high frequencies due to the conductor self-inductance [7]. Dielectric loss is due to the delay of polarisation in the dielectric material when subjected to changing electric fields [7]. Reflections on the other hand are caused by discontinuities in the transmission line path as well as impedance mismatches of the transmission line. Non-ideal electrical performance can be grouped as: channel loss, DJ and random jitter (RJ) [17]. Channel loss is divided into skin effect loss which is proportional to the square root of frequency and dielectric loss which is proportional to the frequency. Skin effect loss dominates cable loss, while dielectric loss dominates PCB trace loss at high frequencies (excess of 10 GHz) [17]. One way of equalising for the non-ideal electric performance is channel equalisation, whereby the channel (cable) contains two types of materials with different resistivities to effectively equalise the frequency response of the channel itself [17]. Apart from the backplane causing frequency dependant distortion, the package in which the chip is finally mounted also produces a significant amount of distortion due to parasitic capacitors and inductors. This is mainly caused by the bonding wire (typically 1 nH/mm) [1]. The parasitic components are mainly caused by [23]: • parasitic capacitance of I/O pad, - bond wire between chip and package, - transmission line effects of lines on the package, and - connection between package and PCB [24]. Three methods of attaching the package to the PCB are discussed in [24]. These include wire bonding, tape automated bonding and flip-chip package bonding. The last mentioned method is electrically the best option, with the parasitic components associated with the packaging at a minimum, but mechanically and thermally the flip-chip package performance is miserable [24]. A typical flip-chip package attached to a PCB can be modelled by Figure 2.4. Figure 2.4. Typical parasitic element model for a flip-chip package connected to a PCB [23]. From Figure 2.4 it can be seen that the parasitic elements caused by the package will contribute significant frequency dependant distortion. The bonding wire and package thus add in producing pattern dependant jitter or distortion. It is important to note that if the backplane channel is physically long, the distortion caused by the channel will be dominant and the bonding wire and package distortion will be negligible, but for physically short channels, the bonding wire and package distortion will become more critical. In a recent implementation, cable or trace attenuation roughly accounted for half of the total loss while the other half is contributed by the component package, connectors and path discontinuities [17]. The loss imposed by the IC package can be improved by making use of techniques such as transmitter-side shunt peaking. With the implementation of shunt peaking, the bandwidth of the bond wire connection can be improved by a factor of 1.8 with the use of a 2 mm length of bond wire [10]. #### 2.2.6 Backplane channel modelling Any backplane channel characteristics can be completely modelled by taking the differential scattering parameters as presented in [25]. Traditional methods were based on applying a test voltage or current and measuring the output at the far end of the link. This has become increasingly difficult at RF and microwave frequencies. Primarily, scattering parameters were developed for such a case but were only applicable to single-ended systems. These single-ended scattering parameters were adjusted [25] to form differential scattering parameters to be used in differential system characterisation. By assuming at first that the backplane channel is a linear time-invariant (LTI) system, the backplane channel can be completely characterised by its impulse response. This is a fair assumption as the environmental conditions affecting the backplane channel model exhibit large time constants and change very little over time. Due to the low pass filter characteristic of a copper channel, the impulse response exhibits a long tail which interferes with the following adjacent bit [10]. The long tail of the impulse response causes ISI, degrading the BER and ultimately the performance of the serial link. Recent transmitter designs were designed with prior knowledge of the channel model. The channel was fully characterised beforehand and therefore the design could be optimised for the specific channel [1], [8]. Due to the cost of implementing and manufacturing an IC, this is not a feasible solution for large scale implementation using different legacy channels. Group delay, which is defined as the phase distortion caused by channel, should be taken into account in the channel characterisation and modelling since the group delay will increase the tail of the impulse response and worsen the ISI [26], [27]. ISI is a major factor limiting the maximum distance and data transmission rates for backplane data transmission. Using FIR filter pre-emphasis at the transmit side is a popular technique for counteracting ISI in backplane data transmission links [1], [17]. #### 2.3 Jitter #### 2.3.1 Jitter in high speed serial links Jitter is defined as the deviation of a timing event of a signal from its intended or ideal occurrence in time. Thus jitter can be seen as unwanted pulse position modulation (PPM) [28]. The timing event in serial communication links is the rising and/or falling edges of the pulses. Jitter degrades the performance of high-speed serial links by limiting the maximum achievable data rates [29]. Jitter impedes the ability to successfully recover the transmitted data at the far end of the link. In a serial link, the data and the clock are sent in a single serial bit stream, since the clock signal is embedded within the data signal. The receiver then extracts the clock and the data using a clock and data recovery (CDR) circuit. The transmitter introduces clock jitter while the receiver further introduces clock jitter in the CDR circuit. This is due to the fact that the receiver is extracting a clock signal from a distorted and already jittery signal. Link performance is typically measured in BER using a bit error rate tester (BERT), however, long simulation times are needed to evaluate the performance. Analytical and statistical methods are thus presented [29], [30] to reduce long simulation times required in determining the effect of jitter on the signal integrity. Total jitter (TJ) can be divided into two main categories: RJ and DJ. Figure 2.5 illustrates how the different sub-components of jitter contribute to the TJ. Figure 2.5. Jitter sub-components constituting the total system jitter. Figure adapted from [4]. Periodic jitter (PJ) refers to periodic variations of the signal edge positions over time. Bounded uncorrelated jitter (BUJ) is typically due to coupling from adjacent data-carrying links or on-chip random logic switching [4]. Duty cycle distortion (DCD) describes the jitter amounting to a signal having unequal pulse widths for high and low values. ISI is jitter that is dependant on both the data bit stream as well as the channel I/O bandwidth. DDJ corresponds to a variable jitter that is dependent on the bit pattern transmitted on the current channel under test [7]. RJ can be measured either with the use of histograms or with curve fitting algorithms depending on whether the DJ component is absent or not. Whenever the RJ distribution is non-Gaussian, other techniques have to be employed to measure the RJ, for example frequency domain analysis [7]. By transmitting a clock like pattern the DCD can be measured directly by noting the periods of logical high and logical low. The DCD distribution can then be developed by making use of a histogram and the data obtained experimentally. ISI and DCD can also be measured using spectral analysis. To measure only ISI, a data pattern containing both long and short bit run lengths can be transmitted while measuring the difference between the pulse edges for the different run lengths [7]. For this research, DCD is assumed to be negligible and ISI dominates the introduced DDJ. #### 2.3.2 Causes of jitter in high speed serial links TJ, as illustrated in Figure 2.5, can be subdivided into two categories, namely, RJ and DJ. RJ is mainly caused by thermal vibrations, semiconductor doping and process variations for instance thermal noise, flicker noise or 1/f noise and shot noise [7], [17]. Flicker noise has different origins but is mainly caused by traps associated with the contamination and crystal defects. These traps capture and release carriers in a random fashion and give rise to a noise signal with energy concentrated at lower frequencies [31]. Shot noise is ever present in any system incorporating diodes, bipolar transistors and metal oxide semiconductor (MOS) transistors. Shot noise is the random fluctuations in the current flowing through a pn-junction [31]. Thermal noise is caused by the random thermal motion of electrons [31]. DJ is mainly caused by crosstalk, switching noise, insufficient power delivery, EMI, DCD, ISI and discontinuities in the transmission path [17]. EMI is the interference caused from energy radiated or conducted from other devices or systems in the vicinity [7]. Discontinuities and impedance mismatches between the load and the transmission line cause signal reflections which further degrade the signal-to-noise ratio (SNR). These reflections are frequency dependant and will thus fall under the classification of DDJ. Signal slew rate limitations of the driver at the transmitter further introduce DDJ and can be modelled as a low pass filter. Crosstalk is the main contributor to BUJ and is caused by interference from other signal traces in the vicinity. BUJ measurements have been performed [7] as a means to characterise it and determine its contribution to DJ in a typical communication link. #### 2.3.3 Mathematical definitions In data transmission, the data pulse width is determined by the timing instant of the transmit clock at both begin or end pulse edges [29]. NRZ pulses are commonly used as basis for discrete data transmission while multi-level signalling schemes such as 4-PAM is also being implemented to increase the spectral efficiency. A jitter-free transmit clock can be written as [29]: $$\phi(t) = \sum_{k=-\infty}^{\infty} (d[kT] - d[kT - T]) u(t - kT)$$ (2.1) where T is equal to the bit period and u(t) is a unit step function. Since a channel can be approximated as an LTI system, it can be accurately characterised by its impulse response. Thus the output of the channel is evaluated by convoluting the jitter-free transmit clock with the channel impulse response. The output signal can thus be evaluated as follows: $$y(t) = \left[\sum_{k=-\infty}^{\infty} (d[kT] - d[kT - T]).u(t - kT)\right] \otimes h(t)$$ $$= \sum_{k=-\infty}^{\infty} (d[kT] - d[kT - T]).s(t - kT)$$ (2.2) where h(t) is the impulse response of the channel, y(t) is the output signal and s(t) is the convolution of the unit step function with the channel impulse response. As seen from (2.2), the sampling instant kT determines the pulse width of the $k^{th}$ transmitted pulse [29]. Jitter can be included in the above equation by defining a transmitter jitter sequence $\{j_{tx}\}$ such that $j_{tx}[k]$ is associated with the $k_{th}$ clock edge. A CDR circuit locked to the serial data stream generates a receiver clock phase that is aligned with the incoming data to maximise the voltage margin at the input of the detector. Due to inherent noise of the receiver, the receiver also introduces clock jitter. By defining a jitter sequence $\{j_{rx}\}$ such that $j_{rx}[n]$ is the jitter associated with the $n^{th}$ sampling edge, both the transmitter and receiver clock jitter in the output signal can be written as: $$y(nT) = \sum_{k=-\infty}^{\infty} [(a[kT]).s(nT - kT + j_{rx}[nT] + j_{tx}[kT])]$$ (2.3) where a[kT] = d[kT] - d[kT - T]. After applying a first order Taylor series approximation [29], the following equation is obtained: $$y[n] \approx a[n] \otimes s[n] + (a[n] \otimes h[n]). j_{rx}[n] + (a[n].j_{tx}[n]) \otimes h[n]$$ (2.4) In (2.4), the first term is the channel output when both the transmitter and the receiver clocks are jitter-free. The second and the third term represent the voltage margin degradation due to the transmitter and receiver clock jitter introduced. This explicit separation of the transmitter clock jitter and the recovered receiver clock jitter enable worst-case simulations without performing long BER time domain simulations [29]. This research focuses on the DDJ introduced, hence the clock signal is assumed to be jitter-free. #### 2.3.4 Eye diagram analysis A graphical measure of data integrity can be done by using an eye diagram. An eye diagram is a composite of all the bit periods superimposed on each other. Figure 2.6 shows an eye diagram with a perfect open eye and well defined edges. Figure 2.6. Perfectly open eye diagram with well defined left and right pulse edges. A perfect open eye diagram as shown in Figure 2.6 is the ideal scenario eye diagram to be received at the far end of the transmission link. Due to DDJ, DCD and channel attenuation, the eye diagram closes both horizontally and vertically. DDJ which is associated with the horizontal eye closing becomes more stringent as the pulse widths decrease with higher data transfer rates. Clock jitter on the other hand exhibits a Gaussian distribution. Thus the ideal sampling instant as shown in Figure 2.6 will vary about its centre with a Gaussian distribution, placing further limitations on the jitter requirements. Since most of the jitter components are caused by noise the only way to characterise jitter is by means of statistical analysis. #### 2.3.5 Statistical jitter analysis The probability density function (PDF) of the TJ is the convolution of its RJ and DJ components [32]. This however requires both the RJ and DJ components to be described by their PDFs rather than their peak-to-peak or RMS values [7]. In most practical cases, RJ is characterised as having a Gaussian distribution. A Gaussian distribution is characterised by the well-known equation: $$f_{RJ}(t) = \frac{1}{\sqrt{2\pi\sigma}} e^{\frac{-(t-\mu)^2}{2\sigma^2}}$$ (2.5) where $\mu$ is the average and $\sigma$ is the standard deviation [33]. Figure 2.7 shows a Gaussian distribution with a zero mean and a standard deviation of two. Figure 2.7. Gaussian distribution with a mean of zero and a standard deviation of two. Adapted from [33]. Contrary to RJ, the DJ PDF is usually estimated as two delta-Dirac functions. A deconvolution method for determining the DJ which mainly consists of ISI and DCD is presented in [28]. A real solution can be obtained through the deconvolution method as opposed to the estimated double delta-Dirac distribution [28]. A mathematical expression for the double delta-Dirac approximation is: $$f_{DJ}(t) = \frac{1}{2} \left[ \delta \left( t - \frac{D}{2} \right) + \delta \left( t + \frac{D}{2} \right) \right]$$ (2.6) where D is the width between the two delta-Dirac functions and $\delta$ is the delta-Dirac function itself. Figure 2.8 shows an approximation of DJ by taking the PDF of a sinusoidal function, which is a good estimate for PJ. The PJ and the DJ PDFs are almost the same due to the double delta-Dirac approximation. Figure 2.8. PDF of DJ estimating two Dirac delta functions. Adapted from [34]. By convoluting the RJ PDF of Figure 2.7 and the DJ PDF of Figure 2.8, the TJ of the system can be approximated. The TJ PDF is shown in Figure 2.9. Figure 2.9. Convolution of RJ and DJ to produce the TJ of the system. Adapted from [34]. The TJ PDF can be used to estimate the BER by using Q-factor theory developed for optical channels. The BER is essentially the cumulative distribution function (CDF) of the TJ PDFs of the left and right eye crossings over the time interval where a bit error occurs [7]. Figure 2.10 shows an overlay of the distortion, imposed on the pulse edges due to the TJ, and a perfect open eye diagram. Figure 2.10. Distortion imposed by total system jitter affecting pulse edges. Adapted from [34]. Figure 2.10 shows that the TJ will distort the pulse edges, with the probability as shown, at the far end of the transmission link closing the eye significantly and ultimately degrading the BER. From a measurement and characterisation perspective, a closed eye diagram at the far end (receiver) does not indicate data integrity failure if the receiver is implementing equalisation. Therefore, oscilloscopes and data analysers need to implement mathematical routines to emulate receiver equalisation to obtain a clear way of measuring the performance of the link [17]. Jitter specifications of a serial communication link are usually specified in terms of the TJ, RJ and DJ. Adherence to existing protocol specifications can be tested by overlaying the protocol eye masks with the achieved eye diagram. Any signal crossing the eye mask is considered a specification violation [7]. Jitter is usually specified to not exceed 10 % of a unit interval (UI). A UI is defined as the ideal or average time duration of a single bit. The more stringent jitter requirements rely on newer IC technologies to achieve the necessary specifications. #### 2.4 Integrated circuit technology The building of networks to handle data, voice and internet protocol (IP) traffic has brought forward a need for ICs that can transmit and receive large amounts of information with high data integrity [18]. Considering the amount of constraints and limitations on high speed backplane serial transceivers, a designer will need all the aid from the technology implemented. Among the more mature silicon-based technologies, the SiGe BiCMOS successfully fulfils this need of the designer by providing high density integration capability as well as high performance and low noise levels [3]. One of the advantages of SiGe devices over bulk silicon devices is that carrier mobility in strained p-channels or tensile strained n-channels are adequate to build metal oxide semiconductor field effect transistors (MOSFETs) whose speed is higher than that of a bulk silicon device with similar size and structure [35]. The reason for the growing popularity of SiGe technology is for its ability to simultaneously achieve high cut-off frequency ( $f_T$ ), low base resistance ( $f_T$ ) and high common emitter current gain ( $f_T$ ). The main reason for the success of SiGe HBT devices are due to its low noise capabilities [2]. SiGe technology has reached the point in which more expensive processes like InP and GaAs are becoming more application specific as performance of SiGe devices are increasing and can provide a lower cost high speed device. Due to the competitive device characteristics of SiGe devices, a shift to the lower cost SiGe devices are likely to occur more often [3]. High speed data communication networks of the 21<sup>st</sup> century will continue to require high performance, low power, low noise and low bit error rates at an affordable price forcing the IC industry to higher integration levels and lower power devices [3]. SiGe technologies will further continue to move into market segments previously believed to be the stronghold of more expensive GaAs and InP-based technologies. Thus SiGe processes are expected to be the technology of choice for future multi Gb/s throughput parallel links [3]. Integrated circuit implementation techniques form an important consideration in the implementation of a high speed serial link. The switching speed of the logic circuits as well as the noise levels of the devices form an integral part of a high speed design. Techniques such as emitter coupled logic (ECL) were employed in the past for its low voltage levels, but CML is preferred for higher speed operation [1]. Chapter 4 discusses the considerations taken into account in choosing between conventional MOS CML and bipolar CML. #### 2.5 Reducing jitter in high speed serial communication links Jitter can be decomposed into several sub-components each having their specific characteristics and root causes [7]. Traditionally, the performance of a communication link has been measured by its BER, but when the data rate increases, the jitter magnitude as well as the signal amplitude noise need to decrease proportionally in order to achieve the same acceptable BER [7]. A common acceptable BER for serial links is $10^{-12}$ . This translates into a jitter requirement of less than 10% of a UI. One of the areas in which a large amount of effort has gone into is in designing of a low phase noise voltage controlled oscillator (VCO) [36]. The phase noise of a VCO can be seen as small variations in the zero crossings of the signal. Jitter is defined as variations from the expected time of an event. Thus by reducing the VCO phase noise the clock jitter introduced by the phase locked loop (PLL) at the transmitter and at the receiver can be reduced. The total amount of clock jitter should be kept as low as possible and should be insignificant in relation to a UI. It is important to note that power supply induced jitter (random fluctuations in supply voltage influencing the tuning voltage and hence oscillation frequency) directly influence the VCO, introducing further phase noise. The DDJ on the other hand causes the receiver eye diagram to close completely, hence making a correct decision virtually impossible, resulting in a large BER. Techniques to overcome DDJ are developed to improve the BER of high speed serial communication links. # 2.5.1 Overcoming data dependant jitter There are two main limitations in the propagation of a multi-Gb/s data signal. These are the frequency dependant characteristics of the channel and the package [8]. As discussed earlier, the backplane channel, the chip package and the bonding wires contain frequency dependant devices which translate into data dependant distortion. The chip pads conventionally contain electro-static discharge (ESD) protection circuitry and together with the chip package and bonding wires form an effective low pass LC filter. This parasitic filter causes the high frequency roll off causing the DDJ in the signal. The ISI is not only determined by the high frequency roll off but also by the non linear phase response of the effective LC filter [18]. To overcome the low pass filter effect of the complete channel one of two methods can be used to overcome the channel and package limitations. These two methods are preemphasis at the transmitter or equalisation at the receiver [37]. Both ways of overcoming the DDJ rely on multiplying the channel (including the chip and bonding wires) transfer function with a certain transfer function to obtain a perfect flat frequency response. Both the pre-emphasis and the equalisation will act as a high pass filters, shaping the signal to produce an overall flat frequency response. Ideally a flat magnitude frequency response and a linear phase response are required for perfect error free transmission. The criterion for error free transmission, namely a flat frequency response and a linear phase response are mathematically expressed as follows [38]: $$|H_{P}(\omega)H_{C}(\omega)| = C$$ $$\theta_{P}(\omega) + \theta_{C}(\omega) = -\omega t_{d}$$ (2.7) where $H_P(\omega)$ is the FIR pre-emphasis filter transfer function, $H_C(\omega)$ is the channel transfer function and C is a constant. $\theta_P(\omega)$ is the FIR pre-emphasis filter phase response and $\theta_C(\omega)$ is the channel phase response. #### 2.5.2 Equalisation To compensate for the lossy characteristics of the channel, equalisation techniques such as a feed-forward equaliser (FFE), decision feedback equaliser (DFE) and receiver preamplifiers have been proposed [17]. For cost effective high performance designs, the channel, transmitter and receiver should complement each other as discussed in [17]. The major limitation with DFE is the time constraints of the feedback loop [37], [39]. The most common transmitter side equalisation (transmitter pre-emphasis) is the FFE which uses the current transmitting bit and one or more of its adjacent bits. This effectively implements a FIR filter [17]. A high pass filter is used for transmitter pre-emphasis to obtain a flat frequency response over the frequency range of interest. Equalisers at the receiver are easier to implement and to make adaptive since the signal being fed back for coefficient updates are already present with the CDR circuit. An adaptive equaliser architecture is presented [11] implementing an infinite impulse response (IIR) filter for its simplicity in an adaptive circuit. The effective IIR filter produced can be viewed as an adjustable high pass filter [11]. Although equalisation can correct some of the data dependant distortion, if the signal is too weak or too distorted, an equaliser cannot produce an open eye diagram to the sampling and decision circuitry. Thus a need for effective pre-emphasis or pre-shaping circuits in high speed serial links arises. #### 2.5.3 Pre-emphasis Transmit pre-emphasis (also called pre-distortion or pre-shaping) exhibits lower power consumption, superior performance and better interoperability as compared to receiver equalisation [40]. Pre-emphasis pre-distorts the transmission data by amplifying the higher frequency components boosting the high frequency components before entering the non-ideal channel [26]. FIR filter based pre-emphasis has been used by various authors to overcome ISI (effectively DDJ) in high speed serial links [1], [8], [26]. Phase distortion caused by the non-linear phase response of the combined channel produces group delay distortion that should be taken into account for channel modelling and FIR filter design [26]. Methods for correcting the group delay distortion as well as the amplitude distortion is discussed in [27], [41], [42]. There are mainly two types of FIR filters used for pre-emphasis: symbol spaced filters (SSFs) and fractionally spaced filters (FSFs) [26]. The performance of a SSF is limited by aliasing as a result of sampling at 1/T. Sampling at a fraction of T, causes the FIR filter to be defined beyond the Nyquist frequency of 1/2T. It is thus expected that FSF pre-emphasis perform better than SSF pre-emphasis. A FSF is more complex to implement so the eventual implementation is a trade-off between performance and complexity [26]. SSFs are widely used to accomplish the pre-distortion necessary to overcome the channel bandwidth limitations [9]. A D-flip-flop is often implemented before the actual FIR filter delay taps to synchronise the data and the clock since the transmitter is fed with non-synchronous clock and data signals in a test environment [9], [12]. A mathematical expression for symbol spaced FIR filters is: $$y(n) = \sum_{i=-N}^{M} c_i x(n-i)$$ (2.8) where $c_i$ is the tap coefficients, N is the number of pre-taps, M is the number of post-taps, x(n-i) is the input data while $c_0$ is the reference tap [1]. The mathematical expression for a symbol spaced FIR filter can also be written as [43]: $$y(n) = \sum_{k=0}^{N-1} h(k)x(n-k)$$ (2.9) where N is the filter length, h(k) is the filter tap coefficients and x(n - k) is the input data. Equation 2.9 assumes no pre-taps. The filter tap coefficients, written as h(k), represent the sampled impulse response of the FIR filter. FIR filters can be made phase linear and thus there are no phase-related amplitude errors introduced by the filter. If a signal is composed of only one frequency almost any filter should suffice, but in the case of pseudo random bit sequences containing multiple frequencies, phase response needs to be considered. Different frequencies take different amounts of time to pass through the filter and the output will become distorted. The only way to avoid this distortion is to have a linear phase response. An IIR filter impulse response has a definite starting point but by definition goes to infinity. FIR filters on the other hand can have a symmetrical impulse response which is a requirement for a linear phase response [44]. Phase distortion introduced by the channel will cause the channel impulse response to exhibit an unusually long tail which will interfere with the adjacent data bits. Implementing a linear phase response filter ultimately boils down to a trade-off between performance and implementation complexity, since a symmetrical impulse response is necessary, requiring double the amount of filter taps. The number of coefficients of the filter is of importance in FIR filter implementation on an IC since the area of the integrated circuit is relative to the production cost. Also, since only a finite amount of filter tap coefficients can be used to represent the filter, the filter frequency response will exhibit sidebands and fall off at higher frequencies. FIR filters are further also unconditionally stable even when realising relatively high Q-values. There are three kinds of pre-emphasis filter implementations: passive or fixed, adjustable and adaptive [40]. Passive pre-emphasis is implemented with prior knowledge of the channel model. The limitation of passive pre-emphasis is that it can only compensate for the exact model for which it was designed. Adjustable pre-emphasis usually makes use of FIR filter pre-emphasis for its easy adjustable tap values. The limitation of externally adjustable FIR filters is that it becomes somewhat a trial and error exercise to determine the near optimum FIR filter tap coefficients [40]. Fixed or programmable pre-emphasis cannot compensate for changes in environmental conditions such as the power supply, temperature and aging. An adaptive pre-emphasis scheme would alleviate the long term issues associated with the environmental effects. From a design perspective, an adaptive structure would not require an external programming interface which can reduce the total die area, but the complexity of implementing an adaptive scheme could be more time consuming [40]. Implementing a transmitter pre-shaping FIR filter requires only that the weighted adjacent bits need to be added to the transmitted signal, meaning that during the next *N*-symbol periods, *N* modules turn on consecutively to cancel the tail of the channel impulse response. The transmitter does not dictate the use of a faster technology to operate properly [10]. One way to implement an *N*-tap FIR filter is to make use of a digital to analogue converter (DAC) and a counter to adjust the FIR filter tap coefficients [10]. FIR filter optimisation using external tap coefficient adjustment has been studied [26] but has the limitation of needing to know the complete characterised channel model prior to optimisation. Least mean squares (LMS) criterion is a popular choice in implementation of a convergence engine for determining the optimal filter coefficients [26]. As the algorithm converges, the tap coefficients reach their optimum values. FIR filter optimisation was developed using MATLAB<sup>2</sup> [26]. A restriction of most FIR filter pre-emphasis transmitters is that the coefficients of the FIR filter have been designed to be fixed or externally adjustable [1], [10], [14], [26]. Adaptive pre-emphasis is being sought after to automatically adjust filter tap coefficients to provide an easy to use system with high data integrity. The FIR filter should be adjusted to produce the optimal tap coefficients in order to provide an open eye diagram at the far end of the link. An adaptation method is produced [16] in which the receiver feeds back a control signal to adjust the FIR pre-emphasis filter tap coefficients. A problem faced with previous designs is sampling the data at the receiver. Since the data is used to adjust the tap coefficients any errors before tap coefficient convergence can result in error propagation. Initial tap coefficients can be calculated with the use of predetermined training sequences, each tap \_ <sup>&</sup>lt;sup>2</sup> Matlab and Simulink from Mathworks, http://www.mathworks.com having its own training sequence. The complete training sequence length is equal to the amount of FIR taps. Mathematical verification and simulation of adaptive FIR preemphasis is presented [16] leaving room for improvement. Adaptive pre-emphasis circuitry can significantly reduce ISI as well as reduce the long setup times associated with determining the optimal tap coefficients of the FIR pre-emphasis filter. # 2.5.4 Adaptive pre-emphasis techniques The FIR filter response is ideally the perfect inverse of the complete channel response in order to obtain a perfectly flat magnitude response and a linear phase response (from Equation 2.7). This results in a perfectly open eye diagram at the receiver. It is however not possible to obtain the perfect inverse since only a finite amount of FIR filter taps can be used [43]. The achieved filter frequency response will hence flatten off at high frequencies [11]. Thus the -3 dB cut-off frequency is only extended. Using adaptive preemphasis, the technique utilised has to converge to the final tap coefficient values in order to extend the -3dB cut-off frequency and improve data integrity. Figure 2.11 illustrates a typical eye diagram taken at the receiver with no pre-emphasis applied. Fig. 2.11. A typical distorted eye diagram at the receiver [45]. Data rate is 5 Gb/s. There are two main methods of adaptive pre-emphasis FIR filtering namely, LMS convergence engine [46], [47] and pilot signalling and peak detection, the latter of which is studied in this research. [16] # LMS convergence engine Utilising an LMS convergence engine is a popular technique of implementing adaptive FIR pre-emphasis. The filter tap coefficients are updated after every sample in order to minimise the mean square error (MSE). The MSE is calculated between the desired signal and the distorted input signal, hence knowledge of the desired and distorted signal is a prerequisite. Adaptive receiver equalisers, requiring the exact same signal information, are easier to implement since the desired data can be estimated by passing the distorted data through CDR circuitry. Implementing an LMS convergence engine for adaptive filtering in the transmitter is a more difficult task to accomplish since the distorted data is not present at the transmitter. This is however overcome by utilising two separate transceivers connected in master-slave fashion. The master-slave structure uses the LMS convergence engine of the second receiver to update the filter tap coefficients of the first pre-emphasis filter [46], [47]. The signum-signum LMS engine is popular for its ease of implementation using digital circuits. The tap update algorithm for a sign-sign LMS engine can be expressed as $$h_k(j+1) = h_k(j) + \operatorname{sgn}\left(\sum_{i=0}^{L-1} \operatorname{sgn}(e_{jL-1})\operatorname{sgn}(d_{jL-i-k})\right)$$ (2.10) where $h_k(j)$ is the $k^{th}$ tap coefficient for block j using a block length of L, e is the error signal and d is the data signal. The sign-sign LMS engine is easily implemented using a digital round robin adaptation engine [46], [47]. The LMS convergence engine continues to update the filter tap coefficients and continues to converge around the optimal tap coefficients. Implementing an LMS convergence technique using a master-slave structure does pose some additional design problems for the systems designer. In order to accurately implement the adaptive pre-emphasis, two physically and electrically identical high frequency paths are needed. This can be hard to accomplish since signal reflection can be 36 hard to predict. The other fundamental restriction of such a technique is that two transceivers are always needed, increasing implementation cost. ## Pilot signalling and peak detection Filter tap coefficients can also be determined by applying simple combinations of pilot signalling and peak detection [16]. By expressing the channel impulse response as $$y_{chn}(n) = \sum_{k=0}^{N-1} c_k \delta(n-k)$$ (2.11) where $c_k$ is the equivalent taps of the channel impulse response, the received signal amplitude at sample times at the far end can now be approximated as the convolution of the channel impulse response (2.11), the FIR filter impulse response (2.9), and the applied data. The determined received signal usually exhibits a long tail causing severe ISI. The filter tap coefficients can now be determined by minimising the tail of the received signal [16]. Each tap coefficient is updated by applying different training sequences to the input. Such a tap coefficient training method requires however that data transfer be temporarily stopped to allow for tap weight determination whereas an LMS convergence engine can adapt the FIR filter coefficients continuously. Piloting signalling and peak detection on the other hand is easier to implement and does not require multiple transceivers to calculate optimal tap coefficients. Filter updates can also happen by means of a lower frequency return path since the update frequency is solely dependent on the pilot signal which is user definable. Pilot signalling and peak detection is further discussed in Chapter 4. #### 2.5.5 FIR filter implementation There are various FIR filter implementation structures of which the most popular is the transversal structure as illustrated in Figure 2.12. The structure is popular for its ability to have a linear phase response, but most importantly for its ease of implementation. Figure 2.12. Transversal FIR filter implementation structure. The transversal structure is easily implemented with the use of fast CML circuits [48]. A CML circuit has the same structure as the more conventional ECL circuits but without the final stage emitter follower [49]. The transversal structure for implementing a FIR filter utilising CML circuits is illustrated in Figure 2.13. Figure 2.13. CML implementation of the FIR filter. The CML circuit as shown above utilises tail current sources, while the transistors switch to whichever branch is chosen depending on the input data. As expected, CML is a full differential signalling technique utilising the ON/OFF switching of the transistors [49], [50], [51]. It is however not always fully possible to switch exactly ON/OFF hence popular switching depicts 90 / 10 % switching. CML relies on the fact that a capacitive load can be charged in a shorter amount of time by either reducing the voltage swing or by increasing the output current drive capability [51]. Conventional CMOS logic swings between the supply rails and the current driving capability is controlled by the transistor aspect ratios. CML on the other hand gives the designer direct control over the current driving capability and voltage swing. The trade-off thus exists between increasing the power consumption and decreasing the voltage swing. This can be seen be looking at the current-voltage relationship of a capacitor: $$I = C \frac{dV}{dt}$$ $$= C \frac{\Delta V}{\Delta t}$$ (2.12) where C is the capacitance and t the time to charge the capacitor to a certain value. Rewriting the equation, it is clear that to improve the charge and discharge time of the capacitor the current drive needs to be increased or the voltage swing needs to be reduced. #### 2.6 Conclusion Jitter significantly degrades the performance of a high speed serial link by closing the eye diagram at the far end of the link. Jitter requirements and specifications are becoming ever more stringent due to the quest for higher data transfer rates and essentially higher bandwidth. Jitter is composed of sub-components each with a different statistical distribution and PDF. DDJ, a further subcomponent of DJ, produce the most significant distortion in high speed serial links over copper backplane channels. Backplane channels have a limited bandwidth due to the skin effect and dielectric loss. The limited bandwidth of a backplane channel cause significant ISI. The effective pulse sequence thus has an effect on the magnitude and phase distortion of the received signal. Overcoming DDJ and more specifically ISI, is a task of utmost importance. The ideal channel magnitude response should be flat with a linear phase response. This means the effective distortion of the signal through the channel is zero. Equalisation and preemphasis should complement each other in obtaining an optimal solution with regards to cost and performance [52]. Adaptive pre-emphasis holds great promise in reducing long setup times and in obtaining optimised filter tap coefficients. A further advantage of adaptive FIR pre-emphasis lies in its ability to adapt to any chosen channel model. The challenge associated with adaptive FIR pre-emphasis is the feedback from the receiver back to the transmitter. The feedback loop can however have a lower fundamental frequency since the filter tap coefficients do not have to be updated at the same rate as the data. Adaptive pre-emphasis holds great promise in increasing the effective data transfer rates of high speed serial communication links by providing an optimal solution for the filter tap coefficients for reducing DDJ produced by the bandwidth limited channel. # **CHAPTER 3: RESEARCH METHODOLOGY** #### 3.1 Introduction This chapter introduces the methodology followed in gathering data and testing the hypothesis under question. The adaptive FIR pre-emphasis filter is implemented in the IBM 7WL 0.18 µm SiGe BiCMOS process [53], [54]. This chapter includes some detail of the technology used to simulate, prototype and to experimentally test the system. This includes the software packages used throughout the course of this research. # 3.2 Justification for the paradigm and methodology External adjustable FIR filters have been widely implemented [1], [10], [14], [26] through which the filter coefficients can be adjusted with the application of a voltage or a current. The filter tap coefficients still need to be optimised with regards to the eye diagram at the far end of the channel. MATLAB has been utilised as a software package to study the mathematical working and design of the system and the results are presented in Chapter 4. Adaptive FIR filter pre-emphasis as presented completely eliminates the long and manual setup times required to find the optimal tap coefficients. The optimal tap coefficients can also be adjusted to compensate for time variant conditions such as temperature which is not compensated for by conventional FIR filter pre-emphasis implementations. The adaptive FIR pre-emphasis technique employed also has the ability to adapt for any ringing that might occur on the backplane channel, since the adaptation of the FIR filter coefficients is result-driven. The coefficient adaptation can be aptly named result-driven since the adaptation process occurs based on the signal received at the far end of the link. ICs are the preferred way of implementing very high speed circuits, particularly if they are application specific. Although this research can be incorporated in to larger design due to the relatively small chip real estate that it spans, the focus of this research is on improving off-chip bandwidth in an attempt to minimise the DDJ present in the system. This bandwidth improvement by means of pre-emphasis is only one way of alleviating the ever increasing need for off-chip bandwidth [55]. #### 3.3 Research methodology and outline The following list contains the research procedures followed in designing the integrated circuit used for gathering the data and testing the hypothesis: - *Theoretical background* Thorough literature study on the effects of jitter on high speed serial links as well as the testing of jitter specifications. Theoretical study on the technology implemented, process steps involved, fabrication limits and fabrication recommendations. - Mathematical design Mathematical modelling of adaptive FIR filters utilising MATLAB. The mathematic design was also used as a conceptual design of the adaptation technique proposed in this research. - Software aided design In addition to the mathematical modelling and design, SPICE is used extensively to simulate and optimise the design. The particular software packages that were used are listed in Section 3.4. - Low level design IC design using the 0.18 µm SiGe process from IBM. This includes layout of the IC, layout optimisation and post layout simulations to verify the functionality of the IC. The low level design will be done using Cadence design systems' Virtuoso<sup>3</sup> software. To test the hypothesis under question, post layout simulations in conjunction with SPICE simulations were used extensively to verify the working of the IC before prototyping. The manufactured IC will be thoroughly tested on different kinds of copper channels each varying in terms of length and substrate type. Previous designs indicated an open eye diagram at the far end of a 34" FR4 backplane channel [1]. The proposed design and preemphasis implementation technique will be compared to previous designs by comparing eye diagrams at the receiving end of the serial link at various data rates. The research methodology followed in this dissertation is graphically illustrated in Figure 3.1. . <sup>&</sup>lt;sup>3</sup> www.cadence.com Figure 3.1. Flow diagram of the integrated design methodology followed to validate the hypothesis under consideration. With the prototype manufactured, bonded and soldered to a PCB, further validation of the hypothesis under question could be achieved. The experimental results, aimed at further validating the hypothesis posed, is presented in Chapter 7. #### 3.4 Modelling, simulation and layout design The several tools available in the Cadence Virtuoso package that were extensively used, as well as their functionality are presented in Table IV. Cadence Design Systems has been chosen as the electronic design automation (EDA) software of choice for this research. Cadence, as a company, has a vast experience of 20 years in the field of electronic design and EDA software. Cadence Design Systems is the world's leading EDA company, providing tools for overcoming a vast range of technical and economic hurdles. #### TOOLS OF CADENCE VIRTUOSO USED IN THE DESIGN AS WELL AS THEIR FUNCTIONALITY. | Cadence Virtuoso v5.1.4.1 | | | |-----------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--| | Package name | Functionality | | | Virtuoso Schematic Composer | Graphical design environment used for schematic level circuit entry and design [56] | | | Virtuoso Analog Design Environment (ADE) with Spectre and Spectre RF Circuit Simulator | SPICE-based circuit level simulator [57] | | | Virtuoso Layout Editor with Diva/ Dracula DRC and Layout generation and design [58] LVS | | | | Assura design rule check (DRC) and layout versus schematic (LVS) | Assura was the preferred engine used in order to run LVS and DRC, as specified by MOSIS for its thoroughness and success rate. | | #### 3.4.1 Mathematical modelling The mathematical modelling was done using MATLAB, as mentioned earlier. The full MATLAB script is provided in Appendix A. The script was developed and written with two outcomes in mind, firstly as a proof of concept and secondly to serve as mathematical design of the implemented system. #### 3.4.2 Circuit level modelling and simulation The modelling and simulation were done using Virtuoso schematic editor and ADE. The system was firstly conceptually designed using mathematic simulation as discussed earlier. The conceptualised design was transformed into a circuit by instancing basic components from the foundry library. The IBM 7WL library made extensive use of parameterised cells (in short p-cells) giving the designer an extra degree of freedom and simulation accuracy during the design. P-cells define a relationship between various interdependent components, such as an increase in the width of an NMOS transistor will increase the drain-substrate capacitance. This allows the instanced component to incorporate non-idealities in the transistor model in the schematic design. Hence, an extracted net list from layout, incorporating the parasitic components, should provide the same simulation results already on a schematic level. ADE, which is a SPICE-based simulator, was used to create a net list of the entered schematic. The net list created from the schematic is simulated using Spectre. All simulations presented in this thesis were done with post layout extracted netlists in order to achieve accurate results through simulation. All type of simulations can be run using ADE, such as DC operating point analysis, transient analysis, frequency domain analysis and S-parameter analysis. The simulated output waveforms are easily viewed using the Wavescan waveform viewer incorporated within ADE. # 3.4.3 Layout design and verification This next step in the design process is to move from schematic design to layout design. This step could only be taken once satisfactorily results were obtained from the circuit SPICE-based simulations. At this level all the components used in the design such as transistors, capacitors and resistors are represented by the relevant shapes and geometries representing the actual components made during the fabrication process. The now drawn shapes should adhere to foundry design rules in order for the foundry to ensure that the given layout can be manufactured within their fabrication limits. These fabrication limits for example is set by the attainable lithography resolution. The layout is sent through rigorous checks such as DRC which checks that the given layout design adheres to the foundry rules and limits. The second verification step is called LVS, and as the name suggests checks whether the drawn layout corresponds to the representing schematic. This is an important verification step since in large designs incorporating logic circuits, a connection net could easily be missed, resulting in a faulty IC. With LVS, a netlist is created from the drawn geometries, resulting in a more accurate depiction of the actual circuit. Hence the generated netlist from this post layout simulation is used in SPICE verification. The transmitter and the receiver were completely separated during layout, each having their own supply rails. Both the transmitter and the receiver were encircled in their own guard ring to ensure that the two subsystems do not interfere with each other. The digital CMOS block were also physically separated from the CML circuits to avoid additional switching noise from coupling to the CML circuits. #### 3.5 SPICE models used in the design The SPICE models used was provided by the foundry, through MOSIS which provides access to fabrication of prototype and low volume ICs, and since HBTs and regular MOS transistors are available, it makes sense to mention the models provided by the foundry. Detailed process parameters are not mentioned due to the proprietary nature of such information. The NPN transistor model that was used is the vertical bipolar inter-company (VBIC) model and has built-in support for some of the following features [53]: - parasitic vertical PNP to substrate, - weak avalanche multiplication, - self-heating approximation $(dV_{be}/dT)$ , - fixed oxide capacitances for the emitter-base and collector-base junctions, - quasi-saturation modelling, and - improved Early effect modelling (compared to the standard Gummel-Poon model) The MOS transistor model used is the BSIM3v3.2.4 intrinsic model and includes the following features [53]: - a user defined layout dependant extrinsic substrate resistance term, - use of BSIM3v3 thermal and 1/f noise equations, - modelling of impact ionisation, and - device mismatch as a function of threshold voltage and mobility mismatch. These models were provided by the foundry and are implemented in a sub-circuit fashion, utilising p-cells, in order to incorporate device mismatches and source/drain diffusion geometries in the SPICE simulation. Some simulations were also performed with the high $f_T$ IBM 8HP SiGe BiCMOS process. More detail to the IBM 8HP process can be found in [59], [60]. ## 3.6 Packaging and experimental measurement equipment The IC dies were packaged in a 64-pin quad flat no-lead (QFN) package since on-chip measuring is hard to come by and difficult to use. The adaptive FIR pre-emphasis system was packaged in a 64-pin package since this project formed part of a MPW run which incorporates two other designs. The actual system only used 21 pads, 13 for the transmitter (which has 4 added pins for extra testability) and 8 pads for the receiver, to achieve the functioning system. In total 15 dies were received and packaged using the QFN package suitable for this design. A further 25 dies were left unpackaged in the case where on-chip measurement may be needed. A photo of the 64-pin QFN package is shown in Figure 3.2. Figure 3.2. Picture of a 64-pin QFN package as is used in this research. The outer dimensions of the 64-pin QFN package are 7 mm x 7 mm x 1.4 mm. A typical bond wire length is between 1 and 2 mm which results in a typical bond wire inductance of between 1 and 2 nH. The bond pad capacitance can be determined from the IBM 7WL model guide for a bond pad of 114 $\mu$ m x 114 $\mu$ m. The chip parasitic components were fully modelled, together with a typical encountered channel, as is presented in Chapter 4. One of the test instruments that is available for possible testing during the experimental testing of the IC is the Agilent E440A PSA spectrum analyser. A picture of the spectrum analyser is shown in Figure 3.3. The spectrum analyser is available at the CSIR. It has a frequency range of 3 Hz to 26.5 GHz [61], [62]. This can be utilised to view and examine the ISI of the implemented system under random data transmission conditions, giving a feel for the expected DDJ spawning from it. Figure 3.3. Photo of the Agilent E4440A PSA high performance spectrum analyser [61], [62]. Eye diagram analysis will be performed with the use of a 12.5 GHz sampling oscilloscope with eye diagram and statistical information being derived from the oscilloscope data. The high speed oscilloscope is also available for use at the CSIR. ## 3.7 Measurement setup The measurement setup for the device under test (DUT) is shown in Figure 3.4. The implemented and packaged IC will be placed on a custom made PCB for easy testing. The experimental results according to this test setup are presented in Chapter 7. The different types of copper backplane channels will be connected with Sub-Miniature version A (SMA) connectors and cables should the transmitter be fully functional. The different channels will each have a different length with SMA connectors on either side for interconnection. The transmitter and receiver of the same IC die will be used by looping back the transmitted signals to the receiver after passing it through the channel connected via the SMA connectors. The internal circuits are biased using external resistors (values specified in Appendix C) for easier testability as well as circuit fine tuning. Initial design values were populated on the PCB for ease of comparison. The clock and data inputs will both be high frequency inputs, hence SMA connectors is used to connect to a function generator. The SMA connectors utilised are specified to have a -3 dB of at least 5 GHz, however it is strongly dependent on the PCB implementation. The RESET pin will initially be a low voltage, and then reset with a rising edge and kept high for the duration of the test. The Department of Electrical, Electronic & Computer Engineering AUTO and EOC\_user inputs are specified in Appendix C. This aims to verify the serial link transceiver as well as the adaptation process at the data rate specified by the clock input signal. Figure 3.4. Measurement setup for the DUT. The same die is used for both the transmitter and receiver utilising loop back connections. The differentially transmitted data, <code>Dout+</code> and <code>Dout-</code>, are looped back to the differential input of the receiver, <code>Din+</code> and <code>Din-</code>. The data is sent to receiver and should the differential voltage be larger than the set threshold value, pulses are generated on the two low frequency return paths, <code>SHIFT</code> and <code>ADJUST</code>, changing the transmitter state as necessary for adaptation. The initial testing entails the main functionality test, by forcing the control inputs to take on various values to completely move through the states of the transmitter. The receiver is then connected to the transmitter and the pulses generated will be monitored for operation as designed. Following the success of the first two tests, speed tests will be performed on the system with the varying copper channels attached. The results are based on eye diagrams, in order to acquire the statistical information located at the pulse edges of the eye. This statistical information can thereafter be used for DDJ analysis, as validation for the hypothesis, and de-embedding of the RJ components introduced by function generators attached to CLKin and DATin. From the statistical information in the vertical eye crossing at an optimal sampling instant, a statistical optimal BER can be calculated based on quality factor theory. Quality factor theory is based on SNR approach at the optimal sampling instant, using an optimal threshold voltage resulting in the optimal BER of the system. #### 3.8 Conclusion Chapter 3 discussed the research methodology followed in this dissertation. A detailed overview if the research methodology was presented followed by an in-depth look at the tools and technologies used during the design process. The SPICE models used during this research were presented as well as the measurement equipment which will be used during the experimental testing of the fabricated IC. # 4.1 Chapter organisation This chapter describes the mathematical modelling and systems design of the pilot signalling and peak detection method of applying adaptive FIR pre-emphasis for implementation in high speed serial links. Chapter 5 follows on this chapter with detailed SPICE simulations. The chapter covers only the most important aspects of the design, since all other aspects of the design is captured in the literature review presented in Chapter 2. # 4.2 Mathematical modelling of channel response The package parasitic components as well as the actual copper backplane channel have a band limited frequency response attenuating the high frequency components of the transmitted data signal. In order to overcome or alleviate the problem of increased DDJ caused by the band limited channel, the inverse of the channel response should ideally be applied to the signal before transmission. The band limited channel response is characterised in this section of the dissertation. #### 4.2.1 Package parasitic modelling The package parasitic components and the response of the implemented package strongly depend on the type of IC package used. Two high frequency IC packages were considered in validation of the hypothesis. These include ball grid arrays (BGAs) and QFN packages. Wire bond QFN packages are preferred for high frequency application below 40 GHz where cost is a consideration [63]. Wire bonding is also preferred over flip-chip packaging for its higher yield and robustness with varying temperatures [63]. The mounting of an IC die within a QFN package is shown in Figure 4.1. Figure 4.1. Mounting of a package within a QFN package utilising bond wires. The combination of the bonding wire, I/O pad capacitance and package capacitance introduce a pole in the frequency response causing the frequency dependant distortion. A typical model for determining the package parasitic is illustrated in Figure 4.2. The model as shown in Figure 4.2 also incorporates mutual inductance caused by adjacent bonding wires terminated with matching 50 $\Omega$ resistors. Figure 4.2. Typical package parasitic modelling. Figure adapted from [63]. The inductance for a 1 mil (25.4 µm) thick gold bond wire is estimated as 1 nH/mm [63]. The capacitance I/O pad capacitance can be calculated depending on the physical size of Department of Electrical, Electronic & Computer Engineering 52 University of Pretoria the bond pad. The bond pad capacitance range between 45 fF and 55 fF for a bond pad size ranging from 75 $\mu$ m x 75 $\mu$ m to 115 $\mu$ m x 115 $\mu$ m [45]. The PCB capacitance to ground plane was estimated to be of the same order of magnitude as the on-chip I/O pad capacitance [63]. The pole inserted in the frequency is not only dependant on the reactive components but also on the termination and transmission line impedance. For this implementation, a 50 $\Omega$ system is used, and hence the transmitter is terminated with a 50 $\Omega$ resistor. The transmitter is terminated on-chip to avoid unnecessary peaking in the frequency response. It is also important to note that the actual frequency response of the package will be worse since the same parasitic components, with the same frequency response, will be encountered at the receiver. The package frequency response using a 2 mm 1 mil thick bond wire and 50 fF I/O pad and PCB to ground plane capacitances is depicted in Figure 4.3. Figure 4.3. Frequency response of a typical QFN package. Adapted from [63]. The frequency response as shown, exhibits a -3 dB cut-off frequency of around 8 GHz. This distortion becomes even more severe when both the transmitter and receiver chip parasitic components are taken into account. # 4.2.2 Copper backplane channel As discussed in Chapter 2, the copper channel loss can be attributed to three factors namely the skin effect, the dielectric loss and signal reflections. Signal reflections are hard to characterise [24], but careful transmission line design can overcome the majority of such reflections. These reflections are primarily caused by impedance mismatches and transmission line discontinuities. The remaining copper channel loss is thus a summation of the skin effect loss and the dielectric loss and can be expressed as: $$\alpha_{tot} = \alpha_c + \alpha_d \tag{4.1}$$ where $\alpha_c$ is the conductor loss due to the skin effect and $\alpha_d$ is the dielectric loss. The copper conductor loss can be expressed as [64], [65]: $$\alpha_c = 8.686 \frac{R_{skin}}{Z_o w} \tag{4.2}$$ where $$R_{skin} = \frac{\sqrt{\pi f \mu \sigma}}{\sigma} \tag{4.3}$$ and $Z_o$ is the transmission line impedance, w the transmission line width, f the operating frequency, $\mu$ the magnetic permeability and $\sigma$ the conductivity of the transmission line. From Equations 4.2 and 4.3 it can be seen that the skin effect causes a frequency dependant resistor proportional to the square root of the frequency. The skin effect is stated as the effect by which current tend to flow closer to the surface of a conductor at higher frequencies. This increases the resistance of the copper conductor as expected. The dielectric loss as stated in Chapter 2, is due to the delay of polarisation in the dielectric material when subjected to changing electric fields. This can be expressed as [65]: $$\alpha_{d} = \frac{\pi f}{c} \frac{\left(\varepsilon_{eff} - 1\right)}{\sqrt{\varepsilon_{eff}}} \frac{\varepsilon_{r}}{\left(\varepsilon_{r} - 1\right)} \tan \delta_{t}$$ (4.4) where $\varepsilon_{eff}$ is the effective dielectric constant, $\varepsilon_r$ is the actual dielectric constant, c is the speed of light and $\tan \delta_t$ is the loss tangent of the conductor. The dielectric loss is directly related to frequency, but fortunately only contributes to the conductor loss at frequencies in the excess of a few tens of GHz. Figure 4.4 illustrates an example of the conductor loss dominating the dielectric loss up to a certain point where the dielectric loss becomes dominant. The parameters used for the simulation are as follows: an Alumina substrate with dielectric constant of 3.43 (FR-4), a dielectric thickness of 1 mm, a conductor thickness of 35 $\mu$ m, conductor width of 2.25 mm (for a 50 $\Omega$ transmission line) and a loss tangent of 0.0005. Figure 4.4. Channel loss contribution as a result of conductor loss and dielectric loss. From Figure 4.4 it can be seen that the total loss is dominated by the conductor loss for implementations below 10 GHz. Hence the dielectric loss can be ignored in this research. # 4.2.3 Complete channel response The complete channel response can be attained by combining the copper channel loss with the package parasitic components. The channel loss can electrically be estimated as a first order low pass filter with a -3 dB cut-off frequency of 400 MHz. The modelled responses are shown in Figure 4.5. Figure 4.5. Modelled response of a typical channel. The typical channel model has been derived using lumped components. The simulated response with a loss of approximately 50 dB at 10 GHz as shown in Figure 4.5, compares well to popular FR-4 implemented copper channels [16], [40]. The backplane channel can however be plated with gold to reduce the conductor loss term but this results in a significant increase in implementation cost. The impulse response of the modelled channel is shown in Figure 4.6. Figure 4.6. Impulse response of the modelled channel showing the long tail characteristic of the capacitive component [64]. The long tail as illustrated in the impulse response of Figure 4.6 interferes directly with any adjacent transmitted bits. This distortion is strongly data dependant and is discussed in the following sections. # 4.3 Mathematical modelling of adaptive FIR filter implementation The adaptive pre-emphasis technique employed in this research is the pilot signalling and peak detection method as introduced in Chapter 2. This technique has a few significant advantages as will become clear later on in this chapter. These advantages include: - simpler receiver design, - no need for accurate sampling at the receiver to determine optimum filter tap coefficients, - easy implementable control logic at the transmitter end, and - no high frequency return path is needed. These advantages are in direct contradiction to the implementation of an LMS adaptation engine, requiring a complex receiver design incorporating accurate sampling for tap adaptation and complex control logic in the transmitter for implementing the LMS convergence engine. The LMS adaptation technique also requires a secondary high frequency return path, for the master-slave architecture, complicating the discrete implementation of a chip with such an architecture. Further, both high frequency paths should strive to have a similar frequency response, which by its nature can be difficult to attain. The rest of this section is dedicated to a qualitative and quantitative discussion on the working of the pilot signalling and peak detection method of adaptive FIR pre-emphasis on high speed serial links. #### 4.3.1 Qualitative description of the implemented adaptation scheme The pilot signalling and peak detection method works on the basis that by transmitting a pilot signal corresponding to a specific filter tap, the received peak voltage can be compared to an ideal received voltage [16], hence a decision can be made regarding the specific filter tap coefficient. The filter tap coefficient corresponding to the pilot signal transmitted can thus be adjusted until the received voltage is equal to the ideal voltage, which will result in an optimal filter tap coefficient. Figure 4.7 illustrates the pilot signalling and peak detection process by means of a flowchart. To start the adaptation process as shown in Figure 4.7, all filter taps are initialized to zero. A recursive loop is then entered in which the filter taps are adjusted. The first filter tap is adjusted to be a maximum, thus a value larger than the ideal received value will be detected at the receiver when transmitting its corresponding pilot signal. It is important to ensure that the initial received pulse is always larger than the ideal value. This is ensured by initialising the filter coefficient to a maximum value before adjustment. The received value (expected to be larger than the ideal value) is compared to the ideal pre-set (or user adjustable) value to produce an error value. The filter tap coefficient can then be decreased until the detected peak value falls below the ideal value (error > 0). Figure 4.7. Flow diagram of the pilot signalling and peak detection method of implementing adaptive FIR pre-emphasis. The first four pilot signals corresponding to the first four filter taps are illustrated on the right hand side. A low frequency return path communicates with the transmitter to count down the tap coefficient value until an end-of-conversion (EOC) signal is received. The next filter tap is then initialized to be a maximum, while keeping the previous tap at its determined value, and using the same procedure the optimal filter tap coefficient is received. This process continues until all the optimal filter tap coefficients have been obtained. #### 4.3.2 Quantitative description of the implemented adaptation scheme The adaptive FIR filter attempts to find the exact inverse frequency response than that of the channel. In order to find the optimal filter taps for improvement of the frequency response, the time domain characteristics are used. The channel and the filter can be represented by their impulse responses, with the assumption that the channel can be modelled as an LTI system. The channel impulse response can be expressed as $$I_{CHN}(n) = \sum_{k=0}^{N} c_k \delta(n-k)$$ (4.5) where $c_k$ is the effective channel impulse response coefficients and $\delta$ is the impulse. For the scenario of a low pass filter channel response, as is the case for copper backplane channels, the first channel coefficient will always be the largest. This is used in the determination of the first filter tap. The first filter tap is henceforth used in determining the following tap and the iterative process continues. The FIR filter can also be represented by its impulse response as $$I_{FIR}(n) = \sum_{k=0}^{N} h_k \delta(n-k)$$ (4.6) where $h_k$ is the filter tap coefficients and $\delta$ is the impulse. Both of these expressions represent time domain equations. The total response is thus easy to calculate as the convolution of the channel impulse response and the FIR filter impulse response. The expression for the convolution is as follows: $$y(n) = I_{CHN} \otimes I_{FIR} \tag{4.7}$$ where $I_{CHN}$ is the channel impulse response and $I_{FIR}$ is the FIR filter impulse response. Equation 4.7 fully characterises the received signal. The received sequence in terms of the FIR filter impulse response coefficients and the channel impulse response coefficients can be expressed as: $$y(n) = \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4, h_5c_0 + h_4c_1 + h_3c_2 + h_2c_3 + h_1c_4 + h_0c_5 \end{bmatrix}$$ (4.8) where y(n) is the received sequence in terms of transmitted data of 100000. This data sequence will result in only a single filter tap being active at a time. For a FIR filter implementation more than one filter tap will be active at a time. This is also a certainty considering data is usually sent through an 8B/10B encoder which ensures no long runs of 0s or 1s (less than 5). The final received data can thus easily be calculated by taking the convolution of the FIR filter impulse response, channel impulse response and the specific data sequence. This can be expressed as: $$y_{FINAL}(n) = I_{CHN} \otimes I_{FIR} \otimes D_{sequence}$$ $$= y(n) \otimes D_{sequence}$$ (4.9) where $y_{FINAL}$ is the final received data and $D_{sequence}$ is the transmitted data sequence. The data sequences used for the adaptation process can now be used in evaluation of the pilot signalling and peak detection method of adaptive FIR pre-emphasis. Table V shows the training or pilot signals used and the corresponding filter tap being adjusted. TABLE V. PILOT SIGNALS USED AND THEIR CORRESPONDING FILTER TAPS. | Pilot signal | Data sequence | Filter tap No. | |--------------|---------------|----------------| | 1 | 100000 | Tap 1 | | 2 | 110000 | Tap 2 | | 3 | 101000 | Tap 3 | | 4 | 100100 | Tap 4 | | 5 | 100010 | Tap 5 | | 6 | 100001 | Tap 6 | Upon adjustment the following data sequences are received for the first three pilot signals transmitted. It is important to note that the filter taps are initialised to a maximum value and decreased until the ideal value is reached. The filter taps are further inactive if not yet changed to its ideal tap coefficient. The received data for the first three pilot signals can be expressed as: $$y_{FINAL}(0) = [h_0c_0]$$ $$y_{FINAL}(1) = [h_0c_0, h_0c_0 + h_1c_0 + h_0c_1, h_1c_0 + h_0c_1]$$ $$= [h_0c_0, h_0(c_0 + c_1) + h_1c_0, h_1c_0 + h_0c_1]$$ $$y_{FINAL}(2) = \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_0c_0 + h_2c_0 + h_1c_1 + h_0c_2, h_1c_0 + h_0c_1, \\ h_2c_0 + h_1c_1 + h_0c_2 \end{bmatrix}$$ $$= \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_0(c_0 + c_2) + h_2c_0 + h_1c_1, h_1c_0 + h_0c_1, \\ h_2c_0 + h_1c_1 + h_0c_2 \end{bmatrix}$$ $$(4.10)$$ For the first pilot signal transmitted, filter taps 1-5 are inactive (coefficient = 0), and only tap 0 is active with the coefficient initialised to a maximum. Since the channel has a low pass filter response the first channel impulse response coefficient is a maximum value. The maximum value of the filter taps should be carefully chosen such that the initial received value is larger than the user-set ideal value. The FIR filter tap is then decreased by one least significant bit (LSB) until the first term falls below the ideal value. The filter tap is only decreased by one LSB at a time, in between transmitting the same pilot signal again. As soon as the received value falls below the ideal value, the EOC signal is sent resulting in the second filter tap being made active and initialised to a maximum. The second pilot signal is then transmitted. For the second pilot signal transmitted, three pulses will be received at the receiver. The first pulse, being solely represented by the first filter tap, results in a received value below the ideal value. The third pulse is solely represented by the second filter tap and will result in a value larger than the ideal value. Hence the filter tap will be decreased by one LSB. The second pulse term however is the addition of the first filter tap and the second filter tap, resulting in a very large value, far exceeding the ideal value. From the second term, it is possible by inspection to expect the second filter tap to be negative and forcing the long tail of the channel impulse response shorter. The third term will at some stage go below the ideal value, while the second term is still above the ideal value. The tap will thus continue to be decreased by one LSB until all pulses go below the ideal value. Hence for the second pilot signal, the most dominant term is the second pulse term controlled solely by adjusting the second filter tap coefficient, since the first filter tap coefficient has been fixed. For the third pilot signal transmitted the third term will be dominant again, and the corresponding filter tap can be adjusted until all values fall below the ideal threshold value at the receiver. It can be noted that the ideal threshold value set at the receiver determines the vertical eye amplitude at the receiver. Hence the vertical eye amplitude can be altered by changing the ideal threshold value at the receiver. The fourth to sixth pilot signal received values are expressed in Equation 4.11 to 4.13. $$y_{FINAL}(3) = \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3 + h_0c_0, \\ h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3 \end{bmatrix}$$ $$= \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0(c_3 + c_0), \\ h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3 \end{bmatrix}$$ $$(4.11)$$ $$y_{FINAL}(4) = \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4 + h_0c_0, h_1c_0 + h_0c_1, \\ h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4 \end{bmatrix}$$ $$= \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0(c_4 + c_0), h_1c_0 + h_0c_1, \\ h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4 \end{bmatrix}$$ $$(4.12)$$ $$y_{FINAL}(5) = \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4, \\ h_5c_0 + h_4c_1 + h_3c_2 + h_2c_3 + h_1c_4 + h_0c_5 + h_0c_0, \\ h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4, \\ h_5c_0 + h_4c_1 + h_3c_2 + h_2c_3 + h_1c_4 + h_0c_5 \end{bmatrix}$$ $$= \begin{bmatrix} h_0c_0, h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4, \\ h_5c_0 + h_4c_1 + h_3c_2 + h_2c_3 + h_1c_4 + h_0(c_5 + c_0), \\ h_1c_0 + h_0c_1, h_2c_0 + h_1c_1 + h_0c_2, h_3c_0 + h_2c_1 + h_1c_2 + h_0c_3, \\ h_4c_0 + h_3c_1 + h_2c_2 + h_1c_3 + h_0c_4, \\ h_5c_0 + h_4c_1 + h_3c_2 + h_2c_3 + h_1c_4 + h_0c_5 \end{bmatrix}$$ $$(4.13)$$ #### 4.3.3 Mathematical simulation results In order to verify the working of the pilot signalling and peak detection method of adaptive FIR pre-emphasis before implementation on a circuit level, thorough mathematical simulations were run. These simulations are presented here. The most common method for testing of a communication system for performance and data integrity is to inspect the eye diagram of the data at the receiver. The eye diagrams for FIR pre-emphasis has a strong dependence on the amount of FIR filter taps implemented. The eye diagrams for the adaptive pre-emphasis technique exploited in this research are presented in Figure 4.8 and Figure 4.9. Figure 4.8. Eye diagrams at the receiver with pre-emphasis applied. (a) 1-tap pre-emphasis applied. (b) 2-tap pre-emphasis applied. (c) 3-tap pre-emphasis applied. (d) 4-tap pre-emphasis applied. As seen in Figure 4.8 and Figure 4.9, the eye diagram significantly improves with the implementation of the first four filter taps. This is due to the majority of the frequency response created by the actual FIR filter being contributed by the first few filter taps. The latter filter taps still improve the eye diagram by further minimising the amount of distortion around the pulse edges. The distortion around the pulse edges, as discussed in Chapter 2, is caused by the band limited channel over which the data is transmitted. It can also be noted that there is no change in the eye diagram between using 4 or 5 filter taps, and between using 7 or 8 filter taps. The 5<sup>th</sup> and 8<sup>th</sup> filter taps' ideal value is close to zero, resulting in no significant change to the frequency response, hence no change to the signal in the time domain. These results should be compared to Figure 5.12 and Figure 5.14 which contain the SPICE simulation results. Figure 4.9. Eye diagrams at the receiver with pre-emphasis applied. (a) 5-tap pre-emphasis applied. (b) 6-tap pre-emphasis applied. (c) 7-tap pre-emphasis applied. (d) 8-tap pre-emphasis applied. A strong emphasis is laid in this research on minimising the amount of jitter experienced due to the channel bandwidth limitation. The pulse edges as shown in Figure 4.8 and Figure 4.9, contain DDJ introduced by the channel. The DDJ is simulated by looking at the distribution of the pulse edges around the ideally located pulse edge. These results are captured in Figure 4.10 and Figure 4.11. The ideal pulse edge location in this instance is 50 ps, with the simulation performed for a system working at a data transport rate of 10 Gb/s. From Figure 4.10 (a) and (b) it is seen that the jitter around the pulse edges is severe, whereby the jitter distribution exceeds a single pulse width of 100 ps. This, in the time domain, closes the eye diagram at the receiver. This is also seen when referring back to Figure 4.8 and Figure 4.9. Figure 4.10. Timing deviation around the pulse edge (DDJ). The ideal pulse edge should be situated at 50 ps. (a) 1-tap pre-emphasis applied. (b) 2-tap pre-emphasis applied. (c) 3-tap pre-emphasis applied. (d) 4-tap pre-emphasis applied. The DDJ around the pulse edges reduce significantly around the pulse edges with the increase in the amount of implemented filter taps. This is easily seen by referring to Figure 4.10 (c) through to Figure 4.11 (d). The jitter situated around the pulse edges for eight implemented filter taps fall within 1.5 ps of the ideal pulse edge location. It is also seen that the jitter around the pulse edges follow a delta-dirac distribution for each frequency component as predicted in the body of knowledge presented in Chapter 2. The DDJ for six implemented FIR filter taps is within 7 ps of the ideally located pulse edge which evaluates to less than 10 % of a pulse width which is a popular jitter requirement specification. Figure 4.11. Timing deviation around the pulse edge (DDJ). The ideal pulse edge should be situated at 50 ps. (a) 5-tap pre-emphasis applied. (b) 6-tap pre-emphasis applied. (c) 7-tap pre-emphasis applied. (d) 8-tap pre-emphasis applied. The adaptation of the FIR filter taps play out exactly as presented in Figure 7. The filter taps start at a maximum, and decreases by one LSB with each iteration. When the ideal value is reached, the adaptation engine moves on to the next filter tap. This process continues until all the filter taps converged to a final value. The filter tap convergence is illustrated in Figure 4.12. Figure 4.12. Filter tap convergence from the initialised maximum value to the final ideal value. (a) Filter tap coefficient 1. (b) Filter tap coefficient 2. (c) Filter tap coefficient 3. (d) Filter tap coefficient 4. (e) Filter tap coefficient 5. (f) Filter tap coefficient 6. Filter tap 7 and 8 omitted. Apart from looking only at the eye diagram at the far end of the link and at the jitter around the pulse edges in getting a qualitative feel for the working of the system, it makes sense to also look at the physical effect of the number of implemented FIR pre-emphasis taps on a random bit sequence. The bit sequence transmitted is 01010001, and the results are shown in Figure 4.13 and Figure 4.14. Figure 4.13. Comparison of the effect of the amount of filter taps to a certain predefined transmit sequence. (a) 1-tap pre-emphasis. (b) 2-tap pre-emphasis. (c) 3-tap pre-emphasis. (d) 4-tap pre-emphasis. It is seen that the ideal amount of FIR filter taps can now be chosen by trading off, the eye diagram, DDJ around the pulse edges and by looking directly at the data integrity at the receiver versus the amount of FIR filter taps. It can be seen that a good trade-off between the data integrity and DDJ is by utilising 4-6 filter taps. This research implemented 6 FIR filter taps for the possibility of encountering a worse channel than was used in these simulations. Figure 4.14. Comparison of the effect of the amount of filter taps to a certain predefined transmit sequence. (a) 5-tap pre-emphasis. (b) 6-tap pre-emphasis. (c) 7-tap pre-emphasis. (d) 8-tap pre-emphasis. As discussed in Chapter 2, the ideal magnitude frequency response after applying FIR preemphasis should be flat over the entire range of frequencies. However, since only a finite number of FIR filter taps can be implemented, the FIR pre-emphasis filter frequency response flattens off at higher frequencies. Hence, all that is achieved regarding the complete magnitude frequency response is an extension of the -3 dB cut-off frequency. This is illustrated in the simulated frequency responses presented in Figure 4.15. Figure 4.15. Frequency response of the implemented FIR pre-emphasis. The dashed line represents the filter response, the dotted line the channel response and the solid line the combined response showing the extended -3 dB cut-off frequency. The presented mathematical simulation results have shown a great improvement on the data integrity and DDJ of the received data. The discussion now moves to an implementation level discussion. ## 4.4 Bipolar CML versus MOS CML #### 4.4.1 Background One of the major advantages of making use of a BiCMOS process is the fact that high speed, low-noise bipolar junction transistors (BJTs) or in this case SiGe HBTs could possibly result in a performance improvement under certain conditions. CMOS did in fact take over from conventional Si BJTs for digital circuits in the early 1990's, but the wireless and high data rate wireline domains are another matter entirely [66]. This is due to the far more stringent demands placed on the devices than in simple digital logic. The FIR filter design, incorporating CML, drives a large complex load, which includes the large I/O pad capacitance. Also, to be able to compensate for the losses of the channel and the package parasitic components the actual transmitter voltage swing should increase. This requires a high current drive of which the high- $\beta$ HBTs could provide an important improvement regarding the rise/fall time of the transmitted pulse. This performance improvement is strongly dependant on the type of implementation at hand. This section covers the trade-offs involved in choosing between bipolar and MOS devices for the implementation of CML. ## 4.4.2 Propagation delay Driving a complex load, especially a large capacitive load as is the case for this implementation, requires a large transistor aspect ratio to obtain acceptable propagation delay times (as well as rise/fall times). Figure 4.16 illustrates the propagation delay of a simple CMOS inverter and relationship between the capacitive load and transistor aspect ratio. Figure 4.16. Propagation delay of a simple MOS inverter. Adapted from [67]. As seen from Figure 4.16, for a certain fixed load capacitance an ideal transistor aspect ratio exists which will result in a minimum propagation delay. Reference [67] reported a propagation delay of 19.1 ps for a load capacitance of 45 fF. Propagation delay as such does not prove to be such a high concern since the transmitted data stream is not synchronised with a clock signal, hence the propagation delay will not affect sampling during clock and data recovery. Propagation delay does however give a good indication of the relationship between transistor aspect ratio and rise/fall times, which if too large will completely close the eye diagram at the receiver. Rise and fall times contribute to the DDJ introduced by the channel, but fortunately the component added by the transmitter itself is dominated by the channel contribution. Hence the rise and fall times can under most circumstances be ignored, but still any improvement in this regard will always be useful. The greatest disadvantage of using MOS CML, also called $C^3$ MOS logic [68], is the required aspect ratio to steer the amount of current to produce the necessary swing across the 50 $\Omega$ source resistors. Especially in a FIR structure, this transistor size can dominate the rest of the circuit since for every filter tap, 2 large NMOS transistors are required. A conventional FIR filter structure implementing CML is shown in Figure 4.17. Figure 4.17. Conventional structure of a FIR filter implementing CML [67]. The FIR implementation structure as shown in Figure 4.17 is easily made adaptable since the FIR filter coefficients are controlled by the tail currents of each pair of transistors. Sign control is also easily attained by simply swapping around the data at the input of the driving transistors. BiCMOS logic on the other hand incorporates bipolar devices to handle the current steering but MOS devices to handle the logic operation. This can be seen as having a best of both worlds result. Bipolar devices are better utilised for current steering for its large $\beta$ , but it does come at a cost of increased static power dissipation (finite base current) Fortunately, the base current is negligible because of the high $\beta$ of SiGe HBTs. The static power dissipation for bipolar CML (BCML) can be expressed as: $$P = \sum_{k=0}^{N} I_{tap}(k) V_{DD}$$ (4.14) where P is the static power dissipation, $I_{tap}(k)$ is the $k^{th}$ filter tap current and $V_{DD}$ is the supply voltage. Reference [67] proposed a BiCMOS logic driver, where a CMOS inverter is used to drive BCML avoiding unnecessary loading on the previous stage (usually a D-flip-flop). The inverter also acts to reduce the propagation delay of the driver by utilising the BiCMOS logic. The PMOS device will drive the transistor in the ON state and the NMOS transistor will remove all excess minority charge in the base region in the OFF state [69]. Hence, the CMOS inverter preceding the BCML driver strives to ideal driving conditions. This is done by proper aspect ratio scaling of the CMOS inverter depending on the input capacitance of the HBT. Figure 4.18 shows the proposed BiCMOS logic driver suitable for FIR filter implementations [67]. Figure 4.18. BiCMOS logic driver for high speed serial links [67]. Since the preceding CMOS inverter strives towards ideal driving conditions of the bipolar transistor, the propagation delay of the transmitter driver can be approximated as (using conventional Gummel-Poon models): $$\tau_{PD} \approx 0.69 \left( \frac{r_c + r_b}{1 + g_m r_e} C_{be} + r_b C_{bci} \left( 1 + \frac{g_m (r_c + R_C)}{1 + g_m r_e} \right) + N(r_c + R_C) (C_{bci} + C_{bcx} + C_{cs}) + R_C C_L \right)$$ (4.15) where $r_c$ , $r_b$ and $r_e$ are the parasitic collector, base and emitter resistances, $g_m$ is the half-circuit transistor transconductance, $R_C$ is the collector resistance, $C_L$ is the load capacitance and N the amount of filter taps. The capacitances are defined in [31]. This equation is suitable for pencil and paper design since it is based on the basic transistor model known by most designers. Figure 4.19 illustrates the comparison between the propagation delay obtained from Equation 4.15 and the propagation delay obtained through simulation. Figure 4.19. Comparison of the propagation delay obtained using the Gummel-Poon model equations and simulation (using VBIC) [67]. From Figure 4.19 it is clear that Equation 4.15 results in an overestimation of about 50%. This can be attributed to the fact that the conventional Gummel-Poon model used for hand calculations is not a valid model for SiGe high- $f_T$ HBTs. ## 4.4.3 Rise and fall times Although the propagation delay of the transmitter is not a critical measure, since the propagation delay will only delay the data bit stream, the rise and fall times of the transmitter can effectively close the eye diagram completely before transmission. Propagation delay can, when too severe, cause data bits to be lost in the transmission process when the propagation delay exceeds the pulse width of a single data bit. The proposed BiCMOS logic driver from Figure 4.18 can be traded off against a conventional MOS CML logic transmitter which is also being driven by an ideal preceding CMOS inverter (ideal in a sense that the load due to the CML driver itself is driven with an ideal aspect ratio CMOS inverter). The results of the rise and fall time simulations are presented in Table VI. TABLE VI. BIPOLAR CML VERSUS C<sup>3</sup>MOS LOGIC RISE/FALL TIMES. | | C <sup>3</sup> MOS | BCML | Optimal<br>C <sup>3</sup> MOS | Optimal<br>BCML | |--------------------------------------------------|--------------------|----------|-------------------------------|-----------------| | $C_L = 50 \text{ fF}$<br>$I_{SS} = 2 \text{ mA}$ | 32.37 ps | 21.23 ps | 12.27 ps | 13.79 ps | | $C_L = 70 \text{ fF}$<br>$I_{SS} = 2 \text{ mA}$ | 32.37 ps | 30.96 ps | 13.28 ps | 14.96 ps | | $C_L = 50 \text{ fF}$<br>$I_{SS} = 5 \text{ mA}$ | 32.06 ps | 20.36 ps | 17.13 ps | 11.66 ps | | $C_L = 70 \text{ fF}$<br>$I_{SS} = 5 \text{ mA}$ | 50.85 ps | 21.23 ps | 18.06 ps | 12.79 ps | The rise and fall times are compared under two conditions namely, high load and high current. A typical load that such transmitter should be able to drive is 50 fF but can increase to about 70 fF. The driving current, which establishes the output voltage swing, can change from a few micro-Amperes to a few milli-Amperes. From Table VI a few key aspects or trends can be noted between the use of BCML and C<sup>3</sup>MOS. The first trend that can be noted is that the rise/fall times of the BCML transmitter stay relatively constant, whereas the C<sup>3</sup>MOS transmitter varies slightly more. This can be attributed to the fact that MOS transistor current depends on both the W/L ratio and the overdrive voltage $(V_{ov})$ . Since the driving MOS transistor aspect ratio cannot be changed to provide an ideal rise and fall time for every tail current value, the actual rise and fall time as seen in Table VI will strongly depend on the overdrive voltage, hence a larger variation. This can be confirmed by looking at Equation 4.16 for the $g_m$ of a MOS transistor [31], $$g_m = \mu C_{ox} \frac{W}{L} (V_{ov}) \tag{4.16}$$ where $g_m$ is the transistor transconductance, W/L the transistor aspect ratio, $V_{ov}$ the overdrive voltage, $\mu$ the mobility and $C_{ox}$ the oxide capacitance. A reduction in current, keeping the W/L value the same, requires a reduction in $V_{ov}$ which results in a linear reduction of $g_m$ . The unity gain frequency for a MOS transistor can further be expressed as [31], $$f_T = \frac{g_m}{2\pi \left(C_{gs} + C_{gd} + C_{gb}\right)} \tag{4.17}$$ where $C_{gs}$ , $C_{gd}$ and $C_{gb}$ are the junction capacitances. Therefore the linear reduction in $g_m$ results in a direct, linear reduction in $f_T$ , hence effectively producing a slower MOS transistor. Thus a few trade-offs can be considered in designing a fast $C^3MOS$ logic driver with respect to the optimal transistor aspect ratio. As shown in Equation 4.16, $g_m$ depends directly on W/L while the MOS transistor junction capacitances in Equation 4.17 can be approximated as having a $W^*L$ term. Hence, to increase the transistor speed the gate length should be reduced accordingly. Reducing the gate length introduces a few problems affecting the speed of the transistor. These can be related to short channel effects. One of the most significant short channel effects affecting the ON/OFF switching time of a MOS transistor is velocity saturation, whereby at a certain critical electric field the drift velocity decreases and approaches the constant scattering limited velocity. Fortunately the advantages gained from a reduced gate length transistor outweigh the disadvantages, which resulted in the MOS transistor scaling according to Moore's law. Hence a MOS transistor has a certain *W/L* ratio for each tail current resulting in the optimal speed transistor under the current conditions. The transistor can be further fingered resulting in lower capacitance and higher speed, also, only up to a certain point whereby the speed decreases again. The second trend noticed from Table VI is that the BCML driver's rise/fall times reduce even further at a higher tail current value while keeping the load capacitance the same. This can be seen as having a direct relationship with the unity gain frequency of the driving HBT. The unity gain frequency for a BJT can be expressed as [31], $$f_T = \frac{g_m}{2\pi (C_{\pi} + C_{\mu})} \tag{4.18}$$ where $C_{\pi}$ and $C_{\mu}$ are the base emitter and base collector capacitances respectively. Since $g_m$ for a BJT is directly related to the transistor current, an increase in $f_T$ can be expected. But actually the $f_T$ of a transistor falls off at higher bias currents because of the decline experienced in the forward transistor current gain. A typical fall-off of the unity gain frequency as a function of the bias current is shown in Figure 4.20. Figure 4.20. Typical unity gain frequency falloff at high bias currents. This decline in current gain can be attributed to high-level injection and the Kirk effect. Hence it is expected that for a certain manufacturing process, the $f_T$ of the transistor will have a peak at a certain bias current as seen in Figure 4.20. This is strongly process dependant since it depends on carrier concentrations in the base, collector and emitter. This process dependency can be seen by looking at the data presented in Table VII which compares a low (IBM 7WL) and high speed (IBM 8HP) SiGe BiCMOS process. TABLE VII. $\mbox{Bipolar CML rise/fall times for a high} f_T \mbox{process versus a low} f_T \mbox{process}.$ | | Low $f_T$ process | $\begin{array}{c} \operatorname{High} f_T \\ \operatorname{process} \end{array}$ | |-------------------------------------------------|-------------------|----------------------------------------------------------------------------------| | $C_{L} = 50 \text{ fF}$ $I_{SS} = 2\text{mA}$ | 18 ps | 13.79 ps | | $C_{L} = 50 \text{ fF}$ $I_{SS} = 5 \text{ mA}$ | 24 ps | 11.66 ps | The high $f_T$ process as shown has a peak $f_T$ value of 180 GHz at a biasing current of more or less 5 mA, whereas the low $f_T$ process peaks at 60 GHz with a biasing current of 600 $\mu$ A. Hence, as can be seen in the data presented in Table VII, at a higher biasing current with the same load, the actual rise and fall times for the low $f_T$ process increase with higher bias current. This is in direct contradiction to the high $f_T$ process which decreases with a higher biasing current due to the peaking behaviour of the $f_T$ curve around 5 mA. The high bias current in the low $f_T$ process can be attempted to be overcome by adding multiple emitters to the transistor, hence decreasing the amount of current in each and providing a possible $f_T$ increase for the low $f_T$ process. However, by adding multiple emitters, the base-emitter capacitance increases, which, by looking at Equation 4.15 increases the propagation delay. Comparing Table VI and Table VII presents some interesting results. The main trend that can be noted is that the BiCMOS logic transmitter utilising a high $f_T$ process outperforms the conventional MOS CML transmitters implemented with a preceding optimally matched CMOS inverter. The low $f_T$ process however is on its turn outperformed by the conventional optimum MOS CML transmitter when considering ease of implementation as well as flexibility at a low operating current. Due to cost considerations, and the chance of having a free MPW run resulted in the choice of the low $f_T$ IBM 7WL BiCMOS process. Hence, all CML circuits implemented in this research make use of only MOS CML. The amplifiers in the receiver was also implemented using MOS transistors to result in the complete CMOS integration of the adaptive FIR filter transceiver presented in this dissertation. ## 4.5 Complete system integration The complete system integration is shown in Figure 4.21. The different subsystems are discussed in detail in the following sections. Figure 4.21. Final system level design of the proposed adaptive FIR filter preemphasis system. The indicated CML multiplexer chooses between actual data to be transmitted, or the pilot signals generated by the pilot signal generator. The control logic in the transmitter controls the adaptation process as well as choosing the correct corresponding pilot signal to be transmitted. The FIR pre-emphasis driver incorporates 6 filter taps, each adjustable by means of a current controlled DAC. The receiver amplifies the received pulses and compares their amplitude to a user adjustable (by means of an external resistor) threshold value. The necessary control pulses are generated and sent to the transmitter for filter tap updates. #### 4.6 Pilot signal generator The pilot signal generator forms an integral part of the adaptive pre-emphasis technique employed. The pilot signal generator implementation is illustrated in Figure 4.22. Figure 4.22. Pilot signal generator implementation. The pilot signal generator utilises a read only memory (ROM) to store the separate pilot signal sequences. Depending on the value of the 3-bit counter, the specific pilot sequence is applied to the parallel load shift register. The parallel load shift register utilises CMOS circuits for the loading of the new pilot signal, but fast CML circuits for the actual shifting. This results in the pilot signal generator being able to shift the applied data at rates exceeding 5 Gb/s. The differential output pilot signal is applied to the CML multiplexer which chooses between the pilot signal or the actual data. The 3-bit counter, parallel load shift register and the ROM are controlled by the CMOS logic according to a state machine. The state diagram of the CMOS control logic is shown in Figure 4.23. The control logic starts with an initial state whereby no pilot signals are loaded into the shift register. The DONE output of the logic is low, indicating that real data is being transmitted at this stage. In the event of an EOC pulse (generated by the CMOS control logic) the state changes. Figure 4.23. Pilot signal generator state diagram. The states are controlled by the EOC signal. The state changes to Pilot 1, where the first pilot signal is chosen from the ROM and the DONE signal goes HIGH indicating to the CML multiplexer to switch to the pilot signals generated. The data is chosen from the ROM by selecting a specific row in the ROM itself. The 3-bit counter thus incorporates a 3-to-8 decoder for selecting a specific ROM row. The pilot signals are shifted when the SHIFT\_IN pulse goes from a HIGH to a LOW and loaded on a HIGH state. The complete state information is contained in Table VIII. The table shows the state names, the state value, the multiplexer control signal (DONE) and which of the ROM rows are selected. TABLE VIII. STATE DESCRIPTION TABLE FOR THE PILOT SIGNAL GENERATOR. | State name | С | DONE | ROM | ROM | ROM | | ROM | ROM | |------------------|-----|------|-------|-------|-------|-------|-------|-------| | | | | sel 1 | sel 2 | sel 3 | sel 4 | sel 5 | sel 6 | | Initial state | 111 | 0 | | | | | | | | Pilot 1 | 000 | 1 | X | | | | | | | Pilot 2 | 001 | 1 | | X | | | | | | Pilot 3 | 010 | 1 | | | X | | | | | Pilot 4 | 011 | 1 | | | | X | | | | Pilot 5 | 100 | 1 | | | | | X | | | Pilot 6 | 101 | 1 | | | | | | X | | Normal operation | 110 | 0 | | | | | | | The final state of the control logic is the Normal operation state. The multiplexer is switched to real data again and no pilot signals are loaded or shifted. On the event of a user generated EOC pulse the adaptation process will start over again by forcing the control logic into the Initial state. This allows the filter taps to be reset for the repeat adaptation. The pilot signal generator SPICE simulation results are presented in Figure 5.2 and Figure 5.3. ## 4.7 Adaptive FIR pre-emphasis driver The first two filter taps' implementation is illustrated in Figure 4.24. The filter taps can assume both negative and positive filter tap values by simply swapping the input data in the case of a negative filter tap value. Figure 4.24. The adaptive FIR pre-emphasis driver showing only the first two filter taps. The data and clock signals are initially passed through a CML D-flip-flop (DFF) for synchronisation. The CML exclusive-or (XOR) gates are used for sign control (swapping of input data) of the filter taps and are directly controlled by the most significant bit of the 7-bit counters. The 6 LSB bits are used as the absolute value of the filter tap. The structure as shown in this figure is an implementation of the transversal structure as discussed in Chapter 2. The taps are controlled using the TAP\_Adjust and TAP\_Active control signals generated by the CMOS control logic presented in the next section. The delayed data of the first CML DFF is used as the input the next CML DFF of the second filter tap. The contributions of the different filter taps to the output signal are added together with the use of the 50 $\Omega$ termination resistors. The DAC is also be reset according to the state of the control logic. #### 4.8 Control logic design The CMOS control logic in this instance is designed to control the adaptive FIR preemphasis driver. The control logic implementation is illustrated in Figure 4.25. Figure 4.25. Control logic implementation. The control logic is responsible for controlling the counting and adaptation of the FIR filter. The input signals are coupled directly from the receiver, but can also be manually controlled by the user. Two additional input signals, AUTO\_IN and EOC\_user, have been added for better testing ability upon experimental testing. This gives the user full control over the transmitter. The EOC\_detector converts the SHIFT\_IN input signal into an EOC pulse. The pulse is generated in the case where the SHIFT\_IN pulse was present initially but has not repeated for a certain amount of time. The time is controlled by charging and discharging an on-chip capacitor and the time is set for a number of possible iterations of the filter tap adaptation. The counter will keep track of the amount of EOC pulses generated and control the transmitter depending on the counter value. The different filter taps are adjusted and activated depending on the current state. The state diagram for the CMOS control logic is presented in Figure 4.26. Figure 4.26. Control logic state diagram. The states are controlled by the EOC signal generated from the SHIFT\_IN signal. The states change depending on whether or not an EOC pulse has been generated. The system will stay in a specific state until an EOC pulse is received and the counter increased. The control logic starts with an initialisation stage whereby only the first filter tap is active and all the DACs present are reset to a maximum value. Upon the initial EOC pulse (generated by the user) the state is changed to State 1 where the first filter tap is active and can be adjusted accordingly. The state is then changed to State 2 upon the next EOC pulse received. In State 2 the second filter tap is activated (as well as the first filter tap) but only the second filter tap can be adjusted. The filter tap activation as well as the filter tap adjustments can only occur while the DONE signal, generated by the pilot signal generator, is HIGH indicating the adaptation process is in progress. A full table containing the specifications for each state is presented in Table IX. #### STATE DESCRIPTION TABLE FOR THE CMOS CONTROL LOGIC. | State name | C | Active | Tap 1 | Tap 2 | Tap 3 | Tap 4 | Tap 5 | Tap 6 | |------------------|-----|--------|-------|-------|-------|-------|-------|-------| | Initial state | 111 | 1 | | | | | | | | State 1 | 000 | 1 | X | | | | | | | State 2 | 001 | 1-2 | | X | | | | | | State 3 | 010 | 1-3 | | | X | | | | | State 4 | 011 | 1-4 | | | | X | | | | State 5 | 100 | 1-5 | | | | | X | | | State 6 | 101 | 1-6 | | | | | | X | | Normal operation | 110 | 1-6 | | | | | | | The adaptation process continues until all the filter taps have been adjusted. After the last filter tap has been adjusted, the EOC pulse generator moves the state to Normal operation. In this state the multiplexer will choose real data, as opposed to the pilot signals, and all the filter taps will be active for the 6-tap FIR pre-emphasis. The DONE signal generated will also go LOW indicating the adaptation process is over. ### 4.9 Receiver design The complete receiver design is illustrated in Figure 4.27. The receiver implements DC blocking capacitors to decouple the AC signal for the differential amplifier. The differential amplifier boosts the level of the input AC signal in order to reduce possible erroneous switching due to a noisy threshold voltage at the comparators. Figure 4.27. Completer receiver implementation. The receiver is responsible for generating control signals from the received pulses depending in the threshold level. The threshold voltage is externally adjustable by means of a resistor. This allows the user to control the acceptable vertical eye amplitude at the receiver. The differential signal is split up into the two single ended streams before passing through the comparators. The comparators switch depending on the received pulse amplitude. The possible switching of the comparator causes the monostable multivibrators to produce a pulse which is sent to the transmitter. Two pulses are generated namely, ADJUST\_out and SHIFT\_out. The first is used to adjust the filter tap in the transmitter by one LSB per pulse. The second will cause the pilot signal generator to again transmit the corresponding pilot signal stream for further tap adjustments. The ADJUST\_out and SHIFT\_out signals are shown in Figure 4.28. Figure 4.28. Relationship between the ADJUST\_out and SHIFT\_out signal pulses. (a) Shows the ADJUST\_out pulse while (b) shows the SHIFT\_out pulse. The pulse width is more or less 10 – 15 ns. The SHIFT\_out pulse always follows the ADJUST\_out pulse in order to ensure that the next pilot signal transmitted will be based on the updated filter tap. The pulse width of the control signals is chosen such that the CMOS logic in the transmitter has sufficient time for switching. It is noted that the adaptation technique implemented contains no peak detector. The adaptation technique was implemented by rather making use of a differential amplifier, since the adaptation method dictates that the largest component received will always be at the term of interest while all preceding components will be below the ideal received value. Hence, a peak detector or alternatively an RMS detector as presented in [70], [71], [72] is not a necessity for the adaptation method. ## 4.10 Conclusion This chapter contained detailed mathematical and systems design considerations and methods. The pilot signalling and peak detection method of adaptive FIR pre-emphasis was simulated using MATLAB and the complete MATLAB script can be found in Appendix A. The mathematical simulation results indicated a great reduction in the DDJ situated around the pulse edges of the received data by implementing the adaptive FIR pre-emphasis method. The simulated jitter was below 10 % of a pulse width at 10 Gb/s. The amount of filter tap coefficients chosen for the IC implementation were set to six, since a worse channel can be encountered than the one simulated. Conventional MOS CML and BCML were compared in order to establish a measure in choosing between the two types of implementation. High $f_T$ HBTs have shown an improvement of the rise and fall times of the pre-emphasis transmitter over conventional MOS CML. However, since the lower $f_T$ IBM 7WL process was used for the implementation of the adaptive pre-emphasis system, conventional MOS CML was chosen. This is due to the fact that the advantages of using MOS CML operated in the lower operating current range far outweigh the small time gain achieved with using BCML. The systems implementation of the entire system was also presented. The transmitter was divided into three main subsystems, namely, the pilot signal generator, control logic and the FIR pre-emphasis transmitter. All of these subsystems were broken down into their corresponding subsystems for clarity sake. The receiver responsible for generating the control signals for the FIR adaptation in the transmitter was also discussed. The following chapter contains SPICE simulation results of the implemented adaptive FIR pre-emphasis filter. The SPICE simulations make use of p-cells for improved accuracy between circuit models and actual layout models. # **CHAPTER 5: SIMULATION RESULTS** ## 5.1 Introduction In order to verify the mathematical simulations as well as verify the system design introduced in Chapter 4, SPICE simulations were carried out. This chapter contains all the SPICE simulation results utilising the Cadence Virtuoso software suite. The simulations were carried out with p-cell instances and by utilising the Spectre RF simulation engine. ## 5.2 Pilot signal generator The pilot signal generator is a key component in the design of and research into the adaptive FIR pre-emphasis technique employed. This section contains the necessary simulation results for pilot signal generator verification and operation. One of the problems faced in all high frequency implementations is to couple a clock signal to be used on-chip. The package parasitic components distort the input signal degrading the quality of the clock signal. Furthermore, implementations of clocked CML circuits require a differential clock for operation. The differential clock in this research was created by taking a single ended clock signal as input and converting it into a fully differential signal. This was done by scaling a super buffer structure with a scaling factor of 2.5 which is close to the ideal value of 2.7 as predicted in [69]. The differential clock signal created with a single ended clock input is illustrated in Figure 5.1. The differential clock signal as shown in Figure 5.1 was used in all simulations requiring a clock signal. This will degrade performance of the circuits because of the clock noise and the finite rise and fall times. This will however result in more accurate simulations. The clock signal created is an accurate representation of the expected on-chip clock signal, since all effects such as the package parasitic components and clock input termination were included in the simulation. Figure 5.1. Differential clock signal created with a super buffer structure with a scaling factor of 2.5. The pilot signal generated need to be properly controlled according to the state diagram presented in Figure 4.23. The control signals as well as the generated pilot signals are shown in Figure 5.2. The pilot signal generator contains a reset input for the reset of all the CMOS logic incorporated for the implementation of the state diagram. The reset input (a) as shown is set to go HIGH at 5 ns. The CMOS logic is reset and the correct values are assumed. The SHIFT\_IN input (b) is used to control the load and shift operations of the pilot signal generator. As seen, the DONE signal (d) has to be HIGH in order for the pilot signal to be loaded and shifted. As soon as the DONE signal goes HIGH, the SHIFT\_IN signal controls the operation of the pilot signal generator. The DONE signal (d), controlled by the EOC signal (c), depicting either the start of a new adaptation process, or the end of a filter tap adaptation. With each new EOC pulse, the state is changed according to Figure 4.23 and a new pulse sequence is chosen and transmitted with a HIGH to LOW transition of the SHIFT\_IN input. It is also seen that after the last EOC pulse has been received from an adaptation process, the pilot signal generator does not load or shift any values since the generator is now in the normal operation state as depicted in Figure 4.23. Two additional EOC pulses are necessary to retrain the filter taps. The first pulse resets all the DACs while the second pulse deactivates all the filter taps, except the first filter tap, for the start of a new adaptation process. The six pilot signals corresponding to the six FIR filter taps are illustrated in Figure 5.3. The noise present in the CML signals are due to the differential clock having a finite rise and fall time. The finite time will cause both transistors to conduct significant portions of the tail current at the same time during switching, causing the noise as shown. Figure 5.3. The six pilot signals corresponding to the six FIR filter taps. Top left: 100000. Top right: 110000. Middle left: 101000. Middle right: 100100. Bottom left: 100010. Bottom right: 100001. Although the non-ideal differential clock causes noise, the clock transistors were made smaller, hence less sensitive to voltage variations. This resulted in the worst case eye closure of the pilot signals of 310 mV. All CML circuits in the design have been developed for a voltage swing of approximately 300 mV. Hence, the pilot signal generator output is directly compatible with following CML gates. ## 5.3 Control logic The states of both the control logic of the pilot signal generator and the control logic of the FIR pre-emphasis driver are controlled by the EOC pulses. The EOC pulses are determined from the received SHIFT\_IN pulse as discussed in Chapter 4. Figure 5.4 illustrates the generation of the EOC pulse from the SHIFT\_IN pulse while Figure 5.5 shows the EOC pulse generated from the user input. Figure 5.4. EOC pulse generated when the SHIFT\_IN pulse was initially present and stopped (indicating an EOC pulse should be generated). (a) SHIFT\_IN pulse. (b) EOC pulse generated after an amount of time. The EOC pulse is generated approximately 5 iterations after the last SHIFT\_IN pulse has been received. The generated EOC pulse will then control the state of the implemented transmitter. Figure 5.4 is generated in automatic mode (AUTO = 1) while Figure 5.5 is generated in manual mode (AUTO = 0). When in manual mode, the EOC pulses will not be generated from the SHIFT\_IN pulses but will only be controlled by the user. The user also has full accessibility to the SHIFT\_IN pin and the ADJUST pin, hence full control can be taken when AUTO is reset. Figure 5.5. EOC pulse generated from the user input for manual control. (a) User input for EOC pulse. (b) Generated EOC pulse for state control. The EOC pulses control the states of the logic for the FIR adaptation process. Figure 5.6 and Figure 5.7 illustrates the simulated control by means of generating EOC pulses as shown. The EOC pulses in Figure 5.6 (b) determines the state in which each of the FIR filter taps should be active in the adaptation process and beyond. As shown in Figure 5.6 (d) to (i), the FIR filter taps turn on one after the other with the application of a new EOC pulse. The filter taps then stay activated until the counter is reset on a rising edge (a) and only the first filter tap remains active. The DONE signal in relation to the tap activation process is shown in Figure 5.6 (c). Figure 5.7 shows the FIR filter tap adjustment pulses only adjusting the filter tap corresponding to the current state. The filter taps are adjusted by means of ADJUST pulses (a) and the states are changed with every EOC pulse (b) received. The DAC is also reset on a generated rising edge (c). The FIR filter tap adjustments are shown in Figure 5.7 (d) through (i). The state diagram for the FIR pre-emphasis adaptation is presented in Figure 4.26. #### 5.4 Receiver The receiver as shown in Figure 4.27 contains a differential amplifier preceding the comparator responsible for comparing the received pulses to a user-adjustable threshold voltage. The differential amplifier is characterised in order to determine the resistor values controlling the eye amplitude at the receiver. Figure 5.8 illustrates the differential amplifier gain and single ended output voltage versus the applied differential input pulse amplitude. Figure 5.8. Receiver amplifier characteristic. (a) Differential amplifier gain versus differential input voltage. (b) Single ended branch output voltage versus differential input voltage. The gain curve in Figure 5.8 (a) is non-linear with increasing input voltage, hence it can be expected that the single ended output voltage will also be non-linear. The non-linearity is caused by the tail current source of the amplifier being forced out of the saturation region and into the triode region. The non-linearity will however not present a problem since the lower amplitudes are amplified sufficiently for detection and comparison, whereas the higher amplitudes (which do not need amplification) can already be detected with minimal gain. The internal threshold voltage is controlled directly with the use of an external resistor. The external resistor is connected in a voltage divider structure with an internal resistor. The threshold voltage is applied to the input of the comparators for the pulse generation circuit. The change in internal threshold voltage with the change in external resistor values is shown in Figure 5.9. Figure 5.9. Adjusting of the internal threshold voltage with the use of an external resistor. Utilising the differential amplifier characteristics and the change in threshold voltage with the change in external resistor value, a graph for differential input voltage versus resistor value can be evaluated as shown in Figure 5.10. Figure 5.10. Graph for choosing an external resistor to control the eye diagram amplitude at the receiver. The differential input voltage to the amplifier is the eye diagram data at the receiver. Figure 5.10 illustrates that for any specific external resistor value the differential input voltage amplitude can be controlled according to the non-linear wave shape. The resistor value can thus be set for controlling the eye amplitude. While the adaptation process is underway (DONE = 1), the control pulses should be generated as specified in Chapter 4. Figure 5.11 illustrates the ADJUST and SHIFT\_IN pulses generated when the differential input voltage is above the threshold voltage for two consecutive iterations. Figure 5.11. ADJUST and SHIFT\_IN pulses generated from the differential input voltage. The resistor was chosen to be 150 $\Omega$ , hence the eye input voltage will be controlled according to Figure 5.10 at approximately 40 mV. The pulse widths of the generated pulse signals will vary initially since the capacitor used in the monostable multivibrator will be fully charged due to long idle time before adaptation. When the adaptation process gets underway the pulse widths will shorten and settle on a value dependant on the charge and discharge rate of the capacitors. The pulse generation circuit settles at pulse widths of approximately 12 ns as shown in the above figure. The ADJUST pulse is sent first to ensure that the next iteration of the adaptation process will make use of the new tap value. The SHIFT\_IN pulse is only sent after the ADJUST pulse to ensure that only one pulse can be sent to the receiver with the reception of a training pulse from the transmitter. # 5.5 Adaptive FIR pre-emphasis driver The simulated eye diagram at the receiver without pre-emphasis (1-tap pre-emphasis) applied is illustrated in Figure 5.12. The eye diagram is completely closed resulting in total loss of data if no receiver equalisation is implemented, as is the case in this research. The data rate used for the circuit simulations is 5 Gb/s and is utilising NRZ signalling. Figure 5.12. Eye diagram at the receiver with no pre-emphasis filtering (1-tap pre-emphasis) applied. With the application of 6-tap FIR pre-emphasis the eye diagram is dramatically improved. The improved eye diagram was adapted for an eye amplitude of 30 mV at the receiver with the use of a 350 $\Omega$ threshold resistor. The simulated channel used, compares well to implemented 30" FR-4 channels which could be used as a worst-case channel. The improved eye diagram, utilising 6 FIR filter taps, at the receiver is illustrated in Figure 5.13. Figure 5.13 (a) shows the eye diagram at the receiver with 2 filter taps applied. The eye diagram is already recognisable due to the fact that the second filter tap goes a long way in reducing the long tail present in the channel impulse response. The introduction of the second filter tap however, comes at the cost of reduced amplitude. The third and fourth filter tap hence corrects for this drop in amplitude as shown in Figures 5.13 (b) and (c). Figure 5.13. Improved eye diagrams at the receiver with the application of FIR filter taps. (a) 2-tap FIR pre-emphasis applied. (b) 3-tap FIR pre-emphasis applied. (c) 4-tap FIR pre-emphasis applied. (d) 5-tap FIR pre-emphasis applied. (e) 6-tap FIR pre-emphasis applied. The introduction of the fourth filter tap (Figure 5.13 (c)) shows a significant improvement on the horizontal eye opening. The fourth filter helps in extending the -3 dB cut-off frequency hence a reduction in the DDJ is expected. The introduction of the fifth (Figure 5.13 (d)) and sixth (Figure 5.13 (e)) filter taps further improve the eye diagram by reducing the DDJ at the pulse edges as well as improving the eye amplitude further. The total DDJ present in the received pulses versus the amount of FIR filter taps implemented is illustrated in Table X. TABLE X. DDJ PRESENT IN THE RECEIVED SIGNAL. | FIR taps | Jitter (ps) | % of UI | Eye amplitude (mV) | |----------|-------------|---------|--------------------| | 1 | - | ≥ 100 | - | | 2 | 50 | 25 | 14 | | 3 | 50 | 25 | 16 | | 4 | 28 | 14 | 19 | | 5 | 28 | 14 | 20 | | 6 | 25 | 12.5 | 20 | The improved eye diagram at the receiver implementing 6-tap FIR pre-emphasis has a vertical eye amplitude of 20 mV and a horizontal eye opening of 175 ps. With the simulated data rate resulting in a minimum pulse width of 200 ps the eye diagram is sufficiently open for CDR in the receiver. The jitter present in the received signal is 25 ps which equates to 12.5 % of a UI. This is close to the commonly acceptable value of 10 %. # **5.6 Conclusion** This chapter illustrated all the SPICE circuit level simulation results. The eye diagram at the receiver without pre-emphasis is completely closed. However, with the application of 6-tap FIR pre-emphasis the eye diagram opens up improving the data integrity of the serial link. As expected, with more pre-emphasis taps applied the eye diagram improves dramatically. The DDJ as shown improves with every pre-emphasis filter tap applied. The simulations were successfully carried out at 5 Gb/s with sufficient noise margins and internal voltage swings to increase the data rate even further. The maximum achievable data rate will be determined upon experimental testing. The DDJ present in the simulated eye diagram at the receiver was reduced significantly with the application of more FIR filter taps. The DDJ was reduced to less than 15 % of the minimum pulse width. This results in a high integrity data signal at the receiver. The control of the adaptation process was also presented showing the movement between states with every EOC pulse received. The EOC pulse was generated from the SHIFT\_IN pulse, but also allowed for manual control. The receiver amplifier was characterised in order to determine the external resistor value for adjusting the internal threshold value. The internal threshold value controls the vertical eye amplitude at the receiver. ## 6.1 Introduction In order to verify the simulations results given in Chapter 5, the adaptive FIR pre-emphasis system was implemented in the IBM 7WL $0.18~\mu m$ process. The implementation formed part of a MPW run sponsored by MOSIS under their educational program. This chapter also describes the final layout considerations. # **6.2** Circuit layouts The adaptive FIR pre-emphasis system consists of two main subsystems, the transmitter and the receiver. As discussed in Chapter 4 and Chapter 5 the transmitter consists of three main subsystems, namely, the pilot signal generator, the multi-tap FIR pre-emphasis driver and the control logic (Refer to Appendix B). The control logic is responsible for the adaptation of the FIR pre-emphasis driver. The pilot signal generator incorporates its own CMOS logic for its control as discussed in Chapter 4. The layout of the final adaptive pre-emphasis transmitter is presented in Figure 6.1, while the layout of the receiver is shown in Figure 6.2. Additional layouts showing more detail and indicating specific aspects of the design are provided in Appendix B. Figure 6.1. Final layout of the 6-tap adaptive FIR pre-emphasis transmitter. The final dimensions are 1000 $\mu m \times 670 \mu m$ . OF PRETORIA Figure 6.2. Final receiver design responsible for detecting the received pulses and generating control signals to adapt the FIR filter coefficients. The dimensions are $340 \mu m \times 160 \mu m$ . ## **6.3 Transceiver configuration** The complete transceiver was packaged in a 64-pin QFN package for its suitability of frequency response and ease of testing and implementation. The wire bond QFN package introduces parasitic components, as discussed in Chapter 4, and can be lumped together with the transceiver to produce Figure 6.3. Figure 6.3. Context of the transmitter and the receiver in relation to the channel and the package parasitic components. The bond wire inductances as well as the parasitic capacitances were fully modelled since it forms part of the bandwidth limiting channel. The return path and the control signals are at a lower frequency, hence the package parasitic components can be ignored on those pads, as indicated in Figure 6.3. ## 6.4 Layout considerations The layout of the prototype IC is an important part of the research process, requiring absolute precision and accuracy. The most deviations in the results obtained from simulations as compared to experimental results are due to the layout. Various design techniques have been utilised, the most important of which is the common centroid design method. This method distributes a component in such a way that it is always symmetrical with regards to both the *x*-axis and the *y*-axis. This will result in a device that will not vary uncontrollably with temperature, since unwanted temperature gradients will be created across the IC while under operation. The same principle can be applied to internal matching between components on-chip. It is particularly important to achieve accurate matching between components when considering the CML circuits implemented. The CML circuits implemented contain a pull-up load resistor on each of the two branches. Any deviation in matching between resistors will result in degradation of voltage swing on the output of one of the branches. This will degrade the integrity of the transmitted data by causing an amplitude distortion. Mismatches in actual resistor values due to tolerances can be tolerated and taken into account on design. This is done by designing the CML circuits robust to changing voltage swings as a result of the resistance mismatches. An example of the pull-up load resistors implemented in the CML circuits for better internal component matching is shown in Figure 6.4. Figure 6.4. CML AND-gate showing the layout of the resistors to combat the effect of component mismatches on-chip. The dimensions are 12 μm x 30 μm. The load resistor of each load is split into two smaller resistors of equal size, and placed on diagonal planes in the layout. The other branch's load resistor is also split into two separate and equal resistors, and placed in the other diagonal planes. This configuration will result in better temperature stability as well as improved matching between the two initial resistors. ## **6.5** Conclusion This chapter presented the layouts of the adaptive FIR pre-emphasis serial link. The configuration of the transceiver with the package parasitic components were also presented and discussed. Layout considerations taken into account in order to improve the efficiency and yield of the implemented prototype were also presented. # **CHAPTER 7: EXPERIMENTAL RESULTS** ## 7.1 Introduction This chapter describes the experimental results achieved with the novel $0.18~\mu m$ CMOS adaptive FIR pre-emphasis transceiver. The transceiver was designed in the IBM 7WL $0.18~\mu m$ SiGe BiCMOS process, although only CMOS components were used. The experimental results were aimed at further validation of the hypothesis established in Chapter 1. # 7.2 Manufacturing and mounting As already mentioned, the manufactured die was made up of three sub-projects combined on a single die as part of a MPW run sponsored by the MOSIS educational programme. Figure 7.1 illustrates the MPW run IC die. Figure 7.1. MPW chip image with the overlaid transmitter and receiver. The MPW run was sponsored by MOSIS under their educational programme. In Fig. 7.1, the transmitter and receiver layouts, as presented in Chapter 6, are overlaid for easy system identification. The manufactured die was wire bonded in a QFN package and mounted on a custom designed PCB. Figure 7.2 illustrates the mounted IC on one of three identical custom made PCBs. Figure 7.2. The MPW run IC mounted on a custom made PCB. The input and output SMA connectors of the transmitter and receiver are indicated. The remaining SMA connectors form part of the other projects sharing the same die. The SMA connectors form the input and output ports of all the data lines, these include the low frequency return paths, the high frequency data paths as well as the additional control lines implemented for testing flexibility. The SMA connectors represent the signals, in order from top to bottom, Dout+, Dout-, EOC\_in, CLK\_in, DATA\_in, ADJin, SHIFT\_in, SHIFT\_out, ADJ\_out, Din+ and Din-. These signals were described in Chapters 4 and 5, and further described in conjunction with the final IC die in Appendices B and C. # 7.3 Transmitter results The transmitter state depends on pulse signals generated on-chip through a series of monostable multivibrators. As discussed in Chapter 4, in the default reset state, the transmitter should allow data applied at the data input to be transferred directly to the output without any signal modifications. The output signal is hence a differential signal replica of the input signal. The first functional state tested at 10 Mb/s is illustrated in Figure 7.3. Figure 7.3. Differential output signal of the transmitter in the first state, the reset state. The data rate used for the functionality tests was 10 Mb/s. As expected, at such a low data rate, the copper cables used do not have on effect on the DDJ in the system. The transmitter however, is stuck in the main reset state due to the inability of the pulse generation circuits to switch the CMOS logic levels and hence change state. This will be elaborated on when discussing the receiver in section 7.4. However even from the result achieved in Figure 7.3, a few results of individual subsystems can be drawn. - The current mode DAC utilised as the tail current source in each of the FIR filter taps are operating close to the designed values. The output voltage swing was on average, averaged between the three custom made PCBs, 14 % smaller than the designed value. - The super buffer structure implemented is functional, since the output signal is a direct replica of the random input data. - Since there is an even number of FIR filter taps, the transmitter can be said with certainty to be stuck in the main reset state. If the reset state was not achieved, all filter taps will be initialised to a maximum value resulting in output signal cancellations as well as large power dissipation variations. • Since the first state of the transmitter is operational, all the CML circuits associated with it are completely functional. Since the pilot signal generator utilises the exact same CML circuit building blocks, only the CMOS control of the pilot signal generator is under question. # 7.4 Receiver results The receiver reacts to a differential signal with an amplitude higher than the set threshold value. Hence each pulse transmitted results in both an ADJUST pulse and a SHIFT pulse being sent back to the transmitter via the low frequency return path, depending on whether the differential signal amplitude is higher than the set threshold value. The two pulses are shown in Figure 7.4, with a closer view of the pulses presented in Figure 7.5. Figure 7.4. Generated pulses transmitted from the receiver to the transmitter via the low frequency return path at each of the transmitted pulses. Figure 7.5. Close-up view of the pulses sent back to the transmitter. The designed pulse widths are overlaid to distinguish between the two pulses. The pulses shown in Figure 7.5 are not of sufficient amplitude to switch CMOS voltage levels in the transmitter pulse generators responsible for switching the FIR filter states. It can be seen that the charge and discharge time of the pulses are too slow. It is further seen that the pulses exhibit the charge and discharge of a capacitor at a constant current. This is due to the slow PMOS transistor used to charge and discharge a capacitor in the pulse generation circuit. Comparing the pulses generated experimentally to the pulses generated under ideal simulation conditions (Figure 7.6), the problem becomes self evident. This problem could have been rectified in simulation, were corner analysis performed to identify the worst speed corner of the PMOS transistor. A perfect PMOS transistor under ideal conditions can successfully charge and discharge the capacitor as illustrated in Figure 7.6. Corner analysis could have also provided useful temperature dependency information of the implemented system. The ADJUST pulse overlaid on Figure 7.5 is of the correct pulse width as designed but lacks the ability to fully charge and discharge the capacitor in the desired time. The low voltage ADJUST pulse is used to generate the SHIFT pulse. The low voltage ADJUST pulse can barely start to switch the SHIFT pulse as indicated, which in turn can under no circumstance switch any logic levels. The exact same pulse generation circuits are repeated in the transmitter to switch the FIR filter states, hence the result of the transmitter stuck in the default reset state. Figure 7.6. Identical to Figure 5.11. Repeated for reading convenience. ADJUST and SHIFT\_IN pulses generated from the differential input voltage. # 7.5 Power dissipation The measured power dissipation of the transmitter with one active FIR filter tap and full receiver is compared to the simulated power dissipation in Table XI. TABLE XI. COMPARISON OF THE SIMULATED AND ACHIEVED POWER DISSIPATION | | Transmitter | Receiver | |-----------|-------------|----------| | Simulated | 32.5 mW | 19.8 mW | | Measured | 36 mW | 18 mW | It is seen that the measured and simulated power dissipation is comparable with one filter tap active. The slight deviation between the power dissipations can directly be attributed to the off-chip biasing not taken into account in the on-chip simulated values. The power dissipation of the transmitter is in the same range as values proposed in other literature [47], implementing self-adaptive FIR at similar data rates and a smaller CMOS process. ## 7.6 Conclusion Although the adaptive FIR pre-emphasis transceiver did not function according to simulation, the basic functionality of the transmitter and receiver could be established. The transmitter, with one FIR filter tap active, results in an output swing deviation of only 14 % averaged over the three custom made PCBs. The standard deviation between the 3 measured IC was measured to be less than 2 %. Considering the tolerances off the on-chip termination resistors as well as the off-chip biasing resistors, a 14 % deviation is acceptable. The voltage swing can be corrected though by adjusting the external biasing resistor. The receiver showed basic functionality, reacting to a differential signal from the transmitter and generating the necessary control pulses although not of sufficient quality to be useful in any way. The transmitter main state diagram showed basic functionality, being stuck in the reset state as designed to be the default state. # **CHAPTER 8: CONCLUSION** #### 8.1 Introduction This research dealt with the problem faced of limited off-chip bandwidth causing severe DDJ at the far end of a high speed serial link implemented on a conventional copper backplane. Adaptive FIR pre-emphasis was proposed as a means to alleviate the limited bandwidth problem by extending the -3 dB cut-off frequency of the combined channel response. The pilot signalling and peak detection method of adaptive FIR pre-emphasis was implemented in the $0.18~\mu m$ IBM 7WL SiGe BiCMOS process. This chapter discusses the contributions made to the body of knowledge and the conclusions that can be reached by means of the simulation results presented in Chapter 5 and the limited experimental results presented in Chapter 7. Future work and improvements in the current design implementation are also presented to finish off the dissertation. # 8.2 Critical evaluation of the hypothesis Chapter 1 introduced the hypothesis that a fully functional high speed serial link transmitter employing adaptive FIR pre-emphasis could provide an improvement in the off-chip bandwidth and data rates. An adaptive FIR pre-emphasis filter was proposed to reduce the DDJ present in the received signal, resulting in a BER improvement, by extending the -3 dB cut-off frequency of the channel frequency response. The high speed serial link transmitter was implemented in the 0.18 $\mu$ m IBM 7WL SiGe BiCMOS process as presented. The presented hypothesis has been proved through the successful design and simulation of the pilot signalling and peak detection method of applying adaptive FIR pre-emphasis. The pilot signalling and peak detection method does not require a high frequency return path for the tap coefficient updates. Furthermore, a master-slave architecture, requiring an extra transceiver, is not required since a simple combination of amplifiers and comparators can be used to generate control pulses for the tap coefficient training. However, the tap adaptation process does require the data stream to be interrupted every time the user selects the FIR filter to restart the adaptation and find new optimal FIR filter coefficients. Although the idea of pilot signalling and peak detection for the application in high speed serial link transceivers is not novel [16], the CMOS implementation and test of the hypothesis is to the authors' knowledge completely novel. The following aspects of the novel implementation of the adaptation method applied are important: - The pilot signal generator incorporated a CMOS ROM for the storing the data sequences necessary for the adaptation process. A parallel load shift register was employed for loading and transmitting the pilot signals with the application of control signals. - The FIR pre-emphasis transmitter utilised a current-mode DAC for adjustment of each of the FIR filter coefficients. Six FIR filter coefficients were employed in this research for its ability to adapt to worse channels than was simulated. Sign control was also done to allow for negative tap coefficient values. Alternating tap coefficients (regarding the sign) are required to have a high pass filter effect extending the -3 dB bandwidth. - Conventional MOS CML was chosen as the choice for implementing very high frequency circuits on-chip, as discussed in Section 4.4. Conventional CMOS logic circuits was utilised for the lower frequency control of the high speed CML circuits. - The novel implementation utilised 0.8 mm<sup>2</sup> of die real estate in the MPW sponsored by MOSIS. The design further utilised 21 I/O pads for the full implementation, incorporating extra pads for testability, debugging and flexibility on experimental testing. - The pilot signalling and peak detection method was analysed and evaluated on both a mathematical level and circuit level. The mathematical simulations were carried out with the use of MATLAB, while the circuit level simulations made use of the Cadance Virtuoso software package in conjunction with the foundry HIT-Kit. - The mathematical and circuit level simulations showed a great improvement in the data integrity at the receiver. The vertical dimension of the receiver eye diagram can be sufficiently controlled with the external threshold resistor, while the implementation of the high pass filter characteristic FIR filter, opens the horizontal dimension of the eye diagram at the receiver. The DDJ present in the receiver has been shown to improve with every FIR filter tap applied. The DDJ can be reduced to acceptable levels of less than 15 % of the pulse width of the transmitted signal. Finally, the layout of the designed pilot signalling and peak detection method of applying adaptive FIR pre-emphasis for application in high speed serial links over conventional copper backplane channels have been submitted for fabrication. The layouts presented in the dissertation have been sent for prototyping in the $0.18~\mu m$ IBM 7WL SiGe BiCMOS process. The experimental results achieved on the first prototype are presented in Chapter 7. # 8.3 Limitations and assumptions The current hypothesis focuses on reducing the total system jitter by reducing the DDJ introduced by the band limited channel. This however assumes that all other types of jitter are kept to a minimum and that the total system jitter is dominated by the DDJ. This is, however, specifically for copper backplane channels, where this assumption holds. The other types of jitter inherent in the system, such as clock jitter and DCD, are thus assumed to be negligible. The DCD is dependant on the FIR filter driver, having equal rise and fall times ensuring a constant and equal duty cycle under 1010 data transmission. The second limitation is the assumption that the externally applied clock signal to the implemented IC is jitter-free. The clock jitter introduces the RJ component giving it the typical normal distribution. This RJ component can however be improved with the implementation of an on-chip, low phase noise VCO. A third limitation of the current design used for hypothesis verification is that six FIR filter taps will always be used. This is not a limitation as such, but will increase the static power dissipation. Typically in high loss channels, the increased static power dissipation improves the data integrity of the system but in the case of a low loss channels (e.g. a very short PCB trace) the extra filter taps will have no improvement on the data integrity, resulting in static power consumed without a purpose. ## 8.4 Future work and improvements The following list contains suggestions for future research work and improvements leading from this research: - An investigation on the suitability of implementing adaptive pre-emphasis for the improvement of global clock interconnects on-chip. Similar work, for fixed pre-emphasis has been presented in [12], [73] but still requires an accurate model of the channel or a user interface for tap adjustments. This could alleviate the interconnect problem currently faced while new innovative solutions are being developed. - A complete design implementing a low phase noise VCO in the transmitter for a reduction in clock jitter, taking into account all jitter sources resulting in clock jitter. This could further reduce the contributed TJ, further improving the BER achieved for the link. - Incorporating the current receiver design module within a full receiver design incorporating CDR circuits. The complete system could hence be instanced, similar to a standard cell, should the integration be easily accomplished. - Implementing a pseudo random bit sequence (PRBS) generator in the transmitter, together with an 8B/10B encoder in order to get a more accurate situation of actual data transmitted across a copper backplane serial link. - Investigate the design trade-offs of using sub-100 nm CMOS processes in terms of implementation size, attainable data rates and power consumption. The sub-100 nm CMOS processes could also be traded-off against very high $f_T$ HBT transistors available in leading SiGe processes for the implementation of the CML. This trade-off will not only be on a performance basis, but prototyping cost should also be taken into account. - Adding inductors in series with the termination resistors at the transmitter to utilise inductive peaking to possibly further extend the -3 dB cut-off frequency of the channel response and reduce the DDJ in the received signal even further. This could possibly lead to even higher attainable data transfer rates. The pulse generation circuits used for generating control pulses in order to switch the state of the transmitter should be redesigned and improved to allow for more thorough testing in future implementations. # REFERENCES - [1] M. Li, T. Kwasniewski, S. Wang and Y. Tao, "A 10 Gb/s transmitter with multi-tap FIR pre-emphasis in 0.18µm CMOS technology", *Proc. of the 2005 IEEE Asia South Pacific design automation conf.*, Shanghai, pp 679-682, 18-21 Jan. 2005. - [2] G. Niu, "Noise in SiGe HBT RF technology: physics, modelling and circuit implications", Invited paper, *Proc. of the IEEE*, Vol. 93, No. 9, pp. 1583-1597, Sept. 2005. - [3] D.J. Friedman, M. Meghelli, B.D. Parker, J. Yang, H.A. Ainspan, A.V. Rylyakov, Y.H. Kwark, M.B. Ritter, L. Shan, S.J. Zier, M. Soma and M. Soyuer, "SiGe BiCMOS integrated circuits for high speed serial communication links", *IBM J. research & development*, Vol 47, No 2/3, Mar. 2003, [Online]. Available: <a href="https://www.research.ibm.com/journal/rd/472/friedman.html">https://www.research.ibm.com/journal/rd/472/friedman.html</a> - [4] A. Kuo, R. Rosales, T. Farahmand, S. Tabatabaei, and A. Ivanov, "Crosstalk bounded uncorrelated jitter (BUJ) for high speed interconnects", *IEEE Trans. on instrumentation and measurement*, Vol. 54, No. 5, Oct. 2005. - [5] C.H. Lin, C.H. Tsai, C. N. Chen, and S.J. Jou, "4/2 PAM serial link transmitter with tunable pre-emphasis", *Proc. of the IEEE Int. symp on circuits and systems*, Vancouver, pp 952-955, 23-26 May 2004. - [6] A.X. Widmer and P.A. Franaszek, "A DC-balanced partitioned-block, 8B/10B transmission code", *IBM J. of research and development*, Vol. 27, No. 5, Sept. 1983. - [7] B. Analui, J.F. Buckwalter and A. Hajimiri, "Data dependent jitter in serial communications", *IEEE Trans. on microwave theory and techniques*, Vol. 53, No. 11, Nov. 2005. - [8] C.H. Lin, C.H. Wang and S.J. Jou, "5 Gbps serial link transmitter with pre-emphasis", *Proc. of the 2003 IEEE Asia South Pacific design automation conf.*, Kitakyushu, pp. 795-800, 21-24 Jan. 2003. - [9] F. Weiss, D. Kehrer and A.L. Scholtz, "Transmitter and receiver circuits for serial data transmission over lossy copper channels for 10 Gb/s in 0.13 µm CMOS", *Proc. of the* IEEE radio frequency integrated circuits (RFIC) symp., San Francisco, pp. 397-400, 11-13 Jun. 2006. - [10] R. Farjad-Rad, C.K.K. Yang, M.A. Horowitz and T.H. Lee, "A 0.4 µm CMOS 10 Gb/s 4-PAM pre-emphasis serial link transmitter", *IEEE J. of solid-state circuits*, Vol. 34, No. 5, pp. 580-585, May 1999. - [11] K. Yoo, G. Han and S. Park, "A 10 Gbps analog adaptive equaliser and pulse shaping circuit for backplane interface", *Proc. of the 5<sup>th</sup> world scientific and engineering academy and society Int. conf. on circuits, systems, electronics, control & signal processing*, Dallas, pp. 225-229, 1-3 Nov. 2006. - [12] S. Rylov, S. Reynolds, D. Storaska, B. Floyd, M. Kapur, T. Zwick, S. Gowda, and M. Sorna, "10+ Gb/s 90-nm CMOS serial link demo in CBGA package", *IEEE J. of solid-state circuits*, Vol. 40, No. 9, Sept. 2005. - [13] R. Farjad-Rad, C.K. Yang, M. A. Horowitz, and T. Lee, "A 0.3-µm CMOS 8-Gb/s 4-PAM serial link transceiver", *IEEE J. of solid-state circuits*, Vol. 35, No. 5, May 2000. - [14] C.Y. Yang and Y. Lee, "A 0.18 µm CMOS 1 Gb/s serial link transceiver by using PWM and PAM techniques", *Proc. of the IEEE Int. symp. on circuits and systems*, Vol. 2, pp. 1150-1153, Kobe, 23-26 May 2005. - [15] F. Zarkeshvari, P. Noel, S. Uhanov and T. Kwasniewski, "An overview of high-speed serial I/O trends, techniques and standards", *Canadian conf. on electrical and computer engineering (CCECE)*, Ontario, pp. 1215-1220, 2-5 May 2004. - [16] K. Yoo and G. Han, "An adaptation method for FIR pre-emphasis filter on backplane channel", *Proc. of IEEE Int. symp. on circuits and systems*, Island of Kos, pp. 5151-5154, 21-24 May 2006. - [17] M. Cases, D.N. de Araujo and E. Matoglu, "Electrical design and specification challenges for high speed serial links", *Proc. of the IEEE electronics packaging technology conf.*, Vol. 1, Singapore, pp. 29-33, 7-9 Dec. 2005. - [18] J.M. Khoury and K.R. Lakshmikumar, "High speed serial transceivers for data communication systems", *IEEE communications magazine*, pp. 160-165, Jul. 2001. - [19] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T. Morf, J. Weiss and M. L. Schmatz, "A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology", *IEEE J. of solid-state circuits*, Vol. 41, No. 4, Apr. 2006. - [20] J. H. R. Schrader, E. A. M. Klumperink, J. L. Visschers and B. Nauta, "Pulse-width modulation pre-emphasis applied in a wireline transmitter, achieving 33 dB loss compensation at 5-Gb/s in 0.13-μm CMOS", *IEEE J. of solid-state circuits*, Vol. 41, No. 4, Apr. 2006 - [21] D.J. Foley and M.P. Flynn, "A low-power 8-PAM serial transceiver in 0.5 µm digital CMOS", *IEEE J. of solid-state circuits*, Vol. 37, No. 3, pp. 310-319, Mar. 2002. - [22] D. Foty, S. Sinha, M. Weststrate, C. Coetzee, A.H. Uys and E. Sibanda, "mm-wave radio communications systems: the quest continues," *Proc. of the 3rd Int. radio electronics forum (IREF) on applied radio electronics. The state and prospects of development*, Kharkov, 22-24 Oct. 2008, pp. 14-17. - [23] S.H. Hall, G.W. Hall and J.A. McCall, "Connectors, packages and vias" in *High-speed digital system design: A handbook of interconnect theory and design practice*, Wiley IEEE Press, 2000. - [24] H.W. Johnson and M. Graham, *High-speed digital design: A handbook of black magic*, Prentice Hall, New Jersey, 1993. - [25] D.E. Bockelman and W.R. Eisenstadt, "Combined differential and common-mode scattering parameters: theory and simulation", *IEEE Trans. on microwave theory and techniques*, Vol. 43. No. 7, pp. 1567-1575, Jul. 1995. - [26] M. Li, T. Kwasniewski, S. Wang and Y. Tao, "FIR filter optimization as pre-emphasis of high speed backplane data transmission", *Proc of the Int. IEEE conf. on communications, circuits and systems*, Chengdu, Vol. 2, pp. 773-776, 27-29 Jun. 2004 - [27] L. Zhang and T. Kwasniewski, "Optimal equalization for reducing the impact of channel group delay distortion on high-speed backplane data transmission", *Int. J. of electronics and communications*, doi: 10.1016/j.aeue.2009.04.010, Apr. 2009. - [28] P.R. Trischitta and E.L. Varma, *Jitter in digital transmission systems*, Artech House, Inc., 1958. - [29] P.K. Hanumolu, B. Casper, R. Mooney, G.Y. Wei and U.K. Moon, "Jitter in high-speed serial and parallel links", *Proc. of the Int. symp. on circuits and systems*, Vancouver, Vol. 4, pp. 425-428, 23-26 May 2004. - [30] D. Hong, C. Ong, and T. Cheng, "Bit error rate estimation for high speed serial links", *IEEE Trans. on circuits and systems I: Regular papers*, Vol. 53, No. 12, Dec. 2006. - [31] P.R Gray, P.J. Hurst, S.H. Lewis and R.G. Meyer, *Analysis and design of analog integrated circuits*, 4<sup>th</sup> edition, John Wiley & Sons, Inc, New York, pp. 748-756, 2000. - [32] J. Sun, M. Li and J. Wilstrup, "A demonstration of deterministic jitter (DJ) deconvolution", *IEEE instrumentation and measurement conf.*, Anchorage, Vol. 1, pp. 293-298, 21-23 May 2002. - [33] D.C. Montgomery, G.C. Runger and N.F. Hubele, *Engineering statistics*, 3<sup>rd</sup> edition, John Wiley & Sons, Inc, New York, pp. 61-69, 2004. - [34] PCI-SIG, *PCI express jitter modelling: Revision 1.0RD*, [Online]. Available: <a href="http://www.pcisig.com/specifications/pciexpress/technical\_library/PCI\_Express\_Jitter\_White\_Paper\_1\_0\_May\_27\_20043.pdf">http://www.pcisig.com/specifications/pciexpress/technical\_library/PCI\_Express\_Jitter\_White\_Paper\_1\_0\_May\_27\_20043.pdf</a> - [35] C. Peiyi, "Development of SiGe materials and devices", *Proc. of the Int. conf. on solid-state and integrated circuit tech.*, Shanghai, Vol. 1, pp. 570-574, 22-25 Oct. 2001. - [36] T. Yu, S. Cho, and H. Jeong, "A 10-GHz CMOS LC VCO with wide tuning range using capacitive degeneration", *J. of semiconductor technology and science*, Vol. 6, No. 4, Dec. 2006. - [37] P. K. Hanumolu, G. Wei and U. Moon, "Equalizers for high speed serial linka", *Int. J. of high speed electronics and systems*, Vol. 15, No. 2, Feb. 2005. - [38] B.P. Lathi, *Modern digital and analog communication systems*, 3<sup>rd</sup> edition, Oxford University Press, New York, pp. 567-572, 1998. - [39] X. Lin, J. Liu, H. Lee and H. Liu, "A 2.5- to 3.5-Gb/s Adaptive FIR Equalizer With Continuous-Time Wide-Bandwidth Delay Line in 0.25-µm CMOS", *IEEE J. of solid-state circuits*, Vol. 41, No. 8, Aug. 2006 - [40] L. Lin, P. Noel and T. Kwasniewski, "Implementing a digitally synthesized adaptive pre-emphasis algorithm fur use in a high-speed backplane interconnection", *Canadian conf. on electrical and computer engineering*, Ontario, Vol. 3, pp. 1221-1224, 2-5 May 2004. - [41] L. Zhang, T. Kwasniewski, "FIR filter optimization using bit-edge equalization in high-speed backplane data transmission", *Microelectronics journal*, Elsevier, July 2008. - [42] J. F. Buckwalter, M. Meghelli, D. J. Friedman and A. Hajimiri, "Phase and amplitude pre-emphasis techniques for low-power serial links", *IEEE J. of solid-state circuits*, Vol. 41, No. 6, June 2006. - [43] E.C. Ifeachor and B.W. Jervis, "Finite impulse response filter design" in *Digital* signal processing: A practical approach, 2<sup>nd</sup> edition, Prentice Hall, Essex, England, 2002. - [44] N.J. Loy, An engineers guide to FIR digital filters, Prentice Hall, New Jersey, 1988. - [45] M.E. Goosen and S. Sinha, "Adaptive FIR filter pre-emphasis for high speed serial links," *Proc. of the South African conf. on semi and superconductor technology (SACSST)*, Stellenbosch, pp. 37-42, 8-9 April 2009. - [46] J.T. Stonick, G.Y. Wei, J.L. Sonntag and D.K. Weinlader, "An adaptive PAM-4 5 Gb/s backplane transceiver in 25 µm CMOS", *IEEE J. of solid-state circuits*, pp. 436-443, Vol. 38, No. 3, Mar. 2003. - [47] D. Tonietto, J. Hogeboon, E. Bensoudane, S. Sadeghi, H. Khor, P. Krotnev, "A 7.5Gb/s transmitter with self-adaptive FIR", *Proc. of IEEE symp. on VLSI circuits digest of technical papers*, Honolulu, pp. 198-199, 18-22 Jun. 2008. - [48] J. Musicer and J. Rabaey, "MOS current mode logic for low power, low noise CORDIC computation in mixed-signal environments", *Proc. of Int. symp. on low power electronics and design*, Rapallo, pp. 102-107, 25-27 July 2000. - [49] M. Alioto and G. Palumbo, "Highly accurate and simple models for CML and ECL gates", *IEEE Trans. on computer-aided design of integrated circuits and systems*, Vol. 18, No. 9, Sept. 1999. - [50] K.M. Sharaf and M.I. Elmasry, "An accurate analytical propagation delay model for high-speed CML bipolar circuits", *IEEE J. of solid-state circuits*, Vol. 29, No. 1. Jan. 1994. - [51] H. Hassan, M. Anis and M. Elmasry, "MOS current mode circuits: analysis, design and variability", *IEEE Trans. on very large scale integration systems*, Vol. 13, No. 8, Aug. 2005. - [52] J.F. Bulzacchelli, M. Meghelli, S.V. Rylov, W. Rhee, A.V. Rylyakov, H.A. Ainspan, B.D. Parker, M.P. Beakes, A. Chung, T.J. Beukema, P.K. Pepeljugoski, L. Shan, Y.H. Kwark, S. Gowda and D.J. Friedman, "A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS technology", *IEEE J. of solid-state circuits*, Vol. 41, No. 12, Dec. 2006. - [53] IBM microelectronics division, *BiCMOS-7WL Model reference guide*, San Jose: Cadence Design Systems, 12 May 2008. - [54] IBM microelectronics division, *BiCMOS-7WL Design manual*, San Jose: Cadence Design Systems, 13 May 2008. - [55] D.G. Kam, M.B. Ritter, T.J. Beukema, J.F. Bulzacchelli, P.K. Pepeljugoski, Y.H. Kwark, L. Shan, X. Gu, C.W. Baks, R.A. John, G. Hougham, C. Schuster, R. Rimolo-Donadio and B. Wu, "Is 25 Gb/s on-board signaling viable?", *IEEE Trans. on advanced packaging*, Vol. 31, No. 4, Nov. 2008. - [56] Anon, 2007. Virtuoso schematic composed user guide, San Jose: Cadence Design Systems - [57] Anon, 2007. Virtuoso Analog Design Environment, San Jose: Cadence Design Systems - [58] Anon, 2007. Virtuoso Layout suite L user guide, San Jose: Cadence Design Systems - [59] IBM microelectronics division, *BiCMOS8HP Design manual*, San Jose: Cadence Design Systems, 18 July 2007. - [60] IBM microelectronics division, *BiCMOS8HP Model reference guide*, San Jose: Cadence Design Systems, 24 July 2007. - [61] Agilent Technologies, Agilent PSA series spectrum analyzers data sheet, 1 June 2008. - [62] Agilent Technologies, Agilent E440A PSA spectrum analyzer user manual - [63] K.B. Unchwaniwala and M.F. Caggiano, "Electrical analysis of IC packaging with emphasis on different ball grid array packages", *Proc. of IEEE electronic components and technology conf.*, Orlando, pp 1496-1501, 29 May 1 Jun. 2001. - [64] M.E. Goosen and S. Sinha, "Analysis of adaptive FIR filter pre-emphasis for high speed serial links", *Proc. of IEEE Africon 2009*, Nairobi, 23-26 Sept. 2009. - [65] S.M. Wentworth, *Fundamentals of electromagnetics with engineering applications*, 1<sup>st</sup> edition, John Wiley & Sons, 2004. - [66] J. D. Cressler and G. Nui, *Silicon germanium heterojunction bipolar transistors*, Artech House, Massachusetts, 2003. - [67] M.E. Goosen, S. Sinha, A. Müller and M. du Plessis, "A low switching time transmitter for high speed adaptive pre-emphasis serial links," *Proc. of IEEE CAS 2009*, pp. 481-484, Sinaia, 12-14 Oct. 2009. - [68] A. Hairapetian, "Current controlled CMOS logic family", *United States Patent* 7215169B2, 8 May, 2007. - [69] S. Kang and Y. Leblebici, "BiCMOS logic circuits" in CMOS digital integrated circuits: Analysis and design, 3rd edition, McGraw Hill, New York, 2003. - [70] K. Jayaraman, Q.A. Khan and P. Chiang, "Design and analysis of 1-60GHz, RF CMOS peak detectors for LNA calibration", *Proc. of the IEEE Int. symp. on VLSI design, automation and test*, Taiwan, 28-30 Apr. 2009 - [71] A. Valdes-Garcia, R. Venkatasubramanian, J. Silva-Martinez and E. Sánchez-Sinencio, "A broadband CMOS amplitude detector for on-chip RF measurements", *IEEE Trans. on instrumentation and measurement*, Vol. 57, No. 7, July 2008. - [72] R.G. Meyer, "Low-power monolithic RF peak detector analysis", *IEEE J. of solid-state circuits*, Vol. 30, No. 1, Jan. 1995. - [73] D. Schinkel, E. Mensink, E.A.M. Klumperink, A.J.M. van Tuijl and B. Nauta, "A 3-Gb/s/ch transceiver for 10-mm uninterrupted *RC*-limited global on-chip interconnects", *IEEE J. of solid-state circuits*, Vol. 41, No. 1, Jan 2006. The code for the mathematical design and simulation of the adaptive FIR pre-emphasis system is given below. ``` %Marius Goosen %Adaptive FIR filtering: finding the optimal tap coefficients 8 2009-03-19 clear all; clc; time = 0:100:700; time2 = 0:100:1400; *specifying the channel impulse response coefficients (FROM SPICE %simulation) CI = [0 \ 0.018 \ 0.017 \ 0.01 \ 0.005 \ 0.003 \ 0.001 \ 0]; %plotting the sampled impulse response plot(time,CI); xlabel ('Time (ps)'); xlim ([0 800]); ylabel ('Voltage [V]'); ylim ([0 \ 0.02]); %Title ('Sampled channel impulse response'); %Specifying some data to determine an actual eye diagram load PRE_FIR_DATA.csv DATA_PRE_FIR = PRE_FIR_DATA; %The received data can be determined by convoluting the data with the %channel impulse response (on a digital level) C = zeros (256, 15); for k = 1:1:256, C(k,:) = convn (DATA_PRE_FIR(k,:), CI(1,:), 'full'); end; figure; hold on; for k = 1:1:256, plot(time2,C(k,:)); end: xlabel ('Time (ps)'); ylabel ('Voltage [V]'); Title ('Superposition of all the date sequences at the receiver'); figure; hold on; for k = 1:1:256, for j = 1:1:6, plot(0:100:200,C(k,[j j+1 j+2])); end: end; xlabel ('Time (ps)'); ylabel ('Voltage [V]'); Title ('Eye diagram at the receiver'); *Say we are using a 6-bit DAC, and for simulation purposes the maximum tap %value is 10. Thus the resolution for the DAC is 0.15625 with a maximum %value of 9.84375. Negative values is also an option since current %direction can be reversed through the differential transmitter. taps = zeros(1,8); Tx_training_sequence = [1 0 0 0 0 0 0 0; 1 1 0 0 0 0 0 0; ``` ``` YUNIBESITHI YA PRETORIA 0100000; 1 0 0 1 0 0 0 0; 1 0 0 0 1 0 0 0; 1 0 0 0 0 1 0 0; 1 0 0 0 0 0 1 0; 1 0 0 0 0 0 0 1]; new_TX_sequence = zeros(8,15); tap\_converge = zeros (8,128); %determining the filter taps for k=1:1:8. idealvalue = 0.1; %The ideal voltage for CML is about 200mV p-p val = 10; taps(k) = 10; count = 1; tap_converge (k,count) = taps(k); while (val >= idealvalue), %testing whether the received value is larger count = count +1; taps(k) = taps(k) - 0.15625; tap_converge(k,count) = taps(k); new_TX_sequence(k,1) = taps(1)*Tx_training_sequence(k,1); new_TX_sequence(k,2) = taps(1)*Tx_training_sequence(k,2) + taps(2)*Tx_training_sequence(k,1); new_TX_sequence(k,3) = taps(1)*Tx_training_sequence(k,3) + taps(2)*Tx_training_sequence(k,2)+ taps(3)*Tx_training_sequence(k,1); new_TX_sequence(k,4) = taps(1)*Tx_training_sequence(k,4) + taps(2)*Tx_training_sequence(k,3)+ taps(3)*Tx_training_sequence(k,2) + taps(4)*Tx_training_sequence(k,1); new_TX_sequence(k,5) = taps(1)*Tx_training_sequence(k,5) + taps(2)*Tx_training_sequence(k,4)+ taps(3)*Tx_training_sequence(k,3) + taps(4)*Tx_training_sequence(k,2)+ taps(5)*Tx_training_sequence(k,1); new_TX_sequence(k,6) = taps(1)*Tx_training_sequence(k,6) + taps(2)*Tx\_training\_sequence(k, 5) + taps(3)*Tx\_training\_sequence(k, 4) + taps(2)*Tx\_training\_sequence(k, 4) + taps(3)*Tx\_training\_sequence(k, taps(3)*T taps(4)*Tx_training_sequence(k,3)+ taps(5)*Tx_training_sequence(k,2)+ taps(6)*Tx_training_sequence(k,1); new_TX_sequence(k,7) = taps(1)*Tx_training_sequence(k,7) + taps(2)*Tx_training_sequence(k,6)+ taps(3)*Tx_training_sequence(k,5) + taps(4)*Tx_training_sequence(k,4)+ taps(5)*Tx_training_sequence(k,3)+ taps(6)*Tx_training_sequence(k,2)+ taps(7)*Tx_training_sequence(k,1); new_TX_sequence(k,8) = taps(1)*Tx_training_sequence(k,8) + taps(2)*Tx_training_sequence(k,7)+ taps(3)*Tx_training_sequence(k,6) + taps(4)*Tx\_training\_sequence(k,5)+ taps(5)*Tx\_training\_sequence(k,4)+ taps(6)*Tx\_training\_sequence(k,3)+ taps(7)*Tx\_training\_sequence(k,2)+ taps(8) *Tx_training_sequence(k,1); %the remaining sequences cannot produce a maximum thus can be ignored new_TX_sequence(k,9) = taps(2)*Tx_training_sequence(k,8) + taps(3)*Tx_training_sequence(k,7)+ taps(4)*Tx_training_sequence(k,6)+ taps(5)*Tx\_training\_sequence(k,5)+ taps(6)*Tx\_training\_sequence(k,4)+ taps(7)*Tx_training_sequence(k,3)+ taps(8)*Tx_training_sequence(k,2); new_TX_sequence(k,10) = taps(3)*Tx_training_sequence(k,8)+ taps(4)*Tx_training_sequence(k,7)+ taps(5)*Tx_training_sequence(k,6) + taps(6)*Tx_training_sequence(k,5)+ taps(7)*Tx_training_sequence(k,4)+ taps(8) *Tx_training_sequence(k,3); new_TX_sequence(k,11) = taps(4)*Tx_training_sequence(k,8)+ taps(5)*Tx_training_sequence(k,7)+ taps(6)*Tx_training_sequence(k,6) + taps(7)*Tx_training_sequence(k,5)+ taps(8)*Tx_training_sequence(k,4); new_TX_sequence(k,12) = taps(5)*Tx_training_sequence(k,8)+ 응 taps(6)*Tx_training_sequence(k,7)+ taps(7)*Tx_training_sequence(k,6) + taps(8) *Tx_training_sequence(k,5); new_TX_sequence(k,13) = taps(6)*Tx_training_sequence(k,8)+ taps(7)*Tx_training_sequence(k,7)+ taps(8)*Tx_training_sequence(k,6); new_TX_sequence(k,14) = taps(7)*Tx_training_sequence(k,8)+ 2 taps(8)*Tx_training_sequence(k,7); new_TX_sequence(k,15) = taps(8)*Tx_training_sequence(k,8); RX_sequence = convn(new_TX_sequence(k,:),CI,'full'); val = max(RX_sequence); %a simple peak detector!! end; for j=count:1:128, ``` tap\_converge(k, j) = taps(k); end; end; ``` 8----- %Implement the filter taps to check if the system has improved. new_TX_DATA = zeros (256,15); %taps(2) = 0; %taps(3) = 0; %taps(4) = 0; %taps(5) = 0; %taps(6) = 0; %taps(7) = 0; %taps(8) = 0; for k=1:1:256, new_TX_DATA(k,1) = taps(1)*DATA_PRE_FIR(k,1); new_TX_DATA(k,2) = taps(1)*DATA_PRE_FIR(k,2) + taps(2)*DATA_PRE_FIR(k,1); taps(3)*DATA_PRE_FIR(k,1); new_TX_DATA(k, 4) = taps(1)*DATA_PRE_FIR(k, 4) + taps(2)*DATA_PRE_FIR(k, 3) + taps(2)*DATA_PRE_FIR(k, 3) + taps(3)*DATA_PRE_FIR(k, 3)*DATA_PRE_FIR(k, 3)*DATA_PRE_FIR(k, 3)*DATA_PRE_FIR(k, 3)*DATA_PRE_FIR(k, 3)*DATA_PRE_FIR(k, 3)*DATA_PRE_FIR(k, 3 taps(3)*DATA_PRE_FIR(k,2) + taps(4)*DATA_PRE_FIR(k,1); new_TX_DATA(k,5) = taps(1)*DATA_PRE_FIR(k,5) + taps(2)*DATA_PRE_FIR(k,4)+ taps(3)*DATA_PRE_FIR(k,3) + taps(4)*DATA_PRE_FIR(k,2)+ taps(5)*DATA_PRE_FIR(k,1); new_TX_DATA(k,6) = taps(1)*DATA_PRE_FIR(k,6) + taps(2)*DATA_PRE_FIR(k,5)+ taps(3)*DATA_PRE_FIR(k,4) + taps(4)*DATA_PRE_FIR(k,3)+ taps(5)*DATA_PRE_FIR(k,2)+ taps(6)*DATA_PRE_FIR(k,1); new_TX_DATA(k,7) = taps(1)*DATA_PRE_FIR(k,7) + taps(2)*DATA_PRE_FIR(k,6)+ taps(3)*DATA\_PRE\_FIR(k,5) + taps(4)*DATA\_PRE\_FIR(k,4) + taps(5)*DATA\_PRE\_FIR(k,3) + taps(6)*DATA_PRE_FIR(k,2)+ taps(7)*DATA_PRE_FIR(k,1); new_TX_DATA(k, 8) = taps(1)*DATA_PRE_FIR(k, 8) + taps(2)*DATA_PRE_FIR(k, 7) + taps(3)*DATA\_PRE\_FIR(k,6) + taps(4)*DATA\_PRE\_FIR(k,5) + taps(5)*DATA\_PRE\_FIR(k,4) + taps(6)*DATA_PRE_FIR(k,3)+ taps(7)*DATA_PRE_FIR(k,2)+ taps(8)*DATA_PRE_FIR(k,1); new_TX_DATA(k,9) = taps(2)*DATA_PRE_FIR(k,8) + taps(3)*DATA_PRE_FIR(k,7) taps(3)*D taps(4)*DATA\_PRE\_FIR(k,6) + taps(5)*DATA\_PRE\_FIR(k,5) + taps(6)*DATA\_PRE\_FIR(k,4) + taps(7)*DATA_PRE_FIR(k,3)+ taps(8)*DATA_PRE_FIR(k,2); new_TX_DATA(k,10) = taps(3)*DATA_PRE_FIR(k,8) + taps(4)*DATA_PRE_FIR(k,7) taps(4)* taps(5)*DATA_PRE_FIR(k,6) + taps(6)*DATA_PRE_FIR(k,5)+ taps(7)*DATA_PRE_FIR(k,4)+ taps(8)*DATA_PRE_FIR(k,3); new_TX_DATA(k,11) = taps(4)*DATA_PRE_FIR(k,8) + taps(5)*DATA_PRE_FIR(k,7) + taps(6)*DATA_PRE_FIR(k,6) + taps(7)*DATA_PRE_FIR(k,5)+ taps(8)*DATA_PRE_FIR(k,4); new_TX_DATA(k, 12) = taps(5)*DATA_PRE_FIR(k, 8) + taps(6)*DATA_PRE_FIR(k, 7) + taps(7)*DATA_PRE_FIR(k,6) + taps(8)*DATA_PRE_FIR(k,5); new_TX_DATA(k,13) = taps(6)*DATA_PRE_FIR(k,8) + taps(7)*DATA_PRE_FIR(k,7) + taps(8)*DATA_PRE_FIR(k,6); new_TX_DATA(k,14) = taps(7)*DATA_PRE_FIR(k,8) + taps(8)*DATA_PRE_FIR(k,7); new_TX_DATA(k, 15) = taps(8)*DATA_PRE_FIR(k, 8); end: %The received data can be determined by convoluting the data with the %channel impulse response (on a digital level) C2 = zeros (256, 22); for k = 1:1:256. C2(k,:) = convn (new_TX_DATA(k,:), CI(1,:), 'full'); end; figure; hold on; time3 = 0:100:2100; for k = 1:1:256, plot(time3,C2(k,:)); end xlabel ('Time (ps)'); ylabel ('Voltage [V]'); Title ('Superposition of all the date sequences at the receiver after pre- emphasis'); figure; hold on; for k = 1:1:256, for j = 1:1:20, plot(0:100:200,C2(k,[j j+1 j+2])); end; ``` ``` end; xlabel ('Time (ps)'); ylabel ('Voltage [V]'); %Title ('Eye diagram at the receiver after pre-emphasis'); %Determining the DDJ component in the pulse edges countg1 = 0; countg2 = 0; for k = 1:1:256, for j = 1:1:20, if abs(((C2(k,j)-C2(k,j+1))/100)) > 0.0002 countg1 = countg1+1; m1(countg1) = (C2(k,j)-C2(k,j+1))/100; x_{sect}(countg1) = (0.05 - C2(k,j))/m1(countg1); end; if abs(((C2(k,j+1)-C2(k,j+2))/100)) > 0.0002 countg2 = countg2+1; m2 (countg2) = (C2(k, j+1)-C2(k, j+2))/100; end; end: end; figure; hist (-1*x_{sect}, 1000); ylabel ('No. of occurences'); xlabel ('Timing deviation around pulse edge (ps)'); %figure; %hist (m2*50,1000); %Plotting the tap convergence time4 = 0:0.01:1.27; figure; subplot (1,2,1), plot(time4,tap_converge(1,:)); xlim ([0,1.27]); ylim ([-10,10]); xlabel ('Time (us)'); ylabel ('Tap 1 coefficient value'); subplot (1,2,2), plot(time4,tap_converge(2,:)); xlim ([0,1.27]); ylim ([-10, 10]); xlabel ('Time (us)'); ylabel ('Tap 2 coefficient value'); figure; subplot (1,2,1), plot(time4,tap_converge(3,:)); xlim ([0,1.27]); ylim ([-10, 10]); xlabel ('Time (us)'); ylabel ('Tap 3 coefficient value'); subplot (1,2,2), plot(time4,tap\_converge(4,:)); xlim ([0,1.27]); ylim ([-10,10]); xlabel ('Time (us)'); ylabel ('Tap 4 coefficient value'); figure; subplot (1,2,1), plot(time4,tap_converge(5,:)); xlim ([0,1.27]); ylim ([-10, 10]); xlabel ('Time (us)'); ylabel ('Tap 5 coefficient value'); subplot (1,2,2), plot(time4,tap\_converge(6,:)); xlim ([0,1.27]); ylim ([-10, 10]); xlabel ('Time (us)'); ylabel ('Tap 6 coefficient value'); data_vir_plot = [0 1 0 1 0 0 0 1 0 0 0 0 0]; figure; subplot (3,1,1), plot (data_vir_plot); xlim ([0, 14]); title ('Ideal data transmitted'); ``` Appendix A ``` ylabel ('Voltage [V]'); subplot (3,1,2), plot(0:1:14,C(139,:)); title ('Data received without pre-emphasis'); ylabel ('Voltage [V]'); subplot (3,1,3), plot(0:1:21,C2(139,:)); title ('Data received with pre-emphasis'); xlim ([0, 14]); ylim([0, 0.1]); ylabel ('Voltage [V]'); xlabel ('Bit no.') ``` # APPENDIX B: DETAILED LAYOUTS OF THE SYSTEM Figure B.1. Layout of on of the FIR filter taps. A single FIR filter tap incorporates its own count logic, driving transistors and controlling current-mode DAC. The different subsystems depicting the filter tap is indicated. The dimensions are 150 $\mu m$ x 70 $\mu m$ . Figure B.2. Control logic interacting with each individual filter tap control logic. The control logic shown in this figure chooses which tap should be active and adjust according to the control signals received. The dimensions are 130 $\mu m$ x 65 $\mu m$ . Figure B.3. Layout of the control logic of the pilot signal generator. The circuit shown keeps track of which pilot signal to transmit at what stage, as well as storing the actual pilot signal in the ROM as illustrated. The CMOS logic as shown is surrounded by a guard ring to reduce unnecessary CMOS switching noise from affecting the CML circuits close by. The dimensions are 100 μm x 40 μm. Figure B.4. Layout of the parallel load shift register, implemented using current-mode logic, operating in excess of 5 Gb/s. The layout in Figure B.3 precedes Figure B.4 to provide control and the necessary pilot sequences. The dimensions are $420~\mu m \ x \ 50~\mu m$ Figure B.5. Final top level layout of the adaptive pre-emphasis receiver generating the necessary control signals to adjust the transmitter. The dimensions of the layout including the I/O pads are 1200 $\mu$ m x 820 $\mu$ m. Figure B.6. Final top level layout of the adaptive pre-emphasis transmitter showing its connection to the bond pads. The dimensions of the layout including the I/O pads are 1200 $\mu m$ x 2785 $\mu m$ . Figure B.7. Bonding diagram for the MPW run. The receiver (RX) is situated in the bottom right, while the transmitter (TX) is to the right of the IC. # APPENDIX C: DATASHEET FOR THE PRE-EMPHASIS IMPLEMENTATION ## Overview The IC implements the pilot signalling and peak detection method of adaptive FIR preemphasis. The adaptation process is result-driven, since the control signals generated at the receiver for controlling the transmitter are generated by determining the amplitude characteristic of the received signal pulse. This section describes the hardware connection of the adaptive FIR pre-emphasis transceiver as was designed for the verification of the pilot signal and peak detection method as was under consideration in this research. ## Pin layout and description Figure C.1 illustrates the pin diagram of the designed IC. Table C.1 and Table C.2 contains the detailed pin number, names and descriptions. Figure C.1. Pin diagram of the adaptive FIR pre-emphasis transceiver. # TRANSMITTER PIN NUMBER, NAME AND DESCRIPTION | Pin number | Pin name | Brief description | | |------------|----------|---------------------------------------------------------------------------|--| | 52 | SHIFT_IN | Input pin for SHIFT pulses received from receiver. | | | 53 | ADJUST | Input pin for ADJUST received directly from the receiver. | | | 54 | DATA_in | Single ended input pin for data. | | | 55 | CLK_in | Single ended input pin for the clock. | | | 56 | AUTO | Switches the transmitter between automatic adaptation or user controlled. | | | | | AUTO = 1 - Automatic | | | | | AUTO = 0 - User controlled | | | 57 | EOC_user | User input for EOC. This pin is always active and state of AUTO is | | | | | unimportant. EOC is a pulse signal. | | | 58 | RVB | Pin for connection of external resistor for CML biasing. Nominal value of | | | | | approximately 23 k $\Omega$ . | | | 59 | RVBias | Pin for connection of external resistor for DAC biasing. Nominal value of | | | | | approximately 9 k $\Omega$ . | | | 60 | RESET | Reset pin for CMOS logic. Resets on a positive pulse, and should stay | | | | | HIGH throughout operation. | | | 61 | DOUT+ | Output pin of differential data. Polarity not important. | | | 62 | GND | Ground pin. | | | 63 | DOUT- | Output pin of differential data. Polarity not important. | | | 64 | VDD | Supply voltage. $(V_{DDmax} = 1.8 \text{ V})$ | | TABLE C.2. RECEIVER PIN NUMBER, NAME AND DESCRIPTION | Pin number | Pin name | Brief description | |------------|---------------------|----------------------------------------------------------------------------| | 44 | Din+ | Input pin for differential data. Polarity insensitive. | | 45 | GND | Ground pin. | | 46 | Din- | Input pin for differential data. Polarity not important. | | 47 | $R_{Threshold}$ | Pin for connection of threshold adjusting resistor. See graph (Figure C.2) | | | | for making a choice. | | 48 | $R_{\mathrm{Bias}}$ | Pin for biasing the amplifier and comparators. Nominal value of | | | | approximately 1.5 k $\Omega$ . | | 49 | ADJUST | Output pin for adjusting transmitter filter taps. | | 50 | VDD | Supply voltage. $(V_{DDmax} = 1.8 \text{ V})$ | | 51 | SHIFT_OUT | Output pin for controlling transmitter and pilot signal generator. | # **Biasing resistors** # Transmitter Two separate pins were provided for external biasing of current sources. This was chosen to be off-chip for better testing ability and flexibility. The CML circuits on-chip were chosen to have a total tail current of 250 $\mu A$ , but can be adjusted by changing the RVB resistor. The designed resistor value is 23 k $\Omega$ . By reducing the resistor value the CML tail currents are increased, resulting in higher voltage swing on-chip, as well as increased performance. The DAC current source can also be controlled off-chip by changing the RVBias resistor. The nominal value of 9 k $\Omega$ will result in an LSB current of 78.125 $\mu A$ and a most significant bit current of 2.5 mA. ## Receiver The receiver utilises one additional pin for external biasing of the receiver amplifier and comparators. This was done to add flexibility since the tail current will directly affect the voltage gain of the amplifiers. ## Threshold voltage The threshold voltage, which directly controls the vertical dimension of the receiver eye diagram, can be adjusted according to Figure C.2. All pins not allocated in Table C.1 and Table C.2 are used by other designers in the implemented MPW run. Figure C.2. External threshold voltage adjustment to control the input voltage amplitude at the receiver.