## UNIVERSITÉ DE MONTRÉAL # PROVIDING BI-DIRECTIONAL, ANALOG, AND DIFFERENTIAL SIGNAL TRANSMISSION CAPABILITY TO AN ELECTRONIC PROTOTYPING PLATFORM # WASIM HUSSAIN DÉPARTEMENT DE GÉNIE ÉLECTRIQUE ÉCOLE POLYTECHNIQUE DE MONTRÉAL THÈSE PRÉSENTÉE EN VUE DE L'OBTENTION DU DIPLÔME DE PHILOSOPHIAE DOCTOR (GÉNIE ÉLECTRIQUE) DÉCEMBRE 2015 # UNIVERSITÉ DE MONTRÉAL # ÉCOLE POLYTECHNIQUE DE MONTRÉAL #### Cette thèse intitulée: # PROVIDING BI-DIRECTIONAL, ANALOG, AND DIFFERENTIAL SIGNAL TRANSMISSION CAPABILITY TO AN ELECTRONIC PROTOTYPING PLATFORM présentée par : HUSSAIN Wasim en vue de l'obtention du diplôme de : <u>Philosophiae Doctor</u> a été dûment acceptée par le jury d'examen constitué de : - M. SAWAN Mohamad, Ph. D., président - M. SAVARIA Yvon, Ph. D., membre et directeur de recherche - M. BLAQUIÈRE Yves, Ph. D., membre et codirecteur de recherche - M. AUDET Yves, Ph. D., membre - M. NABKI Frédéric, Ph. D., membre externe # **DEDICATION** To my beloved parents... #### ACKNOWLEDGEMENTS I would like to express my deepest gratitude to my supervisor Professor Yvon Savaria and Professor Yves Blaquière for their insightful guidance and constant support throughout this research. I feel extremely privileged to have been able to work under their supervision. They provided me with a healthy research environment with full freedom to develop my work as well as adequate supervision to lead me in the right direction. I am indebted to them. I would like to express my deep gratitude to Gestion Technocap, the Natural Sciences and Engineering Research Council of Canada and the Mitacs program for supporting my research. I am grateful to CMC Microsystems for the products and services that facilitated this research (CAD tools by Cadence, fabrication services using 0.13 μm CMOS technology from IBM, and packaging services). I am grateful to Omar Al Terkawi for his help in CAD tools and laying out the test chip. I thank Bryan Tremblant for helping me with the PCB. I thank Rejean Lepage for keeping the lab computers up and running all times, particularly before tape-out deadlines. Special thanks to Marie Yannick Laplante of the Electrical Engineering Department for being so friendly and supportive, even with last minute requests. It has been a great pleasure for me to be a part of the Groupe de Recherche en Microèlectronique et Microsystèmes (GRM). Last but surely not least, I am grateful to my family for their endless care and support throughout the long path of my academic endeavour. #### RÉSUMÉ Les réseaux d'interconnexions programmables (FPIN) se retrouvent largement utilisés dans plusieurs structures bien connues telles que les FPGA, les plateformes de prototypages ainsi que dans plusieurs architectures de réseaux intégrés. Le but de la présente thèse est d'améliorer la structure actuelle des FPIN ainsi que les plateformes de prototypages se basant sur cette technologie afin d'y intégrer d'autres fonctionnalités telles que des interfaces pour les signaux bidirectionnels de type drain-ouvert, les signaux analogiques ou bien les signaux différentiels. Cette thèse présente trois différents circuits qui ont été implémentés dans cette optique. Les interconnexions de ces trois circuits peuvent être reconfigurées pour supporter une interface de type bidirectionnelle drain-ouvert, de type analogique ou différentielle, le tout au travers un réseau d'interconnexions configurable numérique unidirectionnel, ou FPIN. Le besoin d'une telle interface fut tout d'abord envisagé dans le contexte du WaferBoard, qui consiste en une plateforme reconfigurable de prototypage pour les systèmes électroniques. Le cœur de ce WaferBoard consiste en un circuit intégré à l'échelle d'une tranche entière de silicium, qui est constitué d'une matrice bidimensionnelle de cellules. Une large partie de la surface disponible s'en retrouve déjà utilisée par des plots configurables (CIO), l'aiguillage des multiplexeurs du FPIN, des registres dédiés à la chaine JTAG et d'autres circuiteries de contrôle. De ce fait, il en devient primordial que les interfaces bidirectionnelle drain-ouvert, analogique et différentielle soit les plus compactes possibles. Puisque ces circuits d'interfaces seront dédiés pour une plateforme utilisant une tranche de silicium (wafer-scale), l'architecture de ces derniers doit être robuste en regard des variations de procédé, de la température ainsi que de l'alimentation. La première contribution de cette thèse est l'élaboration et la conception d'une interface de type drain-ouvert ainsi que de son support d'interconnexion bidirectionnel utilisant un réseau numérique unidirectionnel à signalisation asymétrique (à l'opposé de la signalisation différentielle) FPIN. L'interface proposée peut interconnecter plusieurs nœuds d'un FPIN. À l'aide de cette interface, le réseau d'interconnexions peut imiter le comportement et le fonctionnement d'un bus de type drain-ouvert (ou collecteur-ouvert) (tel qu'utilisé par le protocole $I^2C$ ). De ce fait, plusieurs plots de type drain-ouvert provenant d'une multitude de circuits-intégrés (ICs) différents peuvent y être connectés au travers le FPIN à l'aide de l'interface proposée. Cette interface a été fabriquée en technologie CMOS 0.13 µm et occupe une surface de 65 µm × 22 µm par plot. Les résultats expérimentaux démontrent que plusieurs instances de cette interface peuvent être interconnectées entre eux en utilisant l'architecture d'interconnexions proposée. Cette architecture combinant six plots de type drain-ouvert a été testée. Les délais de propagation sur cette interconnexion sont approximés par $0.26 \cdot n + 51$ ns et $0.26 \cdot n + 94$ ns pour les fronts montants et descendants lorsque chaque plot a une charge capacitive de 15 pF à sa sortie, où n est le nombre d'interfaces connectées. Ces délais, combinés au délai de propagation du FPIN, sont les facteurs limitant le nombre maximal d'interfaces pouvant y être connectées simultanément pour une vitesse de communication donnée. À titre d'exemple, le prototype d'interface peut supporter plus de 20 unités de type I<sup>2</sup>C Mode Rapide Plus $(3.4 \, \text{Mbit/s})$ . La deuxième contribution de cette thèse de doctorat décrit une interface analogique qui comprend un convertisseur analogique/numérique (A/N) (transmetteur) et un convertisseur numérique/analogique (NA) (récepteur) afin de permettre la propagation d'un signal analogique au travers une plateforme de prototypage de type FPIN. Le circuit intégré (uIC) transmetteur fournit le signal au convertisseur A/N. Ce dernier convertit le dit signal dans le domaine numérique pouvant se propager dans la plateforme FPIN jusqu'au récepteur. Un convertisseur N/A se situant du côté de la réception effectue la conversion afin de reproduire le signal analogique original pour le transmettre à l'uIC de destination. Cependant, les contraintes de surface de silicium de la plateforme de prototypage visée étant extrêmes, une conception très compacte fut requise pour les deux types de convertisseurs. Les convertisseurs de type sur-échantillonnage ne peuvent être utilisés dû aux performances exigées par ce type de convertisseurs pour les composantes analogiques (i.e. amplificateurs, comparateurs, résistances, sources de courant ou capacités) en plus du filtrage numérique requérant une surface de silicium relativement grande. La seconde contribution de cette thèse se situe donc dans l'élaboration et le développement d'un circuit très compact utilisant une version asynchrone d'un modulateur de type $\Delta$ (asynchonous $\Delta$ modulator - ADM) pour effectuer la conversion du domaine analogique vers le numérique. Ce convertisseur est proposé comme un moyen de transmettre un signal analogique à l'aide d'un réseau numérique d'interconnexions. Une analyse détaillée du mécanisme de conversion A/N du circuit ADM est également présentée dans cette thèse. Une méthode d'analyse graphique a été utilisée pour évaluer la fréquence d'oscillation de l'ADM afin de paramétrer le dit circuit. L'équivalence du spectre fréquentiel du signal d'entrée modulé ainsi que le spectre basse fréquence de la sortie de l'ADM, obtenu en utilisant un simple filtre de type passe-bas, peut être utilisé en guise de convertisseur N/A pour effectuer la reconstruction du signal analogique d'entrée. Le circuit ADM a été fabriqué dans une technologie CMOS 0.13 µm. Les mesures effectuées sur le circuit montrent des SNR et SNDR de 57 et 47 dB respectivement pour une bande passante de 2 MHz. L'ADM occupe une surface active de silicium de $45 \,\mu\text{m} \times 22 \,\mu\text{m}$ . L'ensemble des convertisseurs A/N et N/A demande un courant total de 0.15 mA avec une alimentation de 3.3V pour une surface totale de $45 \,\mu\text{m} \times 46 \,\mu\text{m}$ . Lorsque comparé avec des convertisseurs A/N similaires, le circuit ADM peut supporter des signaux de bande passante modérée pour une résolution moyenne mais occupe une surface de silicium très réduite. Une interface différentielle de reconfiguration spatiale a également été développée pour supporter la logique dite en mode courant (CML) pour la transmission de signaux au travers un réseau numérique unidirectionnel à signalisation asymétrique d'une plateforme de type FPIN (WaferBoard). Cette interface a été développée en collaboration avec Oliver Valorge, un stagiaire postdoctoral de l'École Polytechnique de Montréal. Deux types d'étages d'entrée pour l'interface différentielle ont été investiguées. Le premier type est basé sur un tampon de gain unitaire utilisant des multiplexeurs et a été entièrement développé et élaboré par Olivier Valorge. L'étage d'entrée de ce premier circuit occupe une surface de silicium relativement grande, c'est pourquoi une seconde alternative a été développée et élaborée par l'auteur de cette thèse afin de réduire le coût en surface de l'étage d'entrée. Cette thèse de doctorat comporte donc une troisième contribution en lien avec le développement d'un étage d'entrée différentiel basé sur des multiplexeurs à transistors passifs. Cet étage a été dessiné pour une technologie CMOS 0.13 µm et des validations après le dessin des masques ont été effectuées pour établir la faisabilité du concept. Des plots complémentaires différentiels peuvent être détectés sur une surface maximale de $2 \,\mathrm{mm} \times 2 \,\mathrm{mm}$ ( $1 \,\mathrm{mm} \times 1 \,\mathrm{mm}$ dans le pire cas) sur la surface de la plateforme de prototypage. Les deux étages d'entrées proposés utilisent une structure d'arbre en H configurable afin d'équilibrer la propagation des signaux différentiels. L'étage d'entrée, basé sur un multiplexage de tampons de gain unitaire peut supporter un flot de données jusqu'à 2.5 Gbps avec 200 mV de dégagement sous des conditions typiques compatibles avec la spécification PCIe. Pour l'autre approche l'étage d'entrée utilise des multiplexeurs à transistors et peut opérer jusqu'à 2 Gbps. Cependant, le circuit occupe une surface plus faible (5%) en comparaison avec la première solution. #### **ABSTRACT** Field programmable interconnection networks (FPINs) are ubiquitously found embedded in field-programmable gate arrays (FPGAs), in prototyping platforms, and in many Network-on-Chip architectures. The aim of this research was to augment the application domains of current FPIN-based prototyping and emulation platforms by supporting open-drain bi-directional signals, analog signals or differential signals. Three interface circuits have been elaborated and developed to that end in this thesis. These three interface circuits can support reconfigurable routing of open-drain bi-directional, analog and differential signals through an uni-directional digital FPIN. The need for such interface circuits were originally conceived in the context of the WaferBoard, a system prototyping platform. The core of the WaferBoard is a wafer-scale IC that is composed of a two dimensional array of unit cells. Available area was already over-utilized by the configurable I/O (CIO) buffers, crossbar multiplexers of the FPIN, registers of the JTAG chain, and other control circuits. Thus, the interface circuits for open-drain bi-directional, analog and differential signalling had to be made very compact. As the implementation of these interface circuits target "wafer-scale" integration, these interface circuits had to be very robust to parametric variations (process, temperature, power supply). The first contribution of this thesis is the elaboration and development of an open-drain interface circuit and a corresponding interconnect topology to support bi-directional communication through the uni-directional digital FPIN of prototyping platforms. The proposed interface can interconnect multiple nodes in a FPIN. With that interface, the interconnection network imitates the behaviour of open-drain (or open-collector) buses (e.g., those following the I<sup>2</sup>C protocol). Thus, multiple open-drain I/Os from external integrated circuits (ICs) can be connected together through the FPIN by the proposed interface circuit. The interface that has been fabricated in a $0.13 \,\mu m$ CMOS technology takes $65 \,\mu m \times 22 \,\mu m$ per pin. Test results show that several instances of this interface can be interconnected through the proposed interconnect topology. The interconnect topology combining six open-drain I/Os was implemented and tested. The interconnect has propagation delays of approximately $0.26 \cdot n + 51 \text{ ns}$ and $0.26 \cdot n + 94$ ns for rising and falling edge transitions respectively, when each pin has a capacitance of 15 pF, where n is the number of interconnected interfaces. These delays and the propagation delays of the FPIN limit the maximum number of interface circuits that can be interconnected for a given communication speed (i.e. I<sup>2</sup>C Fast-mode Plus with 3.4 Mbit/s). The prototype interface units can support more than 20 I<sup>2</sup>C Fast-mode Plus devices. The second contribution relates to an analog interface circuit that comprises A/D (transmit- ter) and D/A (receiver) converters to support analog signal propagation through the digital FPIN of the prototyping platform. Transmitting user integrated circuit (uIC) provides the analog signal to the A/D converter. The A/D converter converts it into a digital format that can be propagated through the digital FPIN to the receiving side. The receiving side comprises a D/A converter that can reproduce the original analog signal and provide it to the receiving user integrated circuit. However, due to the stringent constraints on the available silicon area, a very compact implementation of A/D and D/A converters were required for compatibility with the prototyping platform. Conventional Nyquist-rate and oversampled converters could not be utilized, because of their respective requirements of high-accuracy analog components (amplifiers, comparators, resistors, current sources or capacitors) and digital filtering that require comparatively large silicon area. Thus, the second contribution of this thesis is the elaboration and development of a compact circuit-implementation of an asynchronous $\Delta$ -modulator (ADM) for A/D conversion. This data converter was proposed as a means to propagate analog signals into digital interconnection networks. A detailed analysis of the A/D conversion mechanism of the proposed ADM circuit is presented in this thesis. A graphical method is used to analyze and evaluate the inherent oscillation frequency of the proposed ADM circuit in terms of its circuit parameters. Due to the equivalence of the spectrum of the modulating input signal and the low-frequency spectrum of the ADM output, a simple low-pass filter can be used as D/A converter to reconstruct the input analog signal. The proposed ADM was fabricated in a 0.13 µm CMOS technology. Measurement results showed SNR and SNDR of 57 and 47dB respectively for an input bandwidth of $2\,\mathrm{MHz}$ . The ADM occupies $45\,\mu\mathrm{m} \times 22\,\mu\mathrm{m}$ active area. The entire A/D and D/A converterpair consumes $0.15\,\mathrm{mA}$ from a $3.3\,\mathrm{V}$ supply and occupies $45\,\mu\mathrm{m}\times46\,\mu\mathrm{m}$ area. Compared to other similar A/D converters, the proposed ADM supports moderate signal bandwidth and medium-resolution, while requiring very small area. A spatially reconfigurable differential interface was also developed to support current mode logic (CML) signal transmission through the single-ended digital FPIN of the prototyping platform (WaferBoard). It was developed in collaboration with Olivier Valorge, a post-doctoral fellow at Polytechnique Montréal. Two types of input stage for the differential interface were investigated. The first input stage, based on unity-gain buffer based multiplexers, was developed and elaborated by Olivier Valorge. In that first circuit, the input stage occupied a relatively larger silicon area. Thus, an alternate input stage was developed and elaborated by the author to reduce the cost of the first input stage. Thus, the third contribution of this thesis is the elaboration and development of a differential input stage based on pass-transistor based multiplexers. This input stage was laid out in a 0.13 µm CMOS technology and post-layout simulation was used to validate the feasibility of the concept. Complementary pins of differential pair spread over a maximum area of $2\,\mathrm{mm}\times2\,\mathrm{mm}$ ( $1\,\mathrm{mm}\times1\,\mathrm{mm}$ in the worst case scenario) on the surface of the prototyping platform can be supported. Both versions of the proposed input stage utilized configurable H-tree structures for balanced differential signal propagation. The input stage, based on unity-gain buffer multiplexers, can support data rates of up to $2.5\,\mathrm{Gbps}$ , with $200\,\mathrm{mV}$ of voltage swing under typical conditions compatible with PCIe specifications. The input stage, based on pass-transistor multiplexers, can support data rates of up to $2\,\mathrm{Gbps}$ while occupying significantly less area (5%) compared to the unity-gain buffer based input stage. # TABLE OF CONTENT | DEDIC | ATION | |--------|---------------------------------------------------| | ACKNO | OWLEDGEMENTS iv | | RÉSUM | ſÉ | | ABSTR | ACT viii | | TABLE | OF CONTENT xi | | LIST O | F TABLE xv | | LIST O | F FIGURES xvi | | LIST O | F ABBREVIATION | | LIST O | F APPENDICES | | CHAPT | TER 1 INTRODUCTION | | 1.1 | Active Reconfigurable Board Overview | | 1.2 | Enhanced Programmable Devices (EPDs) | | 1.3 | Motivation | | | 1.3.1 Open-drain Bi-directional communication | | | 1.3.2 Analog Signal Communication | | | 1.3.3 Differential Signal communication | | 1.4 | High Level Objectives of This Research | | 1.5 | Organization of Thesis | | СНАРТ | TER 2 BACKGROUND INFORMATION AND RELATED PREVIOUS | | WO | RKS | | 2.1 | Active Reconfigurable Platform [1] | | | 2.1.1 WaferNet | | | 2.1.2 Silicon Area Constraints of WaferIC | | 2.2 | Bi-Directional Interface | | | 2.2.1 $I^{2}C$ Bus | | 2.3 | Analog Interface | | | 2.3.1 A | Analog to Digital Converter | 14 | |-------|------------------|----------------------------------------------------------------------|----| | | $2.3.2$ $\Delta$ | <b>\</b> -Modulator | 15 | | | $2.3.3$ $\Sigma$ | $\Delta \Delta \operatorname{Modulator} \ldots \ldots \ldots \ldots$ | 17 | | | 2.3.4 A | asynchronous $\Sigma\Delta$ Modulator | 22 | | 2.4 | Different | ial Interface | 24 | | | 2.4.1 C | Compatibility with the WaferBoard | 24 | | | 2.4.2 P | Physical and Electrical Constraints of Differential Signaling | 25 | | CHAPT | TER 3 O | RGANIZATION OF THESIS | 26 | | 3.1 | Organiza | ation of Thesis | 26 | | 3.2 | Article-1 | and Article-2 | 26 | | | 3.2.1 A | Article-1 (Chapter 4) | 26 | | | 3.2.2 A | Article-2 (Chapter 5) | 27 | | 3.3 | Article-3 | 6 (Chapter 6) | 27 | | 3.4 | Article-4 | (Chapter 7) and Chapter 8 | 28 | | | 3.4.1 A | Article-4 (Chapter 7) | 28 | | | 3.4.2 C | Chapter 8 | 29 | | СНАРТ | TER 4 A | ARTICLE 1: AN INTERFACE FOR I <sup>2</sup> C PROTOCOL IN WAFER- | | | BOA | $ARD^{TM}$ | | 30 | | 4.1 | Introduc | tion | 31 | | 4.2 | Backgrou | und | 31 | | 4.3 | Proposed | d Interface for I <sup>2</sup> C Compatibility | 33 | | 4.4 | Simulation | on Results | 37 | | 4.5 | Conclusi | on | 38 | | CHAPT | ΓER 5 A | RTICLE 2: AN INTERFACE FOR OPEN-DRAIN BI-DIRECTIONAL | | | COI | MMUNIC. | ATION IN FIELD PROGRAMMABLE INTERCONNECTION NET- | | | WO | RKS | | 39 | | 5.1 | Introduc | tion | 40 | | 5.2 | Backgrou | und | 42 | | | 5.2.1 A | Active Reconfigurable Platform [1] | 42 | | | 5.2.2 C | Open-drain Connection Based Communication | 42 | | 5.3 | Proposed | d Architecture of the Bi-Directional Interface | 42 | | | 5.3.1 V | Vorking Principle of the Bi-Directional Interface | 44 | | | 5.3.2 S | tate-Latching Phenomenon | 46 | | | 533 T | The Ring-Interconnection Network of the Ri-Directional Interface | 47 | | | 5.3.4 | Queue and Dual-Queue Interconnection Topologies | 53 | |------------|--------|------------------------------------------------------------------------------------------|-----| | | 5.3.5 | Proposed Bi-Directional Interface | 54 | | | 5.3.6 | Propagation Delay of Dual-Queue Interconnection Topology | 54 | | | 5.3.7 | Maximun Number of Interface Units in a Dual-Queue Interconnection | | | | | Topology | 58 | | 5.4 | Protot | type Test-Chip and Measurement Results | 59 | | | 5.4.1 | Design Specification of the Bi-directional Interface | 60 | | | 5.4.2 | Delay Characterization of the Bi-directional Interface from Post-Layout | | | | | Simulation | 61 | | | 5.4.3 | Test-chip and Test-bench Specifications | 63 | | | 5.4.4 | Measurement results from dual-queue topology with 8 interface units | 63 | | 5.5 | Concl | usion | 65 | | СНАРТ | ΓER 6 | ARTICLE 3: AN ASYNCHRONOUS DELTA-MODULATOR BASED | | | $A/\Gamma$ | CON | VERTER FOR AN ELECTRONIC SYSTEM PROTOTYPING PLAT- | | | FOI | RM | | 67 | | 6.1 | Introd | luction | 68 | | 6.2 | Backg | round | 69 | | | 6.2.1 | Field Programmable Interconnection Networks | 69 | | | 6.2.2 | The Target Application: Prototyping Platform [1] | 70 | | | 6.2.3 | Analog Interface Based on Asynchronous $\Sigma\Delta$ Modulation or $\Delta$ -Modulation | . 7 | | | 6.2.4 | Limitations of Existing Asynchronous $\Sigma\Delta$ Modulator Implementations | 72 | | 6.3 | Propo | sed Asynchronous $\Delta$ -Modulator | 74 | | | 6.3.1 | Proposed Asynchronous $\Delta$ -Modulator | 74 | | | 6.3.2 | Working Principle of the Proposed Asynchronous $\Delta$ -Modulator | 75 | | | 6.3.3 | Behavioral Simulation of the Asynchronous $\Delta$ -Modulator | 77 | | 6.4 | Propo | sed Analog Interface Circuit and Post-Layout Simulation Results | 79 | | | 6.4.1 | ADM-based Analog Interface | 79 | | | 6.4.2 | Implementation | 82 | | | 6.4.3 | Behavioral and Post Layout Simulation of the Asynchronous $\Delta$ -Modulator | 84 | | 6.5 | Protot | type Test-Chip and Measurement Results | 86 | | | 6.5.1 | Measurement Results | 86 | | | 6.5.2 | Comparison with Other Published A/D Converters | 89 | | 6.6 | Concl | usion | 93 | | СНАРТ | ΓER 7 | ARTICLE 4: A NOVEL SPATIALLY CONFIGURABLE DIFFEREN- | | TIAL INTERFACE FOR AN ELECTRONIC SYSTEM PROTOTYPING PLAT- | FOI | RM | | 96 | | | | |-------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|--|--|--| | 7.1 | Introd | uction | 97 | | | | | 7.2 | 7.2 Background | | | | | | | | 7.2.1 | Compatibility with WaferBoard, a Prototyping Platform for Electronic | | | | | | | | Systems | 99 | | | | | | 7.2.2 | Physical and Electrical Constraints | 99 | | | | | 7.3 | Propo | sed Architecture and Circuit Implementation of the Differential Interface | 101 | | | | | | 7.3.1 | Propagation Network : WaferNet $^{\text{\tiny TM}}$ | 104 | | | | | | 7.3.2 | Input Stage | 104 | | | | | | 7.3.3 | H-Tree Input Differential Network | 105 | | | | | | 7.3.4 | Output Differential Network | 107 | | | | | 7.4 | Measu | red Results | 108 | | | | | 7.5 | Conclu | usion | 110 | | | | | CHAPT | ΓER 8 | PASS-TRANSISTOR MULTIPLEXER BASED DIFFERENTIAL IN | _ | | | | | PUT | ΓSTAC | E | 112 | | | | | 8.1 | Differe | ential Interface Based on Unity-Gain Buffer Multiplexer | 112 | | | | | 8.2 | Differe | ential Input Stage based on Pass-Transistor Multiplexer | 113 | | | | | | 8.2.1 | H-Tree Input Differential Network | 113 | | | | | | 8.2.2 | Differential-to-Single Ended Converter | 115 | | | | | 8.3 | Simula | ation Results | 117 | | | | | | 8.3.1 | Input stage with differential amplifier and differential output | 117 | | | | | | 8.3.2 | Input stage with current mode differential-to-single-ended converter $% \left( 1\right) =\left( 1\right) \left( 1\right) +\left( 1\right) \left( 1\right) \left( 1\right) +\left( 1\right) \left( 1\right) \left( 1\right) \left( 1\right) \left( 1\right) +\left( 1\right) \left( \left($ | 121 | | | | | | 8.3.3 | Comparison between the input stage based on unity-gain buffer multi- | | | | | | | | plexer and the input stage based on pass-transistor multiplexer | 124 | | | | | 8.4 | Summ | ary of Contribution | 126 | | | | | СНАРТ | ΓER 9 | GENERAL DISCUSSION | 129 | | | | | 9.1 | Bi-Dir | rectional Interface | 129 | | | | | 9.2 | ADM- | based Analog Interface | 129 | | | | | 9.3 | Differe | ential Signal Interface | 130 | | | | | CONCI | LUSION | 1 | 131 | | | | | REFER | RENCE | S | 134 | | | | | APPEN | IDICES | | 142 | | | | # LIST OF TABLE | Table 2.1 | Area of an unit cell in a test-chip that was previously fabricated in | | |-----------|----------------------------------------------------------------------------------|-----| | | TowerJazz's 0.18 µm CMOS technology | 11 | | Table 4.1 | Corner Simulation | 38 | | Table 4.2 | Area of the interface | 38 | | Table 5.1 | Pull-down current of open-drain buses | 47 | | Table 5.2 | Different states with respect to the voltage level of the BDIO node | 47 | | Table 5.3 | Delays and rise/fall times of the interface circuit | 55 | | Table 5.4 | Design specification of the bi-directional interface in the test-chip ac- | | | | cording to $I^2C$ Fast-mode Plus protocol | 61 | | Table 5.5 | Characterization of the Interface Circuit Based on Post Layout Circuit | | | | Simulations | 62 | | Table 6.1 | The values of $e_{\rm tH}$ and $e_{\rm tL}$ from post-layout simulation | 83 | | Table 6.2 | Comparison with published compact oversampling $\mathrm{A}/\mathrm{D}$ converter | 92 | | Table 7.1 | Characteristics of the differential interface from post-layout simulation. | 109 | | Table 7.2 | Areas of the differential interface and their stages for one four-stages | | | | interface unit | 110 | | Table 8.1 | Silicon area of the pass transistor based input stage | 119 | | Table 8.2 | Comparison between the input stage based on unity-gain buffer multi- | | | | plexer and the input stage based on pass-transistor multiplexer | 128 | | Table B.1 | Timeline of the tasks leading to PhD | 146 | # LIST OF FIGURES | Figure 1.1 | Generic model of a field programmable interconnection network (FPIN) in a FPGA | | | | | | | | |-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|--|--|--|--|--|--| | Figure 1.2 | Conceptual overview of the active reconfigurable board | 2 | | | | | | | | Figure 1.3 | Hierarchical description of the active reconfigurable board | 3 | | | | | | | | Figure 1.4 | Schematic diagram and propagation path of an electrical signal between two uICs pins in contact with the CIO/Nanopads on the WaferIC. The electrical signal propagates from a <i>source</i> uIC to a <i>destination</i> uIC. | | | | | | | | | Figure 1.5 | Possible uses of enhanced programmable devices (EPDs) | 5 | | | | | | | | Figure 2.1 | WaferNet showing the connections between neighboring 1, 2, 4, 8, 16 and 32 in all directions | WaferNet showing the connections between neighboring 1, 2, 4, 8, 16 | | | | | | | | Figure 2.2 | Architecture of an unit cell [1] with configuration registers, CIOs and crossbar multiplexers | | | | | | | | | Figure 2.3 | $I^2C$ Bus | 12 | | | | | | | | Figure 2.4 | Connection of P82B96/PCA9600 I <sup>2</sup> C bus extension buffers | 13 | | | | | | | | Figure 2.5 | $\Delta$ -modulator | 16 | | | | | | | | Figure 2.6 | Waveform at various nodes of the $\Delta$ -modulation in Fig. 2.5(a) | 16 | | | | | | | | Figure 2.7 | $\Sigma\Delta$ modulator | 18 | | | | | | | | Figure 2.8 | Noise-shaping function for the $\Sigma\Delta$ modulator shown in Fig. 2.7(a) | 19 | | | | | | | | Figure 2.9 | A third-order $\Sigma\Delta$ modulator | 20 | | | | | | | | Figure 2.10 | Comparison between $\Sigma\Delta$ modulators of given technology process node versus area ( $\mu m^2$ ) | 21 | | | | | | | | Figure 2.11 | Comparison between $\Sigma\Delta$ modulators of given technology process node versus FoM (pJ) | 21 | | | | | | | | Figure 2.12 | Asynchronous $\Sigma\Delta$ modulator | 23 | | | | | | | | Figure 2.13 | Differential buffer structure | 25 | | | | | | | | Figure 4.1 | Wafer<br>IC $^{\rm TM}$ with ICs deposited: (a) top view, (b) cross section view | 32 | | | | | | | | Figure 4.2 | $I^2C$ Bus | 33 | | | | | | | | Figure 4.3 | (a) Proposed interface. (b) Two instances of the interface interconnected together. (c) The level sensing buffer to remove the latching | | | | | | | | | | problem. (d) Block diagram | 35 | | | | | | | | Figure 4.4 | Full schematic of the interface | 36 | | | | | | | | Figure 4.5 | Simulation result | 37 | | | | | | | | Figure 5.1 | Generic model of an FPIN in an FPGA | 41 | | | | | | | | Figure 5.2 | Hierarchical description of the active reconfigurable platform, from sys- | | |-------------|--------------------------------------------------------------------------------------|----| | | tem level to configurable I/O (CIO) | 43 | | Figure 5.3 | Example of an $I^2C$ -bus configuration | 43 | | Figure 5.4 | Each <i>circle</i> represents an interface unit circuit | 45 | | Figure 5.5 | Development of the bi-directional interface unit circuit | 45 | | Figure 5.6 | Development of pseudo-ring interconnection topology. Each circle re- | | | | presents an interface unit circuit and is labelled $\mathrm{IU}\#.$ | 48 | | Figure 5.7 | Logical signal flow diagram. Low Detector module of each interface | | | | unit (IU#) is labelled LD. Each BDIO node belongs to the respective | | | | interface unit (IU) and represents distinct physical nodes | 50 | | Figure 5.8 | Logical signal flow diagram of dual-queue interconnection topology. | | | | Two individual queue network are joined together. Each queue net- | | | | work have five interface units. Four interface units (labelled IU#) are | | | | connected to external ODD and one $Master\ unit$ (labelled MU). $Low$ | | | | Detector module of each interface unit (IU#) is labelled as LD. $$ | 52 | | Figure 5.9 | Schematic of the interface unit (IU) | 55 | | Figure 5.10 | Detailed transistor-level schematic of the bi-directional interface unit | | | | and micro-photograph of the die | 59 | | Figure 5.11 | Dual-queue interconnection topology with 8 interface units implemen- | | | | ted in the test-chip | 63 | | Figure 5.12 | Measurement result of dual-queue interconnected network (shown in | | | | Fig. 5.11) from the test-chip | 64 | | Figure 6.1 | Asynchronous $\Sigma\Delta$ modulator | 69 | | Figure 6.2 | Generic FPIN model of a Field Programmable Interconnection Network. | 70 | | Figure 6.3 | Hierarchical description of the reconfigurable board to CIO | 71 | | Figure 6.4 | Different implementations of the generic asynchronous $\Sigma\Delta$ modulator. | 73 | | Figure 6.5 | The proposed ADM | 75 | | Figure 6.6 | Proposed linear s-domain model of the ADM shown in Fig. 6.5 | 76 | | Figure 6.7 | Evaluation of the oscillation frequency of the asynchronous $\Sigma\Delta$ modulator | | | | (ASDM) | 77 | | Figure 6.8 | Input signal frequency is 1 MHz | 78 | | Figure 6.9 | High-level simulation with Simulink® of the proposed ADM architec- | | | | ture model (Fig. 6.5). Input signal frequency is 1 MHz. Representative | | | | component noise sources are included but the RC-filter is ideal | 79 | | Figure 6.10 | ADM-based analog interface | 80 | | Figure 6.11 | Block diagram and waveform of the ADM-based analog interface | 81 | | Figure 6.12 | Schematic of the X-OR gate | 83 | | | | | |-------------|-----------------------------------------------------------------------------------------------------|-----|--|--|--|--| | Figure 6.13 | The effect of mismatch between $e_{\rm tH}$ and $e_{\rm tL}$ on the SNDR | 83 | | | | | | Figure 6.14 | Detailed block diagram of the proposed ADM-based analog interface. | 84 | | | | | | Figure 6.15 | Schematic of the ADM & 3.3 V to 1.2 V converter (transmission side). 8 | | | | | | | Figure 6.16 | Schematic of the D/A converter (receiving side) | 85 | | | | | | Figure 6.17 | Simulation of the proposed ADM based on s-model in Simulink® and | | | | | | | | post-layout simulations in Cadence ( $V_{\rm DD}=3.3~{\rm V}$ ) respectively. Simulink <sup>®</sup> | | | | | | | | model included the non-linearity of the filter | 86 | | | | | | Figure 6.18 | Micro-photograph of the die (the die contained other circuits) | 87 | | | | | | Figure 6.19 | Layout of the ADM and LPF to reconstruct the input signal | 88 | | | | | | Figure 6.20 | Measured DC transfer characteristics from the test-chip ( $V_{DD} = 3.3 \text{ V}$ ) | | | | | | | | of the asynchronous $\Delta$ -modulator shown in Fig. 6.15 | 88 | | | | | | Figure 6.21 | Measured noise performances from the test-chip | 89 | | | | | | Figure 6.22 | Simulation of the proposed ADM in with noisy power supply to mimic | | | | | | | | the actual test-bench scenario | 90 | | | | | | Figure 6.23 | Comparison between $\Sigma\Delta$ modulators and the proposed $\Delta$ -modulator | | | | | | | | on technology process node versus Area ( $\mu m^2$ ) and FoM (pJ) | 91 | | | | | | Figure 6.24 | Measurement result for input frequency of $500\mathrm{kHz},1\mathrm{MHz},\mathrm{and}2\mathrm{MHz}$ | | | | | | | | from the test-chip. | 95 | | | | | | Figure 7.1 | CML structure | 98 | | | | | | Figure 7.2 | Conceptual overview of the active reconfigurable platform | 100 | | | | | | Figure 7.3 | Differential pins of user's ICs interfacing with the NanoPad array of | | | | | | | | the WaferIC (zoom of Fig. 7.2(a)) [3] | 100 | | | | | | Figure 7.4 | Architecture of the embedded differential propagation chain [3] | 102 | | | | | | Figure 7.5 | The input differential configurable network | 102 | | | | | | Figure 7.6 | Tiling of differential interface unit | 103 | | | | | | Figure 7.7 | Continuous floor plan of the architecture with overlap between adjacent | | | | | | | | interface units. Each of the two shaded rectangles can be configured as | | | | | | | | a differential interface unit | 104 | | | | | | Figure 7.8 | Schematic of the analog multiplexers | 106 | | | | | | Figure 7.9 | Configurable differential-to-single-ended converter (the multiplexers in | | | | | | | | these two figures are digital multiplexers) | 107 | | | | | | Figure 7.10 | Schematic of the interface output circuit in each NanoPad | 108 | | | | | | Figure 7.11 | Test-chip | 109 | | | | | | Figure 7.12 | Measured DC transfer characteristics ( $V_{DD}=3.3~V$ ) of the analog | | | | | | | | multiplexers shown in Fig. 7.8 | 109 | | | | | | Figure 7.13 | Measured eye diagrams at different data rates from test-chip 11 | .1 | |-------------|-----------------------------------------------------------------------------------------------------------------|----| | Figure 8.1 | Architecture and floor plan of the pass transistor multiplexer based | | | | differential input stage | .4 | | Figure 8.2 | Pass-transistor based model | .5 | | Figure 8.3 | Differential amplifier with differential output | .5 | | Figure 8.4 | Schematic of the differential input stage and differential amplifier with | | | | differential output in stage-4 | .6 | | Figure 8.5 | Proposed differential-to-single ended converter | .6 | | Figure 8.6 | Schematic of the differential input stage with the differential-to-single | | | | ended converter at Stage-4 | .7 | | Figure 8.7 | Layout | .8 | | Figure 8.8 | $2\mathrm{GHz}$ output ( $v_{\mathrm{OUT4+}}$ and $v_{\mathrm{OUT4-}}$ in Fig. 8.4) of stage-4 multiplexer | | | | from Monte Carlo <b>mismatch variation</b> (typical-typical) simulation. | | | | Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.4) common-mode voltage=1.0 V 11 | .9 | | Figure 8.9 | $2\mathrm{GHz}$ output ( $v_{\mathrm{OUT+}}$ and $v_{\mathrm{OUT-}}$ in Fig. 8.4) of stage-4 fully differential | | | | amplifier from Monte Carlo mismatch variation (typical-typical) si- | | | | mulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.4) common-mode voltage=1.0 V.12 | 20 | | Figure 8.10 | ${\bf 2GHz}$ output ( $v_{\rm OUT+}$ and $v_{\rm OUT-}$ in Fig. 8.4) of Stage-4 fully differential | | | | amplifier from Monte Carlo <b>process variation</b> (typical-typical) simu- | | | | lation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.4) common-mode voltage=1.0 V. 12 | 21 | | Figure 8.11 | $2\mathrm{GHz}$ output ( $v_\mathrm{OUT}$ in Fig. 8.6) of the differential-to-single-ended | | | | converter from Monte Carlo <b>mismatch variation</b> (typical-typical) si- | | | | mulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=1.2 V.12 | 22 | | Figure 8.12 | $2\mathrm{GHz}$ output ( $v_\mathrm{OUT}$ in Fig. 8.6) of the differential-to-single-ended | | | | converter from Monte Carlo <b>mismatch variation</b> (typical-typical) si- | | | | mulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=1.6 V.12 | 22 | | Figure 8.13 | $1\mathrm{GHz}$ output ( $v_\mathrm{OUT}$ in Fig. 8.6) of the differential-to-single-ended | | | | converter from Monte Carlo <b>mismatch variation</b> (typical-typical) si- | | | | mulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=1.2 V.12 | 23 | | Figure 8.14 | $1\mathrm{GHz}$ output ( $v_\mathrm{OUT}$ in Fig. 8.6) of the differential-to-single-ended | | | | converter from Monte Carlo <b>mismatch variation</b> (typical-typical) si- | | | | mulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=2.0 V.12 | 23 | | Figure 8.15 | Schematic of the analog multiplexers simulated in IBM 0.13 $\mu m$ CMOS | | | | technology | 25 | | Figure 8.16 | The signal path consisting of 3 multiplexer stages that was simulated | | | | in IBM 0.13 µm CMOS technology | 25 | | Figure 8.17 | $2 \mathrm{GHz}$ output $(v_{\mathrm{OUT}+/-})$ in Fig. 8.16 of the stage-3 multiplexer from | | | | | | |-------------|----------------------------------------------------------------------------------------------|-----|--|--|--|--| | | Monte Carlo <b>mismatch variation</b> (typical-typical) simulation. Input | | | | | | | | $(v_{\text{IN}+/-} \text{ in Fig. 8.16})$ common-mode voltage=1.6 V and voltage swing | | | | | | | | $V_{\text{P-P}}=800\text{mV}$ was used | 126 | | | | | | Figure A.1 | The ASDM used in the proposed analog interface | 144 | | | | | | Figure A.2 | Waveform of the hysteresis input assuming $v_{\rm in}$ is a DC value and 0 < | | | | | | | | $v_{\rm in} < V_{\rm DD}$ | 144 | | | | | | Figure A.3 | Simulated frequency of oscilation ( $V_{DD} = 3.3 \text{ V}$ ) of the asynchronous | | | | | | | | $\Sigma\Delta$ modulator shown in Fig. 6.15 for different $v_{\rm in}$ | 145 | | | | | | Figure C.1 | Test chip | 147 | | | | | | Figure C.2 | Bonding diagram of test chip. Package type: CQFP44A | 148 | | | | | | Figure C.3 | Pin assignment of the test chip | 149 | | | | | | Figure C.4 | Layout of test chip | 149 | | | | | | Figure C.5 | Test setup of the bi-directional interface | 150 | | | | | | Figure C.6 | Test setup of the ASDM | 150 | | | | | #### LIST OF ABBREVIATION **ADC** analog-to-digital converter. 7 **ADM** asynchronous $\Delta$ -modulator. ix, xiv, 22, 26–28, 67, 68, 129–133 **ASDM** asynchronous $\Sigma\Delta$ modulator. xv, xviii, xxi, 20, 22, 23, 77, 132, 133, 143, 145, 150 ATCA advanced telecom computing architecture. 5 CIO configurable I/O. viii, 3-6, 11, 12, 24, 25, 28, 29, 112, 113, 116, 125, 126 CLB configurable logic block. 1 CML current mode logic. ix, 6, 28, 29, 112, 114, 118, 123, 126, 131 $\mathbf{CT}$ - $\mathbf{\Sigma}\Delta$ continuous-time $\Sigma\Delta$ modulator. 22 **DAC** digital-to-analog converter. 7 **DDC** display data channel. 5 DLL delay-locked loop. 132, 133 **ENOB** effective number of bit. 19 **EPD** enhanced programmable device. 3, 4 FoM figure of merit. 20, 130 FPGA field-programmable gate array. viii, xvii, 1–4, 26, 40, 41, 66, 129 **FPIC** field programmable interconnection chip. 1 **FPIN** field programmable interconnection network. viii, ix, xvii, 1–8, 14, 22, 24–29, 39–47, 54, 56, 57, 59, 60, 66, 112–114, 129, 131 **GPIO** general purpose input output. 11 **HSTL** high-speed transceiver logic. 6 I<sup>2</sup>C Inter-Integrated Circuit. 11 IC integrated circuit. viii, 2 **IPMI** intelligent platform management interface. 5 LVDS low voltage differential signalling. 6 LVPECL low voltage positive-emitter-coupled logic. 6 **OSR** oversampling ratio. 19 PCB printed circuit board. 11, 150 PLL phase-locked loop. 14, 132, 133 PMBus power management bus. 5 SFDR spurious-free dynamic range. 130 SMBus system management bus. 5 SNDR signal-to-noise and distortion ratio. 20, 28, 130 SNR signal-to-noise ratio. 28, 130 **SQNR** signal-to-quantization noise ratio. 17 SSI stacked silicon interconnect. 1 **TSV** through-silicon via. 1–3 TTL transistor-transistor logic. 14 uIC user integrated circuit. ix, xvii, 2-4, 6, 8, 9, 11, 13, 25, 28, 29, 113, 114, 116, 118 VCO voltage controlled oscillator. 10, 14 ## LIST OF APPENDICES | APPENDIX A | 142 | |----------------------------------|-----| | APPENDIX B Time Line | 146 | | APPENDIX C Details of Test Chips | 147 | #### CHAPTER 1 INTRODUCTION Following Moore's law, semiconductor technology scaling has ushered the remarkable progress of microelectronic integration over the past four decades. Every technology generation, introduced every two to three years, has doubled the transistor count per chip, increased the operating frequency by 43%, and reduced the switching energy consumption by 65% on average [4]. Very complex systems with programmability at the user end have been made possible by leveraging such unprecedented logic density increase. A very successful class of configurable integrated circuits enabled by these trends is the field-programmable gate arrays (FPGAs). Field programmable interconnection networks (FPINs) are the backbone of emulation and prototyping platforms, i.e. FPGAs, ZeBu Server [5], Veloce verification system [6], Cadence® Palladium® series of accelerators/emulators [7], and many other Network-on-Chip architectures [8]. FPINs provide reconfigurable interconnections between various endpoints, i.e. the configurable logic blocks (CLBs) in FPGAs. Any digital hardware can be emulated in FP-GAs partly by reconfiguring their embedded FPINs. However, reconfigurability of the FPINs sometimes entails long interconnects between endpoints that result in excessive propagation delays. Buffers are typically inserted along these on-chip resistive interconnects to ensure fast signal propagation and linear signal delay increase with distance. Once configured, interconnects are therefore uni-directional. Systems used for digital hardware emulation can enhance their capability and performance by having programmable interconnection between FPGAs. Commercial logic emulation systems, such as the REALIZER SYSTEM [9], use programmable interconnection devices between FPGAs. These devices are called field programmable interconnection chips (FPICs). Fig. 1.1 illustrates an example where an FPIN provides programmable interconnections between endpoints (I/O or configurable logic blocks) in an FPGA. Modern FPGAs, such as those from the Xilinx's Virtex-7 family [10], can emulate circuits with up to 2000 000 logic gates and that maximum complexity keeps growing. Some of the large FPGA chips are internally implemented as several smaller connected FPGA dies. By combining through-silicon vias (TSVs) and microbump technology, Xilinx has developed a stacked silicon interconnect (SSI) technology that is the foundation of Virtex-7 FPGAs [11]. Xilinx notably uses a passive silicon interposer to combine multiple FPGA Super Logic Region (SLR). Instead of creating a 3D-stack, the FPGAs are put side-by-side on a passive interposer and hence this technology is called 2.5 D integration. The interposer is built using a Figure 1.1 Generic model of a field programmable interconnection network (FPIN) in a FPGA. low-risk, high-yield 65 nm process with four passive layers of metallization. It provides tens of thousands of die-to-die connections to enable ultra-high interconnect bandwidth, less power consumption and one fifth the latency of standard I/Os [11]. This passive interposer does not contain any transistor and hence, it is claimed to be a low risk and low cost device that does not introduce TSV related performance degradations [12]. #### 1.1 Active Reconfigurable Board Overview In addition to FPGAs, an example of an FPIN-based circuit targeted in this thesis is the active reconfigurable platform named WaferBoard that was proposed in [1]. It was developed as a prototyping platform that provides interconnections among multiple user integrated circuits (uICs) to test and prototype electronic systems. This reconfigurable platform can be easily extended as an *active* silicon interposer, because unlike the aforementioned passive interposer of Xilinx, the interconnection network can be dynamically configured like an FPGA. It has an uni-directional switch box based FPIN that can be programmed by the user to interconnect the component uICs. The reconfigurable platform is primarily designed to provide digital interconnections between component uICs. However, as originally proposed, this FPIN-based prototyping platform was not supporting open-drain bi-directional signals (notably used in the I<sup>2</sup>C protocol and its derivatives [13–16]), analog signals, or differential signals. The core of the active reconfigurable board is a wafer scale IC (called WaferIC [1]) upon which user component ICs or uICs are to be deposited. The surface and cross-section of the active reconfigurable board is shown in Fig. 1.2(a) & 1.2(b) respectively. The surface of the Figure 1.2 Conceptual overview of the active reconfigurable board. Figure 1.3 Hierarchical description of the active reconfigurable board. wafer scale IC has a dense array of very fine (tens of microns) conducting pads acting as configurable I/Os (CIOs), as shown in Fig. 1.3. These CIOs are called NanoPads in [1]. An FPIN is embedded in the wafer scale IC. The FPIN can be configured, similar to an FPGA, to connect any two CIOs. uIC pins are to have physical contacts with the CIO and communicate through the embedded FPIN as shown in Fig. 1.4. Each CIO has its own configurable I/O buffers. If a CIO is to operate as an input, then the respective CIO is configured as input and this buffer receives the signal from a *source* uIC and propagates it through the FPIN to other destination CIOs. The destination CIO's buffer is configured as an output buffer and it propagates the signal to the corresponding *destination* uIC. ### 1.2 Enhanced Programmable Devices (EPDs) Demands for increased density, higher bandwidths, and lower power pushed IC designs toward 3D IC encapsulation. 3D ICs are manufactured by stacking multiple silicon wafers and/or dies and interconnecting them vertically. Passive silicon interposers with TSVs and several metallization layers are used to align the micro-beads of each silicon die. These dense interconnections between chip layers present a significant challenge to the alignment, testing and diagnosis. Enhanced programmable devices (EPDs) are conceived as active silicon inter- Figure 1.4 Schematic diagram and propagation path of an electrical signal between two uICs pins in contact with the CIO/Nanopads on the WaferIC. The electrical signal propagates from a *source* uIC to a *destination* uIC. posers that can provide, in addition to configurable interconnections, enhancements in terms of testability and diagnosis. A possible basic structure of an EPD is shown in Fig. 1.5. EPDs are intended to support as many types of signal interfaces as possible. As envisioned, similar to the aforementioned WaferBoard [1], an EPD surface has a dense array of CIOs and each CIO is connected to an internal FPIN that can be configured to connect a CIO to any others. #### 1.3 Motivation Configurability of FPINs is extensively utilized in FPGAs, in prototyping platforms, and in many network-on-chip architectures. This thesis work was motivated by the observation that the application domains of such FPIN-based prototyping and emulation platform could be significantly broadened by supporting open-drain bi-directional signals, analog signals, and differential signals. Three interface circuits were thus elaborated and developed to support reconfigurable routing of open-drain bi-directional, analog or differential signals through a uni-directional digital FPIN. Even though the need for such interface circuits were originally conceived in the context of the WaferBoard [1], the developed interface circuits can be integrated in or with FPGAs, active silicon interposers (*i.e.* EPDs) or any platform that embeds some digital FPINs. Figure 1.5 Possible uses of enhanced programmable devices (EPDs). #### 1.3.1 Open-drain Bi-directional communication Multi-master bi-directional communication is widely used in electronic communication systems. It is used between low-speed peripherals and a motherboard, in embedded systems, in cellphones, and in many other electronic devices. The most widely used bi-directional bus is the I<sup>2</sup>C protocol. Several other communication standards are derived from the I<sup>2</sup>C protocol [15, 16]. Some of these I<sup>2</sup>C derived standards are the: - 1. System management bus (SMBus) [16]. - 2. Power management bus (PMBus) [17]. - 3. Intelligent platform management interface (IPMI) [18]. - 4. Display data channel (DDC) [19]. - 5. Advanced telecom computing architecture (ATCA) [20]. The main principle of the I<sup>2</sup>C protocol is that it is an open-drain (or open-collector) bus. All the derived protocols depend on the "wired AND" property of open-drain (or open-collector) connections. FPINs cannot *directly* support such "wired AND" connections, because each interconnection link is established by uni-directional binary digital signaling. Additional interface circuits are required at the CIOs or endpoints to support "wired AND" connections to outside world while the internal connections inside the FPIN is established by uni-directional digital buffers and switch boxes (multiplexers). #### 1.3.2 Analog Signal Communication The WaferBoard [1] was primarily developed to prototype digital electronic systems. However, nowadays many electronic systems are at least partly mixed-signal systems. Having the ability to reconfigurably route analog signals through the embedded FPIN can greatly improve the versatility of the WaferBoard or any electronic system prototyping platforms. In the context of the WaferBoard, A/D (transmitter) and D/A (receiver) converters are required to support analog signal propagation through the digital FPIN. The A/D converter receives the input analog signal from the transmitting user integrated circuit (uIC) and converts it into a digital format that can be propagated through the digital FPIN to the receiving side. This receiving side must comprise a D/A converter that can reproduce the original analog signal and provide it to the receiving uIC. However, due to the stringent limitations on the available silicon area, a very compact implementation of A/D and D/A converter is required for compatibility with the WaferBoard. Thus, the author was motivated to find a compact A/D and D/A converter solution that can support reconfigurable routing of analog signals within the existing constraints of the WaferBoard [1]. #### 1.3.3 Differential Signal communication Differential signaling is widely used in high speed data transmission. It sends an electrical signal and its complement as a differential pair of signals through two conductors. External electromagnetic interferences tend to affect both conductors similarly and the receiving end only detects the difference between the conductors. Thus, differential signaling mitigates common mode electromagnetic coupling that affects single-ended signaling. Standards, currently in use for differential signaling, include for instance low voltage differential signalling (LVDS), low voltage positive-emitter-coupled logic (LVPECL), CML, and high-speed transceiver logic (HSTL) [21]. A differential interface must achieve spatial reconfigurability to support differential signaling in a FPIN-based prototyping platform such as the WaferBoard [1]. As uICs can be randomly placed on the active surface of the WaferBoard, the corresponding "contacted" CIOs/NanoPads can have random physical locations. The differential interface must have the ability to support such randomly located CIOs/NanoPads and still maintain the required symmetry and signal integrity of high-speed differential signaling. The operating speed of standard differential signaling is higher than the target operating speed of the prototyping platform. The primary target of the differential interface for the prototyping platform is for versatility purpose rather than supporting the highest possible speed. #### 1.4 High Level Objectives of This Research The aim of this research is: - 1. To develop an interface that can support open-drain interconnection based bi-directional buses (such as I<sup>2</sup>C) in any digital FPIN-based prototyping platform. - 2. To develop an interface that can support analog signal transmission in the WaferBoard [1] according to WaferBoard's constraints (see Sec.2.1.2). - 3. To develop an interface that can support differential signal transmission through the single-ended digital FPIN of the WaferBoard. #### 1.5 Organization of Thesis This thesis is organized as follows. Chapter 2 presents background information and related previous works, with a brief overview of the prototyping platform [1], open-drain interconnection based bi-directional bus, analog-to-digital converters (ADCs), digital-to-analog converters (DACs), and differential signalling. Chapter 3 presents the detailed organization of the thesis while Chapters 4–8 constitute the core of the thesis. Chapter 9 presents a general discussion on the entire thesis. The contributions from this thesis are finally summarized and possible future works are discussed in Conclusion. # CHAPTER 2 BACKGROUND INFORMATION AND RELATED PREVIOUS WORKS A brief review of the FPIN-based prototyping platform [1] will facilitate the reader to properly understand the topic discussed in this thesis. Characteristics of open-drain, analog, and differential signalling are reviewed in the perspective of the prototyping platform to support explanation of the the proposed solutions. Sec. 2.1 describes the prototyping platform and its various constraints imposed to interface circuits. Sec. 2.2 presents an overview of open-drain interconnection based bi-directional buses. Sec. 2.3 presents different types of A/D and D/A converters, and considers their feasibility for the digital FPIN-based prototyping platform [1]. Sec. 2.4 presents various constraints of differential signaling in the target environment of the digital FPIN-based prototyping platform [1]. #### 2.1 Active Reconfigurable Platform [1] The WaferIC shown in Fig. 1.3 is the core of an active reconfigurable platform [1]. Component uICs are to be placed on the surface of the WaferIC. The building block of the WaferIC is called a *unit cell*. Each unit cell contains an array of $4 \times 4$ CIO, I/O buffers, routing and control circuitries, and a multiplexer-based crossbar. This crossbar routes incoming signals to one of the CIOs, belonging to itself or to other unit cells. When a signal is propagated through the WaferIC, it is routed from unit cell to unit cell until it reaches the destination CIO and the corresponding uIC pin [1]. The dimension of each unit cell is $560 \,\mu\text{m} \times 560 \,\mu\text{m}$ [1]. These cells are tiled within a reticle, and the WaferIC is built from repetition of this reticle across the entire wafer. #### 2.1.1 WaferNet Each unit cell is connected to an embedded digital FPIN, called WaferNet, as shown in Fig. 2.1, that can be configured to connect any two unit cells, whatever their position is. In other words, any two CIOs belonging to any two unit cells can be connected by the WaferNet. Each unit cell has connections with other unit cells which are 1,2,4,8,16, and 32<sup>nd</sup> unit cells away in all fours directions, as shown in Fig. 2.1. An uIC pin (or solder ball) can be contacted to several CIOs. The WaferIC detects and maps the contacted pins, and a netlist is generated according to the required connections. Then the WaferNet is configured to provide Figure 2.1 WaferNet showing the connections between neighboring 1, 2, 4, 8, 16 and 32 in all directions. all required connections between the uICs according to the netlist [1]. The routing of signals through the WaferNet, from a unit cell to another unit cell is accomplished by digital multiplexer-based crossbars. Each CIO has its own buffers. If a CIO is to operate as an input, then the respective I/O buffer is configured accordingly and this buffer receives the signal from a uIC and propagates it to the embedded crossbar of the local unit cell to which it belongs. The crossbar routes the signal to the link pointing toward the destination CIO's unit cell. Since the signal path can make a "jump" of only 1,2,4,8,16, and 32 unit cells at a time, it usually takes several jumps in all four directions for the signal to reach some arbitrary destination unit cell. At the destination cell, the CIO buffer is configured as an output buffer. All I/O buffers must be explicitly configured, either as an input or an output buffer. Normally this configuration is done before "testing" or "prototyping" with uICs. After CIOs are configured, the state remains unchanged during the entire period of testing or prototyping operation. #### 2.1.2 Silicon Area Constraints of WaferIC The functional architecture of the unit cell is depicted in Fig. 2.2. If any additional features are intended to be added, it must be fitted into the unit cell's dimension of $560 \,\mu\text{m} \times 560 \,\mu\text{m}$ only that translates into an area of $313600 \,\mu\text{m}^2$ . This area is already overstretched by the CIO Figure 2.2 Architecture of an unit cell [1] with configuration registers, CIOs and crossbar multiplexers. buffers, crossbar multiplexers, configuration registers of the JTAG chain, and other control circuitry. Thus, any additional circuitry must be very compact as something else needs to be compressed accordingly. Table 2.1 shows the area usage of each blocks of an unit cell in a test-chip that was previously fabricated in TowerJazz's $^1$ 0.18 µm CMOS technology. From the experience, gathered during the design of this test-chip, it appears that any additional features can at most consume 2-3% of the total area of the unit cells. Previous work [22] from other members of the DreamWafer team led to an analog interface circuit based on the frequency modulation of ring oscillator based voltage controlled oscillators (VCOs). The analog interface circuit occupied $4350 \,\mu\text{m}^2$ that represents $\approx 1.4\%$ of the total area of an unit cell. An important consideration is that the interface must be integrated at the "wafer-scale". Thus, the circuits must be very robust against process variation and if possible defect tolerant. Redundancy at the architectural [1] level has been used to isolate physical defects <sup>1.</sup> http://www.towerjazz.com/ | Table 2.1 Area | of an uni | t cell in a | test-chip | that was | previously | fabricated in | TowerJazz's | |----------------|-----------|-------------|-----------|----------|------------|---------------|-------------| | 0.18 μm CMOS | technolog | gy. | | | | | | | | Area (μm²)<br>/Unit | Number of<br>ins-<br>tance/Unit | Required area ( µm <sup>2</sup> ) | |-----------------------------------------------|---------------------|---------------------------------|-----------------------------------| | Configuration registers of the JTAG chain | 96 | 490 | 46944 | | CIO buffers and power supply $^a$ | 8892 | 16 | 142272 | | Analog Interface $^b$ | 4350 | 1 | 4350 | | Crossbar multiplexers and other $\log ic^{c}$ | _ | _ | 120000 | | Total area | | | 313600 | a. CIO buffers were clustered with the power supply circuit because CIOs were designed to provide power supply to uICs. Power supply comprised bandgap circuits and D/A converters used as digitally controlled variable power supply. in WaferIC. #### 2.2 Bi-Directional Interface During the 1980s, as electronic systems became more complex with many peripheral connections, direct connection between each components were becoming too complicated because it required a large number of printed circuit board (PCB) traces and general purpose input output (GPIO), notably in the microprocessors. A multi-master bi-directional protocol was required to solve this problem, where every entity can send and receive data through a "single" physical line. The Inter-Integrated Circuit (I<sup>2</sup>C) bus is such a communication standard. #### 2.2.1 I<sup>2</sup>C Bus The I<sup>2</sup>C protocol is a multi-master bidirectional serial bus developed by Philips [14]. This communication standard is used in various control architectures such as the System Management Bus (SMBus) [16], the Power Management Bus (PMBus) [17], the Intelligent Platform Management Interface (IPMI) [18], Display Data Channel (DDC) [19], and the Advanced Telecom Computing Architecture (ATCA) [20]. I<sup>2</sup>C [14] uses two bidirectional open-drain (or open-collector) lines named Serial Data Line (SDA) and Serial Clock Line (SCL). Both lines have external pull-up resistors. In the I<sup>2</sup>C protocol (shown in Fig. 2.3), when a com- b. This analog interface was introduced in [22]. It will be described in Sec. 2.3. c. These circuits were distributed throughout the unit cell. Figure 2.3 I<sup>2</sup>C Bus. ponent wants to output HIGH, the output driver does not output an explicit HIGH. Rather, the output driver releases the bus and an external common pull up resistor pulls up the bus to $V_{\rm DD}$ . When a component wants to output a LOW, it explicitly drives the bus to LOW because the pull down capability of the driver is much stronger than the pull up resistor. The I<sup>2</sup>C protocol has no explicit signal to specify the direction of data in the bus. Rather there are some rules embedded in the protocol like *clock synchronization*, *arbitration*, and *clock stretching* [14], rules by which an I<sup>2</sup>C driver connected to a bus "realizes" when it is allowed to write into the bus, read from the bus or stay idle. All those rules are based on the "wired AND" property of open-drain connection. If only one of the connected drivers outputs a LOW on the bus, the bus will become LOW. In the I<sup>2</sup>C protocol, there is no *single* master controller. It is a multi-master bus where any one of the connected components can assume the role of master and can control the direction of the data. Such communication protocol cannot be supported by the previously reported version of the WaferIC [1]. The I<sup>2</sup>C bus is not a synchronous communication system. Even though each transmission between two components is controlled by a clock signal, that clock is provided by the respective master. From the perspective of the entire system, there is no master clock. Thus, any interface circuit at the CIO of the unit cell in WaferIC, mimicking "wired AND" interconnection to the outside world, cannot be operated by any master clock to periodically check the voltage level at the corresponding CIOs. The interface circuit must be asynchronous in the sense that, whenever an external I<sup>2</sup>C driver pulls down the voltage level to LOW, the interface must immediately detect it and send the information to other connected interface units. This detection and transmission is not challenging. The challenge is, sending LOW signal to other interface units makes their corresponding CIOs LOW and subsequently they send LOW signal(s) back. Thus, a "state-latching" phenomenon occurs and even when the external I<sup>2</sup>C driver releases the corresponding CIO, the CIO voltage might still be held down Figure 2.4 Connection of P82B96/PCA9600 I<sup>2</sup>C bus extension buffers. to LOW. If a master clock could have been used to control all the interface units, then some trigger could have been used to pull out the entire system from this state-latching. Thus, developing an asynchronous interface (without any master clock) that can behave like an open-drain interconnection to the external I<sup>2</sup>C drivers (uIC), while preventing "state-latching", is a challenging task. The situation is further exacerbated by the constraints of the WaferIC. Since WaferBoard is a recent innovation, references to compatible interface circuits are not available in the literature. To the best of our knowledge, no comparable interface circuit mimicking the behaviour of an open-drain connection has been reported in the literature. The closest existing circuits that we found are the P82B96 [23] and PCA9600 [24], two commercially available $I^2C$ bus extension buffers. Even though these circuits are not equivalent to the aim of this research, they have some similarity in their use of double interpretation voltage levels below $0.3 V_{\rm DD}$ to avoid a state-latching phenomenon. Fig. 2.4 shows the connection of P82B96/PCA9600 I<sup>2</sup>C bus extension buffers that connect multiple isolated groups of I<sup>2</sup>C drivers. The P82B96 bus extension buffer can interface I<sup>2</sup>C logic signals to similar buses having different voltage and current levels [23]. The PCA9600 is intended to isolate I<sup>2</sup>C-bus capacitance, therefore allowing long wires with higher loading capacitance than the I<sup>2</sup>C specifications to be driven [24]. PCA9600 can drive load of up to 4000 pF. The PCA9600 is a higher-speed version of the P82B96. It creates a non-latching, bidirectional, logic interface between a normal I<sup>2</sup>C-bus and a range of other higher capacitance and different voltage bus configurations. It can operate at speeds of up to 1 MHz, and the high drive side is compatible with the I<sup>2</sup>C Fast-mode Plus (Fm+) specifications. The PCA9600 features temperature-stabilized logic voltage levels that allows interfacing with I<sup>2</sup>C-derived buses such as SMBus, PMBus, or with microprocessors that use those same transistor-transistor logic (TTL) logic levels. #### 2.3 Analog Interface The WaferBoard was primarily developed to prototype digital electronic systems. Nowadays though most of the communication and signal processing is done in the digital domain, data acquisition systems and power supply components still are essentially analog devices. Prototyping and testing systems comprising such devices require analog interconnectivity. Thus, WaferBoard must have the ability to sense the voltage level from one end point and reconfigurably route that "information" through the embedded FPIN to another end point to prototype such systems. In general, an analog interface circuit reconfigurably routing an analog signal through a digital FPIN must comprise A/D (transmitter side) and D/A (receiver side) converters. A source IC provides an analog signal at one end point to the A/D converter (transmitter). The A/D converter transforms it into a digital format that will be reconfigurably routed through the FPIN to the D/A converter (receiver side). The digital data, upon reaching the D/A converter, is transformed back into a reconstructed copy of the original analog signal and provided to the destination IC at another end point. A solution was introduced in [22] to provide reconfigurable routing of analog signals in the WaferBoard [1]. The solution used frequency modulation of ring oscillator based VCOs that converted analog signals into discrete-valued pulses that could be reconfigurably routed through the FPIN. The analog signal was reconstructed (demodulated) from the discrete-valued pulses by a phase-locked loop (PLL). However, due to non-linearities in the voltage to frequency transfer curve of ring oscillator based VCOs, that solution could support input analog signals in the range of only 0.6-1.6 V for a power supply of 1.8 V and a bandwidth of 200 kHz [22]. Thus, the author of this thesis was motivated to find an alternative A/D and D/A converters solution that can overcome the drawbacks of the aforementioned analog interface within the existing constraints of the WaferBoard. #### 2.3.1 Analog to Digital Converter Both A/D and D/A converters can be classified into two main categories. - 1. Nyquist-rate and - 2. Oversampled converters In the first category, there exists a one-to-one correspondence between the input and output samples. Each input sample is processed, without any regard to the earlier (or later) input samples. In other words, this type of converters has no memory. Due to the one-to-one correspondence between each input sample and the corresponding output sample, the resolution of the quantizer has to be as high as possible to keep the quantization noise low. The sampling rate can be as low as the Nyquist's criterion, i.e., twice the bandwidth of the input signal. Due to limitations of electronic circuits, such as the finite gain-bandwidth of amplifiers and finite roll-off of low-pass filters, the actual sampling rate must be slightly higher than this minimum value. Nyquist-rate converters require high-accuracy analog components (amplifier, resistors, current sources or capacitors) in order to achieve acceptable linearity and accuracy. Thus, Nyquist-rate converters are often difficult to implement in scaled CMOS technology because of low supply voltages and poor transistor output impedance (due to short-channel effects) [25]. In oversampled converters, the sampling rate is higher than the Nyquist-rate and each input sample is processed with regard to a few previous sample(s). In other words, this type of converters has memory. Due to taking into consideration of the previous samples, the resolution of the quantizer can be lower. $\Sigma\Delta$ modulator and $\Delta$ -modulator are two types of oversampling converter that require low-resolution quantizer (usually 1-bit). Because of oversampling, oversampled converters can trade the extra samples for resolution in amplitude. Thus, mismatch in analog circuits can be tolerated. The use of higher sampling rate also eliminates the need for high roll-off in the analog anti aliasing filter at the input to the A/D converter, as well as in the low-pass filters in the D/A converter [26, 27]. The possibility of 1-bit output in $\Sigma\Delta$ modulator and $\Delta$ -modulator make them particularly suitable A/D converters for reconfigurable routing of analog signal in the FPIN-based prototyping platform [1] because it obviates parallel-to-serial and serial-to-parallel conversions at the transmitting and receiving side respectively. #### 2.3.2 $\Delta$ -Modulator A $\Delta$ -modulator is shown in Fig. 2.5(a). It utilizes an internal low-resolution quantizer (or A/D converter), a loop filter and a D/A converter in a feedback loop. Replacing the quantizer by its linear model, the corresponding z-domain (discrete-time) model of the $\Delta$ -modulator is shown in Fig. 2.5(b). A $\Delta$ -modulation waveform is shown in Fig. 2.6. The input waveform (u in Fig. 2.5(a)) is approximated by a "staircase" signal ( $\hat{u}$ in Fig. 2.5(a)) by the $\Delta$ -modulator. The step size of the staircase signal is fixed to a constant value e. The difference between u and $\hat{u}$ is confined to two levels, i.e. + e and -e. If u changes too rapidly, $\hat{u}$ cannot track Figure 2.5 $\Delta$ -modulator. Figure 2.6 Waveform at various nodes of the $\Delta$ -modulation in Fig. 2.5(a). or "hunt" u properly. Such phenomenon is called slope overload error. By using higher $f_s$ (or smaller $T_s$ ), $\hat{u}$ can be made track u properly. The equivalent mathematical condition of proper tracking or "hunt" is $$\max\left\{\frac{du}{dt}\right\} < \frac{e}{T_s} \tag{2.1}$$ Assuming, $$u = A\cos(2\pi f t + \theta) \tag{2.2}$$ Eq. 2.1 becomes $$2\pi f A < \frac{e}{T_s} = e f_s \tag{2.3}$$ To avoid slope overload error, the maximum amplitude of the input sinusoidal signal has to follow the following inequality, $$A_{max} < \frac{e}{2\pi} \left\{ \frac{f_s}{f} \right\} \tag{2.4}$$ If it is assumed that the spectral power density of the quantization noise (e) is uniformly distributed, then the mean power of the quantization noise is $$P_{e\_total} = \frac{1}{2e} \int_{-e}^{e} \varepsilon^{2}(t) d\varepsilon(t) = \frac{e^{3}}{3}$$ (2.5) If the $\Delta$ -modulator output is passed though a reconstruction filter with bandwidth of B, the in-band power of the quantization noise is $$P_{\delta} = \frac{1}{3} \left( \frac{B}{f_s} \right) (e^2) \tag{2.6}$$ Comparing the in-band quantization noise power with the signal power, some mathematical manipulation between Eq. 2.4 and Eq. 2.6 [28] leads to the following expression of signal-to-quantization noise ratio (SQNR) in a $\Delta$ -modulator. $$SQNR = \frac{3}{8\pi^2} \times \frac{f_s^3}{Bf^2}$$ (2.7) Here, B = Bandwidth of the receiving low pass filter $$f_s = \text{Sampling frequency}$$ (2.8) f =Input signal frequency It can be seen that the SQNR is inversely proportional to the square of the input frequency. The dependency of the SQNR of a $\Delta$ -modulator on the input frequency is a disadvantage compared to $\Sigma\Delta$ modulator. However, $\Delta$ -modulator can provide some practical advantage in terms of implementations compared to $\Sigma\Delta$ modulator (detailed explanation in Chapter 6). #### 2.3.3 $\Sigma \Delta$ Modulator A $\Sigma\Delta$ modulator is shown in Fig. 2.7(a). It utilizes a feedback loop containing a loop filter, an internal low-resolution quantizer or A/D converter, and a D/A converter. Corresponding z-domain (discrete-time) model of the $\Sigma\Delta$ modulator is shown in Fig. 2.7(b). Analysis gives, $$v(n) = u(n-1) + e(n) - e(n-1)$$ (2.9) Figure 2.7 $\Sigma\Delta$ modulator. Thus, the output contains a delayed replica of the input signal u, and a differentiated version of the quantization error e. The differentiation of the error e suppresses it at frequencies which are small compared to the sampling rate. If the loop filter has a high gain in the signal band, the in-band quantization "noise" is strongly attenuated. The output noise due to the quantization error in the $\Sigma\Delta$ modulator is, $$q(n) = e(n) - e(n-1) (2.10)$$ In the z-domain, this becomes, $$Q(z) = (1 - z^{-1})E(z)$$ (2.11) In the frequency domain, after z is replaced by $e^{j2\pi fT}$ , the power spectral density (PSD) of the output noise is found to be, $$S_q(f) = (2\sin(\pi fT))^2 S_e(f)$$ (2.12) Here, $T = 1/f_s$ is the sampling period, and $S_e(f)$ is the 1-sided PSD of the quantization error e of the internal quantizer. For "busy" (i.e., rapidly and randomly varying) input signals, e may be approximated as white noise of rms value $e_{rms}^2 = \Delta^2/12$ , where $\Delta$ is the step size of the quantizer, and thus, $$S_e(f) = \frac{\Delta^2}{6f_s} \tag{2.13}$$ The filtering function $1 - z^{-1}$ is called the noise transfer function (NTF). The squared magnitude of the NTF as a function of frequency is illustrated in Fig. 2.8. The NTF of the Figure 2.8 Noise-shaping function for the $\Sigma\Delta$ modulator shown in Fig. 2.7(a) $\Sigma\Delta$ modulator is a high pass filter function. Since, $$\int_0^{\frac{f_s}{2}} S_q(f)df = \int_0^{\frac{f_s}{2}} (2\sin(\pi f T))^2 S_e(f)df = 2e_{rms}^2$$ (2.14) the overall quantization noise actually increases. But as can be seen in Fig. 2.8, the "increase" actually occurs in the frequency near $f_s/2$ , while in the low frequency range, the noise is attenuated. This process is called *noise shaping*. The accumulated noise near $f_s/2$ can be removed by digital filtering. Removal of accumulated noise by digital filtering makes oversampling converters highly compatible to scaled CMOS technology which is better suited for providing fast digital circuits than precise analog circuits. Oversampling ratio (OSR) is defined as how much faster the input analog signal is sampled in the oversampled converter than in a Nyquist-rate converter, $$OSR = \frac{f_s}{2f_B} \tag{2.15}$$ where $f_B$ is the maximum signal frequency, i.e. the signal bandwidth. Integrating $S_q(f)$ between 0 and $f_B$ , assuming $OSR\gg 1$ , gives the in-band noise power to be $$q_{rms}^2 = \frac{\pi^2 e_{rms}^2}{3(OSR)^3} \tag{2.16}$$ As expected, the in-band quantization noise decreases with the increase of OSR. Doubling the OSR increases the effective number of bit (ENOB) by only about 1.5 bits. Assuming single- Figure 2.9 A third-order $\Sigma\Delta$ modulator. bit quantization, OSR = 64 entails ENOB<10. The $\Sigma\Delta$ modulator shown in Fig. 2.7(b) is called a first-order modulator because it has only one integrator (or accumulator) and one feedback branch in the loop. It is possible to have multiple integrators and feedback branches in the loop that would give us a multi-order modulator. Such multi-order modulators can provide more effective noise-shaping. A third-order $\Sigma\Delta$ modulator is shown in Fig. 2.9. For an L<sup>th</sup>-order modulator, the in-band noise is [27], $$q_{rms}^2 = \frac{\pi^{2L} e_{rms}^2}{(2L+1)(OSR)^{2L+1}}$$ (2.17) Fig. 2.10 & 2.11 present two graphs that compare various low-pass $\Sigma\Delta$ modulators (A/D converter) recorded in the literature. Fig. 2.10 compares silicon area usage plotted at various CMOS technology nodes. It is observed that the area requirement of the $\Sigma\Delta$ modulators ranges between three and seven orders of magnitude. Fig. 2.11 classifies the cited designs using the classic figure of merit (FoM) [59], defined as, FoM = $$\frac{P}{2^{\frac{\text{SNDR}-1.76}{6.02}} \times 2 \times BW}$$ Here, $$BW = \text{signal bandwidth}$$ $$P = \text{power consumption}$$ A primary constraint of the WaferBoard is the silicon area. From Fig. 2.10, it can be seen that the $\Sigma\Delta$ modulator proposed in [29] offers the smallest silicon footprint. However, the $\Sigma\Delta$ modulator in [29] can support a $V_{\text{P-P}}$ of only 0.4 V and 0.8 V for single-ended and differential implementation respectively. Even though it consumes a very small silicon area, it suffers from DC level shifting in the reconstructed signal. The non-linearity of the front-end voltage-to-time converter (VTC) of the $\Sigma\Delta$ modulator in [29] was a large contributor to its Figure 2.10 Comparison between $\Sigma\Delta$ modulators of given technology process node versus area ( $\mu$ m<sup>2</sup>). Figure 2.11 Comparison between $\Sigma\Delta\,\mathrm{modulators}$ of given technology process node versus FoM (pJ). limited bandwidth and signal-to-noise and distortion ratio (SNDR). From Fig. 2.11, it can be seen that the $\Sigma\Delta$ modulator (A/D converter) proposed in [30] offers the smallest FoM. The $\Sigma\Delta$ modulator in [30] is an asynchronous $\Sigma\Delta$ modulator (ASDM). ASDMs can provide a low power and compact implementation of amplitude-to-time conversion. The output of ASDM is a continuous-time discrete-valued signal, rather than being digital. In the target application of digital FPIN-based prototyping platform, a dedicated 1-bit channel is available for transmission that can support propagation of purely digital as well as continuous-time discrete-valued signals. Considering the limited bandwidth and voltage range of the analog interface that was developed for the WaferBoard [1] and introduced in [22], an A/D and D/A converters based on ASDMs or ADMs appear to be the best solution for an analog interface for the WaferBoard. An ASDM or ADM based analog interface circuit can provide the following advantages: - It has inherent real-time calibration of the analog-to-digital (or amplitude-to-time) conversion. - It does not require any clock signal to synchronize the transmission. Sec. 2.3.4 presents an overview of the ASDM principles of operation. #### 2.3.4 Asynchronous $\Sigma\Delta$ Modulator Asynchronous $\Sigma\Delta$ modulators (ASDMs) do not have any sampling operation. A theoretical analysis of ASDM is presented in [2]. Being a closed-loop nonlinear system without any sampling operation, ASDMs cannot be transformed into equivalent z-domain linear models. As a result, ASDMs are analyzed by the describing function (DF) method [2,30,60]. ASDMs can convert continuous-time analog input signals into continuous-time discrete-valued output signals. ASDMs encode the amplitude of the input signal into the pulse-width of the output signal [2]. ASDMs can provide a very compact implementation of a high-resolution amplitude-to-time converter. Even though an ASDM does not have/require any sampling clock, it has an equivalent self-oscillation frequency called the limit cycle frequency that depends on its circuit parameters. A sufficiently high limit cycle frequency is used in the ASDM to avoid spectral overlap with the input modulating frequency [2]. An ASDM is shown in Fig. 2.12. It is similar to a continuous-time $\Sigma\Delta$ modulator (CT- $\Sigma\Delta$ ) except for the absence of the sampling operation. ASDMs can be used as high-precision A/D converters in applications that do not require explicit digitalization. Due to the equivalence of the spectrum of the modulating input signal and the low-frequency part of the spectrum of the ASDM output, the input signal can be reconstructed by a simple low-pass filter from the ASDM output. ASDM applications have Figure 2.12 Asynchronous $\Sigma\Delta$ modulator. been reported in the context of ADSL/VDSL line drivers [30,61], power converters [62], drivers for optical cables [63], and A/D converters [64]. As ASDMs allow very compact and robust implementation of a high-resolution amplitude-to-time converter, they have great potential to be used as the quantizer in the target application of WaferBoard as an A/D converter. A quantizer/1-bit comparator with hysteresis is required to ensure oscillation for first and second-order loop filters in an ASDM. Assuming that the output waveform of the ASDM can have values of $\pm 1$ and $$L(\omega) = \frac{p}{p + j\omega},$$ then for stationary or DC input $v_{\rm in} = V$ (in Fig. 2.12), $$\omega = \omega_c (1 - V^2)$$ where, $\omega = \frac{2\pi}{T}$ and $|V| < 1$ (2.18a) $$2\frac{\alpha}{T} - 1 = V \text{ where, } \frac{\alpha}{T} = \text{the duty-cycle}$$ (2.18b) $$\omega_c = \frac{\pi p}{2h} \text{ when, } V = 0 \tag{2.18c}$$ Eq. 2.18b shows that the duty cycle of the output square wave has a linear relationship with the input voltage. In fact, this relationship is true for DC input only. For an harmonic input of frequency $\mu$ , the relation between v, the duty-cycle, and the frequency is governed by the more general equation [2], $$v_{\rm in} - \left(2\frac{\alpha}{T} - 1\right) = \frac{2}{\pi} \sum_{n=1}^{\infty} \frac{\text{Re}L(n\omega_i)}{nL(\mu)} \sin\left(2\pi n \frac{\alpha}{T}\right)$$ (2.19a) $$\sum_{n=1}^{\infty} \frac{1}{n} \sin^2\left(n\pi \frac{\alpha}{T}\right) \operatorname{Im} L(n\omega_i) = -\frac{\pi}{4}h$$ (2.19b) Where, $v_{\rm in} = v_m \cos \mu t$ and $\omega_i$ is the instantaneous frequency of the output square-wave. Solving Eq. 2.19 for $\mu = 0$ results in Eq. 2.18. However, for $\mu \neq 0$ , Eq. 2.19 evolves into (Appendix A.1), $$2\frac{\alpha}{T} - 1 = (1 + d_1)v_{\rm in} - d_3v_{\rm in}^3$$ (2.20) where, $$d_1 = d_3 = \frac{\pi^2}{6} \frac{\text{Re}L(\omega_i)}{L(\mu)}$$ Inserting $v_{\rm in} = v_m \cos \mu t$ into Eq. 2.20 results in, $$2\frac{\alpha}{T} - 1 = (1 + d_1)v_m \cos \mu t - d_3(v_m \cos \mu t)^3$$ (2.21) Thus, it can be seen that in the process of amplitude-to-time conversion, there is a third harmonic distortion. The distortion coefficient $\Delta_3$ , defined as the ratio of the third harmonic component and the fundamental component is, $$\Delta_3 = \frac{\pi^2 \operatorname{Re}L(\omega_i)}{L(\mu)} v_m^2 \approx \frac{\pi^2 \mu^2}{24 \omega_i^2} v_m^2$$ (2.22) #### 2.4 Differential Interface Differential signaling transmits the same electrical signal as a differential pair of signals, each in its own conductor. As external electromagnetic interferences tend to affect both conductors similarly and the receiving end only detects the difference between the conductors, differential signaling can resist common mode electromagnetic couplings that affect single-ended signaling. The focus of our research was to investigate *spatially configurable* propagation paths for differential-to-single-ended conversion in a digital FPIN-based prototyping platform. Beside the research conducted by the author (and Olivier Valorge), the concept of spatially reconfigurable differential signal transmission in a digital FPIN-based prototyping platform remains unexplored in the literature. #### 2.4.1 Compatibility with the WaferBoard Conventional differential signaling output drivers shown in Fig. 2.13(b) typically consist of an open-drain differential pair and a voltage-controlled current source [65]. Supporting differential signaling in the WaferBoard implies that any two complementary pins (output of the differential pair in Fig. 2.13(b)), whatever their positions on the board/platform, can be declared as being part of a differential interface. However, in practice, the spacing between Figure 2.13 Differential buffer structure. two such differential pins rarely exceeds $2 \,\mathrm{mm}$ [66]. Thus, any pair of pins propagating a differential signal can be arbitrarily positioned in an oriented window of $2 \,\mathrm{mm} \times 2 \,\mathrm{mm}$ . The CIO/NanoPad density in the WaferBoard is about $64 \,(8 \times 8)$ CIO/mm<sup>2</sup>. Mapping the $2 \,\mathrm{mm} \times 2 \,\mathrm{mm}$ area onto the fabric of the WaferBoard means that the differential interface must be able to drive or receive signals from any CIO within an array of $16 \times 16$ CIOs. #### 2.4.2 Physical and Electrical Constraints of Differential Signaling The spatially configurable differential interface must meet several electrical and physical constraints. First, the two differential signals must maintain their symmetry as they "propagate" through the interface (and the FPIN) from the source uIC to the destination uIC. This symmetry depends on the path taken by the two signals through the proposed interface. Asymmetry in the propagation path could induce jitter or phase difference between the signals in a differential signal pair that can lead to errors in the transmitted information. Very stringent jitter constraints exist for most high-speed interfaces. For example, in the PCIe transmission protocol, 30% of the bit length is the maximum allowed jitter [67], which represents a maximum jitter of 120 ps for a data rate of 2.5 Gbps. Slight length or load asymmetry between the two signal paths can cause such jitter or phase difference. Besides symmetry issues, another set of issues stem from the fact that during propagation of high frequency signals, PCB traces can no longer be modelled with lumped parameter circuit elements. Indeed, these traces behave as transmission lines. As a result, reflection at the receiving end and attenuation become prominent in the signal characteristics. To avoid such phenomena, impedance matching is typically done in every stage of a transmission path [68]. #### CHAPTER 3 ORGANIZATION OF THESIS #### 3.1 Organization of Thesis The research contributions of this thesis are divided into five chapters (Chapters 4–8). Each chapter, from Chapters 4–7, presents a research article that was prepared as part of this thesis in order to complete the objectives stated in Sec. 1.4. Each of these articles develop interface circuits for programmable interconnects so that it can enhance the versatility of FPGAs and/or WaferBoard. Chapter 4 includes a published paper [69] reporting the bi-directional interface and a compatible star interconnect topology. Chapter 5 includes another published paper [70] reporting measurement results from a test-chip validating the bi-directional interface and a ring-based interconnect topology. Chapter 6 includes a submitted paper [71] reporting measurement results from a test-chip validating the analog interface based on a novel circuit-implementation of the asynchronous $\Delta$ -modulator (ADM). Chapter 7 includes a paper [72] reporting measurement results from a test-chip validating a differential interface. Chapter 8 includes an alternative solution, improving the work presented in Chapter 7. #### 3.2 Article-1 and Article-2 Article-1 and Article-2, constituting Chapter 4 and Chapter 5, present the open-drain interface circuit and two interconnect topologies. The open-drain interface circuit and the interconnect topologies were developed to support bi-directional communication through unidirectional digital FPINs. #### 3.2.1 Article-1 (Chapter 4) The main contributions of this article are: — An open-drain interface circuit and a star interconnect topology that have been proposed by the author in reference [69] W. Hussain, Y. Savaria, and Y. Blaquiere. An interface for the I<sup>2</sup>C protocol in the WaferBoard. In *Circuits and Systems (ISCAS)*, 2013 IEEE International Symposium on, pages 1492–1495, 2013. Connected according to the star topology, each interface unit has point-to-point communication with all the others. Point-to-point communication leads to the simplest design and minimized delays when a small number of pins need to be connected. The star interconnect topology has an interconnection complexity of $\Theta(n^2)$ for n interface units. Post-layout simulation was reported in [69]. # 3.2.2 Article-2 (Chapter 5) The main contributions of this article are: A solution is proposed to a complexity problem that stems from the fact that an $\Theta(n^2)$ complexity gets very expensive when n grows. There is also a need to overcome a limit on the value of n due to the fan-in of the unit cells of the WaferBoard [1]. Thus, a bi-directional bus emulation with an interconnect topology of $\Theta(n)$ complexity was developed and is reported in [70] W. Hussain, Y. Savaria, and Y. Blaquiere. An interface for open-drain bi-directional communication in field programmable interconnection networks. This paper was accepted for publication in IEEE Transactions on Circuits and Systems I: Regular Papers, August **2015.** The $\Theta(n)$ complexity interconnect topology is structured as a ring or queue. Measurement results from a test-chip that was fabricated in a 0.13 µm CMOS technology were reported in [70]. Measurement results show that several instances of this interface circuit can be successfully interconnected through the $\Theta(n)$ complexity interconnect topology that mimics the "wired AND" property of open-drain (or opencollector) connections. A comprehensive delay model has also been developed. This model can be used to calculate the maximum number of interface circuits that can be interconnected in a network for a given communication speed. # 3.3 Article-3 (Chapter 6) Silicon area is a main constraint of the prototyping platform [1] (see Sec.2.1.2). Thus, the A/D and D/A converters that must be embedded in any analog interface must be very compact. An asynchronous $\Delta$ -modulator (ADM) can provide a very compact and robust implementation of an amplitude-to-time converter that can be used as an A/D converter in the analog interface. A novel circuit-implementation of an ADM was thus proposed and developed for A/D conversion and reported in Article-3 (Chapter 6). The main contributions of this article are: — An analog interface circuit, based on a proposed ADM, was developed to support analog signal transmission in the FPIN of a prototyping platform [1]. The analog interface circuit utilizes the proposed ADM where its output is directly propagated through the FPIN. The analog interface is reported in [71] W. Hussain, F. Hussein, Desgreys P., Y. Savaria, and Y. Blaquiere. An asynchronous Δ-modulator based A/D converter for an electronic system prototyping platform. Submitted in IEEE Transactions on Circuits and Systems I: Regular Papers, September 2015. The proposed ADM was fabricated in a 0.13 $\mu$ m CMOS technology. Measurement results showed a signal-to-noise ratio (SNR) and a SNDR of 57 and 47 dB respectively for input bandwidth of 2 MHz. The ADM occupies an active area of 45 $\mu$ m $\times$ 22 $\mu$ m. The entire A/D and D/A converter-pair consumes 0.15 mA from a 3.3 V supply and occupies a total area of 45 $\mu$ m $\times$ 46 $\mu$ m. Comparisons of the proposed ADM with three other published competitive A/D converters were reported in [71]. # 3.4 Article-4 (Chapter 7) and Chapter 8 CML is a high-speed differential signaling circuit topology that enables transmitting data through pins at several giga bits per second with existing CMOS technologies. A novel spatially configurable differential interface was proposed and developed according to physical and electrical constraints of CML signalling to support differential signalling on a digital FPIN-based electronic prototyping platform such as the WaferBoard [1]. The two main requirements of the differential interface are: - Spatial reconfigurability to select the two differential pins of an uIC, when the uIC can be randomly placed anywhere on the electronic prototyping platform. - Matching of the differential signal paths from the uIC pins to the input of the differential-to-single-ended converter for all possible locations of the NanoPads or CIOs on the electronic prototyping platform. Two differential interfacing solutions were elaborated and developed according to the aforementioned two requirements: # 3.4.1 Article-4 (Chapter 7) The main contributions of this article are: The differential interface was originally developed by Olivier Valorge, a postdoctoral fellow at Polytechnique Montréal. Alternative implementation(s) of the original concept have been investigated by the author of this thesis to find a more cost effective solution that meets the constraints of the WaferBoard [1]. The architecture of the originally proposed differential interface consists of an input stage and an output stage. The differential input stage receives the complementary differential signals from the uICs and converts them into a single ended signal before injecting it into the digital FPIN of the WaferBoard. The differential input stage has a differential-to- single-ended converter. However, before conversion, it must be ensured that the two differential signals reach the differential-to-single-ended converter without excessive phase difference. In other words, the signal paths from the uIC pins to the input of the differential-to-single-ended converter must be adequately "matched" for all possible locations of the pins (NanoPads or CIOs) on the WaferBoard. An H-tree structure with multiple hierarchical levels is used in the differential input stage (explained in Sec. 7.3) to maintain symmetry and to balance all possible propagation paths. Unitygain buffer based analog multiplexers were used in each stage of the H-tree structure. The first version of the differential interface proposed by Valorge has been fabricated in a test-chip implemented using a mature 0.18 µm CMOS technology. Measurement results from the test-chip performed by the author is reported in reference [72] W. Hussain, O. Valorge, Y. Savaria, and Y. Blaquiere. A novel spatially configurable differential interface for an electronic system prototyping platform. Submitted in Integration, the VLSI Journal - Elsevier, May 2015. Measurements on the test-chip show that the configurable differential interface can operate at a speed of up to 2.5 Gbps. #### 3.4.2 Chapter 8 The main contribution of this chapter is: — A pass-transistor multiplexer based differential input stage that was investigated for the spatially configurable differential interface reported in [72]. Its main purpose is to develop a less costly solution for the WaferBoard. This differential input stage was also developed according to CML differential signaling specifications. The differential input stage can be implemented with standard CMOS processes and is fully compatible with a digital FPIN-based prototyping platform. This input stage can replace the input stage based on unity-gain buffer based multiplexers used in [72]. The pass-transistor multiplexer based differential input stage can support data rates of up to 2 Gbps while occupying significantly less silicon area (1/20 th) compared to the input stage based on unity-gain buffer based multiplexers. # CHAPTER 4 ARTICLE 1: AN INTERFACE FOR $I^2C$ PROTOCOL IN WAFERBOARD<sup>TM</sup> #### Summary of the Chapter An open-drain interface circuit, along with an interconnection topology has been conceived by the author of this thesis. It can support bi-directional communication (i.e. $I^2C$ protocol) through an uni-directional network. Initially, simulation results were used to validate the concept. The idea was initially introduced (and subsequently published) in a **lecture presentation** in IEEE International Symposium on Circuits and Systems (ISCAS) on 2013. The interface unit can interconnect multiple open-drain I/Os. It was originally developed for Waferboard<sup>TM</sup> [1]. Designed according to the specification of $I^2C$ protocol, the interface unit can support a speed of up to 3.4 Mbit/s. **The published paper is reproduced in this chapter.** #### Title: An Interface for I<sup>2</sup>C Protocol in WaferBoard<sup>TM</sup> Wasim Hussain, Yves Blaquière, Yvon Savaria. (Published). Circuits and Systems (ISCAS), 2013 IEEE International Symposium on, pp. 1492–1495. #### **Abstract** This paper presents a circuit proposed for the DreamWafer<sup>TM</sup> technology. This circuit can interconnect several pads, also called NanoPads, in such a way that they can imitate the behavior of a "single metal line" for open-drain (or open-collector) buses compliant to the I<sup>2</sup>C protocol. Thus, multiple serial data lines (SDA) and serial clock lines (SCL) from different user ICs can be connected together on the Waferboard<sup>TM</sup>. The interface can support up to 25 I<sup>2</sup>C IC pins together. It can support bidirectional data transfers at up to 100 kbit/s in the Standard-mode, up to 400 kbit/s in the Fast-mode, up to 1 Mbit/s in the Fast-mode Plus, or up to 3.4 Mbit/s in the High-speed mode. The entire interface would take less than 1% of the total area of the WaferIC<sup>TM</sup>, the target system environment for which this circuit is proposed. #### **Keywords** Re-programmable Circuit Board, I<sup>2</sup>C protocol, Open Collector Bus, Bidirectional Bus. #### 4.1 Introduction The I<sup>2</sup>C protocol is a popular communication standard. It is a bidirectional multi-master serial bus developed by Philips. I<sup>2</sup>C is used in various control architectures such as the System Management Bus (SMBus), the Power Management Bus (PMBus) [17], the Intelligent Platform Management Interface (IPMI), the Display Data Channel (DDC) and the Advanced Telecom Computing Architecture (ATCA). I<sup>2</sup>C [14] uses two bidirectional open-drain (or open-collector) lines named Serial Data Line (SDA) and Serial Clock Line (SCL). Both lines have external pull-up resistors. The I<sup>2</sup>C protocol has no explicit signal to specify the direction of data in the bus. Rather, there are some rules embedded in the protocol, like clock synchronization, arbitration and clock stretching [14] by which all the ICs connected to a bus "realize" when they are supposed to write into the bus, read from the bus or stay idle. All those rules are based on one electrical property of the bus. That is, the open-drain bus behaves as a set of "wired AND". Unlike CMOS driver logic, there is no possibility of undefined state in the bus. Indeed, no matter how many ICs are connected to the bus, if only one of them outputs a LOW on the bus, the bus will become LOW. An active reconfigurable board, called the WaferBoard, has been proposed in [1]. WaferBoard is intended to be an alternative to PCBs for providing interconnections among multiple user ICs (uICs) during testing and prototyping. WaferBoard<sup>TM</sup> is being developed to support as many types of communication standards as possible. It can already support bidirectional buses with an *explicit enable signal*. But in its current version, it cannot support bidirectional buses like those following the I<sup>2</sup>C protocol. This paper presents an interface that can interconnect several miniature pads found on top of the WaferIC, called NanoPads, through a mechanism that imitates the behavior of an I<sup>2</sup>C bus. Section-II provides a description of the WaferBoard<sup>TM</sup>. Section-III describes the proposed interface and Section-IV presents simulation results obtained from post-layout circuit extraction. Finally, section-V concludes this paper by summarizing the proposed interface and suggesting some enhancements. ### 4.2 Background In the WaferBoard<sup>TM</sup>, the active surface on which user ICs (uICs) are to be deposited is called WaferIC<sup>TM</sup>. Its surface has a dense array of very fine (tens of microns) conducting pads which are called NanoPads. Each NanoPad is connected to an internal wafer-scale interconnect network, called WaferNet<sup>TM</sup>, that can be configured to connect any two NanoPads, whatever Figure 4.1 WaferIC<sup>TM</sup> with ICs deposited: (a) top view, (b) cross section view. their position is. The NanoPads are able to make contact with the solder ball or pins of uICs. With the WaferIC<sup>TM</sup>, hand-placement is sufficient, as shown in Fig. 4.1(a). The building block of the WaferIC<sup>TM</sup> is called a unit cell. Each unit cell contains an array of $4 \times 4$ NanoPads, I/O buffers, routing and control circuitries, and a multiplexer-based crossbar. This crossbar routes incoming signals to one of the NanoPads belonging to itself or other unit cells. When a signal is propagated through the WaferIC<sup>TM</sup>, it is routed from unit cell to unit cell until it reaches the destination NanoPad and the corresponding uIC pin [1]. The interface, proposed in this paper, is to be integrated in each unit cell. Signals are routed through the WaferNet<sup>TM</sup> by digital multiplexer-based crossbars. Each NanoPad has its own I/O buffers. If a NanoPad is to operate as an input, then the respective I/O buffer is configured accordingly and this buffer receives the signal from a uIC and propagates it to the embedded crossbar of the local unit cell to which it belongs. The crossbar routes the signal to the link pointing towards the destination NanoPad's unit cell. At the destination cell, the NanoPad I/O buffer is configured as an output buffer. Normally this configuration is done before "testing" or "prototyping" with uICs. A bidirectional bus compatibility is provided by accommodating an enable signal which can dynamically configure the I/O buffers as input or output during testing operations. For example, the communication between a microprocessor and some off-chip memory is a bidirectional line that is controlled by a direction bit typically generated by the microprocessor itself. This direction bit can be used in the WaferIC<sup>TM</sup>to control when the corresponding NanoPads' I/O buffers are to be configured as input buffer or output buffer. By contrast, in the I<sup>2</sup>C protocol, there is not any explicit direction bit or master controller. It is a multi-master bus where any connected component can assume the role of master and can control the direction of the data. Such communication protocol cannot be supported by the existing version of the WaferIC<sup>TM</sup>. In the I<sup>2</sup>C protocol (shown in Fig. 4.2), when a Figure 4.2 I<sup>2</sup>C Bus. component wants to output HIGH, the output driver does not output an explicit HIGH. Rather, the output driver releases the bus, and an external common pull up resistor pulls up the bus to $V_{\rm DD}$ . When a component wants to output a LOW, it explicitly drives the bus to LOW because the pull down is stronger than the pull up resistor. # 4.3 Proposed Interface for I<sup>2</sup>C Compatibility To emulate the behavior of an open-drain bus, whenever one of the uIC pins outputs a LOW to its corresponding NanoPad, the interface must be able to detect it and send a signal to the other NanoPads to produce a LOW. Also, when the uIC outputs a HIGH by releasing the NanoPad, the interface must detect it and send signals to the interconnected NanoPads so that they produce a HIGH. A schematic of the proposed interface is shown in Fig. 4.3(a). Instead of a pull up resistor, a pull up pMOS is used. When such interfaces are interconnected through crossbars embedded in each unit cell, the resulting group of NanoPads emulates an open-drain bus. First we consider the case where two such interfaces are connected through a crossbar, as shown in Fig. 4.3(b). Each interface's NAND gate will have only one signal from the other interface and the remaining inputs of the NAND gates are held at $V_{\rm DD}$ . In Fig. 4.3(b) the NAND gate is behaving as an inverter. When, none of the uICs is outputting a LOW, both NanoPads will be held at $V_{\rm DD}$ by their respective pull up pMOS. Also, both NanoPads will send HIGH to each other and as a result, the internal pull down nMOS remain OFF and the NanoPads continue to be held at $V_{\rm DD}$ . If one of the uICs outputs a LOW, the corresponding NanoPad becomes LOW because compared to the internal pull up pMOS of the interface, the output driver of I<sup>2</sup>C uIC is stronger. With the I<sup>2</sup>C protocol, the signals are often propagated through long metal lines. Thus, the drivers have to be sufficiently strong to quickly discharge the capacitance of such lines. Usual I<sup>2</sup>C drivers can sink current of 2-3 mA [14], while the internal pull up pMOS in the proposed interface can supply less than 50 $\mu$ A. Let us assume uIC1 outputs LOW in Fig. 4.3(b) and NanoPad1 becomes LOW. Further assume that uIC2 is not outputting a LOW. In that case, since NanoPad1 is LOW, it will send LOW through the crossbar to NanoPad2 and turn on its internal pull down nMOS. Thus, NanoPad2 will also become LOW even though uIC2 is not driving it LOW. The opposite would have happened if instead of uIC1, uIC2 outputs LOW. This configuration suffers from a latching problem. As described before, when NanoPad2 becomes LOW it will also send a LOW signal through the crossbar to NanoPad1. Thus, the internal pull down nMOS of NanoPad1 will also turn on. This gives rise to the latching problem. I<sup>2</sup>C drivers output "explicit" LOWs but not "explicit" HIGHs. When a driver outputs HIGH, it will release the bus or the NanoPad in the present case. After outputting LOW, when uIC1 releases NanoPad1, it is supposed to be pulled up to $V_{\rm DD}$ . But since, the internal pull down nMOS of NanoPad1 is turned on by the LOW signal from NanoPad2, NanoPad1 may not be pulled up to $V_{\rm DD}$ when uIC1 releases it. An apparent solution to this problem is to make the pull up pMOS strong and the internal pull down nMOS weak. However, in the I<sup>2</sup>C protocol, the maximum allowable $V_{\rm IL}$ is 0.3 $V_{\rm DD}$ . If the pull up pMOS is made too strong compared to the pull down nMOS, then the voltage level of the NanoPad might not fall below 0.3 $V_{\rm DD}$ . The approach taken to solve that problem was to break the loop. This was done by giving two distinct logical interpretation to some voltages observed on the NanoPads. This is possible as the pull down driver in chips designed according to the I<sup>2</sup>C standard are a lot stronger than the nMOS pull down we propose in Fig. 4.3(a). Indeed, by having a pulldown in the WaferIC™ that is much weaker than those found in I<sup>2</sup>C chips, but still significantly stronger than the pMOS pull up, we can have a LOW value less than 150 mV (termed $V_{\rm OL\_1}$ ) when it is driven by a standard I<sup>2</sup>C compatible chip, or of the order of 250 mV (termed $V_{\rm OL/2}$ ) when it is driven by the nMOS pull-down in the WaferIC<sup>TM</sup>. In that case, a sensing circuit such as the one proposed in Fig. 4.3(c) can have different logical interpretations for a low voltage driven by an I<sup>2</sup>C compatible chip and one driven by our nMOS driver, even though the standard I<sup>2</sup>C bus would interpret both voltages as a logical LOW, as required by the I<sup>2</sup>C protocol. This allows breaking the 'logical loop' that would otherwise result from the circuit in Fig. 4.3(b). The desired functionality is obtained with a biased differential amplifier where $M_1$ is approximately two times wider than $M_2$ . The second differential $(M_{3-4})$ pair is used only for amplification and level shifting purpose to make the whole circuit robust against process variations. When the voltage at the NanoPad is below $V_{\rm OL-1}$ or 0.15 V ( $V_{\rm OL-1}$ can be considered as the tripping voltage of the biased differential pair), the buffer will send LOW else it will send HIGH. Let us reconsider the circuit of Fig. 4.3(b) where the level sensing buffers are replaced by Figure 4.3 (a) Proposed interface. (b) Two instances of the interface interconnected together. (c) The level sensing buffer to remove the latching problem. (d) Block diagram. the circuit shown in Fig. 4.3(c). Assuming uIC1 outputs LOW to NanoPad1, the voltage of NanoPad1 drops to below 150 mV ( $V_{\rm OL}_{-1}$ ). NanoPad1 will send a low signal to NanoPad2 through the WaferNet<sup>TM</sup>. As a result, the internal pull down nMOS of NanoPad2 will be turned on and the voltage of NanoPad2 will be pulled down to approximately 0.25 V ( $V_{\rm OL}_{-2}$ , assuming that the operating $V_{\rm DD}$ of WaferIC<sup>TM</sup> is 1.8 V). That voltage level will be "interpreted" as LOW by uIC2. But, since that voltage level is not below 0.15 V, NanoPad2 will not send LOW to NanoPad1. Thus, the two interconnected NanoPads imitate the behavior of an open-drain bus, even though internally the NanoPads are loop-connected through the crossbar but not by any direct metal line. The interconnection principle of Fig. 4.3(b) can be extended to more than two NanoPads. A block diagram of the interface is shown in Fig. 4.3(d). In a case of N NanoPads, each interface must receive the $To\_crossbar$ signals from the other N-1 interfaces at its own $NAND\_Input\_j$ ( $j=1 \ to \ N$ -1). The interface can be integrated into the WaferIC<sup>TM</sup> in each unit cell that can receive 24 signals from the crossbar [73]. Thus, the maximum N this interface can support is Figure 4.4 Full schematic of the interface. 25, which would require a 24-input NAND gate. However, NAND gates with fewer inputs can also be used, which would reduce the maximum number of interfaces that can be supported. The complete schematic of the interface is shown in Fig. 4.4. The number of inputs in the NAND gate is a design choice. The NAND gate delay should not be a problem as the operating speed of I<sup>2</sup>C buses is always less than 5 MHz [14]. It should be mentioned that when this I<sup>2</sup>C interface is activated, the regular I/O buffers will have to be deactivated. Currently the multiplexers being used in the crossbar is a 32-to-1 multiplexer which has six unused inputs. One such input can be used for *To\_crossbar* signal. On the other hand, 24 signals from neighboring unit cells arrive at each unit cell. Those 24 signals are not needed "individually". Rather, the NAND output of those 24 signals are needed. Thus, those 24 signals are simply tapped and input into the NAND gate of the interface. The unused inputs of the NAND gate must be kept at $V_{\rm DD}$ . That can be accomplished in two ways. One is through the unused crossbar link [73]. In that case, the unit cell from which that link originates has to be configured in a way that it will always send a HIGH signal through that link. One drawback of such solution is that the link will be "blocked" and cannot be used for other routing purposes. Another drawback is that, due to fabrication defects, that link might not be able to provide a HIGH signal. A second solution, which removes all the drawbacks of the first one, but that is more expensive, is to use 24 configuration bits to enable/disable the 24 incoming signals from the crossbar links by (N)ANDing. For connecting the interface with the sixteen NanoPads belonging to each unit cell, 16 transmission gates and a 4-to-16 decoder with 4 configuration registers will be required. Summarizing, the interface will consist of the following circuitry, #### 1. The schematic shown in Fig. 4.4. Figure 4.5 Simulation result. - 2. 16 transmission gates with one 4-to-16 decoder and 4 configuration registers. - 3. 24 registers for selecting the input of the NAND gate. #### 4.4 Simulation Results The proposed interface was laid out using a 0.18 $\mu$ m CMOS technology. For post-layout simulation purpose, a 24-input NAND gate was used which implies N=25. Six such interfaces were interconnected together and simulated. A circuit simulation result is presented in Fig. 4.5. In that simulation, the uIC connected to NanoPad1 outputs a LOW, while other uICs continue keeping their respective NanoPads released. As a result, voltage of NanoPad1 drops below 150 mV while the voltage levels of NanoPad2-6 drop to 245 mV (for readability, only voltages of NanoPad1 and NanoPad2 are shown in Fig. 4.5). Monte Carlo dc-simulation were performed (1000 runs) in different process corners to investigate the effects of transistor mismatches. Lower and upper bounds of both $V_{\rm OL}$ and $V_{\rm OL}$ are reported in Table 4.1. There is a safety margin (minimum difference between lower bound of $V_{\rm OL}$ and upper bound of $V_{\rm OL}$ ) of about 63 mV to prevent latching up in different process corners. The decoder and the registers were instantiated as VHDL descriptions. The area of the synthesized circuitry is shown in Table 4.2. Each unit cell has a dimension of 560 $\mu$ m × 560 $\mu$ m. Thus, the proposed interface will occupy less than 1% area of the unit cell. Operating voltage of I<sup>2</sup>C bus devices, currently in the market, range in supply voltage from 1.5 V to 5 V. According to the specification of the I<sup>2</sup>C protocol, the minimum $V_{\rm IH}$ is 0.7 $V_{\rm DD}$ [14]. 1.8 V for HIGH can meet the specification for $V_{\rm IH}$ as long as 1.8 V > 0.7 $V_{\rm DD}$ . That Table 4.1 Corner Simulation | Corner | $V_{\mathrm{OL}\_1}$ | (mV) | $V_{\rm OL\_2}~({\rm mV})$ | | | |----------|----------------------|-------|----------------------------|-------|--| | | Lower | Upper | Lower | Upper | | | | bound | bound | bound | bound | | | NOM | 68 | 153 | 230 | 245 | | | FAST | 70 | 167 | 248 | 279 | | | SLOW | 74 | 162 | 225 | 250 | | | FASTSLOW | 71 | 157 | 223 | 247 | | | SLOWFAST | 68 | 161 | 243 | 274 | | Table 4.2 Area of the interface. | Blocks | Area $(\mu m^2)$ | |-------------------------------------------------------|------------------| | Schematic shown in Fig. 4.4 | 500 | | 16 transmission gates with one 4-to-16 | 900 | | decoder and 4 configuration registers | | | 24 registers for selecting the input of the NAND gate | 1700 | | Total | 3100 | implies that the proposed interface can operate with standard I<sup>2</sup>C bus devices as long as $V_{\rm DD}$ of those devices are less than approximately 2.5 V. On the lower end, the interface can operate with standard I<sup>2</sup>C bus devices as long as 0.245 V < 0.3 $V_{\rm DD}$ which implies that the device $V_{\rm DD}$ must be greater than 0.8 V. #### 4.5 Conclusion This paper presents an interface that can support devices exploiting the $I^2C$ bus structure. It can provide a very attractive solution for supporting $I^2C$ bus in WaferBoard<sup>TM</sup>. A drawback of the proposed interface is that it can only support $I^2C$ bus devices as long as device $V_{DD}$ is less than 2.5 V. That range can be augmented by utilizing the 3.3 V power supply which is available in every unit cell of the Wafer $IC^{TM}$ . To the best of our knowledge, an interface circuit that mimics the behavior obtained with a metal line for a $I^2C$ bus has not been reported yet. There are some commercial $I^2C$ bus extension buffers named P82B96 [74] and PCA9600 [75]. These two buffers have some similarities with our proposed interface in terms of their use of double interpretation of voltage level below 0.3 $V_{\rm DD}$ to avoid latching problems. # CHAPTER 5 ARTICLE 2: AN INTERFACE FOR OPEN-DRAIN BI-DIRECTIONAL COMMUNICATION IN FIELD PROGRAMMABLE INTERCONNECTION NETWORKS #### Summary of the Chapter This chapter presents an enhanced version of the open-drain interface circuit and the interconnection topology that has been presented in Chapter 4. The enhanced interconnection topology reduces the interconnection complexity from $\Theta(n^2)$ (in Chapter 4) to $\Theta(n)$ , where n is the number of interconnected open-drain I/Os. The interface unit and the interconnection topology has been validated by an on-silicon implementation in a 0.13 $\mu$ m CMOS technology. Designed according to the specification of $I^2C$ protocol, the interface unit can support $I^2C$ Fast-mode Plus with 3.4 Mbit/s. The measurement/validation results were published on September, 2015 as a journal paper in IEEE Transactions on Circuits and Systems I: Regular Papers. The published paper is reproduced in this chapter. Title: An Interface for Open-Drain Bi-Directional Communication in Field Programmable Interconnection Networks Wasim Hussain, Yves Blaquière, Yvon Savaria. IEEE Transactions on Circuits and Systems I: Regular Papers. #### Abstract An open-drain interface circuit and a corresponding interconnect topology is proposed to support bi-directional communication in a field programmable interconnection network (FPIN), similar to those implemented in field programmable gate arrays (FPGAs). The proposed interface can interconnect multiple nodes in a FPIN. With that interface, the interconnection network imitates the behaviour of open-drain (or open-collector) buses (e.g., those following the I<sup>2</sup>C protocol). Thus, multiple open-drain I/Os from external integrated circuits (ICs) can be connected together through the FPIN by the proposed interface circuit. The interface that has been fabricated in a 0.13 $\mu$ m CMOS technology takes 65 $\mu$ m × 22 $\mu$ m per pin. Test results show that several instances of this interface can be interconnected through the proposed interconnect topology. We implemented and tested the topology combining six open-drain I/Os. The interconnect has propagation delays of approximately $0.26 \cdot n + 51$ ns and $0.26 \cdot n + 94$ ns for rising and falling edge transitions respectively, when each pin has a capacitance of $15 \,\mathrm{pF}$ , where n is the number of interconnected interfaces. These delays and the propagation delays of the FPIN limit the maximum number of interface circuits that can be interconnected for a given communication speed (I<sup>2</sup>C Fast-mode Plus with $3.4 \,\mathrm{Mbit/s}$ ). #### Keywords FPGA, active reconfigurable platform, wafer scale integration (WSI), I<sup>2</sup>C bus, open collector bus, bi-directional bus. #### 5.1 Introduction Field programmable interconnection networks (FPINs) are the backbone of field programmable gate arrays (FPGAs), prototyping platforms [5–7,76], and Network-on-Chip architectures [8]. Most hardware functions can be emulated in FPGAs by re-programming their embedded FPIN [77,78]. Hardware systems used for logic emulation can enhance their capability and performance by having multiple FPGAs connected together [10]. Fig. 5.1 illustrates an example where an FPIN provides programmable interconnections between endpoints (I/O or configurable logic blocks) in an FPGA. An active reconfigurable platform was proposed in [1]. It is intended to be an alternative to PCBs for providing interconnections among multiple integrated circuits (ICs) for testing and prototyping of an electronic system. This active reconfigurable platform can be seen as an active silicon interposer with an interconnection network that can be dynamically configured like an FPGA. The active reconfigurable platform has an uni-directional switch box based FPIN that can be programmed by the user to interconnect the component ICs. It is primarily designed to provide digital interconnection between component ICs randomly and manually deposited on its active surface. However, this platform cannot support opendrain bi-directional buses where the direction is embedded in the protocol, as found in the I<sup>2</sup>C protocol and its derivatives [13–16]. Open-drain connections have the unique ability to *simultaneously* support multiple drivers on a single *physical* node. Unlike CMOS driver logic, there is no possibility of undefined state in open-drain connections. Indeed, no matter how many I/Os are connected to the bus, if only one of them outputs a LOW on the bus, the bus will become LOW. Open-drain connections are not advantageously used internally in ICs, due to their static power dissipation and relatively low speed. However, they are commonly used to interconnect several ICs, because they usually require fewer IC pins for serial communications between ICs. Multi-master bi- Figure 5.1 Generic model of an FPIN in an FPGA. directional buses cannot be implemented by CMOS drivers, because having multiple CMOS drivers driving a single physical node can give rise to undefined voltage levels on the bus. By contrast, multi-master bi-directional buses can be realized by open-drain connections, e.g. I<sup>2</sup>C and its derivatives [15,16]. This work was motivated by the observation that FPINs based on uni-directional switch boxes cannot support open-drain bi-directional connections. This paper presents an interface for FPINs to support protocols that demand open-drain (or open-collector) connections. The proposed interface can link multiple external signals through the FPIN, while imitating the behaviour of open-drain (or open-collector) connections. That interface allows connecting together arbitrarily large number of pins, subject to delay limitations. To the best of our knowledge, no comparable interface circuit mimicking the behaviour of an open-drain connection has been reported in the literature. The closest existing circuits that we found are the P82B96 [23] and PCA9600 [24], two commercial $I^2C$ bus extension buffers. Even though these circuits are not equivalent to the proposed interface, they have some similarity in their use of double interpretation voltage levels below $0.3\,V_{\rm DD}$ to avoid a state-latching phenomenon (explained in Sec. 5.3.2). Sec. 5.2 provides some background on an FPIN-based active reconfigurable platform and open-drain buses. Sec. 5.3 describes the proposed interface and presents a delay model that can be used to design the interface unit according to communication speed specifications. Sec. 5.4 presents measurement results from a test-chip that was implemented. Finally, Sec. 5.5 concludes the work by summarizing our main contributions and key observations. #### 5.2 Background # 5.2.1 Active Reconfigurable Platform [1] The core of the active reconfigurable platform is a wafer scale IC upon which component ICs are to be deposited. The surface of the wafer scale IC has a dense array of very fine (tens of microns) conducting pads acting as configurable I/Os (CIOs), as shown in Fig. 5.2. An FPIN is embedded in the wafer scale IC. The FPIN can be configured, similar to an FPGA, to connect any two CIOs. User specified ICs are to have physical contacts with the CIO and communicate through the embedded FPIN. Each CIO has its own configurable I/O buffers. If a CIO is to operate as an input, then the respective CIO is configured as an input and this buffer receives the signal from a *source* IC and propagates it through the FPIN to the destination CIO. The destination CIO's I/O buffer is configured as an output buffer and it propagates the signal to the corresponding *destination* IC. ### 5.2.2 Open-drain Connection Based Communication The I<sup>2</sup>C protocol is a popular communication standard. It is a bi-directional multi-master serial bus developed by NXP Semiconductors (formerly Philips Semiconductors). It uses opendrain connections. I<sup>2</sup>C is used in various control architectures such as the System Management Bus (SMBus), the Power Management Bus (PMBus), the Intelligent Platform Management Interface (IPMI), the Display Data Channel (DDC) and the Advanced Telecom Computing Architecture (ATCA) [13–16]. I<sup>2</sup>C uses two bi-directional open-drain (or open-collector) lines named Serial Data Line (SDA) and Serial Clock Line (SCL), shown in Fig. 5.3. SDAs and SCLs of all components are respectively connected together. Both lines have external pull-up resistors. The I<sup>2</sup>C protocol has no explicit signal to specify the direction of data transfer in the bus. Rather, there are some rules embedded in the protocol, like *clock synchronization*, *arbitration and clock stretching* [14] by which all the ICs connected to a bus determine when they are supposed to write into the bus, read from the bus or stay idle. All those rules are based on the "wired-AND" property of open-drain connections. #### 5.3 Proposed Architecture of the Bi-Directional Interface An open-drain bi-directional interface unit is proposed here by the authors. It is designed to meet the following criteria: — Be compatible to an uni-directional switchbox based FPIN. Minimizes modifications Figure 5.2 Hierarchical description of the active reconfigurable platform, from system level to configurable I/O (CIO). Figure 5.3 Example of an I<sup>2</sup>C-bus configuration. to an existing FPIN, i.e. the interface circuit should be integrated at the I/Os of the FPIN; - Imitate the behaviour of a single metal line for open-drain (or open-collector) connection where the direction of the signal is automatically detected; - Allows interconnecting several open-drain I/Os together. Each interface unit has an input and an output through which several interface units can be interconnected in a pre-defined interconnection topology. A bi-directional interface based on a star topology was previously proposed by the authors [69]. In that topology, each interface unit directly communicates with all the others. This leads to the simplest design when a small number of pins need to be connected. Direct connections also minimize delays. However, the star topology has an interconnection complexity of $\Theta(n^2)$ for n interface units. For instance, the case where five interface units are interconnected in a star topology is shown in Fig. 5.4(a). It shows that each interface unit is directly connected with the other four. In the case of the active reconfigurable platform [1], these connections are done through the FPIN. In this platform, the logic connected to a pin can receive at most 24 incoming signals through the FPIN, implying that at most 25 interface units can be interconnected together. As an $\Theta(n^2)$ complexity gets very expensive when n grows, and to overcome the limit on the value of n due to the fan-in of the unit cells, a topology with an $\Theta(n)$ complexity was developed and is reported in the rest of this paper. That new interconnection topology is structured as a ring, as shown in Fig. 5.4(b). Mimicking the behaviour of open-drain (or open-collector) connection through a digital FPIN may lead to a state-latching phenomenon. This can be explained by a minimal example of two interface circuits defining a minimal solution proposed in Sec. 6.3.2 & 5.3.2. That minimal solution was enhanced and adapted to a star topology by the authors in [69]. In this paper, the minimal solution described in Sec. 6.3.2 & 5.3.2 is enhanced and adapted to the ring interconnection topology in Sec. 5.3.3 & 5.3.4. #### 5.3.1 Working Principle of the Bi-Directional Interface When a group of open-drain drivers (ODDs) are to be interconnected by a FPIN, instead of being physically connected by a wire, each ODD output has physical connection with the BDIO node of only one interface unit. BDIO denotes the physical node that acts as the bi-directional input and output node of the interface unit. Thus, each interface must be able to sense the voltage on the respective ODD, in order to interpret the information it conveys and send it to the other interface units through the FPIN. A tentative schematic of the interface unit is shown in Fig. 5.5(a). Instead of a pull-up resistor (used in $I^2C$ [14]), a pull-up pMOS is used ( $V_{\rm BIAS}$ is a biasing voltage that enables the pull-up pMOS). As will be shown, when such interface units are interconnected through a FPIN, the resulting group of I/Os can emulate an open-drain bus if the LOW Detector and ODD LOW Decoder modules are suitably designed. In order to understand the rationale of how the proposed circuit operates, let us first consider the case where only two such interface units are connected through a FPIN, as shown in Fig. 5.5(b). In that case, each interface unit's $ODD\ LOW\ Decoder$ receives signals from the other interface unit through the FPIN to determine whether the other interface unit's ODD is outputting a LOW. The $LOW\ Detector$ module detects the voltage level at its own BDIO node and sends that information to the other interface unit. When there are only two interconnected interface units, a "NOT-gate" can serve the purpose of $ODD\ LOW\ Decoder$ and a simple digital buffer can serve as a $LOW\ Detector$ . When none of the ODD outputs LOW, voltage levels of both BDIOs are held at $V_{DD}$ by their respective pull up pMOS. Thus, both BDIOs send HIGH to each other and the respective internal pull-down nMOS remain OFF, in which case the BDIOs continue to be held at $V_{DD}$ . Standard I<sup>2</sup>C drivers can sink several mili-amperes (Table 5.1). The pull-up pMOS (M<sub>PU</sub> in Figure 5.4 Each *circle* represents an interface unit circuit. (a) Proposed bi-directional interface. (b) Two interface units interconnected together (c) The $LOW\ Detector$ to remove the latching through an FPIN. Figure 5.5 Development of the bi-directional interface unit circuit. Fig. 5.5(a)) is sized so that the pull-up current is less (approximately one-third) than the pull-down current of standard open-drain drivers (e.g. the I<sup>2</sup>C protocol and its derivatives). Thus, when one of the ODD outputs a LOW, the corresponding BDIO becomes LOW. Let us assume ODD1 outputs LOW in Fig. 5.5(b) and BDIO<sub>1</sub> is made LOW. It is also assumed that ODD2 is not outputting a LOW. Since BDIO<sub>1</sub> is LOW, LOW logic value will be sent through the FPIN to *Interface Unit*-2. That LOW is made HIGH by the NOT-gate that turns ON the internal pull-down nMOS of *Interface Unit*-2. Thus, BDIO<sub>2</sub> is made LOW, even though ODD2 is not driving it LOW. The opposite would have happened if instead of ODD1, ODD2 outputs LOW. #### 5.3.2 State-Latching Phenomenon The bi-directional interface shown in Fig. 5.4 and the minimal circuit example in Fig. 5.5 suffer from a *state-latching* problem. Indeed, when BDIO<sub>2</sub> becomes LOW, it will also send a LOW signal through the FPIN to *Interface Unit-1*, and the internal pull-down nMOS of *Interface Unit-1* will also turn ON. Thus, when ODD1 turns OFF, the voltage level of BDIO<sub>1</sub> will be held LOW by the internal pull-down nMOS of *Interface Unit-1* and will not be pulled up to $V_{\rm DD}$ . The approach taken to solve that latching problem in [69] was to break the latching loop. This was done by defining two distinct voltage levels for the LOW logic value on the BDIOs (Table 5.2). In the I<sup>2</sup>C protocol, $V_{\rm IL}$ (the allowed maximum voltage level to represent a LOW logic value) is $0.3 \times V_{\rm DD}$ [14]. At this point, we introduce two reference voltages, named $V_{\rm REF1}$ and $V_{\rm REF2}$ , both of which are below $0.3 \times V_{\rm DD}$ (these two voltages will be generated by a resistor-divider elaborated in Fig. 5.10). When the BDIO is pulled down by an ODD, the voltage level is pulled down to a value that is below $V_{\rm REF1}$ . The pull-down nMOS (and pull-up pMOS) is designed in such a way that when it pulls the BDIO down, the voltage level is pulled down to a value of $V_{\rm REF2}$ that is above $V_{\rm REF1}$ . In that case, a comparator circuit such as the one proposed in Fig. 5.5(c) can have different logical interpretations between a LOW logic value driven by an ODD and the one driven by the internal pull-down nMOS. However, a standard bi-directional bus would interpret both voltages as a LOW logic value, i.e. $V_{\text{REF1}} < V_{\text{REF2}} < V_{\text{IL}}$ . This allows breaking the "logical loop" that would otherwise result from the circuit in Fig. 5.5(b). The desired functionality is obtained with a differential pair $(M_{1,2,5,6,9})$ , shown in Fig. 5.5(c). The second differential $(M_{3,4,7,8,10})$ pair is used only for amplification and level shifting purpose to make the whole circuit robust against process variations. When the voltage at the BDIO is below $V_{\text{REF1}}$ $(V_{\text{REF1}}$ can be considered as the tripping voltage of the differential pair), the LOW Detector BusSink current (mA)Conndition $I^2C$ Standard-mode3 [14] $I^2C$ Fast-mode3 [14] $I^2C$ Vol. = 0.4 V Table 5.1 Pull-down current of open-drain buses. | Table 5.2 Different | states with re- | espect to the | voltage level | of the BDIO | node. | |---------------------|-----------------|---------------|---------------|-------------|-------| | | | | | | | 4 [16] | Logic | State | Voltage level | | |-------|-------------|-------------------------------------------|--| | LOW - | ODD LOW | $V_{\mathbf{BDIO}} < V_{\mathrm{REF1}}$ | | | | Other ODD | $V_{ m REF1} < V_{ m BDIO} < V_{ m REF2}$ | | | HIGH | All ODD OFF | $V_{ m REF2}$ $<$ $V_{ m BDIO}$ | | will send LOW, else it will send HIGH to other interface units. **SMBus** Let us reconsider the circuit of Fig. 5.5(b) where the circuit of Fig. 5.5(c) is used as LOW Detector. Assuming ODD1 outputs LOW to BDIO<sub>1</sub>, the voltage of BDIO<sub>1</sub> drops below $V_{\rm REF1}$ and Interface Unit-1 sends a LOW signal to the ODD LOW Decoder of Interface Unit-2 through the FPIN. As a result, the internal pull-down nMOS of Interface Unit-2 is turned ON and the voltage level of BDIO<sub>2</sub> is pulled down to $V_{\rm REF2}$ that is interpreted as LOW by ODD2. However, since that voltage level is not below $V_{\rm REF1}$ , Interface Unit-2 does not send LOW to Interface Unit-1 and the internal pull-down driver of Interface Unit-1 does not turn ON. Subsequently, when ODD1 releases BDIO<sub>1</sub>, the voltage level of BDIO<sub>1</sub> will be pulled up to $V_{\rm DD}$ without any unambiguity, and the state-latching phenomenon is avoided. Thus, the two interconnected interface units imitate the behaviour of an open-drain bus, even though internally the BDIOs are loop-connected through the FPIN but not by any direct metal line. #### 5.3.3 The Ring-Interconnection Network of the Bi-Directional Interface Similar to the minimal example in Sec. 6.3.2 & 5.3.2, each interface unit in a ring (Fig. 5.4(b)) can be in one of three conditions (see Table 5.2) depending whether: - 1. the ODD directly connected to the interface drives LOW; - 2. another ODD connected to an interface that is part of the same network drives LOW; - 3. none of the ODD drives its interface LOW. Thus, the same *LOW Detector* module of Fig. 5.5(c) can be used to differentiate between a LOW logic value driven by a ODD and the one driven by the internal pull-down nMOS Figure 5.6 Development of pseudo-ring interconnection topology. Each *circle* represents an interface unit circuit and is labelled IU#. in each interface unit. However, in a ring-interconnected topology, each interface unit can communicate with only one other interface unit if implemented as shown in Fig. 5.5. Hence, the *ODD LOW Decoder* module has to be enhanced to communicate these three conditions to the *next* interface unit in a ring. Considering the three conditions that each interface must support and communicate, at least *two* bits of information must be communicated in a digital implementation to unambiguously differentiate between the three possible conditions. A consideration that influences the solution proposed next is the fact that the prototyping platform [1] for which this is elaborated offers a very large number of configurable digital interconnects. A possible first-step toward a feasible ring-structure solution is to establish two separate rings, as shown Fig. 5.6(a). For clarity, each interface unit participating in an emulated bidirectional bus is labelled as IU#. In the proposed design, a first ring (dashed ring) could communicate whether one or more of the ODDs are outputting a LOW, while the second ring (solid ring) would act upon the information broadcasted by the first ring, to propagate an internal pull-down driver activation signal accordingly. As the two rings constitute closed loops, if any ODD connected to an interface unit (Fig. 5.6(a)) outputs a LOW, assuming that all interface units are *exactly* the same, that information would be sent to the subsequent interface units and it would indefinitely 'circulate' through the two rings. This would give rise to a state-latching phenomenon conceptually similar to the one described in Sec. 5.3.2. A possible second-step toward a practical solution is to break the two rings, as shown in Fig. 5.6(b), to prevent this unwanted endless 'circulation'. Since the second ring is to act upon the information propagated by the first ring, the two broken rings must be connected together. That role is played by an additional interface unit, called the Master unit (labelled MU in Fig. 5.6(c)). The resulting topology, shown in Fig. 5.6(c), is called a pseudo-ring. Assuming suitable logic and interfacing circuits can be elaborated, this solution, first proposed here, would offer $\Theta(n)$ interconnection complexity, and $\Theta(1)$ ODD LOW-Decoder complexity. In this topology, each interface unit, with the exception of the MU, is connected to an external ODD through the corresponding BDIO. The target prototyping platform is a completely regular structure, thus our objective was to come up, if possible, with a design where the MU could be derived by configuring differently the same logic as in the other IUs. This was found possible if as in Fig. 5.6(c), IU1 and MU receive a predetermined logic value at their $I_1$ and $I_2$ input respectively. The dashed ring path passing through the $I_1$ and $O_1$ terminals of all interface units, from IU1 to MU, propagates the information whether one or more ODD are outputting a LOW to their respective BDIO. The solid path passing through $I_2$ and $O_2$ form a signal path propagating from MU to IU5 in Fig. 5.6(c). The $I_2$ - $O_2$ path propagates the internal pull-down driver activation signal. MU acts as a bridge between these two signal paths. Each interface unit has an internal bit (called $I_3$ ) that becomes LOW when the voltage level at the respective BDIO drops below $V_{\text{REF}}$ -1. The voltage level drops below $V_{\text{REF}}$ -1 if and only if the external ODD pulls it down, while it drops to $V_{\text{REF}}$ if the internal driver pulls it down. The logical relations between these binary variables in each interface unit are, $$O_1 = I_3 \cdot I_1$$ and $O_2 = I_1 \cdot I_2$ (5.1) Applying Eq. 5.1 to Fig. 5.6(c), we get the logical signal flow diagram of Fig. 5.7(a). From Fig. 5.7(a), we get for any $1 \le n \le 5$ (subscript i, j denotes the variable belonging to IUj, - (a) Pseudo-ring interconnection topology. - (b) Modified pseudo-ring interconnection topology. (c) Signal flow of queue-interconnection topology. Figure 5.7 Logical signal flow diagram. Low Detector module of each interface unit (IU#) is labelled LD. Each BDIO node belongs to the respective interface unit (IU) and represents distinct physical nodes. and MU denotes the variable belonging to module MU), $$O_{1,n} = I_{3,1} \cdot I_{3,2} \dots I_{3,n-1} \cdot I_{3,n} \tag{5.2}$$ Thus, it can be seen that $I_{1,MU}$ (= $O_{1,5}$ in Fig. 5.7(a)) is the equivalent "wired-AND" logic implementation of an open-drain connection. Applying Eq. 5.1 to Fig. 5.7(a), we get, Thus, the $I_2$ - $O_2$ path propagates the "wired-AND" logic value to all interface units and $O_2$ can be used to activate/deactivate their respective internal pull-down drivers. Eq. 5.3 also proves that when all the ODDs output a HIGH logic value to their respective BDIOs by releasing the BDIO nodes, the $I_2$ - $O_2$ path will unequivocally begin to propagate a HIGH logic value and hence the aforementioned state-latching phenomenon is prevented. The $I_2$ - $O_2$ path propagates the accumulated AND of all $I_3$ and hence the AND operation of $I_1$ along the $I_2$ - $O_2$ path does not change the logical value that propagates along the $I_2$ - $O_2$ path (Eq. 5.3). Thus, using a digital buffer in the $I_2$ - $O_2$ path would have sufficed. However, the interface unit has been developed to be integrated in each unit cell of the active reconfigurable platform [1]. Remarkably, the same cell can also be used as the Master unit (MU in Fig. 5.7) when necessary by utilizing an unused interface unit from an unused unit cell. Hence, instead of a digital buffer, an AND-gate was used in the $I_2$ - $O_2$ path. At first glance, using MU may seem redundant, because we could have connected $O_{1,5}$ to $I_{2,1}$ directly. However, using a Master unit (MU) gives us the ability to interconnect two such networks. This allows halving the worst-case propagation delays (analysis elaborated in Sec. 5.3.4). Figure 5.8 Logical signal flow diagram of dual-queue interconnection topology. Two individual queue network are joined together. Each queue network have five interface units. Four interface units (labelled IU#) are connected to external ODD and one *Master unit* (labelled MU). *Low Detector* module of each interface unit (IU#) is labelled as LD. # 5.3.4 Queue and Dual-Queue Interconnection Topologies The previous design outlined in Fig. 5.6(c) achieves the desired $\Theta(n)$ interconnect complexity. But the signal goes around the loop twice. This section calculates the propagation path length and hence, shows how the corresponding delay can be halved. Indeed, according to Eq. 5.3, the AND operation of $I_1$ along the $I_2$ - $O_2$ path does not change the logical value that propagates along that path. The functionality would thus be preserved if the direction of signal propagation on the $I_2$ - $O_2$ path is reversed clockwise as shown in Fig. 5.7(b). If the ring-like structure of Fig. 5.7(b) is 'unrolled', it becomes a queue, as shown in Fig. 5.7(c). This organization is called the queue interconnection topology. Similar to the pseudo-ring topology, whenever one or more ODD outputs a LOW, that LOW propagates through the $I_1$ - $O_1$ path and MU passes that LOW to the $I_2$ - $O_2$ path. The unused $I_2$ of MU can also be used to propagate a LOW to the $I_2$ - $O_2$ path from the $I_1$ - $O_1$ path of another queue network to activate the internal pull-down drivers. Hence, the unused $I_2$ and $O_1$ of MU in a queue network can be used to connect two individual queue networks together, as shown in Fig. 5.8. If one or more ODD of Queue Network-1 outputs a LOW, that LOW will propagate through the $I_1$ - $O_1$ path of the Queue Network-1 and will then pass through MU1 to the $I_2$ - $O_2$ path of Queue Network-2 outputs a LOW, that LOW will propagate through the $I_1$ - $O_1$ path of Queue Network-2 outputs a LOW, that LOW will propagate through the $I_1$ - $O_1$ path of Queue Network-2 and will pass through MU2 to the $I_2$ - $O_2$ path of Queue Network-2 and then to the $I_2$ - $O_2$ path of Queue Network-1. Thus, two individual $I_1$ - $O_1$ - $I_2$ - $O_2$ signal paths are established by MU1 and MU2 that propagate LOW and HIGH to each other when necessary and hence, imitates the "wired-AND" logic of open-drain connection. In a queue interconnection topology, the signal propagates through the entire length of $I_1$ - $O_1$ and $I_2$ - $O_2$ path (thick grey line in Fig. 5.7(c)). By contrast, in the dual-queue interconnection topology, interface units are divided equally in two groups. In this case, the signal propagates through the individual $I_1$ - $O_1$ and $I_2$ - $O_2$ paths only (solid and dotted thick grey lines in Fig. 5.8). After reaching MU1 in Fig. 5.8, the signal propagates simultaneously along the $I_2$ - $O_2$ path of Queue Network-1 (dotted line) and the $I_2$ - $O_2$ path of Queue Network-2 (solid line). Thus, the worst case propagation delay in halved in the dual-queue interconnection topology. #### 5.3.5 Proposed Bi-Directional Interface Based on previous proposals, considerations and discussions, it is now possible to propose an implementation for a bi-directional interface that can interconnect several bi-directional opendrain I/Os in pseudo-ring, queue or dual-queue topology through a FPIN. The schematic of the interface unit is shown in Fig. 5.9. According to Eq. 5.3, $O_2$ (or $\overline{O_2}$ ) propagates the "wired-AND" logic value. Hence, $O_2$ is used to activate/deactivate the *Unity-gain Buffer* in Fig. 5.9. In fact, $\overline{O_2}$ is used because the *Unity-gain Buffer* is activated when a HIGH value is applied as $BUFF_{EN}$ . Upon activation, the *Unity-gain Buffer* propagates $V_{REF2}$ to the BDIO node. When deactivated, the *Unity-gain Buffer* in Fig. 5.9 outputs 3.3 V by a pull-up pMOS to the BDIO node and hence, the *Unity-gain Buffer* is acting as the internal pull-down driver as well as the pull-up pMOS. When the external ODD outputs a LOW, the voltage at the BDIO falls below $V_{\text{REF1}}$ and $I_3$ is made LOW by the LOW Detector. ODD LOW Decoder represents the logical behaviour among $I_1$ , $I_2$ , $I_3$ , $O_1$ , and $O_2$ of the interface units shown in Fig. 5.7, 5.8, 5.9. Hence, the interface unit of Fig. 5.9 can be interconnected in the pseudo-ring, queue or dual-queue interconnection topologies and will imitate the "wired-AND" logic of open-drain buses. # 5.3.6 Propagation Delay of Dual-Queue Interconnection Topology A propagation delay model is developed for the dual-queue topology in this sub-section. Only this topology is analyzed because it has the lowest (best) propagation delay. Similarly, delay models can be developed for the pseudo-ring and queue topology. At this point, we establish a notation system to denote delays and rise/fall times associated with various circuit components or path segments in the entire propagation path. $\tau$ is used to denote various delays and t is used to denote rise/fall times. Subscripts have two indices. The first index denotes the logic value to which the delay corresponds. The second index denotes the interface unit or path segments to which the delay or rise/fall time belongs to. For example, the worst-case propagation delay for LOW and HIGH logic value is denoted by $\tau_{L,\text{wc}}$ and $\tau_{H,\text{wc}}$ respectively. The worst-case signal propagation path of the dual-queue network is shown by the solid thick grey line in Fig. 5.8. The path begins at IU1 and ends at IU8 (IUn in general). The worst-case propagation delay can be divided in three delay segments: 1. The first delay segment is associated with the interface unit (IU1) to detect the voltage transition at the BDIO node and *encode* that information to be sent to other interface units. It is called the detection delay ( $\tau_{L,\text{det}}$ or $\tau_{H,\text{det}}$ ). Figure 5.9 Schematic of the interface unit (IU). | m 11 $r$ 0 | T 1 | 1 . | · /c 11 | | c | . 1 | • , , | • • • • • • • • • • • • • • • • • • • • | |------------|--------|--------|-------------|---------|----|------|-----------|-----------------------------------------| | Table 5.3 | Delays | and ri | ISE / TA II | times | Ot | the | interface | circilit | | 10010 0.0 | DCIGyb | and ii | isc / Idii | UIIIICO | O1 | ULIC | monaco | cii cui o. | | Signal | HIGH | LOW | |-----------------------------------|------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | propagation | logic | logic | | path | value | value | | $I_1 \Rightarrow O_1$ | $\tau_H^{I_1 \Rightarrow O_1}$ | $\tau_L^{I_1 \Rightarrow O_1}$ | | $I_3 \Rightarrow O_1$ | $\tau_H^{I_3 \Rightarrow O_1}$ | $\tau_L^{I_3 \Rightarrow O_1}$ | | $I_1 \Rightarrow O_2$ | $\tau_H^{I_1 \Rightarrow O_2}$ | $\tau_L^{I_1 \Rightarrow O_2}$ | | $I_2 \Rightarrow O_2$ | $\tau_H^{I_2 \Rightarrow O_2}$ | $ au_L^{I_2 \Rightarrow O_2}$ | | $RDIO \rightarrow I_{-}$ | $\_BDIO \Rightarrow I_3$ | $\tau_L^{BDIO \Rightarrow I_3}$ | | $DDIO \rightarrow I_3$ | $ au_H$ | $ au_L$ | | | | | | $\overline{O_2} \Rightarrow BDIO$ | $t_{\rm r}^{\overline{O_2} \Rightarrow BDIO}$ | $t_{\rm f}^{\overline{O_2} \Rightarrow BDIO}$ | | | - | 1 | | $ODD \Rightarrow BDIO$ | $N/A$ $t_{\alpha}^{Q}$ | DDD⇒BDIO | | | | | | $O_{1,2} \Rightarrow I_{1,2}$ | $ au_{r}$ , | DIN | | C 1,2 / 11,2 | ' L,I | PIN | | | propagation path $I_1 \Rightarrow O_1$ $I_3 \Rightarrow O_1$ $I_1 \Rightarrow O_2$ | propagation logic value $I_1 \Rightarrow O_1$ $\tau_H^{I_1 \Rightarrow O_1}$ $I_3 \Rightarrow O_1$ $\tau_H^{I_3 \Rightarrow O_1}$ $I_1 \Rightarrow O_2$ $\tau_H^{I_1 \Rightarrow O_2}$ $I_2 \Rightarrow O_2$ $\tau_H^{I_2 \Rightarrow O_2}$ $BDIO \Rightarrow I_3$ $\tau_H^{BDIO \Rightarrow I_3}$ $\overline{O_2} \Rightarrow BDIO$ $t_r^{\overline{O_2} \Rightarrow BDIO}$ $ODD \Rightarrow BDIO$ N/A $t_f^{\overline{O_2}}$ | - 2. The second delay segment is associated with the transmission of that encoded information through $I_1$ - $O_1$ - $I_2$ - $O_2$ path. It is called the transmission delay ( $\tau_{L, \rm tr}$ or $\tau_{H, \rm tr}$ ) - 3. The third delay segment is associated with the decoding of that information and subsequent activation of the internal pull-down driver of IU8. It is called the activation delay $(\tau_{L,\text{act}} \text{ or } \tau_{H,\text{act}})$ . Thus worst case propagation delays for the dual queue topology can be expressed as, $$\tau_{L,\text{wc}} = \tau_{L,\text{det}} + \tau_{L,\text{tr}} + \tau_{L,\text{act}}$$ (5.4a) $$\tau_{H,\text{wc}} = \tau_{H,\text{det}} + \tau_{H,\text{tr}} + \tau_{H,\text{act}} \tag{5.4b}$$ Each of the aforementioned three delay segments consists of one or multiple circuit component delays. For example, when ODD is activated, it takes some time to bring down the voltage level from HIGH to LOW. Subsequently, the LOW Detector (LD in Fig. 5.8) will require some time to detect the LOW logic value at the BDIO node and produce a LOW logic value at $I_3$ . After that, the LOW logic value propagates through the $I_1$ - $O_1$ - $I_2$ - $O_2$ path. This path consists of AND-gates of ODD LOW Decoders. All these AND-gate delays are categorized in Table 5.3. The definition of these delays will be gradually introduced in the following explanation. At this point, we introduce the signal propagation path as superscript in the delay term to denote the component to which the delay term belongs to. For example, $\tau_{L,IU2}^{I_1 \Rightarrow O_1}$ denotes the LOW logic value propagation delay of the AND-gate from $I_1$ to $O_1$ in IU2. Since Table 5.3 categorizes the various circuit component delays, the second index in the subscript of the delay or rise/fall time is kept empty. #### LOW Logic Propagation Delay The worst-case propagation path for LOW logic value begins from the ODD connected to IU1. The first delay is the time $(t_{\rm f}^{ODD\Rightarrow BDIO})$ required by the ODD to bring the voltage level from HIGH to LOW at the BDIO node of IU1. $t_{\rm f}^{ODD\Rightarrow BDIO}$ is defined as the time required by the ODD to bring the voltage level of the BDIO node from $V_{\rm DD}$ to $V_{\rm REF}$ . Then the LOW Detector (LD in Fig. 5.10) of IU1 will require some time $(\tau_L^{BDIO\Rightarrow I_3})$ to detect the LOW logic value at the BDIO node and produce a LOW logic value at $I_3$ . $\tau_L^{BDIO\Rightarrow I_3}$ is measured only between the crossing of $V_{\rm REF}$ by the voltage of BDIO node and the HIGH-to-LOW transition in $I_3$ because $V_{\rm DD}$ to $V_{\rm REF}$ transition depends on the ODD (external I<sup>2</sup>C driver). Then the LOW logic value will propagate through the AND-gate of IU1 from $I_3$ to $O_1$ . Together, these three delays constitute $\tau_{L,\rm det}$ . $$\tau_{L,\text{det}} = t_{f,\text{IU1}}^{ODD \Rightarrow BDIO} + \tau_{L,\text{IU1}}^{BDIO \Rightarrow I_3} + \tau_{L,\text{IU1}}^{I_3 \Rightarrow O_1}$$ (5.5) Then the LOW logic value begins to propagate from IU1 along the $I_1$ - $O_1$ signal path through FPIN to MU1, then to MU2, and then along the $I_2$ - $O_2$ signal path through the FPIN to IUn (IU8 in Fig. 5.8). These delays constitute the worst case transmission delay $(\tau_{L,\mathrm{tr}})$ . Thus, $$\tau_{L,\text{tr}} = \sum_{k=2}^{\frac{n}{2}} \tau_{L,\text{IU}k}^{I_1 \Rightarrow O_1} + \tau_{L,\text{MU}1}^{I_1 \Rightarrow O_1} + \tau_{L,\text{MU}2}^{I_2 \Rightarrow O_2} + \sum_{k=\frac{n}{2}+1}^{n} \tau_{L,\text{IU}k}^{I_2 \Rightarrow O_2} + \sum_{L,\text{FPIN}} \tau_{L,\text{FPIN}}$$ (5.6) Finally, after the LOW logic value reaches IU8, the internal pull-down driver of IU8 is activated and it requires some time to bring the voltage level of the corresponding BDIO node from $V_{\rm DD}$ to $V_{\rm REF2}$ . $t_{\rm f}^{\overline{O_2} \Rightarrow BDIO}$ in Table 5.3 is defined as the time needed by the internal pull-down driver to bring the voltage level of the BDIO node from $V_{\rm DD}$ to $0.3 \times V_{\rm DD}$ . Thus, $$\tau_{L,\text{act}} = t_{f,\text{IU}n}^{\overline{O_2} \Rightarrow BDIO} \tag{5.7}$$ # **HIGH Logic Propagation Delay** The worst-case propagation path for HIGH logic value is the same as for the LOW logic value. The propagation begins with the deactivation of the ODD connected to IU1. However, in this case, the voltage of the BDIO node does not have to rise from LOW to HIGH for the LOW Detector to detect it. In fact, the voltage level of the BDIO node is required to rise from $\approx 0 \text{ V}$ to $V_{\text{REF}\_1}$ (approximately 10% of $V_{\text{DD}}$ ) for the LOW Detector to begin to detect. Hence, $\tau_H^{BDIO\Rightarrow I_3}$ in Table 5.3 is defined to include that rise time and the delay of the LOW Detector itself. $\tau_H^{BDIO\Rightarrow I_3}$ is the delay between the deactivation of the ODD (external I<sup>2</sup>C driver) and the corresponding LOW-to-HIGH transition of $I_3$ . Then the HIGH logic value propagates through the AND-gate of IU1 from $I_3$ to $O_1$ . Together, these two delays constitute $\tau_{H,\text{det}}$ . $$\tau_{H,\text{det}} = \tau_{H,\text{IU}1}^{BDIO \Rightarrow I_3} + \tau_{H,\text{IU}1}^{I_3 \Rightarrow O_1} \tag{5.8}$$ Similar to $\tau_{L,\text{tr}}$ , the HIGH logic value propagates from IU1 along $I_1$ - $O_1$ signal path through FPIN to MU1, then to MU2, and then along $I_2$ - $O_2$ signal path through FPIN to IUn (IU8 in Fig. 5.8). These delays constitute the worst case transmission delay ( $\tau_{H,\text{tr}}$ ). Thus, $$\tau_{H,\text{tr}} = \sum_{k=2}^{\frac{n}{2}} \tau_{H,\text{IU}k}^{I_1 \Rightarrow O_1} + \tau_{H,\text{MU}1}^{I_1 \Rightarrow O_1} + \tau_{H,\text{MU}2}^{I_2 \Rightarrow O_2} + \sum_{k=\frac{n}{2}+1}^{n} \tau_{H,\text{IU}k}^{I_2 \Rightarrow O_2} + \sum_{H,\text{FPIN}} \tau_{H,\text{FPIN}} \tag{5.9}$$ Finally, after the HIGH logic value reaches IUn, the internal pull-down driver of IU8 is deactivated and it requires some time to bring the voltage level of the corresponding BDIO node from $V_{\rm REF2}$ to $V_{\rm DD}$ . $t_{\rm r}^{\overline{O_2} \Rightarrow BDIO}$ is defined as the time needed by the internal pull-up pMOS driver to bring the voltage level of the BDIO node from $V_{\rm REF2}$ to $0.7 \times V_{\rm DD}$ . Thus, $$\tau_{H,\text{act}} = t_{\text{r,IU}n}^{\overline{O_2} \Rightarrow BDIO} \tag{5.10}$$ # 5.3.7 Maximun Number of Interface Units in a Dual-Queue Interconnection Topology In principle, an arbitrarily large number of interface units can be interconnected by the dual-queue topology. In practice, the maximum number is limited by the worst-case propagation delays of the LOW/HIGH logic value and the required communication speed of the supported open-drain protocol. The worst-case propagation delays of the LOW and HIGH logic value are equivalent to the fall and rise time respectively of the target communication speed specification. From Eq. (5.4a, 5.5-5.7), the worst-case propagation delay of the LOW logic value in the dual-queue network includes the fall-time of two BDIO nodes $(t_{\rm f,IU1}^{ODD\Rightarrow BDIO})$ in $\tau_{\rm L,det}$ and $t_{\rm f,IUn}^{\overline{O_2}\Rightarrow BDIO}$ in $\tau_{\rm L,act}$ ). From Eq. (5.4b, 5.8-5.10), the worst-case propagation delay of a HIGH logic value in the dual-queue network includes the rise-time of only one BDIO node $(t_{\rm r,IUn}^{\overline{O_2}\Rightarrow BDIO})$ in $\tau_{\rm H,act}$ ). Thus, Eq. (5.4a) represents the critical path that puts a practical limit on the maximum BDIO node capacitance and the maximum number of interface units that can be interconnected with the dual-queue topology to support a required communication speed. All I/Os are physically connected together in a conventional I<sup>2</sup>C communication, thus the total bus capacitance is the summation of all I/O capacitances and interconnecting wires. It results in a value that can get fairly large. According to I<sup>2</sup>C specifications (Fast-mode Plus), a standard value of the bus capacitance is 400-550 pF and the maximum fall-time is 120 ns [14]. However, when interconnected through the proposed bi-directional interface, each I<sup>2</sup>C driver is to be directly connected to the BDIO node of only one interface unit, as shown in Fig. 5.8. Hence, standard I<sup>2</sup>C drivers can achieve a shorter rise/fall times. For example, if the loading capacitance of the BDIO node is one-fifth of the standard I<sup>2</sup>C bus capacitance, then standard I<sup>2</sup>C drivers (ODD) would achieve one-fifth of their normal I<sup>2</sup>C fall-time. Similarly, the internal pull-down driver, if designed according to the I<sup>2</sup>C standard, can also achieve a fall time that is a fraction of the I<sup>2</sup>C fall-time. Thus, with proper design, both $\tau_{L,\text{det}}$ and $\tau_{L,\text{act}}$ can be made equal to a pre-determined fraction of a normal fall time. Figure 5.10 Detailed transistor-level schematic of the bi-directional interface unit and microphotograph of the die. $\tau_{L, \text{det}}$ and $\tau_{L, \text{act}}$ represent a deterministic amount of delay because those depend only on IU1 and IUn respectively. However, $\tau_{L, \text{tr}}$ accumulates as the number of interconnected interface units increases. Thus, components associated with $\tau_{L, \text{det}}$ and $\tau_{L, \text{act}}$ can be designed so that $\tau_{L, \text{det}}$ and $\tau_{L, \text{act}}$ consume a deterministic fraction of the I<sup>2</sup>C fall-time for any given communication speed. Thus, $\tau_{L, \text{tr}}$ could consume the remaining 'unused' part of I<sup>2</sup>C fall-time. Timing constraints will thus impose limits on the number of ODDs that could be interconnected by a set of interface units connected using the dual-queue topology that would maintain the worst-case propagation delay to be less than or equal to the maximum fall-time of a regular I<sup>2</sup>C connection. Of course, a smaller loading capacitance of the BDIO node or stronger internal drivers would result in smaller rise/fall times. It would leave more headroom for $\tau_{L,\mathrm{tr}}$ or $\tau_{H,\mathrm{tr}}$ . Thus, larger number of ODDs could be interconnected by the interface units with the dual-queue topology while meeting a given communication speed. #### 5.4 Prototype Test-Chip and Measurement Results The interface unit was designed to be compatible to the prototyping platform of [1]. The platform used thick-oxide I/O FETs for the configurable I/O so that it can support ICs operating on a wide range of power supply voltages. However, the embedded FPIN is to be implemented with thin-oxide FETs (operating on a lower power supply) to leverage their high speed. # 5.4.1 Design Specification of the Bi-directional Interface A detailed transistor level schematic of the interface unit is shown in Fig. 5.10. The LOW Detector has physical connection with the configurable I/O (BDIO node) and hence was designed with thick-oxide 3.3 V I/O FETs, as shown in Fig. 5.10. $I_{3A}$ and $\overline{I_{3A}}$ are 3.3 V logic signals. If the voltage level of the BDIO node falls below $V_{\text{REF1}}$ , $I_{3A}$ and $\overline{I_{3A}}$ become LOW and HIGH respectively. The interface units are to communicate among themselves through the embedded FPIN. Thus, the logic function among $I_1$ , $I_2$ , $I_3$ , $O_1$ , and $O_2$ were implemented in 1.2 V 2.2 nm-oxide FETs. Thus, the voltage levels of $I_{3A}$ and $\overline{I_{3A}}$ were brought down to 1.2 V by a down-converter ( $M_{405-408}$ in Fig. 5.10). $I_{3A}$ and $I_3$ are logically equivalent. On the other side, $I_1$ and $I_2$ are 1.2 V logical signals. Thus, an up-converter ( $M_{401-404}$ in Fig. 5.10) was used to convert the $\overline{O_2}$ from a 1.2 V signal to a 3.3 V signal that is used to activate the Unity-gain Buffer in Fig. 5.10. The Unity-gain Buffer that has physical connection with the I/O was designed with thick-oxide 3.3 V I/O FETs. A resistor divider was used to generate $V_{\text{REF1}}$ and $V_{\text{REF2}}$ . Finally, the Unity-gain Buffer was used to propagate $V_{\text{REF2}}$ to the BDIO node. Sec. 5.3.7 provides guidelines to use the delay model of Sec. 5.3.6 to design the various components of the interface unit to support a given communication speed. The prototype bidirectional interface was designed to support I<sup>2</sup>C Fast-mode Plus specifications (Table 5.4). The amplifier of the *Unity-gain Buffer* was designed to provide a pull-down current of 0.53 mA and a pull-up current of 1.2 mA for a loading capacitance of 15 pF. It can achieve a fall-time $(t_{\rm f}^{\overline{O_2}\Rightarrow BDIO})$ in Table 5.3) of $\approx$ 90 ns. Since the loading capacitance of 15 pF at each node is one-thirtieth of the standard bus loading value of 400-550 pF [14], a standard I<sup>2</sup>C Fast-mode Plus driver can achieve a fall-time $(t_{\rm f}^{ODD\Rightarrow BDIO})$ in Table 5.3) of $\approx$ 4 ns. The AND-gates of the *ODD LOW Decoders* were designed to have a delay that is a fraction of a nano second in the target CMOS technology. With these tentative values and the delay model of Sec. 5.3.6, a few tens of such interface units can be interconnected using the dual-queue topology and the worst-case propagation delay of such a network would be less than 120 ns. Since the interface imitates the behaviour of an open-drain or open-collector bus, it can be redesigned with different parameters (e.g. different values of C<sub>b</sub>, I<sub>OL</sub>, V<sub>IL</sub>, V<sub>IH</sub>, $\tau_{H,wc}$ , $\tau_{L,wc}$ ) for other communication speeds. | Table 5.4 Design specification | of the bi-directional | interface in | the test-chip | according to I <sup>2</sup> C | |--------------------------------|-----------------------|--------------|---------------|-------------------------------| | Fast-mode Plus protocol. | | | | | | Parameter | Description | I <sup>2</sup> C equivalent | Value | Unit | |--------------------|-----------------------------|-----------------------------|---------------------|----------| | -V <sub>DD</sub> | Power supply | same | 3.3 | V | | $C_{\rm b}$ | Capacitive load for | Capacitive load of | 15 | pF | | | each BDIO node <sup>a</sup> | bus line | | | | ${ m I}_{ m OL}$ | LOW-level pull- down | same | 0.53 | mA | | | current | | | | | $ m I_{PU}$ | Pull-up current | same | 1.2 | mA | | <b>-</b> F 0 | $(V_{\rm OL} = 0.6)$ | | ±. <b>=</b> | 1111 1 | | $ m V_{IL}$ | LOW-level input | same | $0.3 \times V_{DD}$ | V | | v IL | voltage | Same | 0.3 × V DD | <b>v</b> | | $V_{IH}$ | HIGH-level input | same | $0.7 \times V_{DD}$ | V | | v IH | $\operatorname{voltage}$ | Same | 0.1 × V DD | V | | | Worst-case | Rise time of both | | ns | | $ au_{H, ext{wc}}$ | propagation delay of | SDA and SCL signals | 120 | | | | HIGH logic value | SDA and SCL signals | | | | | Worst-case | Fall time of both | | | | $ au_{L, ext{wc}}$ | propagation delay of | | 120 | ns | | , | LOW logic value | SDA and SCL signals | | | | | | | | | a. As the test-chip is to be used to validate the concept, the BDIO node capacitance value was chosen to include the PCB trace, oscilloscope probe and connecting wire, and pad capacitances only. # 5.4.2 Delay Characterization of the Bi-directional Interface from Post-Layout Simulation In the test-chip, only the BDIO node of the interface units could be measured. Thus, only the total propagation delay between two interface units could be derived from measurements. Since every point inside the test-chip could not be measured, individual delays of the ODD LOW Decoder and LOW Detector, as well as the rise/fall time of the Unity-gain Buffer (internal pull-down driver) and the ODD were derived from post-layout simulations. Table 5.5 summarizes the numerical values of various component delays and rise/fall times of the interface unit based on post layout simulations. These values indicate that in a network comprising less than 10 interface units, the total propagation delay will be primarily dominated by $\tau_H^{BDIO\Rightarrow I_3}$ , $t_{\rm r}^{\overline{O}\Rightarrow BDIO}$ , and $t_{\rm f}^{\overline{O}\Rightarrow BDIO}$ . These three delays constitute the detection delays $(\tau_{L,{\rm det}} \text{ or } \tau_{H,{\rm det}})$ and the activation delays $(\tau_{L,{\rm act}} \text{ or } \tau_{H,{\rm act}})$ . Various delays of the ODD LOW Decoder module $(\tau_H^{I_1\Rightarrow O_1}, \tau_L^{I_1\Rightarrow O_2}, \tau_H^{I_2\Rightarrow O_2}, \text{ etc.})$ that constitute the transmission delay $(\tau_{L,{\rm tr}} \text{ or } \tau_{H,{\rm tr}})$ are almost negligible compared to the aforementioned three delays. Thus, their effect on the total propagation delay is very small. Contributions of all these individual component delays on the total propagation delay of HIGH/LOW logic values between two interface units | Area<br>(μm²) | 1430 | | | | | |-------------------------------------|----------------------------------------------------------|-------------|----------------------------------------|---------|--| | Power (mA) | | 1 | | | | | Delay of <i>ODD LOW</i> | $\tau_H^{I_1 \Rightarrow O_1} = 0$ | .25 | $ au_L^{I_1\Rightarrow O_1}$ | = 0.225 | | | Decoder (ns) | $\tau_H^{I_3 \Rightarrow O_1} = 0.265$ | | $\tau_L^{I_3 \Rightarrow O_1} = 0.259$ | | | | | $\tau_H^{I_1 \Rightarrow O_2} = 0.254$ | | $\tau_L^{I_1 \Rightarrow O_2} = 0.276$ | | | | | $\tau_H^{I_2 \Rightarrow O_2} = 0$ | .28 | $ au_L^{I_2 \Rightarrow O_2}$ | = 0.246 | | | | BDIO | | | | | | | Load | 10 | 15 | 20 | | | | (pF) | | | | | | Delay of $LOW$ | $\tau_H^{BDIO\Rightarrow I_3}$ | 21 | 28 | 35 | | | $Detector\left( \mathbf{ns}\right)$ | $\tau_L^{BDIO\Rightarrow I_3}$ | $\approx 2$ | ≈2 | ≈2 | | | Rise/fall time of <i>Unity-gain</i> | $t_{\mathrm{r}}^{\overline{O_2}\Rightarrow BDIO}$ | 16 | 23 | 30 | | | Buffer (ns) | $\overline{t_{\rm f}^{\overline{O_2} \Rightarrow BDIO}}$ | ≈62 | ≈91 | ≈122 | | | Fall time of ODD (ns) <sup>a</sup> | $t_{\rm f}^{ODD\Rightarrow BDIO}$ | ≈0.82 | ≈1.14 | ≈1.38 | | Table 5.5 Characterization of the Interface Circuit Based on Post Layout Circuit Simulations. will be compared with measured propagation delays from the test-chip in Sec. 5.4.4. Replacing the right hand side of Eq. (5.4a & 5.4b) with the elaborated expressions of Eq. (5.5-5.10) gives the worst-case propagation delays of the LOW and HIGH logic values in terms of the individual component delays and rise/fall times. Subsequently injecting the corresponding values from Table 5.5 in Eq. (5.4a & 5.4b), we get in nanosecond (ns): $$\tau_{L,\text{wc}} \approx 0.26 \cdot n + 94 \tag{5.11a}$$ $$\tau_{H, \text{wc}} \approx 0.26 \cdot n + 51$$ (5.11b) when each pin (BDIO) has a load capacitance of $15 \,\mathrm{pF}$ and n is the number of interconnected interface units. a. This delay is not a characteristics of the interface unit but of the test-bench. Figure 5.11 Dual-queue interconnection topology with 8 interface units implemented in the test-chip. # 5.4.3 Test-chip and Test-bench Specifications A test-chip was fabricated using IBM 0.13 µm CMOS technology. A dual-queue interconnected network prototype shown in Fig. 5.11, that consists of eight interface units was fabricated in this test-chip. A photomicrograph of that test-chip is shown in Fig. 5.10. A Tektronix MDO4014-6 oscilloscope was used to observe the voltage waveforms. Tektronix TPP1000 passive probes were used. They introduce a 4 pF parasitic capacitance. In the test-chip, isolated nMOS were fabricated to act as external ODD or I<sup>2</sup>C drivers designed to be compliant to the I<sup>2</sup>C Fast-mode Plus specification summarized in Table 5.1. It should be noted that these drivers are not part of the bi-directional interface units. These drivers are part of the test-chip to facilitate the testing operation. Measured waveform data were extracted from the oscilloscope and plotted in Fig. 5.12. They show that the dual-queue interconnected network mimics the "wired-AND" logic of open-drain connection. The eight interface units are called IU1 to IU6 and MU1 & MU2 in Fig. 5.11. ODD3 and ODD4 are operated as I<sup>2</sup>C drivers. CTRL1, a 1.25 MHz pulse having a pulse width of 400 ns, was applied to ODD3, shown in Fig. 5.11. CTRL2 is a similar pulse train, left-shifted by 200 ns or 90°, that was applied to ODD4, shown in Fig. 5.11. Due to the limited number of available test-chip pins, BDIO nodes of IU1, IU2, IU5 & IU6 were not actively driven by ODD. Those interface units could still be assumed to be connected to open-drain drivers that never turn ON. These BDIO nodes are not loaded, but even if they were, such loading would not affect the propagation delay of critical path (solid and dotted thick grey lines) as apparent in Fig. 5.8. #### 5.4.4 Measurement results from dual-queue topology with 8 interface units Fig. 5.12 shows three successful cycles of operation of the implemented bidirectional bus. The cycle beginning at t = 1000 ns will be described in detail. It can be seen in Fig. 5.12 that during the interval between 1000 and 1200 ns, when only ODD4 was activated, the internal Figure 5.12 Measurement result of dual-queue interconnected network (shown in Fig. 5.11) from the test-chip. drivers of IU3 & IU1 became activated to produce a LOW logic value ( $V_{\rm REF2}$ or 600 mV) at $BDIO_3$ and $BDIO_1$ respectively. During the interval between 1200 and 1400 ns, when both ODD3 and ODD4 were activated, the voltage level of both $BDIO_3$ and $BDIO_4$ was $\approx 0$ V, and voltage level of $BDIO_1$ was at $V_{\rm REF2}$ that corresponds to the LOW logic value also. During the interval between 1400 and 1600 ns, when only ODD3 remained activated, the internal drivers of IU4 & IU1 remained activated to maintain a voltage of $V_{\rm REF2}$ or 600 mV at $BDIO_4$ and $BDIO_1$ respectively that corresponds to LOW logic values. Finally, during the interval between 1600 and 1800 ns, when both ODD3 and ODD4 were deactivated, the internal drivers of IU3, IU4 & IU1 became deactivated to produce a voltage of 3.3 V at $BDIO_3$ , $BDIO_4$ and $BDIO_1$ respectively that corresponds to HIGH logic values. This completes a full validation cycle that begins to repeat at 1800 ns. Thus, the dual-queue interconnected bi-directional interfaces successfully mimic the "wired-AND" logic of open-drain connection. It can be seen in Fig. 5.12 that the fall time of the nodes $BDIO_3$ and $BDIO_1$ are not equal. This is due to different lengths of PCB traces and the corresponding loading capacitances. It should be noted that even though two I<sup>2</sup>C drivers do not output LOW logic value during normal operations, two I<sup>2</sup>C drivers can do so when they compete to take control of the bus. I<sup>2</sup>C has an arbitration process [14] through which such contention is resolved and that arbitration process depends on the wired-AND property of open-drain connection. The interval between 1200 and 1600 ns demonstrates the ability of the proposed interface unit to properly support such a scenario where two I<sup>2</sup>C drivers simultaneously output a LOW logic value (1200 to 1400 ns) and subsequently one of the drivers output a HIGH logic value (1400 to 1600 ns). The total propagation path of a LOW logic value from IU4 to IU1 through MU2 and MU1 in Fig. 5.11 is shown by the thick dashed grey line. This path demonstrates the propagation of a LOW logic value from one individual queue (Queue Network-2) to the other queue (Queue Network-1). Comparing various delays and rise/fall times from Table 5.5, $t_{\rm f}^{\overline{O_2} \Rightarrow BDIO}$ can be seen as the largest value. From Eq. (5.4a), (5.5)-(5.7) that combines all the individual component delays and rise/fall times associated with the propagation of a LOW logic value, it can be deduced that $t_{\rm f,IU1}^{\overline{O_2} \Rightarrow BDIO}$ would account for more than 95% of the total propagation delay from IU4 to IU1. The voltage waveform of $BDIO_1$ in Fig. 5.12 supports that analysis. In Fig. 5.12 (Label-A), at t=1000 ns, after the voltage level of $BDIO_4$ is brought down to $\approx 0$ V by $ODD_4$ , a LOW logic value propagates from IU4 through MU2 and MU1. It reaches IU1 within a few nanoseconds, and then the internal pull-down driver of IU1 pulls down the voltage level of the $BDIO_1$ node to $V_{\rm REF2}$ or 600 mV in $\approx 120$ ns (Label-B). #### 5.5 Conclusion This paper has presented an open-drain interface circuit that can support a bi-directional bus structure using a field programmable interconnection network. An interconnection topology, called dual-queue, has been proposed. The topology has an interconnection complexity of $\Theta(n)$ , where n is the number of interconnected interfaces. A delay model has been developed for the topology. The model can be used to determine the maximum number of interface units that can be interconnected to support a given communication speed. The proposed interface circuit has been fabricated in a $0.13 \,\mu\text{m}$ CMOS technology and was successfully tested. The interconnection topology has been validated by measurements from the test-chip. The fabricated circuit has been designed to meet the specification of the I<sup>2</sup>C Fast-mode Plus protocol when implemented with the active reconfigurable platform of [1]. Nevertheless, it could be integrated with any FPIN or FPGA. In principle, it can support any open-drain bus with their respective reference voltages. # Acknowledgments This research was partly supported by Gestion Technocap, the Natural Sciences and Engineering Reseach Council of Canada and by the Mitacs program. The authors would like to acknowledge CMC Microsystems for the products and services that facilitated this research (CAD tools by Cadence, fabrication services using 0.13 µm CMOS technology from IBM, and packaging services). This work was partly done while one of the authors was a guest professor at COMELEC-Telecom ParisTech. # CHAPTER 6 ARTICLE 3: AN ASYNCHRONOUS DELTA-MODULATOR BASED A/D CONVERTER FOR AN ELECTRONIC SYSTEM PROTOTYPING PLATFORM # Summary of the Chapter An asynchronous $\Delta$ -modulator (ADM) has been developed by the author to provide an ultra-compact implementation of an A-to-D converter that can support analog signal propagation through the digital network of the Waferboard<sup>TM</sup> [1]. An s-domain model, developed by the author, of the ADM has been used to analyze its operation. The concept of the proposed ADM has been validated by an on-silicon implementation in a 0.13 $\mu$ m CMOS technology. Measurement results indicate that the proposed ADM can support input signal bandwidth of 2 MHz and achieves measured SNR, SNDR and SFDR of 57, 47, and 54 dB respectively. A journal paper, based on the measurement/validation results was submitted on September, 2015 in IEEE Transactions on Circuits and Systems I: Regular Papers. **The submitted paper is reproduced in this chapter.** Title: An Asynchronous Delta-Modulator Based A/D Converter for an Electronic System Prototyping Platform Wasim Hussain, Hussein Fakhoury, Patricia Desgreys, Yves Blaquière, Yvon Savaria. Submitted in IEEE Transactions on Circuits and Systems I: Regular Papers on September, 2015. #### Abstract This paper presents and validates a compact circuit-implementation of an asynchronous $\Delta$ -modulator (ADM) for A/D conversion. This data converter was proposed as a means to propagate analog signals into digital interconnection networks. A detailed analysis of the A/D conversion mechanism of the proposed ADM circuit is presented. An analytical method is used to analyze and evaluate the inherent oscillation frequency of the proposed ADM circuit in terms of its circuit parameters. Due to the equivalence of the spectrum of the modulating input signal and the low-frequency spectrum of the ADM output, a simple low-pass filter can be used as a D/A converter to reconstruct the input analog signal. The proposed ADM was fabricated in a 0.13 $\mu$ m CMOS technology. Measurement results showed SNR and SNDR of 57 and 47 dB respectively for an input bandwidth of 2 MHz. The ADM occupies $45\,\mu\text{m} \times 22\,\mu\text{m}$ active area. The entire A/D and D/A converter-pair consumes $0.15\,\text{mA}$ from a $3.3\,\text{V}$ supply and occupies $45\,\mu\text{m} \times 46\,\mu\text{m}$ area. Compared to other similar A/D converters, the proposed ADM supports moderate signal bandwidth and medium-resolution, while occupying very small area. ### **Keywords** A/D and D/A conversion, asynchronous $\Delta$ -modulator, programmable silicon interposer. #### 6.1 Introduction Asynchronous $\Sigma\Delta$ modulators (ASDMs) and asynchronous $\Delta$ -modulators (ADMs) can convert continuous-time analog input signals into continuous-time discrete-valued output signals. ASDMs or ADMs are asynchronous because they do not have any sampling operation. ADM implementations in scaled CMOS processes have not been reported in the recent literature. Some ASDM implementations have been reported and analyzed in the recent literature [2, 30, 31, 64]. ASDMs encode the amplitude of the input signal into the pulse-width of the output signal [2]. They can also be considered as a pulse width modulators that can provide a very compact implementation of a high-resolution amplitude-to-time converter. Amplitude-to-time transformation is accomplished by inherent self-oscillation of ASDMs with a frequency called limit cycle frequency. A sufficiently high limit cycle frequency is used in the ASDM to avoid spectral overlap with the input modulating frequency [2]. An ASDM is shown in Fig. 6.1. As ASDMs allow very compact and robust implementations of amplitude-to-time conversion, they have great potential to be used as simple low-power high-precision A/D converters in applications that do not require explicit sampling and quantization. Due to the equivalence of the spectrum of the modulating input signal and the low-frequency part of the spectrum of the ASDM output, the input signal can be reconstructed by a simple low-pass filter from the ASDM output. ASDM applications have been reported in the context of ADSL/VDSL line drivers [30,61], power converters [62], [79,80], drivers for optical cables [63], A/D converters [64], [60,81,82], analog processors [83], and precoding elements to $\Sigma\Delta$ modulators [84,85]. This paper presents and analyzes a very compact and robust circuit implementation of an asynchronous $\Delta$ -modulator (ADM) that overcomes some practical limitations of ASDMs. Instead of having an integrator in the feedback path as is done in the standard $\Delta$ -modulator (DM), the proposed ADM utilizes a low pass filter with a non-zero cutoff frequency in the Figure 6.1 Asynchronous $\Sigma\Delta$ modulator. feedback path. The ADM was developed to support routing of analog signals over a digital field programmable interconnect network (FPIN), embedded in an electronic system prototyping platform introduced in [1]. The same concept could be applied to FPINs found in field programmable gate arrays (FPGAs). The proposed ADM has a small capacitive input impedance and hence, it introduces no significant loading effect on the input signal at its modulating frequency. A simple passive low-pass filter can be used as a D/A converter to reconstruct the input analog signal at the receiving end. Sec. 6.2 provides some background on FPINs, ASDMs, ADMs, and identifies limitations of related existing A/D converters. These limitations led the authors to develop the proposed ADM solution. Sec. 6.3 and Sec. 6.4 present the proposed solution and the associated circuit implementation, while Sec. 6.5 presents measured results from a test-chip that was implemented. Finally, Sec. 6.6 concludes by summarizing our main contributions and results. # 6.2 Background #### 6.2.1 Field Programmable Interconnection Networks FPINs are found embedded in FPGAs, in prototyping platforms [5–7, 76], and in many Network-on-Chip architectures [8]. Hardware systems used for logic emulation can enhance their capability and performance by having programmable interconnection implemented using FPGAs. Commercial logic emulation systems, such as the REALIZER SYSTEM [9], use programmable interconnection devices between FPGAs. Emulation systems are often implemented using components called field-programmable interconnection chips (FPICs). Fig. 6.2 illustrates a generic FPIN that provides programmable interconnections between various endpoints, such as configurable logic blocks (CLBs) in FPGAs, or configurable I/Os in an electronic system prototyping platform called WaferBoard<sup>TM</sup> [1]. WaferBoard provides programmable interconnections that can link multiple component ICs [1]. It has an uni-directional digital switch box based FPIN that can be programmed by users to interconnect component ICs randomly and manually deposited by the user on its surface. The need for analog connections reconfigurably routed over a digital FPIN was originally Figure 6.2 Generic FPIN model of a Field Programmable Interconnection Network. conceived in the context of the WaferBoard. As it influenced the requirements for what we propose, the next section briefly describes the WaferBoard. # 6.2.2 The Target Application: Prototyping Platform [1] The core of the target prototyping platform is a wafer scale IC upon which component ICs are to be deposited. Its surface has a dense array of very fine (tens of microns) conducting pads acting as configurable I/Os (CIOs), shown in Fig. 6.3. A digital FPIN is embedded in the wafer scale IC. The FPIN can be configured, similar to FPGAs, to connect CIOs to any other (CIOs are the endpoints). Each CIO has its own configurable digital I/O buffers. If a CIO is to operate as an input, then the respective CIO is configured as an input and this buffer receives the signal from a source IC and propagates it through the digital FPIN to the destination CIO. By contrast, the destination CIO's I/O buffer is configured as an output buffer and it propagates the signal to the corresponding destination IC. The prototyping platform [1] was primarily developed to prototype digital electronic systems. However, nowadays many electronic systems are at least partly mixed-signal systems. Having the ability to reconfigurably route analog signals through the embedded FPIN can greatly improve the versatility of the WaferBoard or any electronic system prototyping platforms. This feature could also be embedded in FPGAs to be used as analog switches to propagate any analog signals on some of its I/Os to any others. Figure 6.3 Hierarchical description of the reconfigurable board to CIO. # 6.2.3 Analog Interface Based on Asynchronous $\Sigma\Delta$ Modulation or $\Delta$ Modulation In general, an analog interface circuit reconfigurably routing an analog signal through a digital FPIN must comprise A/D (transmitter side) and D/A (receiver side) converters. A source IC provides an analog signal to the A/D converter (transmitter). The A/D converter transforms it into a digital format that will be reconfigurably routed through the FPIN to the D/A converter (receiver side). The digital data, upon reaching the D/A converter, is transformed back into a reconstructed copy of the original analog signal and provided to the destination IC. A solution to provide reconfigurable routing of analog signals in the prototyping platform [1] was introduced in [22]. It was based on the frequency modulation of ring oscillator based voltage controlled oscillators (VCOs) that converted analog signals into discrete-valued pulses that could be reconfigurably routed through the FPIN. A phase locked loop (PLL) was used to reconstruct (demodulate) the analog signal from the discrete-valued pulses. However, due to non-linearities in the voltage to frequency transfer curve of ring oscillator based VCOs, that solution could support input analog signals in the range of only 0.6-1.6 V for a power supply of 1.8 V and a bandwidth of 200 kHz [22]. Thus, the authors were motivated to find an alternative A/D and D/A converter solution that can support reconfigurable routing of analog signals with wider voltage range and higher bandwidth within the existing constraints of the target application [1]. Ideally: - compact silicon area requirement is desirable (existing area is already over utilized by the CIO buffers, crossbar multiplexers of the FPIN, and other control circuitries); - the FPIN should not be modified; - high linearity A/D and D/A conversions are desirable; - A/D and D/A conversions should be single-ended; - high input impedance is preferable; - a robust solution compatible with wafer scale integration is desired (due to wafer scale integration, a large number of such circuits are integrated in one single wafer and defective instances of circuits cannot be arbitrarily discarded); and - low power consumption during standby or inactive mode is desired. Both A/D and D/A converters can be classified into two main categories: - 1. Nyquist-rate and - 2. Oversampling converters. Nyquist-rate A/D converters require high-accuracy analog components (amplifier, comparator, resistor, current source or capacitors) in order to achieve acceptable linearity and resolution. Thus, these converters are often difficult to implement in scaled CMOS technologies because of low supply voltages and poor transistor output impedance (due to short-channel effects) [25]. The sampling rate is higher than the Nyquist-rate in oversampling converters. $\Sigma\Delta$ modulator and $\Delta$ -modulator are two types of oversampling converter that generate a low-resolution (usually 1-bit) data stream and digital filtering is used to produce high-resolution data at Nyquist frequency [27]. In our target application, a dedicated 1-bit channel is available for data transmission through the FPIN. Utilizing a multi-bit Nyquist-rate A/D converter would have required parallel-to-serial and serial-to-parallel conversions at the transmitter and receiver respectively. Thus, the typical 1-bit output in $\Sigma\Delta$ modulator and $\Delta$ -modulator makes them particularly suitable A/D converters for our targeted FPIN-based prototyping platform. In addition, scaled CMOS technology has the beneficial effect of high time resolution in the circuitry due to the increase in the intrinsic speed of the transistors. Thus, designing asynchronous implementations of $\Sigma\Delta$ modulator (and $\Delta$ -modulator) is simplified in scaled CMOS technology [2]. An important advantage of the asynchronous $\Sigma\Delta$ modulator is the property that demodulation consists of simple low pass filtering only [2]. Such advantages can also be exploited with asynchronous $\Delta$ -modulators. #### 6.2.4 Limitations of Existing Asynchronous $\Sigma\Delta$ Modulator Implementations This section reviews limitations of existing ASDMs and their application to A/D conversion. Fig. 6.4(a) shows the block-diagram of a generic ASDM that has been mathematically analyzed in [2] and [86]. It is called generic because it is similar to the conventional synchronous (continuous-time, discrete-time) $\Sigma\Delta$ modulators in its placement of the filter in the forward path of the ASDM loop. A practical limitation of the generic ASDM is that the sharp edges of (a) Generic asynchronous $\Sigma\Delta$ modulator as conceptualized in [2]. (b) An implementation of the generic asynchronous $\Sigma\Delta$ modulator with current sources as feedback elements. (c) An implementation of the generic asynchronous $\Sigma\Delta$ modulator with differential input-output and first-order loop-filter [30]. Figure 6.4 Different implementations of the generic asynchronous $\Sigma\Delta$ modulator. the modulator (or hysteresis) output propagate to the input of the summing (or subtracting) amplifier and introduce non-linearity in the A/D conversion. Fig. 6.4(b) shows an implementation of the generic ASDM that utilizes current sources as feedback elements. When the hysteresis output module makes low to high (LH) or high to low (HL) transitions, the edges propagate through the $C_{gd}$ capacitances of $M_P$ and $M_N$ , and momentarily deviate the inverting input $(v_{virtual})$ of the amplifier from the virtual ground because the gain-bandwidth product of the amplifier is not infinitely high to maintain ideally constant voltage at the inverting input. Utilizing current sources in the feedback also introduces possi- ASDM based on Fig. 6.4(b) was reported in [31]. The reported oscillation center frequency was only 630 Hz and that implementation could only support an input bandwidth of 30 Hz. It appears that due to a very low frequency of operation, the amplifier was able to maintain a *stable* value at the inverting input $(v_{\text{virtual}})$ . Silicon implementation of high or moderate speed ASDM based on such architecture has not been reported yet. Fig. 6.4(c) shows another implementation of the generic ASDM that was reported in [30]. It is based on $g_m$ -C integrator and has fully differential input-output. Due to having differential input-output, the two implementations with first and second-order filter were reported to have a relatively high SFDR and SNR of 75 dB and 70 dB respectively in [30]. However, notice that when a $G_m$ -C filter is used in these asynchronous $\Sigma\Delta$ -modulators, even thought the entire system (comprising the transconductance amplifier, loop filter and hysteresis block) is connected in a negative feedback loop, the transconductance amplifier itself is not connected in a negative feedback loop. The differential input of that transconductance amplifier can be non-zero. In fact, the differential input of the transconductance amplifier is equal to the input analog voltage. The transconductance amplifier ( $G_m$ stage) has its own V/I conversion non-linearity. For large input voltage swing, the actual value of $G_m$ changes because currentvoltage characteristics of transistors are not linear (they approximately follow a square law of MOSFETs operating in their active region). As a result, $G_m$ stages have limited input voltage range with acceptable linearity and [30] could support an input voltage range of only 0.4 V that is a small fraction of their 1.8 V power supply. By contrast, our target application requires a single-ended input A/D converter. Finally, a wideband continuous-time $\Sigma\Delta$ -modulator has been reported in [87]. Their modulator utilizes a low pass filter in the main feedback path to filter out sharp edges. This small area solution that was independently proposed by the authors of [87] is also utilized in our implementation. #### 6.3 Proposed Asynchronous $\Delta$ -Modulator # 6.3.1 Proposed Asynchronous $\Delta$ -Modulator An asynchronous $\Delta$ -modulator (ADM) shown in Fig. 6.5 is proposed by the authors to overcome the performance limiting factor (non-linearity of conversion) of the generic ASDM architecture. Instead of having the low-pass filter in the feed forward path (Fig. 6.4(a)), it is located in the feedback path. Thus, the modulated output is passed through a low-pass filter before being applied to the inverting input of the amplifier, and there is no sharp transition applied at the inverting input ( $v_f$ in Fig. 6.5) of the amplifier. It achieves better conversion Figure 6.5 The proposed ADM. linearity compared to a straight implementation of the generic ASDM. Also the analog input signal can be reconstructed from the modulated output ( $v_{\text{out}\_mod}$ in Fig. 6.5) of the ADM by a passive low pass filter only. # 6.3.2 Working Principle of the Proposed Asynchronous $\Delta$ -Modulator As ASDMs and ADMs do not have any explicit sampling operation in the loop, they cannot be represented by a sampled-data model. An s-domain model of the proposed ADM implementation (Fig. 6.6) is therefore developed in this subsection to explain its operation. The digital buffer in the feed forward path acts as the *quantizer* of the ADM. This digital buffer can be modelled as a relay followed by a delay element (Fig. 6.6). An ideal relay is a gain-changing non-linear element. Such non-linear elements can be inserted in linear models by the *describing function method* and an s-domain transfer function can be used to represent such non-linear elements [88, Page 521, Appendix B]. For simplicity, the digital buffer in the feed forward path is modelled as a relay having infinite gain. The amplifier in the feed forward path of the ADM is modelled as a gain element with first-order low pass function. The resistor and capacitor in the feedback path can be modelled as a low-pass filter. By replacing the non-linear element by their equivalent transfer function [88, Page 521, Appendix B], the loop-gain $(L_{\text{Gain}} = \frac{v_f}{v_{\text{in}}}(s)|_{\text{open}}$ in Fig. 6.6) of the ADM loop is, Figure 6.6 Proposed linear s-domain model of the ADM shown in Fig. 6.5. $$L_{\text{Gain}} = \underbrace{\frac{G}{1 + \frac{s}{w_1}}}_{\text{Amplifier}} \underbrace{\frac{\text{Relay & Delay }(\tau)}{4V_{DD}}}_{\text{Relay }(-s\tau)} \underbrace{\frac{\text{RC-filter}}{1}}_{\text{RC-filter}}$$ $$\underbrace{\frac{1}{1 + sRC}}_{\text{Gain}}$$ Gain Second order low pass function (6.1) $$= \underbrace{\frac{2V_{DD}G}{\pi\delta}}_{\text{Gain}} \cdot \underbrace{\frac{\text{Second order low pass function}}{1}}_{\text{Candition}} \cdot \underbrace{\frac{\text{Delay}}{2-s\tau}}_{\text{Delay}}$$ $$(6.2)$$ where, $2\delta = \text{Peak-to-peak}$ input swing at the ideal relay $\tau = \text{Delay of the digital buffer}$ The loop-gain is a second order low pass function with extra phase-shift due to the delay element. The magnitude of the gain $(\frac{2V_{DD}G}{\pi\delta})$ in Eq. 6.2 is infinite for small value of $\delta$ . Thus, this system is bound to oscillate with a continuous-time discrete-valued output signal $(v_{\text{mod}\_\text{out}})$ . However, the high gain of the feedforward path will ensure that $v_{\text{f}}$ tracks $v_{\text{in}}$ . The cut-off frequency of the RC-filter has to be sufficiently higher than the highest frequency in the input signal bandwidth to ensure that $v_{\text{f}}$ can track $v_{\text{in}}$ without excessive phase difference. Moreover, the cut-off frequency of the RC-filter has to be sufficiently lower than the oscillation frequency of the ADM to ensure that the sharp transitions of the ADM output $(v_{\text{out}\_mod})$ is properly *smoothened*. By applying Barkhausen stability criterion (oscillation criterion) on Eq. 6.2, the self-oscillation frequency of the ADM can be found by solving Eq. 6.3 [89, p. 346]. $$\arctan\left(\frac{\omega}{\omega_1}\right) + \arctan\left(\omega RC\right) + \omega \tau = \pi \tag{6.3}$$ Fig. 6.7(a) shows the gain-frequency plot of the loop-gain ( $L_{\text{Gain}}$ ). Fig. 6.7(b) shows the phase-frequency plots ( $y_1$ ) for different delays $\tau_{1,2,3}$ subtracted from 180° and the second order low pass function ( $y_2$ ) of Eq. 6.2. The intersection of $y_1$ and $y_2$ curve in Fig. 6.7(b) (solution of Eq. 6.3) is the oscillation frequency of the ADM. As the delay ( $\tau$ in Fig. 6.6) (a) Gain-frequency plot of loop-gain $(L_{\text{Gain}})$ . (b) Phase-frequency plot of loop-gain $(L_{Gain})$ . Figure 6.7 Evaluation of the oscillation frequency of the ASDM. inside the loop increases, the corresponding phase-frequency plot $(y_1)$ becomes steeper and the intersection frequency moves to the left implying lower oscillation frequency. As the digital buffer behaves as a relay having infinite gain, the output $(v_{\text{out}\_mod})$ of this digital buffer becomes a continuous-time discrete-valued signal that can be propagated through a digital network. At the receiving end, $v_{\text{in}}$ can be reconstructed by passing the propagated continuous-time discrete-valued signal through a low-pass filter. # 6.3.3 Behavioral Simulation of the Asynchronous $\Delta$ -Modulator This sub-section presents estimated SNDR performance of the proposed ADM based on high-level simulation with Simulink<sup>®</sup>. Measurements from a working prototype FPIN for the prototyping platform [1] through which the modulated signal was to be propagated has shown that this interconnect network can propagate digital signals at 270 Mbps [90]. As our proposed ADM converts both LH and HL transitions into LH transitions (detailed explanation is given in Sec. 6.4.1) and then propagates them through the FPIN, the required bandwidth of the Figure 6.8 Input signal frequency is 1 MHz. propagation path is twice the oscillation frequency of the ADM. Thus, the target oscillation frequency of the ADM was designed to be as high as possible but comfortably lower than half of 270 Mbps. The s-domain model of the proposed ADM (Fig. 6.6) was simulated with Simulink<sup>®</sup>. High-level simulations with Simulink<sup>®</sup> reveal that the achieved SNDR is strongly dependent on $\omega_2$ (or $\frac{1}{RC}$ ) and fairly independent of $\omega_1$ (if $\omega_1$ is sufficiently greater than the input frequency). Fig. 6.8, as expected, shows that $\omega_2$ has to be greater than the frequency of the input signal to achieve reasonable SNDR. It can be seen that the SNDR drastically falls for high input swing if $\omega_2$ is comparable to the input frequency. $\omega_2$ was chosen to be $\approx 16 \,\mathrm{Mrad/s}$ ( $f_2 \approx 2.5 \,\mathrm{MHz}$ ) so that the proposed ADM can support input frequencies of 1-2 MHz. Higher oscillation frequency can reduce the harmonics more effectively and as a result, the ADM can achieve higher SNDR. However, if $\omega_2$ is chosen to make the oscillation frequency too high, such high frequency pulse cannot be propagated through our target FPIN. The other circuit-level parameters for the proposed ADM ( $\omega_1$ , $\tau$ ) were chosen so that the *center frequency* (equivalent to $\omega_c$ in [2] or the oscillation frequency) of the ADM was $\approx 100 \,\mathrm{MHz}$ . Fig. 6.9 shows a plot of the signal power vs. SNDR (realistic noise sources were injected into the s-domain model, according to transistor-level design). As the input swing approached the power rails, the ADM output is degraded by harmonics and the SNDR drastically degrades after the input swing reaches $\approx 80\%$ of full swing. This is similar to the harmonics introduced in the generic ASDM's output as it was derived and validated by simulation results in [2,86]. Figure 6.9 High-level simulation with Simulink® of the proposed ADM architecture model (Fig. 6.5). Input signal frequency is 1 MHz. Representative component noise sources are included but the RC-filter is ideal. It can be seen that the model of our proposed ADM architecture achieves an SNDR of $\approx 70 \,\mathrm{dB}$ for an input bandwidth of 1 MHz. The performance predicted here with a somewhat ideal model gives a target that the design reported later will try to approach. # 6.4 Proposed Analog Interface Circuit and Post-Layout Simulation Results # 6.4.1 ADM-based Analog Interface An analog interface circuit (Fig. 6.10), based on the proposed ADM, was developed by the authors to support reconfigurable routing of analog signals through a digital FPIN. Being a continuous-time discrete-valued signal that makes rather sharp transitions between 0 V and $V_{\rm DD}$ , the ADM output is relatively insensitive to noise in the voltage domain. However, the ADM output is sensitive to noise and parametric imperfections in the time domain during signal propagation through the FPIN. Indeed, it is subject to delay mismatches and variations that influence propagation of digital pulse trains through a chain of digital buffers. For instance, when digital waveforms propagate through digital buffers, the pulse-widths are not conserved in general. This is due to the differences of rise and fall times combined with differences in loading of successive nodes in a gate chain. In theory such mismatches can be mitigated in a regular layout if the gates in a chain are well matched. In the target applications, placement and routing of FPIN's buffers in the chain was done with automatic placement and routing, and digital signal paths can be long and subject to significant parametric variations. However, an important property is that, for any length of a chain of digital buffers, the total propagation delay does not vary significantly for any consecutive LH (or Figure 6.10 ADM-based analog interface. # HL) transitions (respectively). A solution to preserve the pulse-width during signal propagation through any chain of digital buffers is proposed in Fig. 6.11(a). In this architecture, both LH and HL transitions of the ADM output are converted into LH transitions and then propagated through the FPIN. The analog interface uses the ADM output and a $\tau_D$ delayed version of this output to generate a one-shot pulse for each LH and HL transition (Fig. 6.11(b)). The value of $\tau_D$ must be large enough to allow generating full-swing LH transitions so that the sensitivity to noise is minimized. Delays also vary with temperature but on a time scale much longer than all relevant characteristics of the digital transmission line, therefore temperature changes and variations do not introduce significant error as consecutive LH (or HL) transitions are affected almost equally. However, it should be noted that the delay variability of the various buffer stages, due to power supply variations, can cause variability in the total propagation delay of individual LH (and HL) transitions (respectively). At the receiving end, each LH transition can be decoded by toggling the output of a memory element. For example, the propagated signal can be used to clock a T flip-flop (T-FF). As the relative time of consecutive rising (or falling) edge transitions do not get corrupted by the FPIN digital buffers, the output of this T-FF will be a fairly accurate replica of the original ADM output, with some latency. The output of the T-FF can then be passed through a low pass filter to reconstruct the input analog signal. In relation with Fig. 6.11(b), the transmission fidelity depends on precise values of $e_{\rm H}$ , $e_{\rm th}$ and $e_{\rm th}$ . A sufficient condition to obtain good reconstruction is to have $$e_{\rm H} + e_{\rm tH} = e_{\rm L} + e_{\rm tL} \tag{6.4}$$ (a) Block diagram of ADM-based analog interface. Figure 6.11 Block diagram and waveform of the ADM-based analog interface. where, $$e_H = \text{LH (Input) to LH (Output) delay}$$ of XOR gate $e_L = \text{HL (Input) to LH (Output) delay}$ of XOR gate $e_{tH} = \text{Clock to data (LH) delay of T-FF}$ $e_{tL} = \text{Clock to data (HL) delay of T-FF}$ If this condition is met, then pulse-width of the ADM output is preserved after being propagated through the FPIN. This is obtained if the XOR gate ( $e_{\rm H}$ and $e_{\rm L}$ ) and the T-FF ( $e_{\rm tH}$ and $e_{\rm tL}$ ) introduce equal propagation delays for LH and HL transitions. The CMOS XOR gate is shown in Fig. 6.12. Both $e_{\rm H}$ and $e_{\rm L}$ depend on only the pMOS (M<sub>3,4</sub> or M<sub>7,8</sub>) parameters. Thus, the equality of $e_{\rm H}$ and $e_{\rm L}$ can be maintained across process variations. The values of $e_{\rm tH}$ and $e_{\rm tL}$ obtained from post-layout simulations are shown in Table 6.1. The average oscillation frequency (center frequency) of the asynchronous $\Delta$ -modulator is 100 MHz. Thus, mismatch between $e_{\rm tH}$ and $e_{\rm tL}$ can introduce a duty cycle variation of $\approx$ 0.3% (100 MHz $\times$ 29.4 ps) for slow-fast corners (worst-case scenario). For typical values, the duty cycle variation is as low as $\approx$ 0.06%. Fig. 6.13 shows the effect of the mismatches between $e_{\rm tH}$ and $e_{\rm tL}$ . It can be seen that for delay mismatches of up to 10 ps, the performance of the ADM is not *visibly* degraded. # 6.4.2 Implementation Compatibility of the proposed ADM with the target application [1] is described in this subsection. In a previously fabricated test-chip, the prototyping platform [1] used thick-oxide I/O FETs for the configurable I/O so that it can support ICs operating on a wide range of power supply voltages and thin-oxide FETs (operating on a lower power supply) for the embedded FPIN to leverage their high speed and low power circuits. Accordingly, the proposed ADM was designed with thick-oxide 3.3 V I/O FETs (the detailed block diagram of the analog interface is shown in Fig. 6.14). The ADM modulates the input analog signal into 3.3 V pulses ( $v_{\text{out}\_mod\_3V3}$ in Fig. 6.14) and then converts them into 1.2 V pulse ( $v_{\text{out}\_mod}$ ). These pulses are then converted into a one-shot pulse train (by the XOR-gate) and propagated through the FPIN. A transistor level schematic of the proposed ADM block is shown in Fig. 6.15. $M_{1-14}$ constitute the amplifier and $M_{15-22}$ correspond to the digital buffers of Fig. 6.5. The amplifier used in the proposed ADM is a folded cascode amplifier. The non-inverting input of the amplifier is connected to the input analog signal. The inverting input, connected to the output of the low-pass filter, tracks the non-inverting input. A rail-to-rail input range is enabled by using complementary input pairs ( $M_{1-3} \& M_{4-6}$ ). The output of this amplifier does not need to have rail-to-rail swing. $M_{15-22}$ , acting as the digital buffer, provides the rail-to-rail output. $M_{23,24}$ (both are nMOS) convert the 3.3 V pulses to 1.2 V pulses. At the receiving side, the T-FF was designed with 1.2 V 2.2 nm-oxide FETs. The output of the T-FF is up converted from 1.2 V to 3.3 V and then passed through a low pass filter to reconstruct the input analog signal. Transistor level schematic of the 1.2 V to 3.3 V converter and the low pass filter of the receiving side are shown in Fig. 6.16. Capacitors in Fig. 6.15 and Fig. 6.16(b) were implemented with nMOS varactors, i.e. transistors with the drain/source tied to 0 V. nMOS varactors have a non-linear C-V curve. Even though MIM-capacitors or metal-metal capacitors can provide more linear C-V curves, in our target applications the higher metal layers were entirely dedicated for routing signals through the FPIN. Thus, MIM-capacitors could not be used. Besides, metal-metal capacitors (having thick silicon dioxide as dielectric) occupy comparatively large area that make them unsuitable in our target application because of area constraints. Figure 6.12 Schematic of the X-OR gate. Table 6.1 The values of $e_{\rm tH}$ and $e_{\rm tL}$ from post-layout simulation. | Process | $e_{\mathrm{tH}}\left(\mathrm{ps}\right)$ | $e_{\mathrm{tL}}\left(\mathrm{ps}\right)$ | $\Delta t_{\text{T-FF}} (\text{ps})$ | |-----------|-------------------------------------------|-------------------------------------------|---------------------------------------| | Typical | 352.5 | 346.1 | 6.4 | | Fast | 254.6 | 250.7 | $3.9\mathrm{ps}$ | | Slow | 481.4 | 469.7 | $11.7\mathrm{ps}$ | | Slow-Fast | 383.3 | 353.9 | $29.4\mathrm{ps}$ | | Fast-Slow | 339.3 | 353.4 | $-14.1\mathrm{ps}$ | Figure 6.13 The effect of mismatch between $e_{\rm tH}$ and $e_{\rm tL}$ on the SNDR. Figure 6.14 Detailed block diagram of the proposed ADM-based analog interface. # 6.4.3 Behavioral and Post Layout Simulation of the Asynchronous $\Delta$ -Modulator Using complementary input pairs introduce variations in the differential gain (G) and cut-off frequency ( $\omega_1$ ) of the amplifier over the input common-mode range. The SPICE-level model of the amplifier, shown in Fig. 6.15, was simulated to quantify the variations in G and $\omega_1$ . These variations were included in the high-level model of the ADM and simulated in Simulink® to evaluate their effects on the SNDR of the ADM. These variations increased the harmonics at the ADM output that degraded its performance. Fig. 6.17 (dotted and dotted-dashed curve) compares the effects of these variations with the scenario where an ideal amplifier was used in the ADM. It can be seen that the variations in $\omega_1$ degrade the performance of the ADM by 6-10 dB when the input analog signal makes large swings. In order to quantify the impact of the varactor non-linearity, the high-level model shown in Fig. 6.5 was simulated with Simulink<sup>®</sup> by including non-linearity of the nMOS varactors. An expanded gain<sup>1</sup> element (where the term $k_2 \cdot x^2$ is the expanded gain) was used at the output of the filter to properly represent the nonlinearity of C-V characteristics. The signal power vs. SNDR curve from simulation with Simulink<sup>®</sup> and post-layout simulation of the proposed ADM are shown in Fig. 6.17 (dashed and solid curve). Both signal power vs. SNDR curves of Fig. 6.17 exhibit similar behaviours, therefore confirming that the degradation is due to the non-linearity of the nMOS varactors. In post-layout simulations, a sinusoidal input signal with an offset voltage of 1.65 V was used while $V_{P-P}$ was varied. For low amplitude $V_{P-P}$ , the voltage at the output of the reconstruction low-pass filter remained around 1.65 V. In that voltage range, non-linearity of the nMOS varactor is insignificant [91]. However, as the input amplitude $(V_{P-P})$ increases, the low-pass 1. $$y = k_1 \cdot x + \overbrace{k_2 \cdot x^2}^{expanded gain}$$ Figure 6.15 Schematic of the ADM & 3.3 V to 1.2 V converter (transmission side). Figure 6.16 Schematic of the D/A converter (receiving side). filter output approaches 0 V and the non-linearity of the nMOS varactor becomes significant in that voltage range. It causes the reconstructed signal to have harmonics and the SNDR to degrade. Even though the variations in $\omega_1$ and the modulation process of the ADM itself limit the achieved SNDR for large input swings, the non-linearity of the varactor appears to be the dominant factor in limiting the performance of the proposed ADM. Our high-level model was also simulated with varactor's non-linearity and with/without variations in $\omega_1$ . It was Figure 6.17 Simulation of the proposed ADM based on s-model in Simulink<sup>®</sup> and post-layout simulations in Cadence ( $V_{DD}=3.3~V$ ) respectively. Simulink<sup>®</sup> model included the non-linearity of the filter. seen that when the varactor's non-linearity is included, including/excluding variations in $\omega_1$ does not make any observable difference (dashed and dashed-dot-dotted curve in Fig. 6.17) in the ADM's performance, because varactors non-linearity already caused the major part of the distortion. ### 6.5 Prototype Test-Chip and Measurement Results A test-chip of our ADM-based analog interface was fabricated in IBM 0.13 µm CMOS technology. A micro-photograph of the test-chip is shown in Fig. 6.18. The layout is shown in Fig. 6.19. A mixed domain MDO4104-6 Tektronix oscilloscope was used for the tests. It has a 1GHz bandwidth and a sampling rate of 5 GS/s. Passive probes (1 GHz bandwidth) were used. The active silicon area of the asynchronous $\Delta$ -modulator is 43 µm × 21 µm. The total silicon area of the analog interface, including the T-FF, the reconstruction low-pass filter, and the 1.2 V to 3.3 V converter is 43 µm × 46 µm. # 6.5.1 Measurement Results Fig. 6.20 shows the DC transfer characteristic curve of the ADM and low-pass filter based A/D and D/A converter-pair. A rail-to-rail 1 kHz sinusoidal input was applied at the input ( $v_{\rm in}$ in Fig. 6.14) and the corresponding output ( $v_{\rm out}$ in Fig. 6.14) was measured. The two variables Figure 6.18 Micro-photograph of the die (the die contained other circuits). are plotted in the X-Y mode of the oscilloscope. Fig. 6.21(a) shows the power spectrum density (PSD) of the output ( $v_{\text{out}}$ in Fig. 6.14) when the input is a 2.5 $V_{\text{P-P}}$ 1 MHz sinusoidal signal. The spurious free dynamic range (SFDR) shown in Fig. 6.21(a) is 54 dB. The signal power vs. SNR and SNDR curves for various input frequencies are shown in Fig. 6.21(b). Even though measurement results from the test-chip validated the concept of our ADM, some degradation in the measured SNDR compared to the post-layout simulated SNDR led us to conclude that there were some unexpected noise sources inside the test-bench. In the test-chip, the same power supply was used for the analog circuits (amplifier of Fig. 6.5) and the digital circuits (digital buffer of Fig. 6.5) due to a limitation on the number of pins in the test chip. Moreover, for testing and debugging purpose, the output of the ADM ( $v_{\text{out\_mod}}$ in Fig. 6.14) was connected to a pad in the test-chip through a digital output buffer. Thus, noise was generated into the power distribution network by the digital output buffer that drives the relatively large loads that represent an output pad and the test printed circuit board (PCB) capacitances. Even though on-chip de-coupling capacitors were used in the power distribution ring around the chip, none was used near the ADM or the amplifier. Thus, the generated noise affected the ADM and measured SNDR showed more degradation compared to the post-layout simulation. This assumption was also confirmed through post-layout simulations redone with an enriched electrical model comprising inductance and resistance to reflect power supply parasitics and the impact of the digital output buffer with equivalent loading capacitances of pad and test PCB. The resulting SNDR curve shown in Fig. 6.22 matched the measured SNDR curve with reasonable accuracy. Hence, it is claimed that the degradation of measured SNDR is mainly due to power supply coupling. In the target application, the external digital output of the ADM is not needed. Thus, such power supply noise will not be present in the actual operation. If a separate power supply was used for the digital output buffer (or the digital output buffer Figure 6.19 Layout of the ADM and LPF to reconstruct the input signal. Figure 6.20 Measured DC transfer characteristics from the test-chip ( $V_{DD}=3.3~V$ ) of the asynchronous $\Delta$ -modulator shown in Fig. 6.15. was not used in the test-chip), the measured SNDR would have shown improved performance for larger input swings that should have matched the high-level and post-layout simulation curves of Fig. 6.17. (a) Measured power spectrum density (PSD) of reconstructed signal for $V_{\text{P-P}}=80\%$ of $V_{\text{DD}}$ (Input frequency=1MHz). (b) SNR/SNDR versus input amplitude. Figure 6.21 Measured noise performances from the test-chip. # 6.5.2 Comparison with Other Published A/D Converters To compare performances, we use the classic figure of merit (FoM) [59], defined as FoM = $$\frac{P}{2^{\frac{\text{SNDR}-1.76}{6.02}} \times 2 \times BW}$$ with, $BW = \text{signal bandwidth}$ $P = \text{power consumption}$ (6.5) a. $0.7\,\mathrm{nH}$ and $18\,\mathrm{m}\Omega$ parasitics were added in series with the power supply. $2.8\,\mathrm{nH}$ and $18\,\mathrm{m}\Omega$ were added in series for the ground node. A digital output buffer with a loading capacitance equivalent to a pad and PCB parasitics was used in simulation to mimic the actual test-bench noise. Figure 6.22 Simulation of the proposed ADM in with *noisy power supply* to mimic the actual test-bench scenario. Fig. 6.23 presents two graphs that compare various low-pass $\Sigma\Delta$ modulators with the proposed ADM in terms of silicon area usage and FoM. Figure 6.23 Comparison between $\Sigma\Delta$ modulators and the proposed $\Delta$ -modulator on technology process node versus Area ( $\mu m^2$ ) and FoM (pJ). Table 6.2 Comparison with published compact over sampling A/D converter. | Ref. | This work | [29] TCAS 2009 | | [30] JSSC 2006 | | [31] TVLSI 2014 | [22] ISCAS 2012 | |-------------------------|--------------|----------------|--------------|----------------|--------------|---------------------|-----------------| | Technology (µm) | 0.13 | 0.1 | 8 | 0.18 | | 0.13 | 0.18 | | Filter order | 1st-order | 1st-order | 1st-order | 1st-order | 2nd-order | 1st-order | - | | Input type | Single-ended | Single-ended | Differential | Differential | Differential | Single-ended | Single-ended | | Supply (V) | 3.3 | 1.8 | 3 | 1.8 | | 0.25 | 1.8 | | Input $(V_{P-P})$ | 2.5 | 0.4 | 0.8 | 0.4 | | - | 1 V | | F <sub>S</sub> (MHz) | Asynchronous | 140 | 87 | Asynchronous | Asynchronous | Asynchronous | Frequency | | | | | | | | | Modulation | | BW (MHz) | 2 | 0.4 | 0.125 | 8 | 12 | $30 \; \mathrm{Hz}$ | 0.2 | | SFDR (dB) | 54 | 55 | 63 | 75 | 72 | - | - | | SNR (dB) | 57 | 49 | 60 | 70 | 70 | 62 | - | | SNDR (dB) | 47 | 42 | 54 | - | - | 58 | - | | Area (mm <sup>2</sup> ) | 0.00099 | 0.000375 | 0.00256 | 0.026 | 0.04 | 0.141 | 0.0036 | | Current (mA) | 0.15 | 0.263 | 0.433 | 0.8 | 1.2 | 0.000112 | - | | FoM (pJ) | 0.67 | 5.6 | 7.6 | 0.02 | 0.02 | 0.8 | - | Our ADM was compared in detail with three other published compact $\Sigma\Delta$ modulators (two asynchronous $\Sigma\Delta$ modulators and one clocked $\Sigma\Delta$ modulator) in Table 6.2. Degraded measured SNDR of our ASDM was used for the comparison in Table 6.2 even though post-layout simulation predicted an SNDR of $\approx$ 60 dB. Table 6.2 also includes the analog interface based on the frequency modulation of ring oscillator based VCOs that was introduced in [22] to make comparison with our ADM based solution. Even though ADMs cannot benefit from the oversampling ratio as effectively as ASDMs, they are simpler, require very compact area and support moderate signal bandwidth and medium-resolution. The $\Sigma\Delta$ modulator in [29] can support an input $V_{P-P}$ of only 0.4 V and 0.8 V for single-ended and differential implementation respectively. Even though it consumes a very small silicon area, it suffers from severe non-linearity. The non-linearity of the frontend voltage-to-time converter (VTC) of the $\Sigma\Delta$ modulator in [29] was a large contributor to its limited bandwidth and SNDR. The ASDM of [30] can support an input $V_{\text{P-P}}$ of only 0.4 V that is a small fraction of its power supply of 1.8 V. Though the first-order and second-order implementations in [30] can support an input signal bandwidth of up to 8 MHz and 12 MHz respectively, they both consume a comparatively large silicon area. Such large silicon areas cannot be accommodated in our target application. The ASDM of [31] is an ultra-low power implementation that can support an input signal bandwidth of only 30 Hz. The ring oscillator based VCOs and the corresponding PLL based demodulator reported in [22] can support an input bandwidth of only 200 kHz and input voltage range of only 0.6-1.6 V for a $V_{\rm DD}$ of 1.8 V. On the other hand, our proposed ADM can support an input $V_{P-P}$ of up to 2.7 V (80%) of $V_{\rm DD}$ ) with signal bandwidth of up to 2 MHz for power supply of 3.3 V. #### 6.6 Conclusion An A/D converter based on asynchronous $\Delta$ -modulator (ADM) has been proposed. It is designed to support reconfigurable routing of analog signal through digital interconnection networks. Such networks are found in FPGAs, field programmable interconnect networks and an electronic system prototyping platform (WaferBoard) previously published by some of the authors. A prototype was fabricated in a 0.13 µm CMOS technology. It occupies a total area of 45 µm × 46 µm. Measurement results showed that the proposed ADM can support input signal bandwidth of 2 MHz and achieves measured SNR, SNDR and SFDR of 57, 47, and 54 dB respectively. The fabricated test chip showed that the SNDR of our proposed circuit is sensitive to power supply coupling. Different power supplies for analog and digital circuits combined with careful power decoupling are therefore required to match the expected SNDR of $\approx$ 60 dB predicted by post-layout simulations. The prototype was designed according to the stringent constraints (limited silicon area and bandwidth of the transmission channel) of the WaferBoard. Without those constraints, a prototype with higher bandwidth and SNDR could have been designed. Being an "amplitude-to-time converter", the proposed ADM can benefit from higher intrinsic speed of transistors in scaled CMOS technologies. # Acknowledgments This research was partly supported by Gestion Technocap, the Natural Sciences and Engineering Reseach Council of Canada and by the Mitacs program. The authors would like to acknowledge CMC Microsystems for the products and services that facilitated this research (CAD tools by Cadence, fabrication services using 0.13 µm CMOS technology from IBM, and packaging services). This work was partly done while one of the authors was a guest professor at COMELEC-Telecom ParisTech. #### Additional Measurement Results Following the jury's comments, this section aims to provide additional measurement results that could not be put in the journal paper due to length restrictions. A sinusoidal input with a frequency of 500 kHz, 1 MHz, and 2 MHz was applied and the output signal (reconstructed input) was observed, shown in Fig. 6.24. It can be seen that for 1 MHz and 2 MHz, the reconstructed signal begins to get attenuated. The attenuation is due to the passive low pass filter (LPF) at the receiving side (shown in Fig. 6.14). A second order low pass filter (Fig. 6.16) was used to effectively filter out high frequency components, which inevitably attenuated the signal band. It should be kept in mind that in conventional $\Sigma\Delta$ modulation the high frequency components are filtered by digital filtering. As digital filtering was not possible in our target application, passive low pass filter was used. From the spectrum of reconstructed signal, it can be seen that even though the 500 KHz and 1 MHz components suffers from attenuation, the 0 Hz (DC component) components are not attenuated in any cases. Figure 6.24 Measurement result for input frequency of $500\,\mathrm{kHz},\,1\,\mathrm{MHz},\,\mathrm{and}\,2\,\mathrm{MHz}$ from the test-chip. # CHAPTER 7 ARTICLE 4: A NOVEL SPATIALLY CONFIGURABLE DIFFERENTIAL INTERFACE FOR AN ELECTRONIC SYSTEM PROTOTYPING PLATFORM ### Summary of the Chapter The differential interface was originally developed by Olivier Valorge, a postdoctoral fellow at Polytechnique Montréal. Alternative implementation(s) of the original concept have been investigated by the author of this thesis to find a more cost effective solution that meets the constraints of the WaferBoard [1]. A test-chip has been fabricated by Olivier Valorge to prototype the differential interface. Measurements on the test-chip are reported by the author of this thesis in a submitted journal article that shows that the spatially configurable differential interface can operate at a speed of up to 2.5 Gbps. The submitted paper is reproduced in this chapter. # Title: A Novel Spatially Configurable Differential Interface for an Electronic System Prototyping Platform Wasim Hussain, Olivier Valorge, Yves Blaquière, Yvon Savaria. Manuscript submitted on May 20, 2015 to *Integration*, the VLSI Journal. #### Abstract This paper presents complete and detailed circuit design and the first experimental validation of a previously proposed spatially configurable differential interface that was designed to support current mode logic (CML) on a reconfigurable electronic system prototyping platform. The physical and electrical constraints of CML interfaces are described, and an architecture is proposed for transmitting differential signals between two different integrated circuits (ICs) deposited on the prototyping platform surface. The proposed implementation has been validated in a test-chip using a mature 0.18 µm CMOS technology. Measurements on the test-chip show that the spatially configurable differential interface can operate at a speed of up to 2.5 Gbps. #### Keywords Reprogrammable Circuit Board, Wafer Scale Integration (WSI), Differential Signaling, Current Mode Logic. #### 7.1 Introduction In today's high-end electronic systems, higher complexity integrated circuits are being put together to provide as much performance and features as possible in one single product. This complexity is posing many challenges in different stages of product development, which are exacerbated by the short time-to-market imposed on the developers due to the competitive nature of the industry. Simulation platforms and design flows that are used to validate integrated circuits during development stage are quite mature. Hardware emulation platforms, such as the ones based on FPGAs [17,75,92] and ASICs [7,74] can support very complex integrated circuits. Nevertheless, there is no commercially available automated prototyping and testing platform for electronic systems built with integrated circuit (IC) components like microprocessors, ASICs, memories and FPGAs. Printed circuit boards (PCB) are still essentially the only technology for prototyping such systems, but design and manufacturing of complex PCBs can take from several weeks to months. Most of the electronic systems require software in addition to the hardware itself. The sooner a working hardware prototype can be provided for the software team to work on, the faster the overall product development can proceed. Current trends for technologically and economically viable reconfigurable system solutions include a variable combination of FPGAs and other kinds of programmable logic, applicationspecific instruction set processors (ASIPs), and systems implemented with coarse-grained reconfigurable hardware (different from ASIPs) [93]. An active reconfigurable board, called the WaferBoard, has been proposed in [1]. The reconfigurable board is intended to be a multi-purpose prototyping platform, which provides programmable interconnections among multiple user ICs (uICs) like ASICs, memories and FPGAs. The WaferBoard is designed to support as many types of ICs and signal interfaces as possible. One such signal interface is differential signalling, widely used in high speed data transmission. Standards, currently in use for differential signalling, include for instance LVDS (low voltage differential signalling), LVPECL (low voltage positive emitter-coupled logic), CML (current mode logic), and HSTL (high-speed transceiver logic) [21]. The example in Fig. 7.1(a) shows a basic MOS CML buffer. It includes two pull-up resistors $R_{\rm D}$ , two nMOS transistors for switching and a current source $I_{\text{TAIL}}$ . The voltage swing is generated by switching the current in a common-source differential pair. Since the nMOS transistors are always saturated and there are no pMOS transistors, inputs and outputs based on these circuits can operate at more than 3 Gbps, which is faster than the typical maximum speed of CMOS logic implemented with devices of comparable size driving similar loads [14]. A solution to propagate differential signals on a wafer has been proposed in [94]. Unfortunately such approaches do not offer spatial reconfi- Figure 7.1 CML structure. guration and thus are incompatible with reconfigurable systems such as the WaferBoard or FPGAs. This paper presents a complete and detailed circuit design and the first experimental validation of a spatially configurable CML interface originally introduced but not experimentally validated in [3,95]. The circuit design was implemented using a standard CMOS process that is fully compatible with the WaferBoard platform and which could be adapted and used in any integrated circuit with programmable I/Os such as FPGAs. Besides the research conducted by our team, that concept remains unexplored in the literature. The focus of our research was not to support only CML, but in addition to supporting conventional CMOS I/Os, to develop a means to support spatially configurable propagation paths for differential-to-single-ended conversion that can be later enhanced to accommodate other differential signaling standards in the WaferBoard or in integrated circuits with programmable I/Os. We chose CML as a representative differential signaling technique for the prototype test-chip that was designed, fabricated and tested and for which conclusive experimental results are reported for the first time in this paper. CML was chosen because of its popularity and simplicity. A prototype test-chip, that was fabricated and tested, demonstrates the feasibility of the proposed interface that could support differential signalling in the prototyping platform. Section II describes the specifications for compatibility with the WaferBoard, as well as the electrical and physical constraints imposed by differential interfaces. Section III describes the differential interface architecture and its complete and detailed circuit design. Section IV reports measured results from the test-chip implemented using a 0.18 µm CMOS technology. Finally, we conclude in Section V by summarizing our main results. ### 7.2 Background # 7.2.1 Compatibility with WaferBoard, a Prototyping Platform for Electronic Systems The core of the WaferBoard platform upon which uICs are to be deposited is called the WaferIC<sup>TM</sup>. Its surface has a dense array of very fine (tens of microns) conducting pads, called NanoPads. Each NanoPad is connected to an internal wafer-scale interconnect network, called WaferNet<sup>TM</sup>, that can be configured to connect a NanoPad to any other NanoPad, without any conflicts among large sets of connections. Whatever the position and location of the uICs are on the WaferIC, the NanoPads are able to make contact with their solder ball pins. So hand placement of uIC is sufficient as shown in Fig. 7.2. When a uIC pin (solder ball) makes contact with several NanoPads, the WaferIC detects and maps contacted pins, and the WaferNet is then automatically configured according to connected NanoPads and the user netlist [1]. According to the spirit of the WaferBoard, supporting differential interfaces means that any two complementary pins, whatever their positions on WaferIC, could be declared as being part of a differential interface. However, in practice, the spacing between two such pins, used to propagate a differential signal, rarely exceeds $2 \,\mathrm{mm}$ [66]. Thus, any pair of pins propagating a differential signal can be arbitrarily positioned in an oriented window of $2 \,\mathrm{mm} \times 2 \,\mathrm{mm}$ , as shown in Fig. 7.3. The NanoPad density is about $64 \,(8 \times 8)$ NanoPads/mm² [1]. Mapping the $2 \,\mathrm{mm} \times 2 \,\mathrm{mm}$ area onto the WaferIC fabric means that the differential interface has to be able to drive or receive signals from any NanoPad within an array of $16 \times 16$ NanoPads. However, because of the interface's spatial configurability, such an interface is only capable of selecting any two pins from an area of $1 \,\mathrm{mm} \times 1 \,\mathrm{mm}$ , in the worst case scenario (Explained in Sec. 8.2.1). # 7.2.2 Physical and Electrical Constraints The configurable differential interface has to meet several electrical and physical constraints to support CML differential signalling. First, for proper signal integrity, the two differential signals must retain their symmetry as they propagate through the interface from the source uIC to the destination uIC. This symmetry is dependent on the path taken by the two signals through the proposed interface. Asymmetry in propagation path could induce jitter or phase difference between the signals in a differential signal pair. This can lead to errors in the transmitted information. Very stringent jitter constraints exist for most high-speed interfaces. For example, in the PCIe transmission protocol, 30 % of the bit length is the Figure 7.2 Conceptual overview of the active reconfigurable platform. Figure 7.3 Differential pins of user's ICs interfacing with the NanoPad array of the WaferIC (zoom of Fig. 7.2(a)) [3]. maximum allowed jitter [67], which represents a maximum jitter of 120 ps for a data rate of 2.5 Gbps. Such very short propagation time difference can be caused by slight length or load asymmetry between the two signal paths. Secondly, during propagation of high frequency differential signals, dimensions of long PCB traces becomes comparable to the signal wavelength. In such cases, the PCB trace can no longer be modelled with lumped parameter circuit elements. Instead, it starts behaving as a transmission line, and the voltage and current across it show wave propagation behaviour. As a result, reflection at the receiving end and attenuation becomes prominent in the signal characteristics. To avoid such phenomena, impedance matching is typically done in every stage of a transmission path [68]. In our intended application, *i.e.* the prototyping platform, there are no long PCB traces for propagating differential signals. Very much like in FPGA programmable interconnects, internal signals in the WaferIC can be propagated from any NanoPad to any others through the WaferNet, a configurable interconnect single-ended network that uses dispersive integrated wires. In that network, long interconnects are segmented by inserting repeaters at regular intervals, a standard technique for managing signal integrity and for limiting delays. The long integrated interconnects get broken down into chains of short segments connected by repeaters. Thus, even though rail-to-rail single-ended signaling is used in the WaferNet, impedance matching is not an issue there, very much as in interconnects found in FPGAs. Conventional CML output drivers (Fig. 7.1(b)) typically consist of an open-drain differential pair and a voltage-controlled current source. Thus, outputs ( $v_{\text{OUT+}}$ and $v_{\text{OUT-}}$ ) require load resistors connected to $V_{\text{DD}}$ for pull-up since the nMOS transistors are only used to drive the falling edges and some mechanism must be provided to drive rising edges. This is often implemented as load resistors provided externally, *i.e.* on the PCB, where these resistors should be placed as close to the pins as possible. The load resistors commonly used in CML signalling are $50\,\Omega$ . Thus, any configurable differential interface must have the capability to provide such load resistors to CML I/Os. On the other hand, typical input CML stages consist of differential pairs implemented with nMOS transistors (Fig. 7.1(c)). In the target environment, external CML output drivers connected to our input stages will have direct physical contact with NanoPads. Thus, each NanoPad comprises a configurable pull-up pMOS that acts as a $50\,\Omega$ active resistance. The pull-up pMOS, available in each NanoPad, acts as the pull-up resistor when the corresponding NanoPad is configured as one of the input NanoPads of the differential interface. Input stage for CML may [65] or may not [96] have internal pull-up resistors. If internal pull-up or biasing is not included in external CML input stage, the aforementioned embedded pull-up pMOS in each NanoPad can be used as a pull-up resistor when the corresponding NanoPad is configured as output NanoPad of the differential interface and connected with the external CML input stage. # 7.3 Proposed Architecture and Circuit Implementation of the Differential Interface The architecture of the proposed differential interface is shown in Fig. 7.4 [3]. The WaferNet is a configurable network that can only propagate "individual" digital single ended signals and a constraint of this work was to keep the WaferNet untouched. Thus, differential signals are converted to single ended signals in order to propagate the information through this WaferNet and hence, differential signaling could be supported by the WaferIC without modifying the WaferNet, with only few analog blocks added in the NanoPads. Thus, the input differential network receives the complementary differential signal from the uIC and converts it into a single ended signal before injecting it into the WaferNet. The input All the NanoPads "belonging" to one differential interface unit. Only two among those provide differential signal to the interface for propagation into the WaferNet<sup>TM</sup> Figure 7.4 Architecture of the embedded differential propagation chain [3]. Figure 7.5 The input differential configurable network. differential network has a differential-to-single-ended converter. However, before the conversion, it has to be ensured that the two differential signals reach the converter without excessive phase difference. In other words, the signal paths from the uIC pins to the input of the differential-to-single-ended converter must be adequately "matched" for all possible locations of the uIC on WaferIC. To balance all interconnections and propagate fast signals, an H-tree structure with multiple hierarchical levels is proposed as shown in Fig. 7.5(a) and 7.5(b). The area "covered" by one H-tree determines the area over which differential signals are supported. A cell-based hierarchical approach is used to simplify the physical implementation of (a) Tiling of differential interface (b) Tiling with overlap of adjacent interunits without overlap cannot support faces allows spatial reconfigurability. spatial reconfigurability. Figure 7.6 Tiling of differential interface unit. such a complex structure. The H-tree can be configured to select any two NanoPads, belonging to the declared differential interface unit, to receive the differential signal from uIC and propagate the signal to the differential-to-single-ended converter. Multiplexers of each stage of the H-tree structure can be configured in high-Z mode during standby to limit their power consumption. These multiplexers are cascaded and connected with regular and symmetrical metal interconnections to balance all propagation paths from the inputs to the differential-to-single-ended converter. The analog multiplexer acts as unity-gain buffer between the CML input signals from the uICs and the inputs of the differential-to-single-ended converter in the range of 1.2-1.8 V. As uICs could be placed anywhere over the WaferIC, static tiling (without overlap) of the differential interface unit will not ensure that the two complementary pins will always fall within the area of one differential interface unit. Fig. 7.6(a) depicts the scenario where each input differential configurable network are tiled without overlap. If the two NanoPads are in position A and B, then there is no problem. But if they are in position B and C then they cannot be selected by one differential interface unit. To ensure that two complementary pins fall within the area of one differential interface unit, there must be an overlap between two adjacent interface units as shown in Fig. 7.6(b). If the two relevant NanoPads are in position A and B (in Fig. 7.6(b)), then interface 1 is configured. However, if the NanoPads are in position B and C (in Fig. 7.6(b)) then interface 2 is configured. The overlapped continuous floor-plan of the entire architecture is shown in Fig. 7.7. Figure 7.7 Continuous floor plan of the architecture with overlap between adjacent interface units. Each of the two shaded rectangles can be configured as a differential interface unit. # 7.3.1 Propagation Network : WaferNet<sup>TM</sup> WaferNet is an array of multiplexer-based crossbars and interconnects, that allows routing signals in different directions across the WaferIC. The WaferNet has been experimentally validated and tested in [97]. In addition to multiplexer-based crossbars, tristate buffer-based crossbars have also been considered for implementation [73,98]. Whichever type of crossbar is used for the implementation of WaferNet, the proposed differential interface would still be compatible with it as long as the interconnection network is based on single-ended signals at both its input and output. # 7.3.2 Input Stage From the perspective of a uIC transmitting a differential signal, the terminal nodes are the contacted NanoPads. Thus, these NanoPads must provide the required $50\,\Omega$ pull-up resistor to the CML output driver of the uIC. In our proposed architecture, configurable $50\,\Omega$ pull-up resistances are embedded in each NanoPad to meet CML buffer's pullup resistor constraints. These $50\,\Omega$ resistors are made with a pull-up pMOS that can be configured either in $50\,\Omega$ or high impedance modes. The input common-mode voltage level, in the prototype test-chip implemented with a $0.18\,\mu$ m CMOS technology, was designed to be in the range of $1.5\,\mathrm{V}$ [96]. That common mode voltage could be adjusted to other values if the design is re-implemented to support other differential signaling standards. # 7.3.3 H-Tree Input Differential Network The first stage of the 4 level H-tree is only a 4-to-1 analog multiplexer that propagates or not an analog signal from the NanoPads to the input of stage-2, depending on the external configuration, as shown in Fig. 7.8(a). Stage-2 is composed of a 4-to-1 analog multiplexer (Fig. 7.8(b)) and a configurable differential-to-single-ended converter (Fig. 7.9(a)). Based on the external configuration, the stage-2 analog multiplexer propagates the signal from stage-1 to the input of stage-3. However, if the two NanoPads fall within one stage-2 then the differential-to-single-ended conversion will occur in that stage and is then injected into the WaferNet. The configurable differential-to-single-ended converter of the stage-2 can select from only diagonally located stage-1 multiplexers. It does not have to select spatially adjacent pairs, because that would imply the two complementary signal pins to be distant from each other by less than 0.5 mm. Once converted, the single ended signal is directly sent to the WaferNet. Each stage-3 mux-demux consists of a 4-to-1 analog multiplexer followed by a 1-to-4 analog de-multiplexer (Fig. 7.8(c)) and a configurable differential-to-single-ended converter (7.9(b)). As in stage-2, if the two NanoPads fall within one stage-3 then the differential-to-single-ended conversion will occur in that stage and is then injected into the WaferNet. However, if the two selected NanoPads fall in two different stage-3 then the differential-to-single-ended conversion will occur in stage-4 and is then injected into the WaferNet. The stage-3 mux-demux allows propagating or not analog signals from stages-2 to the inputs of stage-4. The ability of stage-3 to propagate the signal from stage-2 to four stage-4 allows the configurable network to define adjacent interface units with overlap in area. Thus, the differential interface unit can slide with a step of half a window. Therefore, multiple input trees can be configured in shifted overlapping windows. In such a case, any two pins can be made to fall inside the area of one interface. Unlike its counterpart in stage-2, the configurable differential-to-single-ended converter of the stage-3 can select all six possible pairs ( $^4C_2$ ) from its 4 previous stages. Finally, stage-4 has only the configurable differential-to-single-ended converter (Fig. 7.9(b)) without any analog multiplexers. Note that in general, two pins cannot fall within one stage-1 because each stage-1 consists of an array of $2 \times 2$ NanoPads and has an approximate dimension of $280 \,\mu\text{m} \times 280 \,\mu\text{m}$ [1], whereas conventional IC packages have a minimum pitch (distance between two adjacent pins) of $0.5 \,\text{mm}$ [99]. An array of $4 \times 4$ NanoPads has a dimension of $550 \,\mu\text{m} \times 550 \,\mu\text{m}$ [1] (a) Analog multiplexer of stage-1. Figure 7.8 Schematic of the analog multiplexers. and each stage-2 covers an array of $4 \times 4$ NanoPads. Thus, two pins can fall within the area covered by one stage-2, one stage-3, or one stage-4, but not within one stage-1. This input H-tree can select any two NanoPads (or uIC pins) as long as the distance between the pins is between $0.5\,\mathrm{mm}$ and $1\,\mathrm{mm}$ . Even though the differential interface area is $2\,\mathrm{mm} \times 2\,\mathrm{mm}$ , it cannot select two pins with distance of $2\,\mathrm{mm}$ in all possible scenarios. It can select two pins with distance between $1\,\mathrm{mm}$ and $2\,\mathrm{mm}$ in some cases. Indeed, if such two pins are positioned on two sides of any overlap, then no single differential interface unit can be configured to include both pins or NanoPads. Position A and C of Fig. 7.7 can correspond to such a scenario where the distance between the two positions can be less than $2\,\mathrm{mm}$ but still not fall under the area of one single interface. On the other hand, positions B and D of Fig. 7.7 falls under one single interface even though the distance between positions B and D is the same as that of positions A and C. At this point, we would like to point out that even though directly applying the length matching technique might seem an easier solution than the proposed H-tree based network, (a) Differential-to-single-ended converter of stage-2. (b) Differential-to-single-ended converter of stage-3 and stage-4. Figure 7.9 Configurable differential-to-single-ended converter (the multiplexers in these two figures are digital multiplexers). we still would have needed 256-to-1 multiplexers at the center of each interface unit. Such large fan-in multiplexers are usually built as hierarchical trees of smaller fan-in multiplexers, which is a standard technique for managing large fan-in multiplexers. Thus, using an H-tree physical layout for such large fan-in multiplexers did not add extra complexity. Another important point to be noted is that laying out the 4-to-1 multiplexers in an H-tree structure allowed us to have overlap (Fig. 7.7) between adjacent interface units by allowing the output of stage-3 4-to-4 mux-demux to be connected to *four* differential-to-single-ended converters (stage-4) instead of *one*, while still maintaining hierarchical trace matching in all four cases. Thus, applying length matching alone would not have allowed us to have overlap and hierarchical trace matching simultaneously. #### 7.3.4 Output Differential Network An output stage is required in the differential interface to provide complementary signals to the uIC. After the single ended signal is propagated through the WaferNet, single-ended-to-differential conversion has to be done before transmitting the signals to the two NanoPads contacted by the two complementary uIC pins. The single-ended signal propagates along two paths routed to two output NanoPads through a configurable output network. The floor plan Figure 7.10 Schematic of the interface output circuit in each NanoPad. of the output network is similar to the input H-tree structure as shown in Fig. 7.5, except that in this case the signal propagates in the opposite direction. Each stage of the output H-tree is configurable and can propagate or not a single ended digital signal to the 4 subsequent stages. In order to decrease the network complexity and its power consumption, the configurable output H-tree is single-ended. The network stages are made with digital standard cells. One of the two single ended signals, upon reaching the destination NanoPad that drives one side of the differential pair, is inverted based on the polarity of the connected uIC differential pin. Each NanoPad includes a 12 mA current source $M_{N1(N3)}$ , a switch to control the current source $M_{N2(N4)}$ , and an active load $M_{P1(P2)}$ shown in Fig. 7.10. The 12 mA current source induces an output voltage swing of 600 mV when switched ON. #### 7.4 Measured Results A test-chip implementing the spatially configurable differential interface, shown in Fig. 7.11, was fabricated in a mature $0.18\,\mu m$ CMOS technology. In the test-chip, a 3-level H-Tree was implemented, fabricated and it occupies a silicon area of $520\,000\,\mu m^2$ . Post-layout simulations showed that the interface can support data rate up to $2.5\,\mathrm{Gbps}$ under typical process corner (Table 7.1). Isolated instances of the stage-1 and stage-2 multiplexers were implemented in the test chip to measure their characteristics. The DC transfer characteristics of stage-1, stage-2, and stage-1 & 2 (in cascade) analog multiplexers are shown in Fig. 7.12. The prototype test chip was designed for input common-mode of 1.0-2.0 V and close linearity in this range is shown in - (a) Layout of the test chip [3]. - (b) Micro-photograph of the test chip. Figure 7.11 Test-chip. Table 7.1 Characteristics of the differential interface from postlayout simulation. | Corner | wc a | typ b | $bc^c$ | |--------------------------|------|-------|--------| | Maximum data rate (Gbps) | 1.8 | 2.5 | 3.1 | | Current (mA) | 22.1 | 28.7 | 33.4 | - a. Worst case: Temp.=125°C, slow nMOS, slow pMOS - b. Typical: Temp.=25°C, typical nMOS, typical pMOS - c. Best case: Temp.=-40°C, fast nMOS, fast pMOS Figure 7.12 Measured DC transfer characteristics ( $V_{DD} = 3.3 \text{ V}$ ) of the analog multiplexers shown in Fig. 7.8. | | | Silicon<br>area ( µm²)<br>/unit | Number of instances | Required silicon area ( µm²) | | |-------------------|-----------|---------------------------------------------------|-----------------------------|------------------------------|--| | | Stage-1 | $900 (30 \mu m \times 30 \mu m)$ | $64 \ (8 \times 8)$ | 57 600 | | | Mux-demux in | Stage-2 | $1750 (50 \mu \text{m} \times 35 \mu \text{m})$ | $16 \ (4 \times 4)$ | 28 000 | | | each stage | Stage-3 | $1225 (35 \mu \text{m} \times 35 \mu \text{m})$ | $4 (2 \times 2)$ | 4 900 | | | | Stage-4 | ge-4 N/A | | | | | Differential-to- | Stage-1 | N/A | | | | | single-ended | Stage-2 | $1280 (40 \mu \text{m} \times 32 \mu \text{m})$ | $32\ ((4\times4)\times2)$ | 40 960 | | | converter in | Stage-3 | $1280 (40 \mu m \times 32 \mu m)$ | $24\ ((2\times2)\times6)$ | 30 720 | | | each stage | Stage-4 | $1280 (40 \mu \text{m} \times 32 \mu \text{m})$ | $4 ((2 \times 2) \times 1)$ | 5 120 | | | Total | 167 300 | | | | | | Total area co | 5 017 600 | | | | | | Area usage in the | 3.3% | | | | | Table 7.2 Areas of the differential interface and their stages for one four-stages interface unit. the DC transfer characteristics, which is sufficient for differential signaling. When integrated with a full-fledged WaferIC, the proposed differential architecture would take less than 3.3% of the total area, with a silicon area of $5\,017\,600\,\mu\text{m}^2$ for one interface unit (Table 7.2). The silicon area was calculated by assuming continuous floor planning of adjacent "overlapped interface units" with four stage-4. A real-time Infiniium 90000A Agilent oscilloscope, having a bandwidth of $12\,\text{GHz}$ , was used for test-chip measurements. Active probe 1169A ( $12\,\text{GHz}$ bandwidth also) were used, which can be directly put at the chip output to avoid PCB parasitics. Measurements on the test-chip also showed that the differential interface can support data speeds of up to $2.5\,\text{Gbps}$ . Eye diagrams measured at the output NanoPad for $1.25\,\text{Gbps}$ , $2.0\,\text{Gbps}$ , and $2.5\,\text{Gbps}$ are shown in Fig. 7.13. # 7.5 Conclusion This paper presented a complete and detailed circuit level implementation and the first experimental validation of a previously proposed spatially configurable differential interfacing architecture to support CML signalling in the WaferBoard. Complementary pins of differential signalling can be detected over a maximum area of $2\,\mathrm{mm}\times2\,\mathrm{mm}$ ( $1\,\mathrm{mm}\times1\,\mathrm{mm}$ in the worst case scenario). The interface utilizes configurable H-tree structures for balanced input and output differential signal propagation. It also includes configurable $50\,\Omega$ load that is compliant with standard CML interfaces. The entire interface unit consumes less than $3.3\,\%$ of the total silicon area when implemented with the WaferIC. It can support data rates of (a) 1.25 Gbps (Time-Base=100 ps/unit, Voltage Scale=70 mV/unit). (b) 2.0 Gbps (TimeBase= $100 \, \mathrm{ps/unit}$ , Vol- (c) 2.5 Gbps (TimeBase= $80 \, \mathrm{ps/unit}$ , Voltage Scale= $70 \, \mathrm{mV/unit}$ ). Figure 7.13 Measured eye diagrams at different data rates from test-chip. up to 2.5 Gbps with 200 mV of voltage swing under typical conditions compatible with PCIe specifications. Finally, the concept explored in this paper could be applied to any integrated circuit requiring spatial reconfiguration of a differential interface, such as FPGAs and CPLDs. #### Acknowledgments This research was partly supported by Gestion Technocap, the Natural Sciences and Engineering Research Council of Canada and by the Mitacs program. The authors would like to acknowledge CMC Microsystems for the products and services that facilitated this research. This work was partly done while one of the authors was a guest professor at COMELEC-Telecom ParisTech. # CHAPTER 8 PASS-TRANSISTOR MULTIPLEXER BASED DIFFERENTIAL INPUT STAGE This chapter presents a pass transistor multiplexer based differential input stage for the spatially configurable differential interface elaborated in Chapter 7. This differential input stage was also developed according to CML differential signalling specifications. The input stage is implementable in standard CMOS process and fully compatible with the WaferBoard platform. Sec. 8.1 describes the limitations of unity-gain buffer multiplexer based input stage that was elaborated in Chapter 7. Sec. 8.2 describes and elaborates the pass transistor multiplexer based input stage and Sec. 8.3 presents the simulation results to validate the concept. Finally, Sec. 8.4 summarizes the contribution. # 8.1 Differential Interface Based on Unity-Gain Buffer Multiplexer Unity-gain buffer based multiplexers were used in the input stage of the spatially reconfigurable differential interface in Chapter 7 and [72]. The multiplexers were cascaded and connected in an H-tree structure with regular and symmetrical metal interconnections to balance all possible paths from the CIOs/NanoPads to the differential-to-single-ended converter. The rationale behind using unity-gain buffer based multiplexer in that differential interface was to avoid signal attenuation as it propagates through each successive stage to the differential-to-single ended converter. Each unity-gain buffer based multiplexer consisted of wide nMOS and pMOS that occupied large silicon area. The entire signal propagation from the CIOs/NanoPads to the input of the differential-to-single-ended converter occurred in terms of voltage. Thus, voltage mode sensing was used in the differential-to-single ended converter. A differential pair with single-ended output was used as the differential-to-single ended converter in Chapter 7 and [72], where the propagated differential input voltage was applied at the gates of nMOS/pMOS. However, voltage-mode sensing entailed a drawback. The active reconfigurable platform used thick-oxide I/O FETs for the configurable I/O so that it can support ICs operating on a wide range of power supply voltages and the embedded FPIN is to be implemented with thin-oxide FETs (operating on a lower power supply) to leverage their high speed. The differential-to-single ended converter operated on the same power supply as the FPIN. It could not have supported much higher input common-mode voltage than the supply voltage of the differential-to-single ended converter (or FPIN) because the voltage input was applied at the gates of transistors. Using a current-mode sensing could allow to support higher common-mode input voltage than the power supply of the FPIN, as will be proposed in this chapter. # 8.2 Differential Input Stage based on Pass-Transistor Multiplexer An input stage based on pass transistor based multiplexers is described in this section that can be integrated with the remaining parts of the entire differential interface architecture (shown in Fig. 7.4). Similar to the unity-gain buffer multiplexer based input stage (of Chapter 7 and [72]), this input stage utilizes an H-tree structure with multiple hierarchical levels to match the differential signal paths from the uIC pins (NanoPads/CIOs) to the input of the differential-to-single-ended converter for all possible locations of the NanoPads/CIOs. The H-tree can be configured to select any two NanoPads/CIOs, belonging to the declared differential interface unit, to receive the differential signals from uICs and propagate the signals to the differential-to-single-ended converter. The pass transistor based multiplexers act as "low impedance paths" for the signals from the uICs (NanoPads/CIOs) to the inputs of the differential-to-single-ended converter. # 8.2.1 H-Tree Input Differential Network The input stage is to have two H-trees that can propagate signals from any two Nano-Pads/CIOs to the inputs of the differential-to-single-ended converter. However, it is not required to have two sets of multiplexers in each stage. Two pins cannot fall within one stage-1 because each stage-1 consists of an array of $2 \times 2$ CIOs and has an approximate dimension of $280 \,\mu\text{m} \times 280 \,\mu\text{m}$ [1], whereas conventional IC packages have a minimum pitch (distance between two adjacent pins) of $0.5 \,\text{mm}$ [99]. An array of $4 \times 4$ CIOs has a dimension of $560 \,\mu\text{m} \times 560 \,\mu\text{m}$ [1] and each stage-2 covers an array of $4 \times 4$ CIOs. Thus, two pins can indeed fall within the area covered by one stage-2 (or by one stage-3 or one stage-4). Thus, stage-2, 3 & 4 have two multiplexers as shown in Fig. 8.1(a). Each stage of the 4-level H-tree consists of pass transistor based 4-to-1 analog multiplexer shown in Fig. 8.1(c) that propagates (or not) the signal from the previous stage (or CIOs) to the next stage, as shown in Fig. 8.1(b). The output of stage-4 multiplexer goes to the input of the differential-to-single-ended converter. It should be noted that stage-1, 2, 3 & 4 of the H-tree shown in Fig. 7.5(a) of Chapter 7 had only one multiplexer and hence Fig. 8.1(a) is slightly different than Fig. 7.5(a). From the perspective of an uIC transmitting a differential signal, the terminal nodes are From the perspective of an uIC transmitting a differential signal, the terminal nodes are the contacted NanoPads/CIOs. The full-fledged differential interface that was proposed in Chapter 7 and [72] includes embedded configurable $50 \Omega$ pull-up pMOS in each NanoPad/CIO to meet external CML buffer's pull-up resistor requirement. The proposed pass transistor Figure 8.1 Architecture and floor plan of the pass transistor multiplexer based differential input stage. multiplexer based input stage (elaborated in this Chapter) is compatible with such pull-up pMOS. As uICs could be placed anywhere over the WaferIC, static tiling (without overlap) of the differential interface unit cannot ensure that the two complementary pins will always "fall" within the area of one differential interface unit. Thus, a continuous floor plan with "overlap" between adjacent interface units as shown in Fig. 7.7 was also utilized in the proposed input stage based on pass transistor multiplexers. Each stage of the pass transistor based multiplexers behaves as a RC-filter and do not provide any amplification in terms of voltage/currents as shown in Fig. 8.2. Thus, the signal is attenuated as it propagates from one stage to the next. However, acting as a passive element, each stage of the pass transistor based multiplexers does not add any significant offset (verified by extensive monte carlo simulation results that is presented in Sec. 8.3) due to mismatch of the pass-transistors to the signal as it propagates. Thus, the two differential signals at the outputs of stage-4 multiplexers remain almost 180° out of phase. A differential amplifier with differential output can be used at stage-4 to amplify the attenuated differential signal before being propagated to the differential-to-single-ended converter or the FPIN. As stage-4 has the lowest density among all stages, having an extra amplifier in stage-4 negligibly affects the Figure 8.2 Pass-transistor based model. Figure 8.3 Differential amplifier with differential output. total silicon area. A differential amplifier with differential output, used in stage-4, is shown in Fig. 8.3. It amplifies the attenuated signal propagated through the pass transistor multiplexers. Fig. 8.4 shows the transistor-level schematic of the pass transistor multiplexer based input stage and the differential amplifier with differential output. Each unity-gain buffer based multiplexer of Chapter 7 consisted of wide nMOS and pMOS that occupied large silicon area. Compared to unity-gain buffer based multiplexer, pass transistor based multiplexers occupied <sup>1</sup>/<sub>20</sub> th silicon area (a comparison based on available layout is presented in Sec. 8.3). Moreover, the cumulative attenuation can be compensated by an amplifier in stage-4. The output of this amplifier in stage-4 can be propagated to the differential-to-single-ended converter as it was done in Fig. 7.5(a) of Chapter 7. # 8.2.2 Differential-to-Single Ended Converter As pass transistor based multiplexers act as "low impedance path" for the signals from the uICs (NanoPads/CIOs), current-mode sensing can be utilized by the differential-to-single Figure 8.4 Schematic of the differential input stage and differential amplifier with differential output in stage-4. Figure 8.5 Proposed differential-to-single ended converter. ended converter. A current-mode sensing differential-to-single ended converter is shown in Fig. 8.5. The supply voltage of the differential-to-single ended converter is same as the embedded FPIN, i.e the n-Well of $M_{3,4}$ is tied to the power supply of the embedded FPIN. $M_{3,4}$ act as common-gate amplifiers to provide a low input impedance and high output impedance. $M_{1,2}$ act as active current mirrors to convert the differential input current into single ended output voltage. Current mode sensing allowed the differential-to-single ended converter to support a higher and wider common-mode voltage at its input (source of $M_{3,4}$ ) compared to a voltage-mode sensing differential-to-single ended converter such as a differential pair (where the input is applied at the gates of nMOS/pMOS) with single-ended output. Fig. 8.6 shows the transistor-level schematic of the input stage with the differential-to-single ended converter at stage-4. It should be noted that the input stage shown in Fig. 8.6 does not utilize any differential amplifier with differential output (as shown in Fig. 8.3). The output of stage-4 pass transistor-based multiplexer is *directly* applied as input to the differential-to-single ended converter. Differential amplifier with differential output (shown in Fig. 8.3) receives its input at the gates of $M_{1,2}$ and cannot by itself support much higher input common-mode voltage than the supply voltage due to reliability issues. If such a differential amplifier Figure 8.6 Schematic of the differential input stage with the differential-to-single ended converter at Stage-4. was used in Fig. 8.6, it would have defeated the higher input common-mode voltage supporting capability of current mode sensing differential-to-single ended converter. #### 8.3 Simulation Results The pass transistor multiplexer based input stage was laid out in IBM 0.13 µm CMOS technology. The pass transistor multiplexers were designed with thick-oxide 3.3 V I/O FETs. The differential-to-single ended converter was designed with thin-oxide 1.2 V FETs. The layout of the pass transistor multiplexer is shown in Fig. 8.7(a). The layout of the differential amplifier with differential output and the differential-to-single ended converter is shown in Fig. 8.7(b) and Fig. 8.7(c) respectively. The post-layout extraction was extensively simulated to validate the feasibility and robustness of the proposed input stage. Monte Carlo transient-simulation were performed to investigate the effects of transistor and interconnect parasitic variations (mismatch and process) on the robustness of the input stage and the differential-to-single ended conversion. The silicon area of the input stage is summarized in Table 8.1. ### 8.3.1 Input stage with differential amplifier and differential output The input stage of Fig. 8.4 was laid out. Each multiplexer stage was physically placed with proper distance so that the input stage can support two differential pins that are 2 mm apart. Long metal interconnects were used to include the effect of their parasitics. The post-layout extraction was simulated to characterize the input stage. OFF nMOS are marked in grey in Fig. 8.4. Post-layout simulation included the OFF nMOS to simulate the effect of their parasitic capacitances. As the focus of this chapter is on the feasibility of pass transistor - (a) Pass-transistor based 4-to-1 multiplexers (schematic in Fig. 8.1(c)). - (b) Differential amplifier with differential output (schematic in Fig. 8.3(b)). (c) Differential-to-single ended converter (schematic in Fig. 8.5). Figure 8.7 Layout. multiplexer based input stage, ideal 50 $\Omega$ was used in simulation to represent the external uIC CML output buffers. The common-mode voltage applied at the input of stage-1 multiplexer appears at the output of stage-4 multiplexer. That voltage is applied at the gates of nMOS (M<sub>1,2</sub> in Fig. 8.4) of the differential amplifier. Thus, the common-mode voltage level cannot be much higher than the recommended power supply voltage of the differential amplifier for reliability issues. An input common mode of 1.0 V (applied at the input of stage-1 multiplexer) was used in the post-layout simulation of the input stage utilizing differential amplifier with differential output. Fig. 8.8 shows the voltage waveforms from Monte Carlo **mismatch variation** simulations (1000 run) of the output of stage-4 pass transistor multiplexers (input of the differential amplifier with differential output of stage-4). A frequency of 2 GHz was used. As expected, there is *almost* no variation in the waveforms of the output of stage-4 due to the pass transistor | Table 8.1.9 | Silicon area | of the | nass | transistor | hased | input stage. | |-------------|--------------|---------|------|--------------|-------|--------------| | Table 0.1 k | Jincon arca | OI UIIC | pass | or arrara to | Dasca | mput stage. | | | <u> </u> | Area ( µm <sup>2</sup> ) | Number of | Required | | |------------------------------------------------------------|-----------|--------------------------|----------------------|-------------|--| | | | ( - / | ins- | area | | | | | /Unit | tance/Unit | $(\mu m^2)$ | | | Number of Mux in | Stage-1 | 66 | 64 | 4224 | | | each stage (schematic | Stage-2 | 66 | $32 \ (16 \times 2)$ | 2 112 | | | in Fig. 8.1(c)) | Stage-3 | 66 | $8 (4 \times 2)$ | 528 | | | III 1 1g. 6.1(C)) | Stage-4 | 66 | $8 (4 \times 2)$ | 528 | | | Differential-to-single | 4 | 0.00 | | | | | converter (schematic in | Fig. 8.5) | 65 | 4 | 260 | | | Fully differential am | 4 | 1.40 | | | | | (schematic in Fig. | 8.3) | 35 | 4 | 140 | | | Total silicon area of <i>one</i> four-stage interface unit | | | | | | | Total covered silicon area | | | | | | | Silicon area usage in the full-fledged wafer-scale | | | | | | | integrated circuit (WaferIC) | | | | | | Figure 8.8 **2 GHz** output ( $v_{\text{OUT4+}}$ and $v_{\text{OUT4-}}$ in Fig. 8.4) of stage-4 multiplexer from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.4) common-mode voltage=1.0 V. based multiplexers operating in triode region. Fig. 8.9 shows the voltage waveforms from Monte Carlo **mismatch variation** simulations (1000 run) of the output of the differential amplifier with differential output at stage-4. These waveforms show that the output of the differential amplifier with differential output at stage-4 maintains sufficient symmetry with Figure 8.9 **2 GHz** output ( $v_{\text{OUT+}}$ and $v_{\text{OUT-}}$ in Fig. 8.4) of stage-4 fully differential amplifier from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.4) common-mode voltage=1.0 V. reasonable 180° phase difference between the two differential signals. Fig. 8.9 shows a jitter of $\approx 60 \,\mathrm{ps}$ due to **mismatch variations** of the transistors of the differential amplifier with differential output that lies within the limit of practical standards. For example, 30% of the bit length is the maximum allowed jitter [67] in the PCIe transmission protocol, which represents an allowed jitter of 120 ps for a data rate of 2.5 Gbps. The output of the differential amplifier with differential output of stage-4 can be propagated to a voltage-mode sensing differential-to-single ended converter such as a differential pair with single-ended output where the input is applied at the gates of nMOS/pMOS. Such a differential-to-single ended converter was used in the differential interface reported in Chapter 7 and [72]. Process variation is a global variation that affects both propagation paths of differential signalling and thus affects both signals equally (in the absence of other variations). Fig. 8.10 shows the voltage waveforms from Monte Carlo **process variation** simulations (1000 run) at the output of the differential amplifier with differential output of stage-4. These waveforms show that the basic common-mode feedback (CMFB) used in Fig. 8.3 is sufficient to keep the common-mode output level of the differential amplifier with differential output of stage-4 within acceptable limits in spite of process variations. Thus, the output of the differential amplifier with differential output of stage-4 can be reliably used to drive a voltage-mode sensing differential-to-single ended converter such as a differential pair with single-ended Figure 8.10 **2 GHz** output ( $v_{\text{OUT+}}$ and $v_{\text{OUT-}}$ in Fig. 8.4) of Stage-4 fully differential amplifier from Monte Carlo **process variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.4) common-mode voltage=1.0 V. output. # 8.3.2 Input stage with current mode differential-to-single-ended converter The input stage of Fig. 8.6 was laid out. In this layout, each multiplexer stage was physically placed with proper distance so that the input stage can support two differential pins that are less than 2 mm apart. Also long metal interconnects were used to include the effects of their parasitics and the post-layout extraction was simulated to characterize the input stage. In this input stage (Fig. 8.6), the outputs of stage-4 pass transistor multiplexers are used directly as the input of the differential-to-single ended converter. Fig. 8.11 and Fig. 8.12 shows the output voltage waveforms ( $v_{\rm OUT}$ in Fig. 8.6) of the differential-to-single-ended converter from Monte Carlo **mismatch variation** simulations (1000 run) for 2 GHz signal when the differential input signal ( $v_{\rm IN}$ - and $v_{\rm IN}$ + in Fig. 8.6) is subject to a common-mode voltage of 1.2 V and 1.6 V respectively. For 2 GHz differential input signals, the differential-to-single-ended converter can make correct *conversion/detection* as long as the input common-mode voltage is in the range of 1.2-1.6 V. The differential-to-single-ended converter manifested input offset current due to mismatch in Monte Carlo simulation. However, the differential current swing of 8 mA associated with CML is sufficient to overcome that offset for correct conversion/detection. Figure 8.11 **2 GHz** output ( $v_{\text{OUT}}$ in Fig. 8.6) of the differential-to-single-ended converter from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=1.2 V. Figure 8.12 **2 GHz** output ( $v_{\text{OUT}}$ in Fig. 8.6) of the differential-to-single-ended converter from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=1.6 V. Figure 8.13 **1 GHz** output ( $v_{\text{OUT}}$ in Fig. 8.6) of the differential-to-single-ended converter from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=1.2 V. Figure 8.14 **1 GHz** output ( $v_{\text{OUT}}$ in Fig. 8.6) of the differential-to-single-ended converter from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) common-mode voltage=2.0 V. When the frequency of the differential input signal is lower, the differential input current has sufficient time to overcome the inherent input offset of the differential-to-single-ended converter (Fig. 8.5) and make the correct conversion/detection. Thus, for lower input frequencies, the pass-transistor based multiplexers and the differential-to-single-ended converter can support a wider range of input common-mode voltage. Fig. 8.13 and Fig. 8.14 show the output voltage waveforms ( $v_{\text{OUT}}$ in Fig. 8.6) of the differential-to-single-ended converter from Monte Carlo **mismatch variation** simulations (1000 run) for 1 GHz signal when the differential input signal ( $v_{\text{IN-}}$ and $v_{\text{IN+}}$ in Fig. 8.6) is subject to a common-mode voltage of 1.2 V and 2.0 V respectively. For 1 GHz differential input signals, the differential-to-single-ended converter can make correct conversion/detection as long as the input common-mode voltage is in the range of 1.2-2.0 V. # 8.3.3 Comparison between the input stage based on unity-gain buffer multiplexer and the input stage based on pass-transistor multiplexer The input stage based on unity-gain buffer multiplexers along with the entire differential interface elaborated in Chapter 7 was fabricated in a test-chip in TSMC 0.18 µm CMOS technology. However, the unity-gain buffer multiplexers were laid out in thick-oxide 0.35 µm nMOS/pMOS and the differential-to-single-ended converter (of Chapter 7) was laid out in 0.18 µm nMOS/pMOS of TSMC's CMOS process. On the other hand, the input stage based on pass-transistor multiplexer elaborated in this chapter was laid out in IBM 0.13 µm CMOS technology and post-layout simulation was used to validate the concept. The pass-transistor multiplexers (Fig. 8.7(a)) were laid out in thick-oxide 0.35 µm nMOS/pMOS of IBM's CMOS process. The fully differential amplifier (Fig. 8.7(b)) and the differential-to-single-ended converter (Fig. 8.7(c)) were laid out in 0.13 µm thin-oxide nMOS/pMOS of IBM's CMOS process. The input stage based on unity-gain buffer multiplexers was simulated in IBM 0.13 µm CMOS technology to characterize its performance so that a *fair* and meaningful comparison between these two input stages can be made. The unity-gain buffer based analog multiplexers shown in Fig. 8.15(a)-8.15(c) were simulated in IBM 0.13 µm CMOS technology. Three such unity-gain buffer based analog multiplexers (Fig. 8.15(a)-8.15(c)) were cascaded as shown in Fig. 8.16 to simulate the entire signal propagation path from the CIO/NanoPads to the input of the differential-to-single-ended converter of stage-4 (as was done through test-chip measurement in Chapter 7). It should be noted that the stage-4 of the input stage based on pass-transistor multiplexer comprised pass-transistor multiplexers and differential-to-single-ended converter. On the other hand, stage-4 of the input stage based on unity-gain buffer multiplexer comprised only a differential-to-single-ended converter (in Chapter 7). The schematic of these three multiplexers (shown in Fig. 8.15(a)-8.15(c)) are same as the (a) Analog multiplexer of stage-1. Figure 8.15 Schematic of the analog multiplexers simulated in IBM $0.13\,\mu m$ CMOS technology. Figure 8.16 The signal path consisting of 3 multiplexer stages that was simulated in IBM 0.13 µm CMOS technology. three multiplexers shown in Fig. 7.8(a)-7.8(c) but with some of the transistors having different sizings due to using IBM 0.13 µm CMOS technology. After some trial and error simulations (in IBM CMOS technology) with different sizings of transistors, the sizings of Fig. 8.15(a)-8.15(c) appeared to manifest the optimum signal integrity as the signal propagated from the Figure 8.17 **2 GHz** output ( $v_{\text{OUT+/-}}$ in Fig. 8.16) of the stage-3 multiplexer from Monte Carlo **mismatch variation** (typical-typical) simulation. Input ( $v_{\text{IN+/-}}$ in Fig. 8.16) common-mode voltage=1.6 V and voltage swing $V_{\text{P-P}}$ =800 mV was used. CIOs/NanoPads to the input of the differential-to-single-ended converter of stage-4. Each unity-gain buffer based analog multiplexer added random offsets to the propagated differential signals (in Fig. 8.16) due to mismatch of the transistors. Fig. 8.17 shows the output voltage waveforms ( $v_{\text{OUT+/-}}$ in Fig. 8.16) of the stage-3 multiplexer from Monte Carlo **mismatch** variation simulations (1000 run) for 2 GHz input frequency. Input ( $v_{\text{IN+/-}}$ in Fig. 8.16) voltage swing of $V_{\text{P-P}}$ =800 mV was used in the simulation that met the CML standards [65]. Comparing Fig. 8.9 and Fig. 8.17, it can be seen that almost similar amount (130 mV and 100 mV respectively) of offset is manifested in the signal waveforms that are used as the input of the differential-to-single-ended converter in the respective input stages. Thus, it can be concluded that both input stages can manifest approximately similar performance in terms of signal integrity from the CIO/NanoPads to the input of the differential-to-single-ended converter. ## 8.4 Summary of Contribution This chapter presented a spatially configurable differential input stage that can be integrated in the reconfigurable differential interfacing architecture elaborated in Chapter 7. The reconfigurable differential interfacing architecture elaborated in Chapter 7 utilized unity-gain buffer based multiplexers in its input stage. The input stage elaborated in this chapter utilized pass transistor multiplexers. A comparison is made in Table 8.2 between the input stage based on unity-gain buffer multiplexer and the input stage based on pass-transistor multiplexer. The input stage based on pass-transistor multiplexer was laid out using IBM 0.13 µm CMOS technology and post-layout simulation was used to validate the feasibility of the concept. The pass transistor multiplexer based input stage can support data rates of up to 2 Gbps while consuming **significantly** less silicon area (the smaller circuit uses only 5% of the area occupied by the larger) compared to the input stage based on unity-gain buffer based multiplexers. Table 8.2 Comparison between the input stage based on unity-gain buffer multiplexer and the input stage based on pass-transistor multiplexer. | Parameter i | Differential input stage | Differential input<br>stage based on<br>pass-transistor<br>multiplexer | | Comment | |------------------------------------------------------------|---------------------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------| | | based on<br>unity-gain<br>buffer mul-<br>tiplexer | With the differential amplifier with differential output shown in Fig. 8.4 | With the differential-to-single-ended converter shown in Fig. 8.6 | | | Total area<br>of one<br>four-stage<br>input<br>stage (µm²) | 167 300 | 7792 | | Compared to unity-gain buffer based multiplexers, pass transistor based multiplexers occupied 1/20 th silicon area footprint. | | Speed (Gbps) | 2.5 | 2 | 1 | _ | | Input common-mode range | Limited | Limited. | Wider. <sup>a</sup> 1.2-2.0 V for 1 GHz. 1.2-1.6 V for 2 GHz. | Compared to unity-gain buffer based multiplexers, pass transistor based multiplexers can support wider input common-mode range. | a. The power supply of the differential-to-single-ended converter (or FPIN) was $1.2\,\mathrm{V}$ . #### CHAPTER 9 GENERAL DISCUSSION The aim of this research was to augment the application domains of a field programmable interconnection network (FPIN) based prototyping and emulation platform by supporting open-drain bi-directional signals, analog signals and differential signals. Three interface circuits have been elaborated and developed to that end in this thesis. These three interface circuits can support reconfigurable routing of open-drain bi-directional, analog and differential signals through an uni-directional digital FPIN. The need for such interface circuits were originally conceived in the context of the WaferBoard, a system prototyping platform. #### 9.1 Bi-Directional Interface An open-drain interface circuit has been developed that can support a bi-directional bus structure using a digital FPIN. A star interconnect topology with $\Theta(n^2)$ complexity and a dual-queue interconnect topology with $\Theta(n)$ complexity have been proposed where n is the number of interconnected interface units. A delay model has been developed for the dual-queue interconnect topology. The model can be used to determine the maximum number of interface units that can be interconnected to support a given communication speed. The proposed open-drain bi-directional interface circuit has been fabricated in a 0.13 µm CMOS technology and was successfully tested. The interconnection topology has been validated by measurements from the test-chip. The fabricated circuit has been designed to meet the specification of the I<sup>2</sup>C Fast-mode Plus protocol when implemented with an active reconfigurable board. Nevertheless, it could be integrated with any FPIN or FPGA. In principle, it can support any open-drain bus with their respective reference voltages. To the best of our knowledge, an interface circuit that mimics the behaviour of open-drain connection has never been reported elsewhere. ### 9.2 ADM-based Analog Interface An analog interface circuit, based on a novel circuit-implementation of an asynchronous $\Delta$ -modulator (ADM), has been developed by the author. It is designed to support reconfigurable routing of analog signal through digital interconnection networks. Such networks are found in FPGAs and in an electronic system prototyping platform previously introduced in [1]. A silicon prototype was fabricated in a 0.13 µm CMOS technology. It occupies a total area of $45 \,\mu\text{m} \times 46 \,\mu\text{m}$ . Measurement results showed that the proposed analog interface can sup- port input signal bandwidth of 2 MHz and achieves measured SNR, SNDR and spurious-free dynamic range (SFDR) of 57, 47, and 54 dB respectively. The fabricated test chip showed that the SNDR of the proposed ADM is sensitive to power supply coupling. Different power supplies for analog and digital circuits combined with careful power decoupling are therefore required to match the expected SNDR of $\approx$ 60 dB predicted by post-layout simulations. Some of the previously published A/D converter manifested better figure of merit (FoM) and some occupied smaller silicon area than our proposed ADM. However, in terms of FoM and silicon area requirement *together*, the proposed ADM shows the best result. ## 9.3 Differential Signal Interface A spatially reconfigurable differential interfacing architecture to support CML signalling in the WaferBoard has been elaborated and developed in collaboration with Olivier Valorge, a post-doctoral fellow at Polytechnique Montréal. The proposed interface utilizes configurable H-tree structures for balanced input and output differential signal propagation. Two types of input stage for the differential interface were investigated. The first one is based on unity-gain buffer based multiplexers and the second one is based on pass-transistor based multiplexers. The first input stage has been validated by measurement results from a test-chip, while the latter one has been validated by post-layout simulations. Complementary pins of differential signalling can be detected over a maximum area of $2 \, \text{mm} \times 2 \, \text{mm}$ ( $1 \, \text{mm} \times 1 \, \text{mm}$ in the worst case scenario). The input stage, based on unity-gain buffer based multiplexers, can support data rates of up to $2.5 \, \text{Gbps}$ with $200 \, \text{mV}$ of voltage swing under typical conditions compatible with PCIe specifications. The input stage, based on pass-transistor based multiplexers, can support data rates of up to $2 \, \text{Gbps}$ while occupying significantly less area (5%) compared to the other input stage. ### **CONCLUSION** This thesis investigated and developed three interface circuits to support open-drain bidirectional signals, analog signals and differential signals through uni-directional digital FPINs. #### The List of Articles From This Thesis This section lists four research articles that report the aforementioned contributions discussed in Sec. 9.1, Sec. 9.2, and Sec. 9.3: - 1. An interface circuit that can support open-drain interconnection based bi-directional buses (such as I<sup>2</sup>C) was proposed, implemented, and reported in: - A conference paper [69, W. Hussain, Y. Savaria, and Y. Blaquiere, 'An interface for the I<sup>2</sup>C protocol in the WaferBoard' *IEEE International Symposium on Circuits and Systems (ISCAS)*, 2013, pages 1492–1495, 2013.]. This paper reports on an open-drain interface circuit based on a star interconnect topology. - A journal paper [70, W. Hussain, Y. Savaria, and Y. Blaquiere, 'An interface for open-drain bi-directional communication in field programmable interconnection networks' accepted for publication in IEEE Transactions on Circuits and Systems I: Regular Papers, August 2015]. This paper reports an $\Theta(n)$ complexity interconnect topology and measurement results of an open-drain interface circuit implemented in a test-chip. - 2. A novel circuit implementation of an asynchronous $\Delta$ -modulator (ADM) that was proposed and developed for A/D conversion. This circuit was developed to support analog signal transmission in the FPIN of the WaferBoard. This contribution was reported in: - A journal paper [71, W. Hussain, F. Hussein, Desgreys P., Y. Savaria, and Y. Blaquiere. An asynchronous Δ-modulator based A/D converter for an electronic system prototyping platform. Submitted in IEEE Transactions on Circuits and Systems I: Regular Papers, September 2015] has been submitted, reporting measurement results of the proposed ADM from a test-chip. - 3. A novel spatially configurable differential interface is proposed and developed to support CML differential signalling on WaferBoard. This contribution was reported in: - A journal paper [72, W. Hussain, O. Valorge, Y. Savaria, and Y. Blaquiere. A novel spatially configurable differential interface for an electronic system prototyping platform. Submitted in Integration, the VLSI Journal Elsevier, May 2015]. This paper introduce analog multiplexers, acting as unity-gain buffers, used in the input stage of the differential interface. The paper reports measurement results of the proposed differential interface. - A pass-transistor multiplexer based differential input stage is investigated and developed by the author for the aforementioned differential interface [72]. Post-layout simulation results validated the concept (Chapter 8). Prototypes of the three interface circuits that have been elaborated and developed in this thesis have been fabricated and successfully tested. The test has not only verified the proposed concepts, but also given us a few additional insight into their operations. In this thesis, step-by-step procedures for the development and elaboration of the interface circuits have been described so that anybody interested in these type of circuits and systems can utilize the concept in other domains. A possible utilization of the ASDM/ADM circuit in a conventional synchronous $\Sigma\Delta$ modulator is described in the following section. #### **Future Work** #### **Bi-Directional Interface** The dual-queue interconnection topology, elaborated in this thesis, has a worst-case propagation delay of $\mathcal{O}(n)$ , where n is the number of interconnected interfaces. There is a possibility of another topology where the worst-case propagation delay can be brought down to $\approx \mathcal{O}(2 \cdot \sqrt{n})$ . The interface unit, elaborated in this thesis, needs to be slightly modified to be compatible with such topologies. However, unless n is greater than 10, such topologies do not offer significant advantages over the dual-queue interconnection topology. #### ASDM-based Continuous-Time $\Sigma\Delta$ Modulator Being "amplitude-to-time converters", both asynchronous $\Sigma\Delta$ modulators (ASDMs) and asynchronous $\Delta$ -modulators (ADMs) can achieve higher precision by exploiting the higher intrinsic speed of transistors in scaled CMOS technologies. Such ASDMs and ADMs can operate as the internal quantizer of a synchronous CT- $\Sigma\Delta$ modulator for low-power applications (e.g., wireless communications, medical imaging, video, and instrumentation). As the "sampling" is done by pulse-width modulation in ASDMs/ADMs, quantization operations lead to measuring time intervals. Phase-locked loops (PLLs) and delay-locked loops (DLLs) can be utilized for measuring time intervals by generating multi-phase clocking with fractional nano-second resolution. Regular PLLs/DLLs are able to produce resolution of one gate-delay, which are in the range of $50 - 100 \,\mathrm{ps}$ . A synchronous CT- $\Sigma\Delta$ modulator architecture can be developed where the quantization error noise shaping capability of $\Sigma\Delta$ -loop and the high-resolution amplitude-to-time conversion capability of ASDM/ADMs as internal quantizers are simultaneously utilized. The output of such ASDM/ADMs could be quantized and fed back into the $\Sigma\Delta$ -loop to noise-shape the quantization error and generate a digital bit stream. Such a synchronous CT- $\Sigma\Delta$ modulator architecture can offer two advantages: - 1. In a synchronous CT-ΣΔ modulator architecture, the integrator gains depend on RC or g<sub>m</sub>C products. These products are subject to process dependent variations that usually lead to mismatched integrator gains compared to the original designed values resulting either in less noise-shaping or in worst case to an unstable system. The self-oscillation frequency of the ASDM/ADMs as the internal sampler can be controlled by an RC or g<sub>m</sub>C product. In such a case, the integrator gains of the loop-filter and self-oscillation frequency of the ASDM/ADMs are made to shift in same direction with process dependent variations and therefore the effective integrator gains remain constant. Thus, the loop-filter function (or noise-shaping transfer function) remains unchanged. - 2. In such an architecture, the quantization output code can have an intrinsic dynamic element matching (DEM) sequence that will mitigate the unit-element mismatches of the feedback D/A converter. #### REFERENCES - [1] R. Norman, O. Valorge, Y. Blaquiere, E. Lepercq, Y. Basile-Bellavance, Y. El-Alaoui, R. Prytula, and Y. Savaria. An active reconfigurable circuit board. In *Circuits and Systems and TAISA Conference*, 2008. NEWCAS-TAISA 2008. 2008 Joint 6th International IEEE Northeast Workshop on, pages 351–354, june 2008. - [2] E. Roza. Analog-to-digital conversion via duty-cycle modulation. *Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on*, 44(11):907 –914, nov 1997. - [3] O. Valorge, Y. Blaquiere, and Y. Savaria. A spatially reconfigurable fast differential interface for a wafer scale configurable platform. In *Electronics, Circuits, and Systems* (ICECS), 2010 17th IEEE International Conference on, pages 1176 –1179, dec. 2010. - [4] J.M. Rabaey, A.P. Chandrakasan, and B. Nikolic. *Digital integrated circuits: a design perspective*. Prentice Hall electronics and VLSI series. Pearson Education, 2003. - [5] Synopsys. Zebu-Server: Billion Gate, Multi-User/Mode ASIC and SoC Emulation, March 2014. - [6] Mentor. Veloce emulation systems, March 2014. - [7] Cadence. Cadence palladium series with incisive xe software, July 2012. - [8] Arteris. Network on chip (noc) interconnect technology for socs, April 2014. - [9] J. Varghese, M. Butts, and J. Batcheller. An efficient logic emulation system. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 1(2):171–174, June 1993. - [10] Xilinx. 7 series fpgas overview, March 2014. - [11] Xilinx. Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency, May 2014. - [12] Xilinx. 2.5D ICs: Just a Stepping Stone or a Long Term Alternative to 3D, March 2014. - [13] A.P.M.M. Moelands and H. Schutte. Two-wire bus-system comprising a clock wire and a data wire for interconnecting a number of stations, April 11 1984. EP Patent 0,051,332. - [14] NXP. I<sup>2</sup>C-bus specification and user manual, February 2013. - [15] R.V. White and D. Durant. Understanding and using pmbus trade; data formats. In Applied Power Electronics Conference and Exposition, 2006. APEC '06. Twenty-First Annual IEEE, pages 7 pp.–, March 2006. - [16] SMBus. SMBus Specifications, March 2014. - [17] PMBus. Introduction to the PMBus<sup>TM</sup>, October 2012. - [18] Intel. Intelligent platform management interface (IPMI), August 2015. - [19] Video Electronics standards association. Display Data Channel Command Interface (DDC/CI) Standard, August 1998. - [20] PICMG. AdvancedTCA Base Specification, April 2015. - [21] M.-J.E. Lee, W.J. Dally, R. Farjad-Rad, H.-T. Ng, Ramesh Senthinathan, J. Edmondson, and J. Poulton. Cmos high-speed i/os present and future. In *Computer Design*, 2003. *Proceedings. 21st International Conference on*, pages 454 461, oct. 2003. - [22] O.A.-T. Hasib, W. Andre, Y. Blaquiere, and Y. Savaria. Propagating analog signals through a fully digital network on an electronic system prototyping platform. In *Circuits and Systems (ISCAS)*, 2012 IEEE International Symposium on, pages 1983–1986, 2012. - [23] P82B96 dual bidirectional bus buffer, October 2012. - [24] PCA9600 dual bidirectional bus buffer, October 2012. - [25] D. Johns and K.W. Martin. Analog integrated circuit design. John Wiley & Sons, 1997. - [26] S.R. Norsworthy, R. Schreier, G.C. Temes, and IEEE Circuit & Systems Society. *Delta-Sigma data converters: theory, design, and simulation*. IEEE Press, 1997. - [27] R. Schreier and G.C. Temes. Understanding Delta-Sigma Data Converters. John Wiley & Sons, 2004. - [28] D.G. Zrilic. Circuits and Systems Based on Delta Modulation: Linear, Nonlinear and Mixed Mode Processing. Signals and Communication Technology. Springer Berlin Heidelberg, 2006. - [29] C.S. Taillefer and G.W. Roberts. Delta sigma ad conversion via time-mode signal processing. *Circuits and Systems I: Regular Papers, IEEE Transactions on*, 56(9):1908 –1920, sept. 2009. - [30] S. Ouzounov, Engel Roza, J.A. Hegt, G. van der Weide, and A.H.M. van Roermund. Analysis and design of high-performance asynchronous sigma-delta modulators with a binary quantizer. *Solid-State Circuits*, *IEEE Journal of*, 41(3):588 596, march 2006. - [31] L.H.C. Ferreira and S.R. Sonkusale. A 0.25-v 28-nw 58-db dynamic range asynchronous delta sigma modulator in 130-nm digital cmos process. *Very Large Scale Integration* (VLSI) Systems, IEEE Transactions on, PP(99):1–1, 2014. - [32] Jun-Gi Jo, Jinho Noh, and Changsik Yoo. A 20-mhz bandwidth continuous-time sigmadelta modulator with jitter immunity improved full clock period scr (fscr) dac and high-speed dwa. Solid-State Circuits, IEEE Journal of, 46(11):2469–2477, 2011. - [33] M. Ortmanns, F. Gerfers, and Y. Manoli. A continuous-time sigma-delta modulator with switched capacitor controlled current mode feedback. In *Solid-State Circuits Conference*, 2003. ESSCIRC '03. Proceedings of the 29th European, pages 249–252, Sept 2003. - [34] R.T. Baird and T.S. Fiez. A low oversampling ratio 14-b 500-khz delta; sigma; adc with a self-calibrated multibit dac. *Solid-State Circuits, IEEE Journal of*, 31(3):312–320, Mar 1996. - [35] A.M. Marques, V. Peluso, M.S.J. Steyaert, and Willy Sansen. A 15-b resolution 2-mhz nyquist rate delta; sigma; adc in a 1µm cmos technology. *Solid-State Circuits*, *IEEE Journal of*, 33(7):1065–1075, Jul 1998. - [36] S. Rabii and B.A. Wooley. A 1.8-v digital-audio sigma-delta modulator in 0.8- mu;m cmos. Solid-State Circuits, IEEE Journal of, 32(6):783-796, 1997. - [37] F. Medeiro, B. Perez-Verdu, and A. Rodriguez-Vazquez. A 13-bit 2.2-MS/s 55-mW multibit cascade Sigma; Delta; modulator in CMOS 0.7μm single-poly technology. *Solid-State Circuits*, *IEEE Journal of*, 34(6):748–760, Jun 1999. - [38] Y. Geerts, M.S.J. Steyaert, and Willy Sansen. A high-performance multibit $\Delta\Sigma$ CMOS ADC. Solid-State Circuits, IEEE Journal of, 35(12):1829–1840, Dec 2000. - [39] T.L. Brooks, D.H. Robertson, D.F. Kelly, A. Del Muro, and S.W. Harston. A cascaded ΔΣ pipeline A/D converter with 1.25 MHz signal bandwidth and 89 dB SNR. Solid-State Circuits, IEEE Journal of, 32(12):1896–1906, Dec 1997. - [40] R. Naiknaware and T. Fiez. 142 dB ΔΣ ADC with a 100 nV LSB in a 3 V CMOS process. In Custom Integrated Circuits Conference, 2000. CICC. Proceedings of the IEEE 2000, pages 5–8, 2000. - [41] K. Vleugels, S. Rabbi, and B.A Wooley. A 2.5 v sigma-delta modulator for broadband communication applications. *Solid-State Circuits, IEEE Journal of*, 36(12):1887–1889, Dec 2001. - [42] F. Gerfers, M. Ortmanns, and Y. Manoli. A 1.5-v 12-bit power-efficient continuous-time third-order sigma; delta; modulator. Solid-State Circuits, IEEE Journal of, 38(8):1343– 1352, 2003. - [43] J. Grilo, I. Galton, K. Wang, and R.G. Montemayor. A 12-mw add delta-sigma modulator with 80 db of dynamic range integrated in a single-chip bluetooth transceiver. *Solid-State Circuits*, *IEEE Journal of*, 37(3):271–278, 2002. - [44] Yong-In Park, S. Karthikeyan, Wern Ming Koe, Zhongnong Jiang, and Tiak-Chean Tan. A 16-bit, 5mhz multi-bit sigma-delta adc using adaptively randomized dwa. In *Custom Integrated Circuits Conference*, 2003. Proceedings of the IEEE 2003, pages 115–118, Sept 2003. - [45] A. Prasad, A. Chokhawala, K. Thompson, and J. Melanson. A 120db 300mw stereo audio A/D converter with 110db thd+n. In Solid-State Circuits Conference, 2004. ESSCIRC 2004. Proceeding of the 30th European, pages 191–194, 2004. - [46] Gil-Cho Ahn, Dong-Young Chang, M.E. Brown, N. Ozaki, H. Youra, K. Yamamura, K. Hamashita, K. Takasuka, G.C. Temes, and Un-Ku Moon. A 0.6-v 82-db delta-sigma audio adc using switched-rc integrators. Solid-State Circuits, IEEE Journal of, 40(12):2398–2407, 2005. - [47] Khiem Nguyen, R. Adams, K. Sweetland, and Huaijin Chen. A 106-db snr hybrid oversampling analog-to-digital converter for digital audio. *Solid-State Circuits*, *IEEE Journal of*, 40(12):2408–2415, Dec 2005. - [48] C.B. Wang, S. Ishizuka, and B.Y. Liu. A 113-db dsd audio adc using a density-modulated dithering scheme. Solid-State Circuits, IEEE Journal of, 38(1):114-119, 2003. - [49] R. Reutemann, P. Balmelli, and Qiuting Huang. A 33mw 14b 2.5m sample/s SigmaDelta A/D converter in 0.25/spl mu/m digital cmos. In Solid-State Circuits Conference, 2002. Digest of Technical Papers. ISSCC. 2002 IEEE International, volume 2, pages 252–495, Feb 2002. - [50] A.A. Hamoui and Ken Martin. A 1.8-v 3-ms/s 13-bit Delta Sigma A/D converter with pseudo data-weighted-averaging in 0.18- mu; m digital cmos. In *Custom Integrated Cir*cuits Conference, 2003. Proceedings of the IEEE 2003, pages 119–122, 2003. - [51] S.K. Gupta and Victor Fong. A 64-mhz clock-rate sigma; delta; adc with 88-db sndr and -105-db im3 distortion at a 1.5-mhz signal frequency. *Solid-State Circuits*, *IEEE Journal of*, 37(12):1653–1661, 2002. - [52] M. Safi-Harb and G.W. Roberts. Low power delta-sigma modulator for adsl applications in a low-voltage cmos technology. *Circuits and Systems I: Regular Papers, IEEE Transactions on*, 52(10):2075–2089, 2005. - [53] Ruoxin Jiang and T.S. Fiez. A 14-bit delta-sigma adc with 8 times; osr and 4-mhz conversion bandwidth in a 0.18μm cmos process. *Solid-State Circuits, IEEE Journal of*, 39(1):63–74, Jan 2004. - [54] L. Dorrer, F. Kuttner, P. Greco, P. Torta, and T. Hartig. A 3-mw 74-db snr 2-mhz continuous-time delta-sigma adc with a tracking adc quantizer in 0.13- mu;m cmos. Solid-State Circuits, IEEE Journal of, 40(12):2416–2427, 2005. - [55] Jiang Yu and F. Maloberti. A low-power multi-bit sigma; delta; modulator in 90-nm digital cmos without dem. Solid-State Circuits, IEEE Journal of, 40(12):2428–2436, Dec 2005. - [56] M.Z. Straayer and M.H. Perrott. A 12-Bit, 10-MHz Bandwidth, Continuous-Time ΣΔ ADC With a 5-Bit, 950-MS/s VCO-Based Quantizer. Solid-State Circuits, IEEE Journal of, 43(4):805 –814, april 2008. - [57] U. Wismar, D. Wisland, and P. Andreani. A 0.2v 0.44 $\mu$ w 20 khz analog to digital $\Sigma\Delta$ modulator with 57 fj/conversion fom. In *Solid-State Circuits Conference*, 2006. ESSCIRC 2006. Proceedings of the 32nd European, pages 187–190, Sept 2006. - [58] U. Wismar, D. Wisland, and P. Andreani. A 0.2 v, 7.5 μw, 20 khz σδ modulator with 69 db snr in 90 nm cmos. In Solid State Circuits Conference, 2007. ESSCIRC 2007. 33rd European, pages 206–209, Sept 2007. - [59] R.H. Walden. Analog-to-digital converter survey and analysis. Selected Areas in Communications, IEEE Journal on, 17(4):539–550, Apr 1999. - [60] S. Ouzounov, H. Hegt, and A. van Roermund. Sigma-delta modulators operating at a limit cycle. Circuits and Systems II: Express Briefs, IEEE Transactions on, 53(5):399 – 403, may 2006. - [61] T. Piessens and M. Steyaert. Highly efficient xdsl line drivers in 0.35- mu; m cmos using a self-oscillating power amplifier. Solid-State Circuits, IEEE Journal of, 38(1):22–29, Jan 2003. - [62] E. Dallago and G. Sassone. Advances in high-frequency power conversion by deltasigma modulation. *Circuits and Systems I: Fundamental Theory and Applications, IEEE* Transactions on, 44(8):712–721, Aug 1997. - [63] L.J. Meuleman, A. van de Grijp, and E. Roza. Apparatus for modulating the output signal of a converter, May 22 1984. US Patent 4,450,564. - [64] E. Roza. Poly-phase sigma-delta modulation. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, 44(11):915–923, Nov 1997. - [65] Maxim Integrated. Introduction to lvds, pecl, and cml, July 2000. - [66] Xilinx. Virtex-5 packaging and pinout specifications, August 2007. - [67] PCI-SIG. Pci express $^{\text{TM}}$ jitter and ber, revision 1.0, February 2005. - [68] D. Brooks. Signal integrity issues and printed circuit board design. Prentice Hall modern semiconductor design series. Prentice Hall, 2003. - [69] W. Hussain, Y. Savaria, and Y. Blaquiere. An interface for the I<sup>2</sup>C protocol in the WaferBoard. In Circuits and Systems (ISCAS), 2013 IEEE International Symposium on, pages 1492–1495, 2013. - [70] W. Hussain, Y. Savaria, and Y. Blaquiere. An interface for open-drain bi-directional communication in field programmable interconnection networks. *Accepted for publication in IEEE Transactions on Circuits and Systems I: Regular Papers*, August 2015. - [71] W. Hussain, F. Hussein, Desgreys P., Y. Savaria, and Y. Blaquiere. An asynchronous Δ-modulator based A/D converter for an electronic system prototyping platform. Submitted in IEEE Transactions on Circuits and Systems I: Regular Papers, September 2015. - [72] W. Hussain, O. Valorge, Y. Savaria, and Y. Blaquiere. A novel spatially configurable differential interface for an electronic system prototyping platform. Submitted in Integration, the VLSI Journal - Elsevier, May 2015. - [73] R. Norman, E. Lepercq, Y. Blaquiere, O. Valorge, Y. Basile-Bellavance, R. Prytula, and Y. Savaria. An interconnection network for a novel reconfigurable circuit board. In Circuits and Systems and TAISA Conference, 2008. NEWCAS-TAISA 2008. 2008 Joint 6th International IEEE Northeast Workshop on, pages 129 –132, june 2008. - [74] Mentor, July 2012. - [75] Synopsys, July 2012. - [76] Freescale. M68hc11e family, March 2014. - [77] Xilinx. All Programmable FPGAs and 3D ICs, July 2015. - [78] Altera. Altera FPGAs, July 2015. - [79] Chia-Hsi Chang, Feng-Yu Wu, and Yaow-Ming Chen. Modularized bidirectional gridconnected inverter with constant-frequency asynchronous sigma delta modulation. *In*dustrial Electronics, IEEE Transactions on, 59(11):4088–4100, Nov 2012. - [80] J.N. Kitchen, C. Chu, S. Kiaei, and B. Bakkaloglu. Combined linear and Δ-modulated switch-mode pa supply modulator for polar transmitters. Solid-State Circuits, IEEE Journal of, 44(2):404–413, Feb 2009. - [81] J. Daniels, W. Dehaene, M.S.J. Steyaert, and A. Wiesbauer. A/d conversion using asynchronous delta-sigma modulation and time-to-digital conversion. *Circuits and Systems I: Regular Papers, IEEE Transactions on*, 57(9):2404 –2412, sept. 2010. - [82] B. De Vuyst and P. Rombouts. A 5-mhz 11-bit self-oscillating $\Sigma\Delta$ modulator with a delay-based phase shifter in 0.025 mm2. Solid-State Circuits, IEEE Journal of, 46(8):1919–1927, Aug 2011. - [83] F. Colodro, A. Torralba, J.L. Mora, and J.M. Martinez-Heredia. An analog squaring technique based on asynchronous ΣΔ modulation. Circuits and Systems II: Express Briefs, IEEE Transactions on, 56(8):629–633, Aug 2009. - [84] L. Hernandez, S. Paton, and E. Prefasi. VCO-based sigma delta modulator with PWM precoding. *Electronics Letters*, 47(10):588–589, May 2011. - [85] A. Babaie Fishani and P. Rombouts. Continuous time $\Delta\Sigma$ modulation with PWM precoding and binary $g_m$ blocks. *Electronics Letters*, 48(19):1187–1188, September 2012. - [86] A. Babaie-Fishani, B. Van Keymeulen, and P. Rombouts. Analytical Expressions for the Distortion of Asynchronous ΣΔ Modulators. Circuits and Systems II: Express Briefs, IEEE Transactions on, 60(8):472–476, Aug 2013. - [87] Mohsen Tamaddon and Mohammad Yavari. A wideband time-based continuous-time sigma-delta modulator with 2nd order noise-coupling based on passive elements. *International Journal of Circuit Theory and Applications*, pages n/a-n/a, 2015. - [88] A. Gelb and W.E. Vander Velde. *Multiple-input describing functions and nonlinear system design*. McGraw-Hill electronic sciences series. McGraw-Hill, 1968. - [89] B. Razavi. Design of Analog CMOS Integrated Circuits. McGraw-Hill higher education. Tata McGraw-Hill, 2002. - [90] Etienne Lepercq, Olivier Valorge, Yan Basile-Bellavance, Nicolas Laflamme-Mayer, Yves Blaquiere, and Yvon Savaria. An interconnection network for a novel reconfigurable circuit board. In *Microsystems and Nanoelectronics Research Conference*, 2009. MNRC 2009. 2nd, pages 53–56, Oct 2009. - [91] R.L. Bunch and S. Raman. Large-signal analysis of mos varactors in cmos -gm lc vcos. Solid-State Circuits, IEEE Journal of, 38(8):1325–1332, Aug 2003. - [92] BEEcube. Bee4 hardware platform, July 2012. - [93] Lech Jozwiak, Nadia Nedjah, and Miguel Figueroa. Modern development methods and tools for embedded reconfigurable systems: A survey. *Integration, the VLSI Journal*, 43(1):1 33, 2010. - [94] J. Balachandran, M. Kuijk, S. Brebels, G. Carchon, W. De Raedt, B. Nauwelaers, and E. Beyne. Efficient link architecture for on-chip serial links and networks. In Systemon-Chip, 2006. International Symposium on, pages 1-4, nov. 2006. - [95] W. André, Y. Basile-Bellavance, Y. Blaquière, M. Bougataya, M.N. Laflamme, A. Lahks-sassi, Y. Savaria, M. Sawan, and O. Valorge. Methods, apparatus and system to support large-scale micro-systems including embedded and distributed power supply, thermal regulation, multi-distributed-sensors and electrical signal propagation, March 2012. WO Patent App. PCT/CA2011/050,537. - [96] Texas Instrument. Interfacing between lypecl, vml, cml, and lyds levels, December 2002. - [97] G. Sion, Y. Blaquiere, and Y. Savaria. Defect diagnosis algorithms for a field programmable interconnect network embedded in a very large area integrated circuit. In 21st IEEE International On-Line Testing Symposium, Greece, July 2015. - [98] O. Valorge, A.T. Nguyen, Y. Blaquiere, R. Norman, and Y. Savaria. Digital signal propagation on a wafer-scale smart active programmable interconnect. In *Electronics, Circuits and Systems*, 2008. ICECS 2008. 15th IEEE International Conference on, pages 1059–1062, 31 2008-sept. 3 2008. - [99] Texas Instrument. Microstar bga packaging reference guide, September 2000. ### APPENDIX A ## A.1 Derivation of Third Harmonic Distortion [2] Inserting the approximation of $2\frac{\alpha}{T} - 1 = v_{\rm in}$ in Eq. 2.19, $$v_{\rm in} - \left(2\frac{\alpha}{T} - 1\right) = \frac{2}{\pi} \sum_{n=1}^{\infty} \frac{\text{Re}L(n\omega_i)}{n\text{Re}L(\mu)} \sin\left\{n\pi(v_{\rm in} + 1)\right\}$$ (A.1) For assumption that $\omega \gg p$ , $$ReL(\omega) = Re\frac{p}{p+j\omega} = Re\frac{p^2 - jp\omega}{p^2 + \omega^2} \approx \frac{p^2}{\omega^2}$$ (A.2) Inserting Eq. A.2 into Eq. A.1 gives, $$v_{\rm in} - \left(2\frac{\alpha}{T} - 1\right) = \frac{2\operatorname{Re}L(\omega_i)}{\pi\operatorname{Re}L(\mu)} \sum_{n=1}^{\infty} \frac{\sin\left\{n\pi(v_{\rm in} + 1)\right\}}{n^3}$$ $$= \frac{2\operatorname{Re}L(\omega_i)}{\pi\operatorname{Re}L(\mu)} \frac{2\pi^3}{3} \operatorname{B}_3\left(\frac{v_{\rm in} + 1}{2}\right)$$ $$= \frac{\pi^2}{6} \frac{\operatorname{Re}L(\omega_i)}{\operatorname{Re}L(\mu)} (v_{\rm in}^3 - v_{\rm in}) \tag{A.3}$$ where $B_3(x)$ is the third-order Bernoulli polynomial. ## A.2 Duty Cycle of the Proposed ASDM Output $v_{\text{out}}$ in Fig. A.1 cannot remain at $V_{\text{DD}}$ indefinetly because it will make $v_f = V_{\text{DD}}$ and the $V_{DD}$ will propagate through the loop and make the schmidt trigger change to $v_{\text{out}} = 0$ . Similarly, $v_{\text{out}} = 0$ cannot be maintained indefinitely. Thus, $v_{\text{out}}$ will oscillate. Assuming $v_{\text{in}}$ is a DC value and $0 < v_{\text{in}} < V_{\text{DD}}$ , $$(v_{\text{in}} - V_{OL})A = V_{TL} + 2h$$ $$(v_{\text{in}} - V_{OH})A = V_{TL}$$ $$(V_{OH} - V_{OL}) = \frac{2h}{A}$$ Capacitor charge-discharge equation is, $$V_C(t) = \{V_C(0) - V_{\rm DD}\} \exp(\frac{-t}{RC}) + V_{\rm DD}$$ During $t_L$ , $$V_{OL} = V_{OH} \exp \frac{-t_L}{RC}$$ $$V_{OH} - V_{OL} = V_{OH} (1 - \exp \frac{-t_L}{RC})$$ $$= \frac{2h}{A}$$ $A\gg 1 \Rightarrow (1-\exp\frac{-t_H}{RC}) \ll 1$ and $\frac{t_L}{RC} \ll 1.$ Thus, $$V_{OH} \frac{t_L}{RC} = \frac{2h}{A} \tag{A.4}$$ During $t_H$ , $$V_{OH} = (V_{OL} - V_{DD}) \exp \frac{-t_H}{RC} + V_{DD}$$ $$V_{OH} - V_{OL} = (V_{DD} - V_{OL})(1 - \exp \frac{-t_H}{RC})$$ $$= \frac{2h}{A}$$ Figure A.1 The ASDM used in the proposed analog interface Figure A.2 Waveform of the hysteresis input assuming $v_{\rm in}$ is a DC value and $0 < v_{\rm in} < V_{\rm DD}$ . $A \gg 1 \Rightarrow (1 - \exp \frac{t_H}{RC}) \ll 1$ and $\frac{t_H}{RC} \ll 1$ . Thus, $$V_{OH} - V_{OL} = \frac{t_H}{RC}(V_{DD} - V_{OL}) = \frac{2h}{A}$$ (A.5) $A\gg 1 \Rightarrow V_{OH} \approx V_{OL} \approx v_{\rm in}.$ Thus, from Eq. A.4 and Eq. A.5, $$\frac{t_H}{t_L} = \frac{V_{OL}}{V_{DD} - V_{OL}}$$ $$\frac{t_H}{t_L + t_H} = \frac{V_{OL}}{V_{DD}}$$ $$\frac{t_H}{T} = \frac{v_{\text{in}}}{V_{DD}}(Proved)$$ (A.6) Figure A.3 Simulated frequency of oscilation ( $V_{DD} = 3.3 \text{ V}$ ) of the asynchronous $\Sigma\Delta$ modulator shown in Fig. 6.15 for different $v_{in}$ . From Eq. A.5, $$t_H = \frac{2hRC}{A} \frac{1}{V_{\rm DD} - v_{\rm in}} \tag{A.7}$$ From Eq. A.4, $$t_L = \frac{2hRC}{A} \frac{1}{v_{\rm in}} \tag{A.8}$$ From Eq. A.7 and Eq. A.8, $$T = t_L + t_H$$ $$= \frac{2hRC}{A} \left[ \frac{1}{V_{\text{DD}} - v_{\text{in}}} + \frac{1}{v_{\text{in}}} \right]$$ $$f = \frac{A}{2hRC} \frac{v_{\text{in}}(V_{\text{DD}} - v_{\text{in}})}{V_{\text{DD}}}$$ (A.9) Eq. A.9 shows that the oscillation frequency of the ASDM is quadratically related to the input voltage $(v_{in})$ . Fig. A.3 shows the simulated oscillation frequency of $v_{\text{out}}$ for $V_{\text{DD}} = 3.3 \,\text{V}$ . The oscillation frequency shows quadratic relationship with $v_{\text{in}}$ . The oscillation frequency is maximum when input voltage is around $\frac{V_{\text{DD}}}{2}$ . This behaviour is in agreement with Eq. A.9. ## APPENDIX B Time Line Table B.1 Timeline of the tasks leading to PhD. ## APPENDIX C Details of Test Chips # Test chip: Bi-Directional Interface and Asynchronous $\Sigma\Delta$ Modulator Technology: IBM 0.13 µm CMOS. Tape-out Date: October 31, 2013. CMC Run Code: 1304CG. Test Status: Tested at GRM lab. Design Name: ICGPMWUH. Functionality: Working. (a) Micro-photograph of the die. (b) Test bench. Figure C.1 Test chip. Figure C.2 Bonding diagram of test chip. Package type: CQFP44A. Figure C.3 Pin assignment of the test chip. Figure C.4 Layout of test chip. Figure C.5 Test setup of the bi-directional interface. Figure C.6 Test setup of the ASDM.