10 research outputs found
Recommended from our members
Architecting NP-Dynamic Skybridge
With the scaling of technology nodes, modern CMOS integrated circuits face severe fundamental challenges that stem from device scaling limitations, interconnection bottlenecks and increasing manufacturing complexities. These challenges drive researchers to look for revolutionary technologies beyond the end of CMOS roadmap. Towards this end, a new nanoscale 3-D computing fabric for future integrated circuits, Skybridge, has been proposed [1]. In this new fabric, core aspects from device to circuit style, connectivity, thermal management and manufacturing pathway are co-architected in a 3-D fabric-centric manner.
However, the Skybridge fabric uses only n-type transistors in a dynamic circuit style for logic and memory implementations. Therefore, it requires complicated clocking schemes to overcome signal monotonicity associated with cascading dynamic logic gates. For Skybridge’s large-scale circuits, the dynamic circuit style requires cascaded stages to be micro-pipelined, which results in large number of buffers used for storing minterms causing significant overhead in terms of area and power. Moreover, implementation of logic is limited to NAND or AND-of-NAND based logic expressions, which does not always result in compact circuits. In this work, we propose an extension of original Skybridge fabric, called NP-Dynamic-Skybridge, to solve these challenges by using both n-and p-type transistors in an innovative circuit style. Here, every stage in a given circuit is implemented by either n-type or p-type dynamic logic.
Cascading n- and p-type dynamic logic effectively avoids signal monotonicity problem, and allows combinational-like circuit implementation. This helps to simplify the clocking scheme for cascaded logics requiring only one set of global precharge and evaluate clock signals. And also it expands the degree of expressing logic enabling expressions such as NOR, OR-of-NORs, in addition to those previously mentioned. Furthermore, the number of pipeline stages is significantly reduced for a given logic function, and buffer requirements are less compared with Skybridge 3D fabric thus improving on area and power metrics. Initial evaluation for NP-Dynamic-Skybridge’s 4-bit carry look-ahead adder shows up to 2x density benefits over Skybridge 3-D fabric and at least 17% power/throughput benefit
Recommended from our members
Architecting SkyBridge-CMOS
As the scaling of CMOS approaches fundamental limits, revolutionary technology beyond the end of CMOS roadmap is essential to continue the progress and miniaturization of integrated circuits. Recent research efforts in 3-D circuit integration explore pathways of continuing the scaling by co-designing for device, circuit, connectivity, heat and manufacturing challenges in a 3-D fabric-centric manner. SkyBridge fabric is one such approach that addresses fine-grained integration in 3-D, achieves orders of magnitude benefits over projected scaled 2-D CMOS, and provides a pathway for continuing scaling beyond 2-D CMOS.
However, SkyBridge fabric utilizes only single type transistors in order to reduce manufacture complexity, which limits its circuit implementation to dynamic logic. This design choice introduces multiple challenges for SkyBridge such as high switching power consumption, susceptibility to noise, and increased complexity for clocking. In this thesis we propose a new 3-D fabric, similar in mindset to SkyBridge, but with static logic circuit implementation in order to mitigate the afore-mentioned challenges. We present an integrated framework to realize static circuits with vertical nanowires, and co-design it across all layers spanning fundamental fabric structures to large circuits. The new fabric, named as SkyBridge-CMOS, introduces new technology, structures and circuit designs to meet the additional requirements for implementing static circuits. One of the critical challenges addressed here is integrating both n-type and p-type nanowires. Molecular bonding process allows precise control between different doping regions, and novel fabric components are proposed to achieve 3-D routing between various doping regions.
Core fabric components are designed, optimized and modeled with their physical level information taken into account. Based on these basic structures we design and evaluate various logic gates, arithmetic circuits and SRAM in terms of power, area footprint and delay. A comprehensive evaluation methodology spanning material/device level to circuit level is followed. Benchmarking against 16nm 2-D CMOS shows significant improvement of up to 50X in area footprint and 9.3X in total power efficiency for low power applications, and 3X in throughput for high performance applications. Also, better noise resilience and better power efficiency can be guaranteed when compared with original SkyBridge fabrics
Scalable Reliable SD Erlang Design
This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design
Simulation de profils de gravure et de dépôt à l’échelle du motif pour l’étude des procédés de microfabrication utilisant une source plasma de haute densité à basse pression
En lien avec l’avancée rapide de la réduction de la taille des motifs en microfabrication, des processus physiques négligeables à plus grande échelle deviennent dominants lorsque cette taille s’approche de l’échelle nanométrique. L’identification et une meilleure compréhension de ces différents processus sont essentielles pour améliorer le contrôle des procédés et poursuivre la «nanométrisation» des composantes électroniques. Un simulateur cellulaire à l’échelle du motif en deux dimensions s’appuyant sur les méthodes Monte-Carlo a été développé pour étudier l’évolution du profil lors de procédés de microfabrication. Le domaine de gravure est discrétisé en cellules carrées représentant la géométrie initiale du système masque-substrat. On insère les particules neutres et ioniques à l’interface du domaine de simulation en prenant compte des fonctions de distribution en énergie et en angle respectives de chacune des espèces. Le transport des particules est effectué jusqu’à la surface en tenant compte des probabilités de réflexion des ions énergétiques sur les parois ou de la réémission des particules neutres. Le modèle d’interaction particule-surface tient compte des différents mécanismes de gravure sèche telle que la pulvérisation, la gravure chimique réactive et la gravure réactive ionique. Le transport des produits de gravure est pris en compte ainsi que le dépôt menant à la croissance d’une couche mince. La validité du simulateur est vérifiée par comparaison entre les profils simulés et les observations expérimentales issues de la gravure par pulvérisation du platine par une source de plasma d’argon.With the reduction of feature dimensions, otherwise negligible processes are becoming dominant in microfabricated profile evolution. Improved understanding of these different processes is essential to improve the control of the microfabrication processes and to further decrease of the feature size. To help attaining such control, a 2D feature scale cellular simulator using Monte-Carlo techniques was developed. The calculation domain is discretized in square cells representing empty space, substrate or mask of the initial system. Neutral and ion species are inserted at simulation interface from their respective angular and energy distributions functions. Particles transport to the feature surface is calculated while taking into account ion reflection on sidewall and neutral reemission. The particles-surface interaction model includes the different etching mechanisms such as sputtering, reactive etching and reactive ion etching. Etch product transport is also taken into account as is their deposition leading to thin film growth. Simulation validity is confirmed by comparison between simulated profiles and experimental observations issued from sputtering of platinum in argon plasma source
Digital assistance design for analog systems : digital baseband for outphasing power amplifiers
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 145-150).Digital assistance is among many aspects that can be leveraged to help analog/mixed-signal designers keep up with the technology scaling. It usually takes the form of predistorter or compensator in an analog/mixed-signal system and helps compensate the nonidealities in the system. Digital assistance takes advantage of the process scaling with faster speed and a higher level of integration. When a digital system is co-optimized with system modeling techniques, digital assistance usually becomes a key enabling block for the high performance of the overall system. This thesis presents the design of digital assistances through the digital baseband design for outphasing power amplifiers. In the digital baseband design, this thesis conveys two major points: the importance of the use of the reduced-complexity system modeling techniques, and the communications between hardware design and system modeling. These points greatly help the success in the design of the energy-efficient baseband. The first part of the baseband design is to realize the nonlinear signal processing unit required by the modulation scheme. Conventional approaches of implementing this functionality do not scale well to meet the throughput, area and energy-efficiency targets. We propose a novel fixed-point piece-wise linear approximation technique for the nonlinear function computations involved in the signal processing unit. The new technique allows us to achieve an energy and area-efficient design with a throughput of 3.4Gsamples/s. Compared to the projected previous designs, our design shows 2x improvement in energy-efficiency and 25x in area-efficiency. The second part of the baseband design devotes to the nonlinear compensator design, aiming to improve the linearity performance of the outphasing power amplifier. We first explore the feasibility of a working compensator by use of an off-line iterative solving scheme. With the confirmation that a compensator does exist, we analyze the structure of the nonlinear baseband-equivalent PA system and create a dynamical real-time compensator model. The resulting compensator provides the overall PA system with around 10dB improvement in ACPR and up to 2.5% in EVM.by Yan Li.Ph.D
Recommended from our members
Interconnect Aging—Physics to Software
Device reliability or lifetime is often non-negotiable and crucial for sensitive applications such as medical devices, autonomous vehicles and space crafts. Inevitable technology advancement (e.g. miniaturization) has added unwelcome complications and unpredictability to the aging problem. Reliability of VLSI chips is jeopardized by mass transport in metallic interconnects. Material migration is caused by electrical, mechanical and thermal phenomena, and, therefore, is a complicated process. While all aspects of material migration have been studied, a comprehensive investigation that can explain and include all those phenomena simultaneously remains unsolved. Inaccuracies in modeling and predicting aging processes in wires cause that chipmakers often overdesign interconnects. This is an undesirable and expensive approach in terms of time and cost. In modern technologies, the predicted lifetime, aging, and failure mechanisms in interconnect very often do not match the observed behaviors. Unrealistic models used in CAD tools are the main culprit of such incompatibilities. In general, two situations may occur: (1) in some cases, the models may wrongly scrutinize reliability in unfailing parts and consequently impose unnecessary design tightening and (2) in some other cases, the models may underestimate serious reliability problems causing unpredicted behaviors or catastrophic failures to occur. The existing models for reliability evaluation are usually pessimistic in case of interconnect voiding and optimistic when extrusions occur. Time-consuming and not converging reliability assessments, as well as undesired chip behaviors, are the common expensive outcome of such models.We revisit the underlying physics of aging processes in dual-damascene copper lines. We demonstrate, that the simplistic modeling is the cause of the incompatibility of the existing models. We study all three main aging processes: electromigration, thermo-migration, and stress migration and offer several comprehensive yet compact models for realistic assessment of interconnect aging. These models explain many observations that have been inexplicable for decades. Ultimately, a computer-aided design tool, RAIN, is developed based on the proposed models and is capable of assessing the reliability of industry standard complex multi-layer, multi-segment interconnect networks. This tool can be readily integrated into other verification signoffs phases such as performance, timing, and power analyses. RAIN takes as inputs: (1) interconnect design, (2) technology specifications, (3) initial stress and temperature, (4) IR drop and lifetime requirements. It analyzes and assesses reliability and delivery requirements of all nets, and provides a report on voltage limitations, thermal violations and expected lifetime. It is validated on a wide spectrum of experimental results performed on various industry benchmarks
Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip
In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core.
For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed.
The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed.
The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications.
Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies.In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie Funktionalität zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese Funktionalität zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine Chipfläche einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell ändern können.
Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und für kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung für den digital gesteuerten Oszillator (DCO) zur Verringerung der Sensitivität bezüglich Versorgungsspannung und Temperatur beinhaltet. Zusätzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu ändern um schnelles DVFS zu realisieren.
Die Sensitivität dieses Frequenzgenerators bezüglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten Taktverstärkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine Chipfläche (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geändert werden können. Die Schaltungen erfüllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. Zusätzliche können schnelle Takte für neuartige serielle on-Chip
Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukünftigen MPSoCs mit Power Management in modernsten CMOS Technologien