48 research outputs found

    Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip

    Get PDF
    In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core. For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed. The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed. The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications. Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies.In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie FunktionalitĂ€t zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese FunktionalitĂ€t zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine ChipflĂ€che einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell Ă€ndern können. Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und fĂŒr kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung fĂŒr den digital gesteuerten Oszillator (DCO) zur Verringerung der SensitivitĂ€t bezĂŒglich Versorgungsspannung und Temperatur beinhaltet. ZusĂ€tzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu Ă€ndern um schnelles DVFS zu realisieren. Die SensitivitĂ€t dieses Frequenzgenerators bezĂŒglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten TaktverstĂ€rkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine ChipflĂ€che (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geĂ€ndert werden können. Die Schaltungen erfĂŒllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. ZusĂ€tzliche können schnelle Takte fĂŒr neuartige serielle on-Chip Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukĂŒnftigen MPSoCs mit Power Management in modernsten CMOS Technologien

    A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI

    Full text link
    We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial temperature range, from -40 {\deg}C to 125 {\deg}C. Our results match the state of the art (SOTA) with 4.8 uW / MHz at 50 MHz in active mode and surpass the SOTA in ultra-low-power retention mode.Comment: accepted at ISOCC 202

    VLSI Implementation of a 2.8 Gevent/s Packet-Based AER Interface with Routing and Event Sorting Functionality

    Get PDF
    State-of-the-art large-scale neuromorphic systems require sophisticated spike event communication between units of the neural network. We present a high-speed communication infrastructure for a waferscale neuromorphic system, based on application-specific neuromorphic communication ICs in an field programmable gate arrays (FPGA)-maintained environment. The ICs implement configurable axonal delays, as required for certain types of dynamic processing or for emulating spike-based learning among distant cortical areas. Measurements are presented which show the efficacy of these delays in influencing behavior of neuromorphic benchmarks. The specialized, dedicated address-event-representation communication in most current systems requires separate, low-bandwidth configuration channels. In contrast, the configuration of the waferscale neuromorphic system is also handled by the digital packet-based pulse channel, which transmits configuration data at the full bandwidth otherwise used for pulse transmission. The overall so-called pulse communication subgroup (ICs and FPGA) delivers a factor 25–50 more event transmission rate than other current neuromorphic communication infrastructures

    A database accelerator for energy-efficient query processing and optimization

    Get PDF
    Data processing on a continuously growing amount of information and the increasing power restrictions have become an ubiquitous challenge in our world today. Besides parallel computing, a promising approach to improve the energy efficiency of current systems is to integrate specialized hardware. This paper presents a Tensilica RISC processor extended with an instruction set to accelerate basic database operators frequently used in modern database systems. The core was taped out in a 28 nm SLP CMOS technology and allows energy-efficient query processing as well as query optimization by applying selectivity estimation techniques. Our chip measurements show an 1000x energy improvement on selected database operators compared to state-of-the-art systems

    A 16-Channel Fully Configurable Neural SoC With 1.52 ÎŒW/Ch Signal Acquisition, 2.79 ÎŒW/Ch Real-Time Spike Classifier, and 1.79 TOPS/W Deep Neural Network Accelerator in 22 nm FDSOI

    Get PDF
    With the advent of high-density micro-electrodes arrays, developing neural probes satisfying the real-time and stringent power-efficiency requirements becomes more challenging. A smart neural probe is an essential device in future neuroscientific research and medical applications. To realize such devices, we present a 22 nm FDSOI SoC with complex on-chip real-time data processing and training for neural signal analysis. It consists of a digitally-assisted 16-channel analog front-end with 1.52 Ό W/Ch, dedicated bio-processing accelerators for spike detection and classification with 2.79 Ό W/Ch, and a 125 MHz RISC-V CPU, utilizing adaptive body biasing at 0.5 V with a supporting 1.79 TOPS/W MAC array. The proposed SoC shows a proof-of-concept of how to realize a high-level integration of various on-chip accelerators to satisfy the neural probe requirements for modern applications

    Alignment, orientation, and Coulomb explosion of difluoroiodobenzene studied with the pixel imaging mass spectrometry (PImMS) camera

    Get PDF
    Citation: Amini, K., Boll, R., Lauer, A., Burt, M., Lee, J. W. L., Christensen, L., . . . Rolles, D. (2017). Alignment, orientation, and Coulomb explosion of difluoroiodobenzene studied with the pixel imaging mass spectrometry (PImMS) camera. Journal of Chemical Physics, 147(1). doi:10.1063/1.4982220Laser-induced adiabatic alignment and mixed-field orientation of 2,6-difluoroiodobenzene (C6H3F2I) molecules are probed by Coulomb explosion imaging following either near-infrared strong-field ionization or extreme-ultraviolet multi-photon inner-shell ionization using free-electron laser pulses. The resulting photoelectrons and fragment ions are captured by a double-sided velocity map imaging spectrometer and projected onto two position-sensitive detectors. The ion side of the spectrometer is equipped with a pixel imaging mass spectrometry camera, a time-stamping pixelated detector that can record the hit positions and arrival times of up to four ions per pixel per acquisition cycle. Thus, the time-of-flight trace and ion momentum distributions for all fragments can be recorded simultaneously. We show that we can obtain a high degree of one-and three-dimensional alignment and mixed-field orientation and compare the Coulomb explosion process induced at both wavelengths. © 2017 Author(s)