Search CORE

6,496 research outputs found

Variant X-Tree Clock Distribution Network and Its Performance Evaluations

Author
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 10/01/2007
Field of study

Locally-Stable Macromodels of Integrated Digital Devices for Multimedia Applications

Author: Canavero Flavio
Maio Ivano Adolfo
Siviero Claudio
Stievano Igor Simone
Publication venue: IEEE
Publication date: 01/01/2008
Field of study

This paper addresses the development of accurate and efficient behavioral models of digital integrated circuits for the assessment of high-speed systems. Device models are based on suitable parametric expressions estimated from port transient responses and are effective at system level, where the quality of functional signals and the impact of supply noise need to be simulated. A potential limitation of some state-of-the-art modeling techniques resides in hidden instabilities manifesting themselves in the use of models, without being evident in the building phase of the same models. This contribution compares three recently-proposed model structures, and selects the local-linear state-space modeling technique as an optimal candidate for the signal integrity assessment of data links. In fact, this technique combines a simple verification of the local stability of models with a limited model size and an easy implementation in commercial simulation tools. An application of the proposed methodology to a real problem involving commercial devices and a data-link of a wireless device demonstrates the validity of this approac

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Near-Memory Address Translation

Author: Falsafi Babak
Jevdjic Djordje
Picorel Javier
Publication venue
Publication date: 21/08/2017
Field of study

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

Author: Benini Luca
Conti Francesco
Schiavone Pasquale Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna