Search CORE

1,785 research outputs found

Recommended from our members

Silicon compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Pangrle Barry M.
Publication venue: eScholarship, University of California
Publication date: 01/01/1987
Field of study

Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory

eScholarship - University of California

Recommended from our members

Timing models for high-level synthesis

Author: Chaiyakul Viraphol
Gajski Daniel D.
Wu Allen C.H.
Publication venue: eScholarship, University of California
Publication date: 31/10/1991
Field of study

In this paper, we describe a timing model for clock estimation during high-level synthesis. In order to obtain realistic timing estimates, the proposed model considers all delay elements, including datapath, control and wire delays, and several technology factors, such as layout architecture, technology mapping, buffers insertion and loading effects. The experimental results show that this model can provide much better estimates than previous models. This model is well suited for automatic and interactive synthesis as well as feedback-driven synthesis where performance matrices must be rapidly and incrementally calculated

eScholarship - University of California

Baseband processor for IEEE 802.11a standard with embedded BIST

Author: Grass Eckhard
Jagdhold Ulrich
Krstic Milos
Maharatna Koushik
Troya Alfonso
Publication venue
Publication date: 01/01/2004
Field of study

In this paper results of an IEEE 802.11a compliant low-power baseband processor implementation are presented. The detailed structure of the baseband processor and its constituent blocks is given. A design for testability strategy based on Built-In Self-Test (BIST) is proposed. Finally implementational results and power estimation are reported

Southampton (e-Prints Soton)

Recommended from our members

SLAM : an automated structure to layout synthesis system

Author: Gajski Daniel
Wu Allen C.H.
Publication venue: eScholarship, University of California
Publication date: 07/11/1989
Field of study

SLAM is a structure to layout synthesis system. It incorporates parameterisable bit-sliced and glue-logic generators to produce high density layout. In this paper, we describe a sliced layout architecture and SLAM system. In addition, we present partitioning algorithms for generating the floorplan for such an architecture. The algorithms partition the netlist into component sets best suited for different layout styles such as bit-sliced or strip-oriented logic. Each group is partitioned further into clusters to achieve better area utilization. Several experiments demonstrate that highly dense layouts can be achieved by using these algorithms with the sliced layout architecture

eScholarship - University of California

A High performance and low cost hardware arcitecture for H.264 transform and quantization algorithms

Author: Hamzaoglu Ilker
Hamzaoğlu İlker
Tasdizen Ozgur
Taşdizen Özgür
Publication venue
Publication date: 01/09/2005
Field of study

In this paper, we present a high performance and low cost hardware architecture for real-time implementation of forward transform and quantization and inverse transform and quantization algorithms used in H.264 / MPEG4 Part 10 video coding standard. The hard-ware architecture is based on a reconfigurable datapath with only one multiplier. This hardware is designed to be used as part of a complete low power H.264 video coding system for portable appli-cations. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 81 MHz in a Xilinx Virtex II FPGA and it is verified to work at 210 MHz in a 0.18´ ASIC implementation. The FPGA and ASIC implementations can code 27 and 70 VGA frames (640x480) per second respectively

CiteSeerX

Sabanci University Research Database

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

Author: Benini Luca
Conti Francesco
Schiavone Pasquale Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

Author: Alex Kondratyev
Christos Sotiriou
Jordi Cortadella
Lavagno Luciano
Publication venue
Publication date: 01/01/2006
Field of study

Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino