6 research outputs found

    A Multi-Format Floating-Point Multiplier for Power-Efficient Operations

    Get PDF

    Improved 64-bit Radix-16 Booth Multiplier Based on Partial Product Array Height Reduction

    Get PDF
    In this paper we describe an optimization for binary radix-16 (modified) Booth recoded multipliers to reduce the maximum height of the partial product columns to ceil(n/4) for n = 64-bit unsigned operands. This is in contrast to the conventional maximum height of ceil((n + 1)/4). Therefore a reduction of one unit in the maximum height is achieved. This reduction may add flexibility during the design of the pipelined multiplier to meet the design goals, it may allow further optimizations of the partial product array reduction stage in terms of area/delay/power and/or may allow additional addends to be included in the partial product array without increasing the delay. The method can be extended to Booth recoded radix-8 multipliers, signed multi- pliers, combined signed/unsigned multipliers, and other values of n

    Exploring abstract interfaces in system-on-chip integration

    Get PDF
    Modern mobile devices are marvels of computation. They can encode high defnition video, processing and compressing over 350MB/s of image data in real time. They have no trouble driving displays with as much resolution as a full laptop, and smartphone manufacturers boast of running games with console quality graphics. Mobile devices pack all of this computational power into a 12\ handheld package by integrating a number of specialized hardware accelerators (IP) along with conventional CPU and GPUs in a system on chip (SoC). Unfortunately, creating these specialized systems is becoming increasingly expensive. Since hardware accelerators come from a number of different sources and design cycles, different accelerator blocks will often contain incompatible hardware interfaces. Therefore, a large portion of SoC design cost comes in the form of designers manually interfacing each accelerator into a system. This work includes everything from building custom logic to wire up a block, to developing the drivers and API needed to take advantage of the hardware. My research focuses on generating these interfaces, including the physical hardware used to tie IP blocks into a system and the associated software collateral. Leveraging recent trends such as High Level Synthesis and other hardware generator methodologies, I propose an IP interface abstraction and parameterization designed to describe the interface of most current IP blocks. By encoding this knowledge at a higher level of abstraction, I am able to construct and demonstrate a hardware generator that maps an interface protocol description into synthesizable register transfer language (RTL), and that can automatically create hardware bridges between different interconnect standards. iv To ease the integration of the next generation of IP blocks-blocks that are automatically generated based of of user specification. I propose a set of interface primitives. \hen integrated into an IP generator, these primitives can automatically generate an interface that my interface system can tie to the rest of the system. I also demonstrate how the information stored in these types of primitives can be used to automatically generate a low level software driver that manages access to the IP blocks. Finally, I show how the simulation environment provided with an IP generator can be used to provide a domain appropriate application programming interface (API) to drive the software. Using an image signal processor generator as my platform, I demonstrate the construction of a map between the simulation software and hardware driver that enables a full one-button flow from algorithm development to applications running on specialized hardware within a working system

    FPU designs with NEM relays

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (pages 71-74).Nano-electromechanical (NEM) relays are an alternative to CMOS transistors as the fabric of digital circuits. Circuits with NEM relays offer energy-efficiency benefits over CMOS since they have zero leakage power and are strategically designed to maintain throughput that is competitive with CMOS despite their slow actuation times. The floating-point unit (FPU) is the most complex arithmetic unit in a computational system. This thesis investigates if the energy-efficiency promise of NEM relays demonstrated before on smaller circuit blocks holds for complex computational structures such as the FPU. The energy, performance, and area trade-offs of FPU designs with NEM relays are examined and compared with that of state-of-the-art CMOS designs in an equivalent scaled process. Circuits that are critical path bottlenecks, including primarily the leading zero detector (LZD) and leading zero anticipator (LZA) blocks, are carefully identified and optimized for low latency and device count. We manage to drop the NEM relay FPU latency from 71 mechanical delays in a CMOS-style implementation to 16 mechanical delays in a NEM relay pass-logic style implementation. The FPU designed with NEM relays features 15x lower energy per operation compared to CMOS.by Sumit Dutta.S.M
    corecore