42 research outputs found

    Self-timed design in GaAs - case study of a high-speed parallel multiplier

    Get PDF
    Journal ArticleAbstract-The problems with synchronous designs at high clock frequencies have been well documented. This makes an asynchronous approach attractive for high speed technologies like GaAs. We investigate the issues involved by describing the design of a parallel multiplier that can be part of a floating point multiplier. We first present a new architecture called the partial army of array (PAA) that is more regular than a partial tree approach while having the same latency. We then show how this architecture can be used in a self-timed implementation in the style of micropipelines. We next describe how we can design the final carry propagate adder using a new precharged logic family in GaAs that we developed as part of this project. We conclude with some genera1 observations on doing asynchronous design in GaAs

    Serial-data computation in VLSI

    Get PDF

    DESIGN OF THE OPTIMIZED FUSED-ADD MULTIPLY (FAM) OPERATOR BY USING MODIFIED BOOTH

    Get PDF
    Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) Operator for increasing performance. We investigate techniques to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the propose technique yields considerable reductions in terms of critical delay, hardware complexity of the FAM unit. The FAM Architecture is implemented by Verilog Hardware Description Language and it is synthesized by Xilinx ISE tool. In proposed, we focus on AM units which implement the operation. The conventional design of the AM operator requires that its inputs and are first driven to an adder and then the input and the sum are driven to a multiplier in order to get. The drawback of using an adder is that it inserts a significant delay in the critical path of the AM. As there are carry signals to be propagated inside the adder, the critical path depends on the bit-width of the inputs. In order to decrease this delay, a SPST adder can be used which, however, the increases the area occupation and the power dissipation. An optimized design of the AM operator is based on the fusion of the adder and the MB encoding unit into a single data path block by direct recoding of the sum to its MB representation. The fused Add-Multiply (FAM) component contains only one adder at the end (final adder of the parallel multiplier). As a result, significant area savings are observed and the critical path delay of the recoding process is reduced and decoupled from the bit-width of its inputs. In this work, we present a new technique for direct recoding of two numbers in the MB representation of their sum

    HIGH-LEVEL OPTIMIZATION TECHNIQUES FOR LOW-POWER MODIFIED BOOTH MULTIPLIER DESIGN OF FPGA

    Get PDF
    Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) Operator for increasing performance. We investigate techniques to implement the direct recoding of the sum of two numbers in its Modified Booth (MB) form. We introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the propose technique yields considerable reductions in terms of critical delay, hardware complexity of the FAM unit. The FAM Architecture is implemented by Verilog Hardware Description Language and it is synthesized by Xilinx ISE tool. In proposed, we focus on AM units which implement the operation. The conventional design of the AM operator requires that its inputs and are first driven to an adder and then the input and the sum are driven to a multiplier in order to get. The drawback of using an adder is that it inserts a significant delay in the critical path of the AM. As there are carry signals to be propagated inside the adder, the critical path depends on the bit-width of the inputs. In order to decrease this delay, a SPST adder can be used which, however, the increases the area occupation and the power dissipation. An optimized design of the AM operator is based on the fusion of the adder and the MB encoding unit into a single data path block by direct recoding of the sum to its MB representation. The fused Add-Multiply (FAM) component contains only one adder at the end (final adder of the parallel multiplier). As a result, significant area savings are observed and the critical path delay of the recoding process is reduced and decoupled from the bit-width of its inputs. In this work, we present a new technique for direct recoding of two numbers in the MB representation of their sum

    Increasing rendering performance of graphics hardware

    Get PDF
    Graphics Processing Unit (GPU) performance is increasing faster than central processing unit (CPU) performance. This growth is driven by performance improvements that can be divided into the following three categories: algorithmic improvements, architectural improvements, and circuit-level improvements. In this dissertation I present techniques that improve the rendering performance of graphics hardware measured in speed, power consumption or image quality in each of these three areas. At the algorithmic level, I introduce a method for using graphics hardware to rapidly and efficiently generate summed-area tables, which are data structures that hold pre-computed two-dimensional integrals of subsets of a given image, and present several novel rendering techniques that take advantage of summed-area tables to produce dynamic, high-quality images at interactive frame rates. These techniques improve the visual quality of images rendered on current commodity GPUs without requiring modifications to the underlying hardware or architecture. At the architectural level, I propose modifications to the architecture of current GPUs that add conditional streaming capabilities. I describe a novel GPU-based ray-tracing algorithm that takes advantage of conditional output streams to reduce the memory bandwidth requirements by over an order of magnitude times when compared to previous techniques. At the circuit level, I propose a compute-on-demand paradigm for the design of high-speed and energy-efficient graphics components. The goal of the compute-on-demand paradigm is to only perform computation at the bit-level when needed. The compute-on-demand paradigm exploits the data-dependent nature of computation, and thereby obtains speed and energy improvements by optimizing designs for the common case. This approach is illustrated with the design of a high-speed Z-comparator that is implemented using asynchronous logic. Asynchronous or "clockless" circuits were chosen for my implementations since they allow for data-dependent completion times and reduced power consumption by disabling inactive components. The resulting circuit-level implementation runs over 1.5 times faster while on dissipating 25% the energy of a comparable synchronous comparator for the average case. Also at the circuit-level, I introduce a novel implementation of counterflow pipelining, which allows two streams of data to flow in opposite directions within the same pipeline without the need for complex arbitration. The advantages of this implementation are demonstrated by the design of a high-speed asynchronous Booth multiplier. While both the comparator and the multiplier are useful components of a graphics pipeline, the objective of this work was to propose the new design paradigm as a promising alternative to current graphics hardware design practices

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    VLSI architectures for public key cryptology

    Get PDF

    VLSI architectures for high speed Fourier transform processing

    Get PDF

    Microprocessor energy characterization and optimization through fast, accurate, and flexible simulation

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 99-102).Energy dissipation is emerging as a key constraint for both high-performance and embedded microprocessor designs, requiring computer architects to consider energy in addition to performance when evaluating design decisions. A major limitation is the general difficulty in analyzing the energy impact of architectural and microarchitectural features without constructing detailed implementations and running slow simulations. This thesis first describes the design of a fast, accurate, and flexible circuit simulation tool which enables transition-sensitive studies of microprocessor energy consumption that would otherwise be impossible or impractical. With a simulation infrastructure in place, various optimizations are implemented that target the entire datapath and cache energy consumption. The individual energy optimizations are analyzed in detail, and the microprocessor design is characterized using various energy breakdowns and studies of the bit correlation between data values. This work shows that a few relatively simple energy-saving techniques can have a large impact in the implementation of an energy-efficient microprocessor. By fully characterizing the energy usage, this thesis establishes a coherent vision of microprocessor energy consumption, and serves as a basis and motivation for further energy optimizations.by Ronny Krashinsky.S.M
    corecore