487 research outputs found

    Pipelined Two-Operand Modular Adders

    Get PDF
    Pipelined two-operand modular adder (TOMA) is one of basic components used in digital signal processing (DSP) systems that use the residue number system (RNS). Such modular adders are used in binary/residue and residue/binary converters, residue multipliers and scalers as well as within residue processing channels. The design of pipelined TOMAs is usually obtained by inserting an appriopriate number of latch layers inside a nonpipelined TOMA structure. Hence their area is also determined by the number of latches and the delay by the number of latch layers. In this paper we propose a new pipelined TOMA that is based on a new TOMA, that has the smaller area and smaller delay than other known structures. Comparisons are made using data from the very large scale of integration (VLSI) standard cell library

    Pipelining Saturated Accumulation

    Get PDF
    Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM

    Design and Analysis of Low Power Hybrid Braun Multiplier using Ladner Fischer Adder

    Get PDF
    Multiplier is important in many DSP systems and in many hardware blocks. Multiplier are used in various DSP application like digital filtering, digital communication. This needs parallel array multiplier to attain high speed for execution and better performance. A specific array multiplier is implemented known as Braun design. Braun multiplier is the one which is a kind of parallel multiplier. It contains different CSA count of AND gates. Braun multiplier employing Ripple Carry Adder is developed here having high speed PPA. It will reduce the delay and implemented using Tanner EDA tool

    ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA’S

    Get PDF
    In this paper carry tree adders are known to have the best performance in VLSI designs. However, this performance advantage does not translate directly into FPGA implementations due to constraints on logic block configurations and routing overhead. This paper investigates three types of carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and spanning tree adder) and compares them to the simple Ripple Carry Adder (RCA) and Carry Skip Adder (CSA). These designs of varied bit-widths were implemented on a Xilinx Spartan 3E FPGA and delay measurements were made with a high-performance logic analyzer. Due to the presence of a fast carry-chain, the RCA designs exhibit better delay performance up to 128 bits. The carry-tree adders are expected to have a speed advantage over the RCA as bit widths approach 256

    Efficient modular arithmetic units for low power cryptographic applications

    Get PDF
    The demand for high security in energy constrained devices such as mobiles and PDAs is growing rapidly. This leads to the need for efficient design of cryptographic algorithms which offer data integrity, authentication, non-repudiation and confidentiality of the encrypted data and communication channels. The public key cryptography is an ideal choice for data integrity, authentication and non-repudiation whereas the private key cryptography ensures the confidentiality of the data transmitted. The latter has an extremely high encryption speed but it has certain limitations which make it unsuitable for use in certain applications. Numerous public key cryptographic algorithms are available in the literature which comprise modular arithmetic modules such as modular addition, multiplication, inversion and exponentiation. Recently, numerous cryptographic algorithms have been proposed based on modular arithmetic which are scalable, do word based operations and efficient in various aspects. The modular arithmetic modules play a crucial role in the overall performance of the cryptographic processor. Hence, better results can be obtained by designing efficient arithmetic modules such as modular addition, multiplication, exponentiation and squaring. This thesis is organized into three papers, describes the efficient implementation of modular arithmetic units, application of these modules in International Data Encryption Algorithm (IDEA). Second paper describes the IDEA algorithm implementation using the existing techniques and using the proposed efficient modular units. The third paper describes the fault tolerant design of a modular unit which has online self-checking capability --Abstract, page iv

    Internet of Things Based Reconfigurable SIMD Processor for High-Speed End Devices in FPGA

    Get PDF
    This research article proposed the reconfigurable Single Instruction Multi Data (SIMD) processor design to speed up the accelerated computing task in IoT operations. Single Instruction Multi Data models leverage the parallel real source to speed up computing accelerated tasks. It proposes the utilization of reconfigurable Kogge Stone-dependent hybrid adder structures, now referred to as KS-CPA, in which reconfiguration occurs during the addition operation. The Least Significant Bits (LSB) are processed using a carry propagate adder, while the Most Significant Bits (MSB) are computed using the Kogge Stone adder. Depending on the data width and device-accessible energy resources, the hybrid configuration of the adder offers the 4-bit, 8-bit, and 16-bit addition. The adder form is identified by a shift in the configuration of its Carry Look-ahead and then by a Kogge Stone Adder (KSA). Throughout the activity, the KS-CLA crossbreed configuration is used to attain the fastest speed and low energy usage. The effectiveness, including its proposed hybrid adder, is evaluated by looking at the speed, energy, and area parameters, including a suitable area use during rapid applications in which both less delay and low power adders are required. Considering these, we are structuring an IoT processor that can be reconfigured to gain from SIMD. We have demonstrated that our hybrid adder-enhanced processor saves energy up to 13% and reduces 27% latency. The proposed 16 and 32-bit adders will boost time, power, and Area Delay Product (ADP) by almost 18-24% and 13-19% respectively

    A Novel VLSI Design On CSKA Of Binary Tree Adder With Compaq Area And High Throughput

    Get PDF
    Addition is one of the most basic operations performed in all computing units, including microprocessors and digital signal processors. It is also a basic unit utilized in various complicated algorithms of multiplication and division. Efficient implementation of an adder circuit usually revolves around reducing the cost to propagate the carry between successive bit positions. Multi-operand adders are important arithmetic design blocks especially in the addition of partial products of hardware multipliers. The multi-operand adders (MOAs) are widely used in the modern low-power and high-speed portable very-large-scale integration systems for image and signal processing applications such as digital filters, transforms, convolution neural network architecture. Hence, a new high-speed and area efficient adder architecture is proposed using pre-compute bitwise addition followed by carry prefix computation logic to perform the three-operand binary addition that consumes substantially less area, low power and drastically reduces the adder delay. Further, this project is enhanced by using Modified carry bypass adder to further reduce more density and latency constraints. Modified carry skip adder introduces simple and low complex carry skip logic to reduce parameters constraints. In this proposal work, designed binary tree adder (BTA) is analyzed to find the possibilities for area minimization. Based on the analysis, critical path of carry is taken into the new logic implementation and the corresponding design of CSKP are proposed for the BTA with AOI, OAI

    Review of high-speed phase accumulator for direct digital frequency synthesizer

    Get PDF
    A review of high-speed pipelined phase accumulator (PA) is proposed in this paper. The detail explanation of ideas, methods and techniques used in previous researches to improve the PA throughput designs were surveyed. The Brent–Kung (BK) adder was modified in this paper to be applied in pipelined PA architecture. A comparison of different adder circuits, includes a modified BK, ripple carry adder (RCA), Kogge-Stone adder (KS) and other prefix adders were applied to architect the PA based on Pipeline technique. The presented pipelined PA design circuit with multiple frequency control word (FCW) and different adders were coded Verilog hardware description language (HDL) code, compiled and verified with field programmable gate array (FPGA) kit platform. The comparison result shows that the modified BK adder has fast performances. The shifted clocking technique is utilized in the proposed pipelined PA circuit to reduce the unwanted repetitive D-flip flop (DFF) registers (coming from the pipeline technique), while preserving the high speed

    High-Performance Design of a 4-Bit Carry Look-Ahead Adder in Static CMOS Logic

    Get PDF
    Design of a 4-bit Carry Look-Ahead (CLA) process in static CMOS logic has been presented. CLA architecture proposed in this work computes carry-out terms without using carry-propagate and carry-generate signals which are used in conventional static CMOS (C-CMOS) 4-bit CLA adder. Performance parameters of the proposed 4-bit CLA architecture have been simulated and validated by comparing with the conventional design using Cadence design toolset in 45 nm technology. The designs were compared in terms of average power consumption, propagation delay and power delay product (PDP). The proposed 4-bit CLA topology obtained 34.53 % improvement in speed, 4.84 % improvement in power consumption and 37.696 % improvement in PDP while the source voltage was 1.0 V. Hence, as per acquired simulation results, the proposed 4-bit CLA structure is proven to be an excellent alternative to the conventional design for data-path design in modern high-performance processors.Design of a 4-bit Carry Look-Ahead (CLA) process in static CMOS logic has been presented. CLA architecture proposed in this work computes carry-out terms without using carry-propagate and carry-generate signals which are used in conventional static CMOS (C-CMOS) 4-bit CLA adder. Performance parameters of the proposed 4-bit CLA architecture has been simulated and validated by comparing with the conventional design using Cadence design toolset in 45 nm technology. The designs were compared in terms of average power consumption, propagation delay and power delay product (PDP). The proposed 4-bit CLA topology obtained 26.67 % improvement in speed, 5.966 % improvement in power consumption and 31.06 % improvement in PDP while the source voltage was 1.0 V. Hence, as per acquired simulation results, the proposed 4-bit CLA structure is proven to be an excellent alternative to the conventional design for data-path design in modern high-performance processors

    Design of ALU and Cache Memory for an 8 bit ALU

    Get PDF
    The design of an ALU and a Cache memory for use in a high performance processor was examined in this thesis. Advanced architectures employing increased parallelism were analyzed to minimize the number of execution cycles needed for 8 bit integer arithmetic operations. In addition to the arithmetic unit, an optimized SRAM memory cell was designed to be used as cache memory and as fast Look Up Table. The ALU consists of stand alone units for bit parallel computation of basic integer arithmetic operations. Addition and subtraction were performed using Kogge Stone parallel prefix hardware operating at 330MHz. A high performance multiplier was built using Radix 4 Modified Booth Encoder (MBE) and a Wallace Tree summation array. The multiplier requires single clock cycle for 8 bit integer multiplication and operates at a maximum frequency of 100MHz. Multiplicative division hardware was built for executing both integer division and square root. The division hardware computes 8-bit division and square root in 4 clock cycles. Multiplier forms the basic building block of all these functional units, making high level of resource sharing feasible with this architecture. The optimal operating frequency for the arithmetic unit is 70MHz. A 6T CMOS SRAM cell measuring 90 µm2 was designed using minimum size transistors. The layout allows for horizontal overlap resulting in effective area of 76 µm2 for an 8x8 array. By substituting equivalent bit line capacitance of P4 L1 Cache, the memory was simulated to have a read time of 3.27ns. An optimized set of test vectors were identified to enable high fault coverage without the need for any additional test circuitry. Sixteen test cases were identified that would toggle all the nodes and provide all possible inputs to the sub units of the multiplier. A correlation based semi automatic method was investigated to facilitate test case identification for large multipliers. This method of testability eliminates performance and area overhead associated with conventional testability hardware. Bottom up design methodology was employed for the design. The performance and area metrics are presented along with estimated power consumption. A set of Monte Carlo analysis was carried out to ensure the dependability of the design under process variations as well as fluctuations in operating conditions. The arithmetic unit was found to require a total die area of 2mm2 (approx.) in 0.35 micron process
    • …
    corecore