97 research outputs found

    A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm

    Get PDF
    In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic.By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai’s alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas

    VLSI ARCHITECTURE OF PARALLEL MULTIPLIER– ACCUMULATOR BASED ON RADIX-2 MODIFIED BOOTH ALGORITHM

    Get PDF
    A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas

    A Modified Architecture of Multiplier and Accumulator Using Spurious Power Suppression Technique

    Get PDF
    High speed and low power Multiplier and Accumulator (MAC) unit is at most requirement of today’s VLSI systems and digital signal processing (DSP) applications like FFT, Finite Impulse response filters, convolution etc. In this modified architecture, Radix-4 Modified Booth Encoding (MBE) is used to produce the partial products. In this multiplication and accumulation has been combined using a hybrid type of Carry Save Adder (CSA). So the performance will be improved. A Carry Look ahead Adder is inserted in the CSA tree to reduce the number of bits in the final adder. In booth multiplication, when two numbers are multiplied some portion of the data may be zero. By neglecting those data, power has been reduced. For this purpose Spurious Power Suppression Technique (SPST) is used to remove useless portion of the data in addition process. In this modified architecture, the overall process is three stages to produce the result. The modified MAC operation is coded with Verilog and simulated using Xilinx 12.1

    A high-performance inner-product processor for real and complex numbers.

    Get PDF
    A novel, high-performance fixed-point inner-product processor based on a redundant binary number system is investigated in this dissertation. This scheme decreases the number of partial products to 50%, while achieving better speed and area performance, as well as providing pipeline extension opportunities. When modified Booth coding is used, partial products are reduced by almost 75%, thereby significantly reducing the multiplier addition depth. The design is applicable for digital signal and image processing applications that require real and/or complex numbers inner-product arithmetic, such as digital filters, correlation and convolution. This design is well suited for VLSI implementation and can also be embedded as an inner-product core inside a general purpose or DSP FPGA-based processor. Dynamic control of the computing structure permits different computations, such as a variety of inner-product real and complex number computations, parallel multiplication for real and complex numbers, and real and complex number division. The same structure can also be controlled to accept redundant binary number inputs for multiplication and inner-product computations. An improved 2's-complement to redundant binary converter is also presented

    Improving the Hardware Performance of Arithmetic Circuits using Approximate Computing

    Get PDF
    An application that can produce a useful result despite some level of computational error is said to be error resilient. Approximate computing can be applied to error resilient applications by intentionally introducing error to the computation in order to improve performance, and it has been shown that approximation is especially well-suited for application in arithmetic computing hardware. In this thesis, novel approximate arithmetic architectures are proposed for three different operations, namely multiplication, division, and the multiply accumulate (MAC) operation. For all designs, accuracy is evaluated in terms of mean relative error distance (MRED) and normalized mean error distance (NMED), while hardware performance is reported in terms of critical path delay, area, and power consumption. Three approximate Booth multipliers (ABM-M1, ABM-M2, ABM-M3) are designed in which two novel inexact partial product generators are used to reduce the dimensions of the partial product matrix. The proposed multipliers are compared to other state-of-the-art designs in terms of both accuracy and hardware performance, and are found to reduce power consumption by up to 56% when compared to the exact multiplier. The function of the multipliers is verified in several image processing applications. Two approximate restoring dividers (AXRD-M1, AXRD-M2) are proposed along with a novel inexact restoring divider cell. In the first divider, the conventional cells are replaced with the proposed inexact cells in several columns. The second divider computes only a subset of the trial subtractions, after which the divisor and partial remainder are rounded and encoded so that they may be used to estimate the remaining quotient bits. The proposed dividers are evaluated for accuracy and hardware performance alongside several benchmarking designs, and their function is verified using change detection and foreground extraction applications. An approximate MAC unit is presented in which the multiplication is implemented using a modified version of ABM-M3. The delay is reduced by using a fused architecture where the accumulator is summed as part of the multiplier compression. The accuracy and hardware savings of the MAC unit are measured against several works from the literature, and the design is utilized in a number of convolution operations

    Serial-data computation in VLSI

    Get PDF

    Design of a novel X-section architecture for FX-correlator in large interferometers : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering at Massey University, Auckland, New Zealand

    Get PDF
    Figures 2-12 and 2-17 are re-used under CC BY-NC 4.0 International & CC 3.0 Unported Licences respectively.Published journal papers I-III in the Appendices were removed because they are subject to copyright restrictions.In large radio-interferometers it is considerably challenging to perform signal correlations at input data-rates of over 11 Tbps, which involves vast amount of storage, memory bandwidth and computational hardware. The primary objective of this research work is to focus on reducing the memory-access and design complexity in matrix architectural Big Data processing of the complex X-section of an FX-correlator employed in large array radio-telescopes. This thesis presents a dedicated correlator-system-multiplier-and -accumulator (CoSMAC) cell architecture based on the real input samples from antenna arrays which produces two 16-bit complex multiplications in the same clock cycle. The novel correlator cell optimization is achieved by utilizing the flipped mirror relationship between Discrete Fourier transform (DFT) samples owing to the symmetry and periodicity of the DFT coefficient vectors. The proposed CoSMAC structure is extended to build a new processing element (PE) which calculates both cross- correlation visibilities and auto-correlation functions simultaneously. Further, a novel mathematical model and a hardware design is derived to calculate two visibilities per baseline for the Quadrature signals (IQ sampled signals, where I is In-phase signal and Q is the 90 degrees phase shifted signal) named as Processing Element for IQ sampled signals (PE_IQ). These three proposed dedicated correlator cells minimise the number of visibility calculations in a baseline. The design methodology also targets the optimisation of the multiplier size in order to reduce the power and area further in the CoSMAC, PE and PE_IQ. Various fast and efficient multiplier algorithms are compared and combined to achieve a novel multiplier named Modified-Booth-Wallace-Multiplier and implemented in the CoSMAC and PE cells. The dedicated multiplier is designed to mostly target the area and power optimisations without degrading the performance. The conventional complex-multiplier-and-accumulators (CMACs) employed to perform the complex multiplications are replaced with these dedicated ASIC correlator cells along with the optimized multipliers to reduce the overall power and area requirements in a matrix correlator architecture. The proposed architecture lowers the number of ASIC processor cells required to calculate the overall baselines in an interferometer by eliminating the redundant cells. Hence the new matrix architectural minimization is very effective in reducing the hardware complexity by nearly 50% without affecting the overall speed and performance of very large interferometers like the Square Kilometre Array (SKA)

    Investigating the VLSI Characterization of Parallel Signed Multipliers for RNS Applications Using FPGAs

    Get PDF
    Signed multiplication is a complex arithmetic operation, which is reflected in its relatively high signal propagation delay, high power dissipation, and large area requirement. High reliability applications such as Cryptography, Residue Number System (RNS) and Digital Signal Processing (DSP)2019;s effective performance is mainly depend on its arithmetic circuit's performance. Trend of using Residue Number System (RNS) instead of Constrain over-whelming Binary representation is promising technique in VLSI Systems and Multiplier is the basic building block of such systems. In this paper we have considered signed Modified Baugh Wooley Multiplier and Modified Booth Encoding (MBE) Multiplier logic for analysis and synthesized on best suited application platform. Analysis has taken account of Delay, Number of Logic Element requirements; Number of Signal Transition for particular sample input and its Power Consumption were analyzed for both Modified Baugh Wooley Multiplier and Modified Booth Encoding Multiplier. Analysis of Multiplier is described in Verilog HDL and Simulated using two different simulators namely Xilinx ISIM and Altera Quartus II. Then for comparative study, both multipliers are synthesized with Xilinx Virtex 7 XCV2000T-2FLG1925 and Altera Cyclone II EP2C35F672C6 and same parameter as discussed above are also evaluated. Booth Recoding provides overall advent of 9.691% in terms of area and approximately 43 % in terms of Delay compared to Modified Baugh Wooley Multiplier implemented using FPGA Technology

    A Study on Efficient Designs of Approximate Arithmetic Circuits

    Get PDF
    Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor pp is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB
    • …
    corecore