6 research outputs found

    On High-Performance Parallel Fixed-Point Decimal Multiplier Designs

    Full text link
    High-performance, area-efficient hardware implementation of decimal multiplication is preferred to slow software simulations in a number of key scientific and financial application areas, where errors caused by converting decimal numbers into their approximate binary representations are not acceptable. Multi-digit parallel decimal multipliers involve two major stages: (i) the partial product generation (PPG) stage, where decimal partial products are determined by selecting the right versions of the pre-computed multiples of the multiplicand, followed by (ii) the partial product accumulation (PPA) stage, where all the partial products are shifted and then added together to obtain the final multiplication product. In this thesis, we propose a parallel architecture for fixed-point decimal multiplications based on the 8421-5421 BCD representation. In essence, we apply a hybrid 8421-5421 recoding scheme to help simplify the computation logic of the PPG. In the following PPA stage, these generated partial products are accumulated using 8421 carry-lookahead adders (CLAs) organized as a tree structure; this organization is a significant departure from the traditional carry-save-adder-based (CSA) approach, which suffers from the problems introduced by extra recoding logic and/or addition circuits needed. In addition to the proposed 8421-5421-based decimal multiplier, we also propose a 4221-based decimal multi-plier that is built upon a novel full adder for 4221 BCD codes; in this design, expensive 4221-to-8421 conversions are no longer needed, and as a result, the operands of this 4221 multiplier can be directly represented in 4221 BCD. The proposed 16x16 decimal multipliers are compared against other best known decimal multiplier designs in terms of delays and delay-area products with a TSMC 90nm technology. The evaluation results have confirmed that the proposed 8421-5421 multiplier achieves the lowest delay and is the most time-area efficient design among all the existing hardware-based BCD multipliers

    Analysis and implementation of decimal arithmetic hardware in nanometer CMOS technology

    Get PDF
    Scope and Method of Study: In today's society, decimal arithmetic is growing considerably in importance given its relevance in financial and commercial applications. Decimal calculations on binary hardware significantly impact performance mainly because most systems utilize software to emulate decimal calculations. The introduction of dedicated decimal hardware on the other hand promises the ability to improve performance by two or three orders of magnitude. The founding blocks of binary arithmetic are studied and applied to the development of decimal arithmetic hardware. New findings are contrasted with existent implementations and validated through extensive simulation.Findings and Conclusions: New architectures and a significant study of decimal arithmetic was developed and implemented. The architectures proposed include an IEEE-754 current revision draft compliant floating-point comparator, a study on decimal division, partial product reduction schemes using decimal compressor trees and a final implementation of a decimal multiplier using advanced techniques for partial product generation. The results of each hardware implementation in nanometer technologies are weighed against existent propositions and show improvements upon area, delay, and power

    DESIGN OF ON-LINE DECIMAL MULTIPLIER

    Get PDF

    DESIGN OF ON-LINE DECIMAL MULTIPLIER

    Get PDF

    Max Operation in Statistical Static Timing Analysis on the Non-Gaussian Variation Sources for VLSI Circuits

    Full text link
    As CMOS technology continues to scale down, process variation introduces significant uncertainty in power and performance to VLSI circuits and significantly affects their reliability. If this uncertainty is not properly handled, it may become the bottleneck of CMOS technology improvement. As a result, deterministic analysis is no longer conservative and may result in either overestimation or underestimation of the circuit delay. As we know that Static-Timing Analysis (STA) is a deterministic way of computing the delay imposed by the circuits design and layout. It is based on a predetermined set of possible events of process variations, also called corners of the circuit. Although it is an excellent tool, current trends in process scaling have imposed significant difficulties to STA. Therefore, there is a need for another tool, which can resolve the aforementioned problems, and Statistical Static Timing Analysis (SSTA) has become the frontier research topic in recent years in combating such variation effects. There are two types of SSTA methods, path-based SSTA and block-based SSTA. The goal of SSTA is to parameterize timing characteristics of the timing graph as a function of the underlying sources of process parameters that are modeled as random variables. By performing SSTA, designers can obtain the timing distribution (yield) and its sensitivity to various process parameters. Such information is of tremendous value for both timing sign-off and design optimization for robustness and high profit margins. The block-based SSTA is the most efficient SSTA method in recent years. In block-based SSTA, there are two major atomic operations max and add. The add operation is simple; however, the max operation is much more complex. There are two main challenges in SSTA. The Topological Correlation that emerges from reconvergent paths, these are the ones that originate from a common node and then converge again at another node (reconvergent node). Such correlation complicates the maximum operation. The second challenge is the Spatial Correlation. It arises due to device proximity on the die and gives rise to the problems of modeling delay and arrival time. This dissertation presents statistical Nonlinear and Nonnormals canonical form of timing delay model considering process variation. This dissertation is focusing on four aspects: (1) Statistical timing modeling and analysis; (2) High level circuit synthesis with system level statistical static timing analysis; (3) Architectural implementations of the atomic operations (max and add); and (4) Design methodology. To perform statistical timing modeling and analysis, we first present an efficient and accurate statistical static timing analysis (SSTA) flow for non-linear cell delay model with non-Gaussian variation sources. To achieve system level SSTA we apply statistical timing analysis to high-level synthesis flow, and develop yield driven synthesis framework so that the impact of process variations is taken into account during high-level synthesis. To accomplish architectural implementation, we present the vector thread architecture for max operator to minimize delay and variation. Finally, we present comparison analysis with ISCAS benchmark circuits suites. In the last part of this dissertation, a SSTA design methodology is presented
    corecore