979 research outputs found

    Design of Synchronous Section-Carry Based Carry Lookahead Adders with Improved Figure of Merit

    Full text link
    The section-carry based carry lookahead adder (SCBCLA) architecture was proposed as an efficient alternative to the conventional carry lookahead adder (CCLA) architecture for the physical implementation of computer arithmetic. In previous related works, self-timed SCBCLA architectures and synchronous SCBCLA architectures were realized using standard cells and FPGAs. In this work, we deal with improved realizations of synchronous SCBCLA architectures designed in a semi-custom fashion using standard cells. The improvement is quantified in terms of a figure of merit (FOM), where the FOM is defined as the inverse product of power, delay and area. Since power, delay and area of digital designs are desirable to be minimized, the FOM is desirable to be maximized. Starting from an efficient conventional carry lookahead generator, we show how an optimized section-carry based carry lookahead generator is realized. In comparison with our recent work dealing with standard cells based implementation of SCBCLAs to perform 32-bit addition of two binary operands, we show in this work that with improved section-carry based carry lookahead generators, the resulting SCBCLAs exhibit significant improvements in FOM. Compared to the earlier optimized hybrid SCBCLA, the proposed optimized hybrid SCBCLA improves the FOM by 88.3%. Even the optimized hybrid CCLA features improvement in FOM by 77.3% over the earlier optimized hybrid CCLA. However, the proposed optimized hybrid SCBCLA is still the winner and has a better FOM than the currently optimized hybrid CCLA by 15.3%. All the CCLAs and SCBCLAs are implemented to realize 32-bit dual-operand binary addition using a 32/28nm CMOS process.Comment: arXiv admin note: text overlap with arXiv:1603.0796

    Approximate Ripple Carry and Carry Lookahead Adders - A Comparative Analysis

    Full text link
    Approximate ripple carry adders (RCAs) and carry lookahead adders (CLAs) are presented which are compared with accurate RCAs and CLAs for performing a 32-bit addition. The accurate and approximate RCAs and CLAs are implemented using a 32/28nm CMOS process. Approximations ranging from 4- to 20-bits are considered for the less significant adder bit positions. The simulation results show that approximate RCAs report reductions in the power-delay product (PDP) ranging from 19.5% to 82% than the accurate RCA for approximation sizes varying from 4- to 20-bits. Also, approximate CLAs report reductions in PDP ranging from 16.7% to 74.2% than the accurate CLA for approximation sizes varying from 4- to 20-bits. On average, for the approximation sizes considered, it is observed that approximate CLAs achieve a 46.5% reduction in PDP compared to the approximate RCAs. Hence, approximate CLAs are preferable over approximate RCAs for the low power implementation of approximate computer arithmetic

    ASIC-based Implementation of Synchronous Section-Carry Based Carry Lookahead Adders

    Full text link
    The section-carry based carry lookahead adder (SCBCLA) topology was proposed as an improved high-speed alternative to the conventional carry lookahead adder (CCLA) topology in previous works. Self-timed and FPGA-based implementations of SCBCLAs and CCLAs were considered earlier, and it was found that SCBCLAs could help in delay reduction i.e. pave the way for improved speed compared to CCLAs at the expense of some increase in area and/or power parameters. In this work, we consider semi-custom ASIC-based implementations of different variants of SCBCLAs and CCLAs to perform 32-bit dual-operand addition. Based on the simulation results for 32-bit dual-operand addition obtained by targeting a high-end 32/28nm CMOS process, it is found that an optimized SCBCLA architecture reports a 9.8% improvement in figure-of-merit (FOM) compared to an optimized CCLA architecture, where the FOM is defined as the inverse of the product of power, delay, and area. It is generally inferred from the simulations that the SCBCLA architecture could be more beneficial compared to the CCLA architecture in terms of the design metrics whilst benefitting a variety of computer arithmetic operations involving dual-operand and/or multi-operand additions. Also, it is observed that heterogeneous CLA architectures tend to fare well compared to homogeneous CLA architectures, as substantiated by the simulation results.Comment: in the Book, Recent Advances in Circuits, Systems, Signal Processing and Communications, Included in ISI/SCI Web of Science and Web of Knowledge, Proceedings of 10th International Conference on Circuits, Systems, Signal and Telecommunications, pp. 58-64, 2016, Barcelona, Spai

    Designing high-speed, low-power full adder cells based on carbon nanotube technology

    Full text link
    This article presents novel high speed and low power full adder cells based on carbon nanotube field effect transistor (CNFET). Four full adder cells are proposed in this article. First one (named CN9P4G) and second one (CN9P8GBUFF) utilizes 13 and 17 CNFETs respectively. Third design that we named CN10PFS uses only 10 transistors and is full swing. Finally, CN8P10G uses 18 transistors and divided into two modules, causing Sum and Cout signals are produced in a parallel manner. All inputs have been used straight, without inverting. These designs also used the special feature of CNFET that is controlling the threshold voltage by adjusting the diameters of CNFETs to achieve the best performance and right voltage levels. All simulation performed using Synopsys HSPICE software and the proposed designs are compared to other classical and modern CMOS and CNFET-based full adder cells in terms of delay, power consumption and power delay product.Comment: 13 Pages, 13 Figures, 2 Table

    Asynchronous Early Output Dual-Bit Full Adders Based on Homogeneous and Heterogeneous Delay-Insensitive Data Encoding

    Get PDF
    This paper presents the designs of asynchronous early output dual-bit full adders without and with redundant logic (implicit) corresponding to homogeneous and heterogeneous delay-insensitive data encoding. For homogeneous delay-insensitive data encoding only dual-rail i.e. 1-of-2 code is used, and for heterogeneous delay-insensitive data encoding 1-of-2 and 1-of-4 codes are used. The 4-phase return-to-zero protocol is used for handshaking. To demonstrate the merits of the proposed dual-bit full adder designs, 32-bit ripple carry adders (RCAs) are constructed comprising dual-bit full adders. The proposed dual-bit full adders based 32-bit RCAs incorporating redundant logic feature reduced latency and area compared to their non-redundant counterparts with no accompanying power penalty. In comparison with the weakly indicating 32-bit RCA constructed using homogeneously encoded dual-bit full adders containing redundant logic, the early output 32-bit RCA comprising the proposed homogeneously encoded dual-bit full adders with redundant logic reports corresponding reductions in latency and area by 22.2% and 15.1% with no associated power penalty. On the other hand, the early output 32-bit RCA constructed using the proposed heterogeneously encoded dual-bit full adder which incorporates redundant logic reports respective decreases in latency and area than the weakly indicating 32-bit RCA that consists of heterogeneously encoded dual-bit full adders with redundant logic by 21.5% and 21.3% with nil power overhead. The simulation results obtained are based on a 32/28nm CMOS process technology

    Asynchronous Ripple Carry Adder based on Area Optimized Early Output Dual-Bit Full Adder

    Full text link
    This technical note presents the design of a new area optimized asynchronous early output dual-bit full adder (DBFA). An asynchronous ripple carry adder (RCA) is constructed based on the new asynchronous DBFAs and existing asynchronous early output single-bit full adders (SBFAs). The asynchronous DBFAs and SBFAs incorporate redundant logic and are encoded using the delay-insensitive dual-rail code (i.e. homogeneous data encoding) and follow a 4-phase return-to-zero handshaking. Compared to the previous asynchronous RCAs involving DBFAs and SBFAs, which are based on homogeneous or heterogeneous delay-insensitive data encodings and which correspond to different timing models, the early output asynchronous RCA incorporating the proposed DBFAs and/or SBFAs is found to result in reduced area for the dual-operand addition operation and feature significantly less latency than the asynchronous RCAs which consist of only SBFAs. The proposed asynchronous DBFA requires 28.6% less silicon than the previously reported asynchronous DBFA. For a 32-bit asynchronous RCA, utilizing 2 stages of SBFAs in the least significant positions and 15 stages of DBFAs in the more significant positions leads to optimization in the latency. The new early output 32-bit asynchronous RCA containing DBFAs and SBFAs reports the following optimizations in design metrics over its counterparts: i) 18.8% reduction in area than a previously reported 32-bit early output asynchronous RCA which also has 15 stages of DBFAs and 2 stages of SBFAs, ii) 29.4% reduction in latency than a 32-bit early output asynchronous RCA containing only SBFAs.Comment: 12 pages. arXiv admin note: substantial text overlap with arXiv:1706.04487, arXiv:1704.0761

    Weighted p-bits for FPGA implementation of probabilistic circuits

    Full text link
    Probabilistic spin logic (PSL) is a recently proposed computing paradigm based on unstable stochastic units called probabilistic bits (p-bits) that can be correlated to form probabilistic circuits (p-circuits). These p-circuits can be used to solve problems of optimization, inference and also to implement precise Boolean functions in an "inverted" mode, where a given Boolean circuit can operate in reverse to find the input combinations that are consistent with a given output. In this paper we present a scalable FPGA implementation of such invertible p-circuits. We implement a "weighted" p-bit that combines stochastic units with localized memory structures. We also present a generalized tile of weighted p-bits to which a large class of problems beyond invertible Boolean logic can be mapped, and how invertibility can be applied to interesting problems such as the NP-complete Subset Sum Problem by solving a small instance of this problem in hardware

    Soft Realization: a Bio-inspired Implementation Paradigm

    Full text link
    Researchers traditionally solve the computational problems through rigorous and deterministic algorithms called as Hard Computing. These precise algorithms have widely been realized using digital technology as an inherently reliable and accurate implementation platform, either in hardware or software forms. This rigid form of implementation which we refer as Hard Realization relies on strict algorithmic accuracy constraints dictated to digital design engineers. Hard realization admits paying as much as necessary implementation costs to preserve computation precision and determinism throughout all the design and implementation steps. Despite its prior accomplishments, this conventional paradigm has encountered serious challenges with today's emerging applications and implementation technologies. Unlike traditional hard computing, the emerging soft and bio-inspired algorithms do not rely on fully precise and deterministic computation. Moreover, the incoming nanotechnologies face increasing reliability issues that prevent them from being efficiently exploited in hard realization of applications. This article examines Soft Realization, a novel bio-inspired approach to design and implementation of an important category of applications noticing the internal brain structure. The proposed paradigm mitigates major weaknesses of hard realization by (1) alleviating incompatibilities with today's soft and bio-inspired algorithms such as artificial neural networks, fuzzy systems, and human sense signal processing applications, and (2) resolving the destructive inconsistency with unreliable nanotechnologies. Our experimental results on a set of well-known soft applications implemented using the proposed soft realization paradigm in both reliable and unreliable technologies indicate that significant energy, delay, and area savings can be obtained compared to the conventional implementation.Comment: The Imprecise (Approximate) computing and Relaxed Fault Tolerance concept are some but not all instances of the Soft Realization. The soft realization and imprecise computing are first introduced around 2005 as H.R. Mahdiani Phd Thesis proposal. The first imprecise computing paper is published in 2010. This manuscript is written in 2012, submitted to Nature in 2017 and rejected by the editor

    Optimized Configurable Architectures for Scalable Soft-Input Soft-Output MIMO Detectors with 256-QAM

    Full text link
    This paper presents an optimized low-complexity and high-throughput multiple-input multiple-output (MIMO) signal detector core for detecting spatially-multiplexed data streams. The core architecture supports various layer configurations up to 4, while achieving near-optimal performance, as well as configurable modulation constellations up to 256-QAM on each layer. The core is capable of operating as a soft-input soft-output log-likelihood ratio (LLR) MIMO detector which can be used in the context of iterative detection and decoding. High area-efficiency is achieved via algorithmic and architectural optimizations performed at two levels. First, distance computations and slicing operations for an optimal 2-layer maximum a posteriori (MAP) MIMO detector are optimized to eliminate the use of multipliers and reduce the overhead of slicing in the presence of soft-input LLRs. We show that distances can be easily computed using elementary addition operations, while optimal slicing is done via efficient comparisons with soft decision boundaries, resulting in a simple feed-forward pipelined architecture. Second, to support more layers, an efficient channel decomposition scheme is presented that reduces the detection of multiple layers into multiple 2-layer detection subproblems, which map onto the 2-layer core with a slight modification using a distance accumulation stage and a post-LLR processing stage. Various architectures are accordingly developed to achieve a desired detection throughput and run-time reconfigurability by time-multiplexing of one or more component cores. The proposed core is applied as well to design an optimal multi-user MIMO detector for LTE. The core occupies an area of 1.58MGE and achieves a throughput of 733 Mbps for 256-QAM when synthesized in 90 nm CMOS

    Superconductor Digital Electronics: Scalability and Energy Efficiency Issues

    Full text link
    Superconductor digital electronics using Josephson junctions as ultrafast switches and magnetic-flux encoding of information was proposed over 30 years ago as a sub-terahertz clock frequency alternative to semiconductor electronics based on complementary metal-oxide-semiconductor (CMOS) transistors. Recently, interest in developing superconductor electronics has been renewed due to a search for energy saving solutions in applications related to high-performance computing. The current state of superconductor electronics and fabrication processes are reviewed in order to evaluate whether this electronics is scalable to a very large scale integration (VLSI) required to achieve computation complexities comparable to CMOS processors. A fully planarized process at MIT Lincoln Laboratory, perhaps the most advanced process developed so far for superconductor electronics, is used as an example. The process has nine superconducting layers: eight Nb wiring layers with the minimum feature size of 350 nm, and a thin superconducting layer for making compact high-kinetic-inductance bias inductors. All circuit layers are fully planarized using chemical mechanical planarization (CMP) of SiO2 interlayer dielectric. The physical limitations imposed on the circuit density by Josephson junctions, circuit inductors, shunt and bias resistors, etc., are discussed. Energy dissipation in superconducting circuits is also reviewed in order to estimate whether this technology, which requires cryogenic refrigeration, can be energy efficient. Fabrication process development required for increasing the density of superconductor digital circuits by a factor of ten and achieving densities above 10^7 Josephson junctions per cm^2 is described.Comment: 20 pages, 14 figures, 153 references, submitted to Low Temperature Physic
    • …
    corecore