15,804 research outputs found

    Indicating Asynchronous Array Multipliers

    Full text link
    Multiplication is an important arithmetic operation that is frequently encountered in microprocessing and digital signal processing applications, and multiplication is physically realized using a multiplier. This paper discusses the physical implementation of many indicating asynchronous array multipliers, which are inherently elastic and modular and are robust to timing, process and parametric variations. We consider the physical realization of many indicating asynchronous array multipliers using a 32/28nm CMOS technology. The weak-indication array multipliers comprise strong-indication or weak-indication full adders, and strong-indication 2-input AND functions to realize the partial products. The multipliers were synthesized in a semi-custom ASIC design style using standard library cells including a custom-designed 2-input C-element. 4x4 and 8x8 multiplication operations were considered for the physical implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one (RTO) handshake protocols were utilized for data communication, and the delay-insensitive dual-rail code was used for data encoding. Among several weak-indication array multipliers, a weak-indication array multiplier utilizing a biased weak-indication full adder and the strong-indication 2-input AND function is found to have reduced cycle time and power-cycle time product with respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further, the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943

    On signalling over through-silicon via (TSV) interconnects in 3-D integrated circuits.

    Get PDF
    This paper discusses signal integrity (SI) issues and signalling techniques for Through Silicon Via (TSV) interconnects in 3-D Integrated Circuits (ICs). Field-solver extracted parasitics of TSVs have been employed in Spice simulations to investigate the effect of each parasitic component on performance metrics such as delay and crosstalk and identify a reduced-order electrical model that captures all relevant effects. We show that in dense TSV structures voltage-mode (VM) signalling does not lend itself to achieving high data-rates, and that current-mode (CM) signalling is more effective for high throughput signalling as well as jitter reduction. Data rates, energy consumption and coupled noise for the different signalling modes are extracted

    Area/latency optimized early output asynchronous full adders and relative-timed ripple carry adders

    Get PDF
    This article presents two area/latency optimized gate level asynchronous full adder designs which correspond to early output logic. The proposed full adders are constructed using the delay-insensitive dual-rail code and adhere to the four-phase return-to-zero handshaking. For an asynchronous ripple carry adder (RCA) constructed using the proposed early output full adders, the relative-timing assumption becomes necessary and the inherent advantages of the relative-timed RCA are: (1) computation with valid inputs, i.e., forward latency is data-dependent, and (2) computation with spacer inputs involves a bare minimum constant reverse latency of just one full adder delay, thus resulting in the optimal cycle time. With respect to different 32-bit RCA implementations, and in comparison with the optimized strong-indication, weak-indication, and early output full adder designs, one of the proposed early output full adders achieves respective reductions in latency by 67.8, 12.3 and 6.1 %, while the other proposed early output full adder achieves corresponding reductions in area by 32.6, 24.6 and 6.9 %, with practically no power penalty. Further, the proposed early output full adders based asynchronous RCAs enable minimum reductions in cycle time by 83.4, 15, and 8.8 % when considering carry-propagation over the entire RCA width of 32-bits, and maximum reductions in cycle time by 97.5, 27.4, and 22.4 % for the consideration of a typical carry chain length of 4 full adder stages, when compared to the least of the cycle time estimates of various strong-indication, weak-indication, and early output asynchronous RCAs of similar size. All the asynchronous full adders and RCAs were realized using standard cells in a semi-custom design fashion based on a 32/28 nm CMOS process technology

    Xampling: Signal Acquisition and Processing in Union of Subspaces

    Full text link
    We introduce Xampling, a unified framework for signal acquisition and processing of signals in a union of subspaces. The main functions of this framework are two. Analog compression that narrows down the input bandwidth prior to sampling with commercial devices. A nonlinear algorithm then detects the input subspace prior to conventional signal processing. A representative union model of spectrally-sparse signals serves as a test-case to study these Xampling functions. We adopt three metrics for the choice of analog compression: robustness to model mismatch, required hardware accuracy and software complexities. We conduct a comprehensive comparison between two sub-Nyquist acquisition strategies for spectrally-sparse signals, the random demodulator and the modulated wideband converter (MWC), in terms of these metrics and draw operative conclusions regarding the choice of analog compression. We then address lowrate signal processing and develop an algorithm for that purpose that enables convenient signal processing at sub-Nyquist rates from samples obtained by the MWC. We conclude by showing that a variety of other sampling approaches for different union classes fit nicely into our framework.Comment: 16 pages, 9 figures, submitted to IEEE for possible publicatio

    autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

    Full text link
    Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately 10310^3 highly important implementations from 102310^{23} possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor.Comment: Accepted for publication at the Design Automation Conference 2019 (DAC'19), Las Vegas, Nevada, US

    Energy Detection UWB Receiver Design using a Multi-resolution VHDL-AMS Description

    Get PDF
    Ultra Wide Band (UWB) impulse radio systems are appealing for location-aware applications. There is a growing interest in the design of UWB transceivers with reduced complexity and power consumption. Non-coherent approaches for the design of the receiver based on energy detection schemes seem suitable to this aim and have been adopted in the project the preliminary results of which are reported in this paper. The objective is the design of a UWB receiver with a top-down methodology, starting from Matlab-like models and refining the description down to the final transistor level. This goal will be achieved with an integrated use of VHDL for the digital blocks and VHDL-AMS for the mixed-signal and analog circuits. Coherent results are obtained using VHDL-AMS and Matlab. However, the CPU time cost strongly depends on the description used in the VHDL-AMS models. In order to show the functionality of the UWB architecture, the receiver most critical functions are simulated showing results in good agreement with the expectations
    • 

    corecore