1,793 research outputs found

    Yield-driven power-delay-optimal CMOS full-adder design complying with automotive product specifications of PVT variations and NBTI degradations

    Get PDF
    We present the detailed results of the application of mathematical optimization algorithms to transistor sizing in a full-adder cell design, to obtain the maximum expected fabrication yield. The approach takes into account all the fabrication process parameter variations specified in an industrial PDK, in addition to operating condition range and NBTI aging. The final design solutions present transistor sizing, which depart from intuitive transistor sizing criteria and show dramatic yield improvements, which have been verified by Monte Carlo SPICE analysis

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

    Why area might reduce power in nanoscale CMOS

    Get PDF
    In this paper we explore the relationship between power and area. By exploiting parallelism (and thus using more area) one can reduce the switching frequency allowing a reduction in VDD which results in a reduction in power. Under a scaling regime which allows threshold voltage to increase as VDD decreases we find that dynamic and subthreshold power loss in CMOS exhibit a dependence on area proportional to A(s-3)s/ while gate leakage power ? A(s-6)s/, and short circuit power ? A(s-8)s/. Thus, with the large number of devices at our disposal we can exploit techniques such as spatial computing, tailoring the program directly to the hardware, to overcome the negative effects of scaling. The value of s describes the effectiveness of the technique for a particular circuit and/or algorithm - for circuits that exhibit a value of s =3, power will be a constant or reducing function of area. We briefly speculate on how s might be influenced by a move to nanoscale technology

    Current-Mode Techniques for the Implementation of Continuous- and Discrete-Time Cellular Neural Networks

    Get PDF
    This paper presents a unified, comprehensive approach to the design of continuous-time (CT) and discrete-time (DT) cellular neural networks (CNN) using CMOS current-mode analog techniques. The net input signals are currents instead of voltages as presented in previous approaches, thus avoiding the need for current-to-voltage dedicated interfaces in image processing tasks with photosensor devices. Outputs may be either currents or voltages. Cell design relies on exploitation of current mirror properties for the efficient implementation of both linear and nonlinear analog operators. These cells are simpler and easier to design than those found in previously reported CT and DT-CNN devices. Basic design issues are covered, together with discussions on the influence of nonidealities and advanced circuit design issues as well as design for manufacturability considerations associated with statistical analysis. Three prototypes have been designed for l.6-pm n-well CMOS technologies. One is discrete-time and can be reconfigured via local logic for noise removal, feature extraction (borders and edges), shadow detection, hole filling, and connected component detection (CCD) on a rectangular grid with unity neighborhood radius. The other two prototypes are continuous-time and fixed template: one for CCD and other for noise removal. Experimental results are given illustrating performance of these prototypes

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    Multi-objective Pareto front and particle swarm optimization algorithms for power dissipation reduction in microprocessors

    Get PDF
    The progress of microelectronics making possible higher integration densities, and a considerable development of on-board systems are currently undergoing, this growth comes up against a limiting factor of power dissipation. Higher power dissipation will cause an immediate spread of generated heat which causes thermal problems. Consequently, the system's total consumed energy will increase as the system temperature increase. High temperatures in microprocessors and large thermal energy of computer systems produce huge problems of system confidence, performance, and cooling expenses. Power consumed by processors are mainly due to the increase in number of cores and the clock frequency, which is dissipated in the form of heat and causes thermal challenges for chip designers. As the microprocessor’s performance has increased remarkably in Nano-meter technology, power dissipation is becoming non-negligible. To solve this problem, this article addresses power dissipation reduction issues for high performance processors using multi-objective Pareto front (PF), and particle swarm optimization (PSO) algorithms to achieve power dissipation as a prior computation that reduces the real delay of a target microprocessor unit. Simulation is verified the conceptual fundamentals and optimization of joint body and supply voltages (Vth-VDD) which showing satisfactory findings
    • 

    corecore