576 research outputs found

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Energy Efficient Spintronic Device for Neuromorphic Computation

    Get PDF
    Future computing will require significant development in new computing device paradigms. This is motivated by CMOS devices reaching their technological limits, the need for non-Von Neumann architectures as well as the energy constraints of wearable technologies and embedded processors. The first device proposal, an energy-efficient voltage-controlled domain wall device for implementing an artificial neuron and synapse is analyzed using micromagnetic modeling. By controlling the domain wall motion utilizing spin transfer or spin orbit torques in association with voltage generated strain control of perpendicular magnetic anisotropy in the presence of Dzyaloshinskii-Moriya interaction (DMI), different positions of the domain wall are realized in the free layer of a magnetic tunnel junction to program different synaptic weights. Additionally, an artificial neuron can be realized by combining this DW device with a CMOS buffer. The second neuromorphic device proposal is inspired by the brain. Membrane potential of many neurons oscillate in a subthreshold damped fashion and fire when excited by an input frequency that nearly equals their Eigen frequency. We investigate theoretical implementation of such “resonate-and-fire” neurons by utilizing the magnetization dynamics of a fixed magnetic skyrmion based free layer of a magnetic tunnel junction (MTJ). Voltage control of magnetic anisotropy or voltage generated strain results in expansion and shrinking of a skyrmion core that mimics the subthreshold oscillation. Finally, we show that such resonate and fire neurons have potential application in coupled nanomagnetic oscillator based associative memory arrays

    Energy-Efficient Neural Network Architectures

    Full text link
    Emerging systems for artificial intelligence (AI) are expected to rely on deep neural networks (DNNs) to achieve high accuracy for a broad variety of applications, including computer vision, robotics, and speech recognition. Due to the rapid growth of network size and depth, however, DNNs typically result in high computational costs and introduce considerable power and performance overheads. Dedicated chip architectures that implement DNNs with high energy efficiency are essential for adding intelligence to interactive edge devices, enabling them to complete increasingly sophisticated tasks by extending battery lie. They are also vital for improving performance in cloud servers that support demanding AI computations. This dissertation focuses on architectures and circuit technologies for designing energy-efficient neural network accelerators. First, a deep-learning processor is presented for achieving ultra-low power operation. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance back-end, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased energy efficiency. Featuring a reconfigurable datapath and a memory architecture optimized for energy efficiency, the processor supports multilevel dynamic activation of neural network segments, performing object detection tasks with 5.3x lower energy consumption in comparison with a static execution baseline. Fabricated in 40nm CMOS, the processor test-chip dissipates 0.23mW at 5.3 fps. It demonstrates energy scalability up to 28.6 TOPS/W and can be configured to run a variety of workloads, including severely power-constrained ones such as always-on monitoring in mobile applications. To further improve the energy efficiency of the proposed heterogeneous architecture, a new charge-recovery logic family, called zero-short-circuit current (ZSCC) logic, is proposed to decrease the power consumption of the always-on front-end. By relying on dedicated circuit topologies and a four-phase clocking scheme, ZSCC operates with significantly reduced short-circuit currents, realizing order-of-magnitude power savings at relatively low clock frequencies (in the order of a few MHz). The efficiency and applicability of ZSCC is demonstrated through an ANSI S1.11 1/3 octave filter bank chip for binaural hearing aids with two microphones per ear. Fabricated in a 65nm CMOS process, this charge-recovery chip consumes 13.8”W with a 1.75MHz clock frequency, achieving 9.7x power reduction per input in comparison with a 40nm monophonic single-input chip that represents the published state of the art. The ability of ZSCC to further increase the energy efficiency of the heterogeneous neural network architecture is demonstrated through the design and evaluation of a ZSCC-based front-end. Simulation results show 17x power reduction compared with a conventional static CMOS implementation of the same architecture.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147614/1/hsiwu_1.pd

    ULTRA–LOW POWER STRAINTRONIC NANOMAGNETIC COMPUTING WITH SAW WAVES: AN EXPERIMENTAL STUDY OF SAW INDUCED MAGNETIZATION SWITCHING AND PROPERTIES OF MAGNETIC NANOSTRUCTURES

    Get PDF
    A recent International Technology Roadmap for Semiconductors (ITRS) report (2.0, 2015 edition) has shown that Moore’s law is unlikely to hold beyond 2028. There is a need for alternate devices to replace CMOS based devices, if further miniaturization and high energy efficiency is desired. The goal of this dissertation is to experimentally demonstrate the feasibility of nanomagnetic memory and logic devices that can be clocked with acoustic waves in an extremely energy efficient manner. While clocking nanomagnetic logic by stressing the magnetostrictive layer of a multiferroic logic element with with an electric field applied across the piezoelectric layer is known to be an extremely energy-efficient clocking scheme, stressing every nanomagnet separately requires individual contacts to each one of them that would necessitate cumbersome lithography. On the other hand, if all nanomagnets are stressed simultaneously with a global voltage, it will eliminate the need for individual contacts, but such a global clock makes the architecture non-pipelined (the next input bit cannot be written till the previous bit has completely propagated through the chain) and therefore, unacceptably slow and error prone. Use of global acoustic wave, that has in-built granularity, would offer the best of both worlds. As the crest and the trough propagate in space with a velocity, nanomagnets that find themselves at a crest are stressed in tension while those in the trough are compressed. All other magnets are relaxed (no stress). Thus, all magnets are not stressed simultaneously but are clocked in a sequentially manner, even though the clocking agent is global. Finally, the acoustic wave energy is distributed over billions of nanomagnets it clocks, which results in an extremely small energy cost per bit per nanomagnet. In summary, acoustic clocking of nanomagnets can lead to extremely energy efficient nanomagnetic computing devices while also eliminating the need for complex lithography. The dissertation work focuses on the following two topics: Acoustic Waves, generated by IDTs fabricated on a piezoelectric lithium niobate substrate, can be utilized to manipulate the magnetization states in elliptical Co nanomagnets. The magnetization switches from its initial single-domain state to a vortex state after SAW stress cycles propagate through the nanomagnets. The vortex states are stable and the magnetization remains in this state until it is ‘reset’ by an external magnetic field. 2. Acoustic Waves can also be utilized to induce 1800 magnetization switching in dipole coupled elliptical Co nanomagnets. The magnetization switches from its initial single-domain ‘up’ state to a single-domain ‘down’ state after SAW tensile/compressive stress cycles propagate through the nanomagnets. The switched state is stable and non-volatile. These results show the effective implementation of a Boolean NOT gate. Ultimately, the advantage of this technology is that it could also perform higher order information processing (not discussed here) while consuming extremely low power. Finally, while we have demonstrated acoustically clocked nanomagnetic memory and logic schemes with Co nanomagnets, materials with higher magnetostriction (such as FeGa) may ultimately improve the switching reliability of such devices. With this in mind we prepared and studied FeGa films using a ferromagnetic resonance (FMR) technique to extract properties of importance to magnetization dynamics in such materials that could have higher magneto elastic coupling than either Co or Ni

    Modelling, Simulation and Verification of 4-phase Adiabatic Logic Design: A VHDL-Based Approach

    Get PDF
    The design and functional verification of the 4-phase adiabatic logic implementation take longer due to the complexity of synchronizing the power-clock phases. Additionally, as the adiabatic system scales, the amount of time in debugging errors increases, thus, increasing the overall design and verification time. This paper proposes a VHDL-based modelling approach for speeding up the design and verification time of the 4-phase adiabatic logic systems. The proposed approach can detect the functional errors, allowing the designer to correct them at an early design stage, leading to substantial reduction of the design and debugging time. The originality of this approach lies in the realization of the trapezoidal power-clock using function declaration for the four periods namely; Evaluation (E), Hold (H), Recovery (R) and Idle (I) exclusively. The four periods are defined in a VHDL package followed by a library design which contains the behavioral VHDL model of adiabatic NOT/BUF logic gate. Finally, this library is used to model and verify the structural VHDL representations of the 4-phase 2-bit ring-counter and 3-bit up-down counter, as design examples to demonstrate the practicality of the proposed approach

    Energy efficient implementation of multi-phase quasi-adiabatic Cyclic Redundancy Check in near field communication

    Get PDF
    Ultra-low power operation in power-limited portable devices (e.g. cell phone and smartcard) is paramount. Existing conventional CMOS consume high energy. The adiabatic logic technique has the potential of rendering energy efficient operation. In this paper, a multi-phase quasi-adiabatic implementation of 16-bit Cyclic Redundancy Check (CRC) is proposed, compliant with the ISO/IEC-14443 standard for contactless smart cards. In terms of a number of CRC bits, the design is scalable and all generator polynomials and initial load values can be accommodated. The CRC design is used as a vehicle to evaluate a range of adiabatic logic styles and power-clock strategies. The effects of voltage scaling and variations in Process-Voltage-Temperature (PVT) are also investigated providing an insight into the robustness of adiabatic logic styles. PFAL and IECRL designs using a 4-phase power-clock are shown to be both the most energy-efficient and robust designs
    • 

    corecore