235 research outputs found
Low Power SoC Design
The design of Low Power Systems-on-Chips (SoC) in very deep submicron technologies becomes a very complex task that has to bridge very high level system description with low-level considerations due to technology defaults and variations and increasing system and circuit complexity. This paper describes the major low-level issues, such as dynamic and static power consumption, temperature, technology variations, interconnect, DFM, reliability and yield, and their impact on high-level design, such as the design of multi-Vdd, fault-tolerant, redundant or adaptive chip architectures. Some very low power System-on-Chip (SoC) will be presented in three domains: wireless sensor networks, vision sensors and mobile TV
Recommended from our members
A Process Variation Tolerant Self-Compensation Sense Amplifier Design
As we move under the aegis of the Moore\u27s law, we have to deal with its darker side with problems like leakage and short channel effects. Once we go beyond 45nm regime process variations also have emerged as a significant design concern.Embedded memories uses sense amplifier for fast sensing and typically, sense amplifiers uses pair of matched transistors in a positive feedback environment. A small difference in voltage level of applied input signals to these matched transistors is amplified and the resulting logic signals are latched. Intra die variation causes mismatch between the sense transistors that should ideally be identical structures. Yield loss due to device and process variations has never been so critical to cause failure in circuits. Due to growth in size of embedded SRAMs as well as usage of sense amplifier based signaling techniques, process variations in sense amplifiers leads to significant loss of yield for that we need to come up with process variation tolerant circuit styles and new devices. In this work impact of transistor mismatch due to process variations on sense amplifier is evaluated and this problem is stated. For the solution of the problem a novel self compensation scheme on sense amplifiers is presented on different technology nodes up to 32nm on conventional bulk MOSFET technology. Our results show that the self compensation technique in the conventional bulk MOSFET latch type sense amplifier not just gives improvement in the yield but also leads to improvement in performance for latch type sense amplifiers. Lithography related CD variations, fluctuations in dopant density, oxide thickness and parametric variations of devices are identified as a major challenge to the classical bulk type MOSFET. With the emerging nanoscale devices, SIA roadmap identifies FinFETs as a candidate for post-planar end-of-roadmap CMOS device. With current technology scaling issues and with conventional bulk type MOSFET on 32nm node our technique can easily be applied to Double Gate devices. In this work, we also develop the model of Double Gate MOSFET through 3D Device Simulator Damocles and TCAD simulator. We propose a FinFET based process variation tolerant sense amplifier design that exploits the back gate of FinFET devices for dynamic compensation against process variations. Results from statistical simulation show that the proposed dynamic compensation is highly effective in restoring yield at a level comparable to that of sense amplifiers without process variations. We created the 32nm double gate models generated from Damocles 3-D device simulations [25] and Taurus Device Simulator available commercially from Synopsys [47] and use them in the nominal latch type sense amplifier design and on the Independent Gate Self Compensation Sense Amplifier Design (IGSSA) to compare the yield and performance benefits of sense amplifier design on FinFET technology over the conventional bulk type CMOS based sense amplifier on 32nm technology node effective in restoring yield at a level comparable to that of sense amplifiers without process variations. We created the 32nm double gate models generated from Damocles 3-D device simulations [25] and Taurus Device Simulator available commercially from Synopsys [47] and use them in the nominal latch type sense amplifier design and on the Independent Gate Self Compensation Sense Amplifier Design (IGSSA) to compare the yield and performance benefits of sense amplifier design on FinFET technology over the conventional bulk type CMOS based sense amplifier on 32nm technology node
CAD Techniques for Robust FPGA Design Under Variability
The imperfections in the semiconductor fabrication process and uncertainty in operating environment of VLSI circuits have emerged as critical challenges for the semiconductor industry. These are generally termed as process and environment variations, which lead to uncertainty in
performance and unreliable operation of the circuits. These problems have been
further aggravated in scaled nanometer technologies due to increased process
variations and reduced operating voltage.
Several techniques have been proposed recently for designing digital VLSI circuits
under variability. However, most of them have targeted ASICs and custom designs.
The flexibility of reconfiguration and unknown end application in FPGAs
make design under variability different for FPGAs compared to
ASICs and custom designs, and the techniques proposed for ASICs and custom designs cannot be directly applied
to FPGAs. An important design consideration is to minimize the modifications in architecture and circuit
to reduce the cost of changing the existing FPGA architecture and circuit.
The focus of this work can be divided into three principal categories, which are, improving
timing yield under process variations, improving power yield under process variations and improving the voltage profile
in the FPGA power grid.
The work on timing yield improvement proposes routing architecture enhancements along with CAD techniques to
improve the timing yield of FPGA designs. The work on power yield improvement for FPGAs selects a low power dual-Vdd FPGA design
as the baseline FPGA architecture for developing power yield enhancement techniques. It proposes CAD techniques to improve the
power yield of FPGAs. A mathematical programming technique is proposed to determine the parameters
of the buffers in the interconnect such as the sizes of the transistors and threshold voltage of the transistors, all
within constraints, such that the leakage variability is minimized under delay constraints.
Two CAD techniques are investigated and proposed to improve the supply voltage profile of
the power grids in FPGAs. The first technique is a place and route technique and the second technique
is a logic clustering technique to reduce IR-drops and spatial variation of supply voltage in the power grid
Characterization and mitigation of process variation in digital circuits and systems
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-166).Process variation threatens to negate a whole generation of scaling in advanced process technologies due to performance and power spreads of greater than 30-50%. Mitigating this impact requires a thorough understanding of the variation sources, magnitudes and spatial components at the device, circuit and architectural levels. This thesis explores the impacts of variation at each of these levels and evaluates techniques to alleviate them in the context of digital circuits and systems. At the device level, we propose isolation and measurement of variation in the intrinsic threshold voltage of a MOSFET using sub-threshold leakage currents. Analysis of the measured data, from a test-chip implemented on a 0. 18[mu]m CMOS process, indicates that variation in MOSFET threshold voltage is a truly random process dependent only on device dimensions. Further decomposition of the observed variation reveals no systematic within-die variation components nor any spatial correlation. A second test-chip capable of characterizing spatial variation in digital circuits is developed and implemented in a 90nm triple-well CMOS process. Measured variation results show that the within-die component of variation is small at high voltages but is an increasing fraction of the total variation as power-supply voltage decreases. Once again, the data shows no evidence of within-die spatial correlation and only weak systematic components. Evaluation of adaptive body-biasing and voltage scaling as variation mitigation techniques proves voltage scaling is more effective in performance modification with reduced impact to idle power compared to body-biasing.(cont.) Finally, the addition of power-supply voltages in a massively parallel multicore processor is explored to reduce the energy required to cope with process variation. An analytic optimization framework is developed and analyzed; using a custom simulation methodology, total energy of a hypothetical 1K-core processor based on the RAW core is reduced by 6-16% with the addition of only a single voltage. Analysis of yield versus required energy demonstrates that a combination of disabling poor-performing cores and additional power-supply voltages results in an optimal trade-off between performance and energy.by Nigel Anthony Drego.Ph.D
Recommended from our members
Very-Large-Scale-Integration Circuit Techniques in Internet-of-Things Applications
Heading towards the era of Internet-of-things (IoT) means both opportunity and challenge for the circuit-design community. In a system where billions of devices are equipped with the ability to sense, compute, communicate with each other and perform tasks in a coordinated manner, security and power management are among the most critical challenges.
Physically unclonable function (PUF) emerges as an important security primitive in hardware-security applications; it provides an object-specific physical identifier hidden within the intrinsic device variations, which is hard to expose and reproduce by adversaries. Yet, designing a compact PUF robust to noise, temperature and voltage remains a challenge.
This thesis presents a novel PUF design approach based on a pair of ultra-compact analog circuits whose output is proportional to absolute temperature. The proposed approach is demonstrated through two works: (1) an ultra-compact and robust PUF based on voltage-compensated proportional-to-absolute-temperature voltage generators that occupies 8.3× less area than the previous work with the similar robustness and twice the robustness of the previously most compact PUF design and (2) a technique to transform a 6T-SRAM array into a robust analog PUF with minimal overhead. In this work, similar circuit topology is used to transform a preexisting on-chip SRAM into a PUF, which further reduces the area in (1) with no robustness penalty.
In this thesis, we also explore techniques for power management circuit design.
Energy harvesting is an essential functionality in an IoT sensor node, where battery replacement is cost-prohibitive or impractical. Yet, existing energy-harvesting power management units (EH PMU) suffer from efficiency loss in the two-step voltage conversion: harvester-to-battery and battery-to-load. We propose an EH PMU architecture with hybrid energy storage, where a capacitor is introduced in addition to the battery to serve as an intermediate energy buffer to minimize the battery involvement in the system energy flow. Test-case measurements show as much as a 2.2× improvement in the end-to-end energy efficiency.
In contrast, with the drastically reduced power consumption of IoT nodes that operates in the sub-threshold regime, adaptive dynamic voltage scaling (DVS) for supply-voltage margin removal, fully on-chip integration and high power conversion efficiency (PCE) are required in PMU designs. We present a PMU–load co-design based on a fully integrated switched-capacitor DC-DC converter (SC-DC) and hybrid error/replica-based regulation for a fully digital PMU control. The PMU is integrated with a neural spike processor (NSP) that achieves a record-low power consumption of 0.61 µW for 96 channels. A tunable replica circuit is added to assist the error regulation and prevent loss of regulation. With automatic energy-robustness co-optimization, the PMU can set the SC-DC’s optimal conversion ratio and switching frequency. The PMU achieves a PCE of 77.7% (72.2%) at VIN = 0.6 V (1 V) and at the NSP’s margin-free operating point
ULTRA ENERGY-EFFICIENT SUB-/NEAR-THRESHOLD COMPUTING: PLATFORM AND METHODOLOGY
Ph.DDOCTOR OF PHILOSOPH
Robust Design of Variation-Sensitive Digital Circuits
The nano-age has already begun, where typical feature dimensions are smaller than 100nm. The operating frequency is expected to increase up to
12 GHz, and a single chip will contain over 12 billion transistors in 2020, as given by the International Technology Roadmap for Semiconductors
(ITRS) initiative. ITRS also predicts that the scaling of CMOS devices and process technology, as it is known today, will become much more
difficult as the industry advances towards the 16nm technology node and further. This aggressive scaling of CMOS technology has pushed the
devices to their physical limits. Design goals are governed by several factors other than power, performance and area such as process
variations, radiation induced soft errors, and aging degradation mechanisms. These new design challenges have a strong impact on the parametric
yield of nanometer digital circuits and also result in functional yield losses in variation-sensitive digital circuits such as Static Random
Access Memory (SRAM) and flip-flops. Moreover, sub-threshold SRAM and flip-flops circuits, which are aggravated by the strong demand for lower
power consumption, show larger sensitivity to these challenges which reduces their robustness and yield. Accordingly, it is not surprising that
the ITRS considers variability and reliability as the most challenging obstacles for nanometer digital circuits robust design.
Soft errors are considered one of the main reliability and robustness concerns in SRAM arrays in sub-100nm technologies due to low operating
voltage, small node capacitance, and high packing density. The SRAM arrays soft errors immunity is also affected by process variations. We
develop statistical design-oriented soft errors immunity variations models for super-threshold and sub-threshold SRAM cells accounting for
die-to-die variations and within-die variations. This work provides new design insights and highlights the important design knobs that can be
used to reduce the SRAM cells soft errors immunity variations. The developed models are scalable, bias dependent, and only require the
knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit optimization as well as technology
prediction. The derived models are verified using Monte Carlo SPICE simulations, referring to an industrial hardware-calibrated 65nm CMOS
technology.
The demand for higher performance leads to very deep pipelining which means that hundreds of thousands of flip-flops are required to control
the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop can result in latching incorrect data
causing the overall system to malfunction. In addition, the flip-flops power dissipation represents a considerable fraction of the total power
dissipation. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of
secondary importance. Accordingly, statistical gate sizing is conducted to different flip-flops topologies for timing yield improvement of
super-threshold flip-flops and power yield improvement of sub-threshold flip-flops. Following that, a comparative analysis between these
flip-flops topologies considering the required overhead for yield improvement is performed. This comparative analysis provides useful
recommendations that help flip-flops designers on selecting the best flip-flops topology that satisfies their system specifications while
taking the process variations impact and robustness requirements into account.
Adaptive Body Bias (ABB) allows the tuning of the transistor threshold voltage, Vt, by controlling the transistor body voltage. A forward
body bias reduces Vt, increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias increases
Vt, reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and
less leaky devices or slowing down devices that are fast and highly leaky. Practically, the implementation of the ABB is desirable to bias each
device in a design independently, to mitigate within-die variations. However, supplying so many separate voltages inside a die results in a
large area overhead. On the other hand, using the same body bias for all devices on the same die limits its capability to compensate for
within-die variations. Thus, the granularity level of the ABB scheme is a trade-off between the within-die variations compensation capability
and the associated area overhead. This work introduces new ABB circuits that exhibit lower area overhead by a factor of 143X than that of
previous ABB circuits. In addition, these ABB circuits are resolution free since no digital-to-analog converters or analog-to-digital
converters are required on their implementations. These ABB circuits are adopted to high performance critical paths, emulating a real
microprocessor architecture, for process variations compensation and also adopted to SRAM arrays, for Negative Bias Temperature Instability
(NBTI) aging and process variations compensation. The effectiveness of the new ABB circuits is verified by post layout simulation results and
test chip measurements using triple-well 65nm CMOS technology.
The highly capacitive nodes of wide fan-in dynamic circuits and SRAM bitlines limit the performance of these circuits. In addition, process
variations mitigation by statistical gate sizing increases this capacitance further and fails in achieving the target yield improvement. We
propose new negative capacitance circuits that reduce the overall parasitic capacitance of these highly capacitive nodes. These negative
capacitance circuits are adopted to wide fan-in dynamic circuits for timing yield improvement up to 99.87% and to SRAM arrays for read access
yield improvement up to 100%. The area and power overheads of these new negative capacitance circuits are amortized over the large die area of
the microprocessor and the SRAM array. The effectiveness of the new negative capacitance circuits is verified by post layout simulation results
and test chip measurements using 65nm CMOS technology
Microarchitectural Low-Power Design Techniques for Embedded Microprocessors
With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided
Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic
For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies.
Contributions of this work:
• Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation
• Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components
• Detailed performance characterization of CYA
– Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS)
– Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio
Energy Reduction Techniques to Increase Battery Life for Electronic Sensor Nodes
Preserving battery life in duty-cycled sensor nodes requires minimizing energy use in the active region. Lowering the power supply of CMOS gates down into sub-threshold mode is a good way to decrease energy. In this work, a unique technique to control the current in CMOS gates to reliably operate them in sub-threshold mode is described. Compared to the current state-of-theart for running digital gates in the sub-threshold regime, this work is often superior in its lack of complexity and in reduced variance in delay caused by process variations. In addition to presenting the design considerations, a demonstration of a complete digital design flow is given using the custom gates. An AES128 encryption/decryption engine is designed using the aforementioned digital flow in a commercial 180nm process. The resulting design has a ratio of maximum to minimum frequency variation over corners of only 50% with a 0.3V power supply where the same ratio with standard CMOS gates biased under the same supply voltage is 5600%. In addition, the custom gates are used to design a Wallace tree multiplier in an SOI 45nm process that is fully functional with an optimum energy power supply level of 0.34V with a typical operating frequency of 8 MHz having a variation over corners of 80%.
For a proof of concept memory chip designed in this work, the architecture uses a logiccompatible CMOS process particularly suitable for embedded applications. The differential pair construct causes the read and refresh power to be independent of any process parameter including the within-die threshold voltage. The current stop feature keeps the read voltage transition low to further minimize read power. The bit cell operates in both single bit BASE2 and multi-bit BASE4 modes. An expression for the read signal was verified with bit cell simulations. These simulations also compare the performance impact of threshold voltage variance in the architecture with a standard gain cell. A DRAM bit cell array was fabricated in the XFab 180nm CMOS process. Measured waveforms closely match theoretical results obtained from a system simulation. The silicon retention time was measured at room temperature and is greater than 150 ms in BASE2 mode and greater than 75 ms in BASE4 mode. 180nm, 25C analysis predicts 0.8uW/Mbit refresh power at 630 MHz, the lowest in the literature. Further: the memory bit cell architecture presented here has a refresh power delay product several times lower than any other published architecture.
The current controlled memory architecture in this work improves or overcomes the drawbacks of the 1T1C and gain cell memory architectures. A current controlled memory design was fabricated as a 131K bit array in an 180nm process to provide silicon proof. The bit cell configuration with shared read and write bit cells gives effectively two memory banks. The grouping of rows together into common source domains allows only two opamps to control the current in all the bit cells across the whole chip. The sense amplifiers have a globally controlled switching threshold point and keep their static power in the nano-amp range. The bit cells can operate either in BASE2 or BASE4 mode and the read bit line transitions are reduced with a current stop construct. Parts were received from the foundry in an 84-pin PLCC and were tested at a number of locations on the die. They proved to be fully functional in BASE4. The silicon retention time was measured at room temperature and was greater than four seconds
- …