This special section consists of six articles covering a range of important issues such as nanometer MOSFET minimum-energy point, power gating considering negative bias temperature instability, low-power behavioral synthesis, memory architecture for 3D integration, a virtual machine for green computing, and firmware management of networked embedded systems.
Nanometer MOSFET Effects on Minimum-Energy Point
Lowering the supply voltage results in a quadratic reduction in dynamic power. However, as CMOS technology scaling increases leakage current, shortchannel effects, and process variation, the longer gate delay caused by lowering supply voltage again results in extended operation time, which eventually increases leakage energy over the execution time. Therefore, there exists an optimum supply voltage that achieves the minimum energy per operation [Calhoun et al. 2005] . This minimum-energy point operation concept is widely applied to leakage-aware Dynamic Voltage Scaling (DVS) [Calhoun and Chandrakasan 2006; Zhai et al. 2005] .
Along with high-performance applications, there are strong needs for ultralow-power applications such as Radio-Frequency IDentification (RFID) tags, biomedical devices and sensor networks, fabricated in nanometer CMOS technologies [Kwong et al. 2009; Kaul et al. 2010; Pu et al. 2009 ]. Subthreshold logic fulfills ultra-low-power operation compromising speed performance [Soeleman and Roy 1999] . The minimum-energy operating point is extremely important due to the longer gate delay as well as large leakage power proportion.
It is widely known that variability cannot be neglected in nanometer technologies, especially for the threshold voltage variations due to Random Dopant Fluctuations (RDF). Consequently, variability strongly affects the minimumenergy operating point scaling trend [Bol et al. 2009 ]. The article entitled "Nanometer MOSFET Effects on the Minimum-Energy Point of Sub-45nm Subthreshold Logic -Mitigation at Technology and Circuit Levels" analyzes the effects that increase the minimum energy per operation in nanometer technologies. This article also proposes two directions for mitigating these new effects at 45nm node to keep the minimum-energy operating point under control. This enables the optimum MOSFET selection for circuit designers and fully depleted Silicon-On-Insulator (SOI) for technology developers. The results show that the use of low threshold voltage medium gate length MOSFETs in a 45nm technology yields 35% reduction in minimum energy per operation when compared with baseline MOSFETs.
NBTI and NBTI-Aware Design
Nowadays time-dependent degradation of a semiconductor is much more difficult to handle than process variation because the deviation cannot be detected at the manufacturing stage, but it develops later. Bias Temperature Instability (BTI) results in the degradation of the oxide, thus causing a drift of the threshold voltage over time. The threshold voltage drift again results in performance change and leakage power variation. Negative BTI (NBTI) affects PMOS transistors while Positive BTI (PBTI) affects NMOS devices. Logic "1" results in PMOS transistor degradation due to NBTI while logic "0" partly recovers the degradation process. Unfortunately, NBTI eventually incurs irreversible effects that cause reliability issues because the recovery process is limited. As current technologies mostly keep PMOS transistors negatively biased, thus NBTI gives more direct impact on reliability than PBTI. NBTI results in increase of the threshold voltage over time time and consequent decrease in drain current and transconductance. The degradation exhibits logarithmic dependence on time, and experimental data reports variation of the threshold voltage of about 10-15% per year.
Overall, it is very common to observe highly biased inputs for some PMOS transistors, which will degrade them very quickly [Abella et al. 2007 ]. Therefore, it would be possible to limit NBTI-induced aging by keeping most of the circuit signals to the logic "1" state, by proper implementation of the local functions and/or by isolation of nodes with high zero probability during technology mapping [Kumar et al. 2007] . Minimum leakage vectors lead to minimum circuit performance degradation and maintain maximum leakage reduction rate during the standby mode [Wang et al. 2007] .
NBTI also affects reliability of sequential circuits. NBTI tightens the setup and hold timing constraints imposed on the flip-flops. Different types of flipflops exhibit different levels of susceptibility to NBTI-induced change in their setup/hold time values. An elaborated NBTI-aware transistor sizing technique can minimize the NBTI effect on timing characteristics of the flip-flops [Abrishami et al. 2008] . Furthermore, memory-like structures have a special characteristic in NBTI degradation. As bit cells consist of two inverters arranged in a ring manner, there is always one of the inverters with negative voltage (logic input "0"). Thus, PMOS transistor NBTI degradation is unavoidable. Instead of elimination of the chances of the PMOS degradation in memory structures, the second best case degradation happens when the value at the output of each inverter is "0" 50% of the time [Abrishami et al. 2008] .
The article entitled "NBTI-Aware Clustered Power Gating" proposes a new solution for concurrently optimizing power consumption (static power, in particular) and aging. This article introduces power gating to mitigate NBTI. The authors observe that the conventional way of implementation of power gating for leakage reduction provides a natural way of reducing the NBTI effects. When a circuit is in a standby state, it is intrinsically immune to NBTI-induced aging. However, although power gating is always advantageous from the aging perspective, its actual effectiveness in terms of performance depends on its implementation.
The authors elaborate a clustered power gating strategy which allows a better control of the initial delay degradation. The authors implement tradeoff analysis with an automated tool that leverages standard EDA tools in an industry-strength design flow, using an industrial 45nm technology. Results on a set of standard benchmarks show average lifetime extensions of about 2.2X while saving, on average, 87% of leakage power with respect to the nonpowergated circuit.
Low-Power Behavioral Synthesis
Many low-power techniques in RTL synthesis rely on Observability Don't-Care (ODC) conditions. ODC conditions effectively identify unnecessary operations in a Boolean network [De Micheli 1994; Devadas et al. 1994; Hassoun and Sasao 2002] . ODC conditions are widely used to eliminate unnecessary operation of digital logics. ODC conditions are useful for power-friendly RTL descriptions in behavioral synthesis. Operation gating explicitly adds predicate to an operation based on its ODC condition so that the operation can be avoided using RTL power optimization techniques such as clock gating.
The problem of computing ODC conditions in a sequential RTL model has been approached in a number of ways. The article entitled "Behavior-Level Observability Analysis for Operation Gating in Low-Power Behavioral Synthesis" introduces the concept of behavior-level observability and its approximations in the context of behavioral synthesis. This article also describes an efficient procedure to compute an approximated behavior-level observability of every operation in a dataflow graph. Unlike previous techniques which work at the bit level in Boolean networks, the proposed method is able to perform analysis at the word level, and thus greatly reduces the computation overhead with a reasonable approximation.
The proposed algorithm exploits the observability-masking nature of some Boolean operations, as well as the select operation, and allows certain forms of other knowledge, once uncovered, to be considered for stronger observability conditions. The approximation is proved optimal for (acyclic) dataflow graphs when non-Boolean operations other than select are treated as black boxes. The behavior-level observability condition obtained by the algorithm can be used to guide the operation scheduler to optimize the efficiency of operation gating, demonstrating an average of 33.9% reduction in total power.
3D Architectures
Migrating to new silicon technology nodes provided in the past advantages in many design criteria. However, silicon scaling will eventually come to an end as feature sizes reach physical limits. A promising means to sustain scaling for some while is integration in 3D, that is, utilizing the third dimension in form of, for example, stacked layers of silicon dies with direct vertical tunneling [Xie et al. 2006; Loh et al. 2007 ]. While 3D architectures promise to provide advantages in some key architectural concerns like decreased latency in on-chip communication (since layer-to-layer distance is very short), a very high integration density, and an additional dimension providing new opportunities for routing signals on a chip, etc., disadvantages also exist. The most important disadvantage among others is the dissipation of heat since the ratio of number of transistors per surface area (i.e., higher thermal densities) is far smaller than in conventional 2D architectures. At the current state, first 3D architectures have been successfully built and CAD design tool chains are increasingly being developed in order to address the specific problems associated with 3D and exploit their advantages. The article entitled "LowPower Hypercube Divided Memory FFT Engine Using 3D Integration" aims at
exploiting the architectural advantages of 3D integration. The article deals with an FFT (Fast Fourier Transformation) processor with a hypercube memory architecture to reduce power consumption and to decrease latency. The deployed architecture is a memory-on-logic design comprising TSVs (Through Silicon Vias). The basic idea of the approach is to divide the whole memory into smaller memories in order to avoid the long wires typically associated with large memories. Each small memory block can thereby be accessed simultaneously. As a result, the authors report benefits in almost all relevant design aspects compared to a 2D implementation. In particular, it is reported that memory access energy is significantly reduced, the total wire length is less than half, and the whole architecture can be implemented onto a significantly smaller die area.
Green Computing and Green Virtualization
Low-power consumption is no longer a critical design consideration of batteryoperated systems only. The Environmental Protection Agency (EPA) report states that the energy consumption of servers and data centers has doubled in the past five years, and is expected to quadruple in the next five years to more than 100 billion kWh at a cost of about $7.4 billion annually [Pakbaznia and Pedram 2009] . It goes without saying that power reduction of data center is crucial even though it is powered by the Grid. There are a number of different techniques currently employed to reduce the energy cost and power density in data centers.
Due to the high power density of the data centers, about 30% of the total energy cost of a data center is used for cooling [Pelley et al. 2009; Rasmussen 2007] . The hot-aisle/cold-aisle structure, which has become common practice these days, is one of the attempts to improve the cooling efficiency of data centers [Patel et al. 2002] . About 10-15% power loss is due to power distribution and conversion losses in the data center [Pelley et al. 2009 ]. DC power delivery systems are viable, can be 20% or more efficient than current AC delivery systems, can be more reliable, and potentially cost less in the long run [Ton et al. 2008] .
Recently, it turns out that scheduling and consolidation are efficient data center power management schemes without infrastructure investment and/or modification. There are numerous prior works on temperature-aware task scheduling [Moore et al 2005; Tang et al. 2008] , which can reduce the server temperature and thus save cooling cost. Server consolidation, which refers to assigning incoming tasks to the minimum number of active servers in the data center and shutting down unused servers, is another approach for power reduction of data centers [Intel 2006 ]. Consolidation of VMs originally distributed in many different servers into a minimum number of servers enables effective power management.
The article entitled "vGreen: A System for Energy-Efficient Management of Virtual Machines" introduces vGreen, a multitiered software system to manage VM scheduling across different physical machines with the objective of managing the overall energy efficiency and performance. vGreen exploits the relationship between the architectural characteristics of a VM such as instructions per cycle, memory accesses, etc., and its performance and power consumption. This method is elaborated to avoid misleading the VM management policies into making decisions that can create hot spots of activity, and degrade overall performance and energy efficiency.
This article shows that vGreen is able to dynamically characterize VMs, and accurately models their resource utilization levels. Based on these characteristics, it can intelligently manage the VMs across the physical cluster by issuing VM migration or DVFS commands. This improves the overall performance up to 100% and energy efficiency up to 55% compared with state-of-the-art VM scheduling and power management policies.
Networked Embedded Systems
Networked embedded systems have gained significant relevance within the last decade. Application areas are manifold and span from automotive embedded systems for safety-critical functions to wireless sensor networks that, for instance, monitor environmental data from a network of thousands of nodes distributed throughout many square miles. In many networked embedded systems, design for low power is a prime constraint. In the automotive field, for example, a low-power design of the ECUs (Electronic Control Units) allows for a small dimensioning of the battery, which in turn leads to a reduced carbon dioxide footprint through a reduced weight that needs to be moved by the vehicle. In wireless sensor networks on the other side, low-power consumption is often vital for the entire lifetime of the system since a limited amount of energy might have been provided in form of a primary battery. Hence, when the battery state of charge is lost, the system might not be used any further since recharging is too expensive and energy harvesting [Raghunathan and Chou 2006] is often not possible. The article entitled "Energy-Efficient Progressive Remote Update for Flash-Based Firmware of Networked Embedded Systems" deals with a novel and efficient technique to update the firmware of networked embedded systems. The background is that the process of updating the firmware is expensive, both in terms of (limited) bandwidth provided by the system and the energy that is associated with updating the firmware. Besides the energy consumed by the radio part, it also refers to the energy associated with the flash memory that typically requires to erase/overwrite a whole page before writing to it. This work therefore introduces a technique, applied at link time of the firmware code, that reduces the number of caller-to-callee dependencies crossing memory page limits and hence it reduces the before-mentioned erasing cycles of entire pages of the flash memory. The approach encompasses two techniques: (i) grouping functions into one page that have a high caller-tocallee dependency and (ii) reordering the function calls within a certain page. In addition, the authors apply a sophisticated patching method of the firmware code that reduces the otherwise necessary erasing/overwriting of entire pages. As a result, the authors report significant energy savings compared to current state-of-the-art in flash-based firmware updates.
