Energy-efficient systems for information transfer and processing

Abstract

Machine learning (ML) systems are finding excellent utility in tackling the data deluge of the big data era thanks to the exponential increase in computing power. Current ML systems adopt either centralized cloud computing or distributed edge computing. In both, the challenge of energy efficiency has been drawing increased attention. In cloud computing, data transfer due to inter-chip, inter-board, inter-shelf and inter-rack communications (I/O interface) within data centers is one of the dominant energy costs. This will intensify with the growing demand for increased I/O bandwidth of high-performance computing in data centers. On the other hand, in edge computing, energy efficiency is the primary design challenge, as mobile devices have limited energy, computation and storage resources. This challenge is being exacerbated by the need to embed ML algorithms such as convolutional neural networks (CNNs) for enabling local on-device inference capabilities. In this dissertation, we investigate techniques to address these challenges. To address the energy efficiency challenge in data centers, this dissertation focuses on reducing the energy consumption of the I/O interface. Specifically, in the emerging analog-to-digital converter (ADC)-based multi-Gb/s serial link receivers, the power dissipation is dominated by the ADC. ADCs in serial links employ signal-to-noise-and-distortion-ratio (SNDR) and effective-number-of-bits (ENOB) as performance metrics because these are the standard for generic ADC design. This dissertation presents the use of information-based metrics such as bit-error-rate (BER) to design a BER-optimal ADC (BOA) for serial links. First, theoretical analysis is developed to show when the benefits of BOA over a conventional uniform ADC (CUA) in a serial link receiver are substantial. Second, a \unit[4]{GS/s}, 4-\mbox{\textrm{bit}} on-chip ADC in a \unit[90]{nm} CMOS process is designed and integrated into a 4 Gb/s serial link receiver to verify the aforementioned analysis. Specifically, measured results demonstrate that a 3-\mathrm{bit} BOA receiver outperforms a 4-\mathrm{bit} CUA receiver at a BER <10^{-12} and provides \unit[50]{\%} power savings in the ADC. In the process, it is demonstrated conclusively that BER as opposed to ENOB is a better metric when designing ADCs for serial links. For the problem of resource-constrained computing at the edge, this dissertation tackles the issue of energy-efficient implementation of ML algorithms, particularly CNNs which have recently gained considerable interest due to their record-breaking performance in many recognition tasks. However, their implementation complexity hinders their deployment on power-constrained embedded platforms. This dissertation develops two techniques for energy-efficient CNN design. The first technique is a predictive CNN (PredictiveNet), which makes use of high sparsity in well-trained CNNs to bypass a large fraction of power-dominant convolutions at runtime without modifying the CNN structure. Analysis supported by simulations is provided to justify PredictiveNet's effectiveness. When applied to both the MNIST and CIFAR-10 datasets, simulation results show that PredictiveNet achieves 7.2\times and 4.4\times reduction in the computational and representational costs, respectively, compared with a conventional CNN. It is further shown that PredictiveNet enables computational and representational cost reductions of 2.5\times and 1.7\times, respectively, compared to a state-of-the-art CNN, while incurring only 0.02 classification accuracy loss. The second technique is a variation-tolerant architecture for CNN capable of operating in near threshold voltage (NTV) regime for aggressive energy efficiency. It is well-known that NTV computing can achieve up to 10\times energy savings but is sensitive to process, temperature, and voltage (PVT) variations which can lead to timing errors. To leverage the great potential of NTV for energy efficiency, this dissertation develops a new statistical error compensation (SEC) technique referred to as rank decomposed SEC (RD-SEC). RD-SEC makes use of inherent redundancy in CNNs to handle timing errors due to NTV computing. When evaluated in CNNs for both the MNIST and CIFAR-10 datasets, simulation results in \unit[45]{nm} CMOS show that RD-SEC enables robust CNNs operating in the NTV regime. Specifically, the proposed RD-SEC can achieve up to 11\times improvement in variation tolerance and enable up to 113\times reduction in the standard deviation of classification accuracy while incurring marginal degradation in the median classification accuracy

    Similar works