157 research outputs found

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing

    Deep in-memory computing

    Get PDF
    There is much interest in embedding data analytics into sensor-rich platforms such as wearables, biomedical devices, autonomous vehicles, robots, and Internet-of-Things to provide these with decision-making capabilities. Such platforms often need to implement machine learning (ML) algorithms under stringent energy constraints with battery-powered electronics. Especially, energy consumption in memory subsystems dominates such a system's energy efficiency. In addition, the memory access latency is a major bottleneck for overall system throughput. To address these issues in memory-intensive inference applications, this dissertation proposes deep in-memory accelerator (DIMA), which deeply embeds computation into the memory array, employing two key principles: (1) accessing and processing multiple rows of memory array at a time, and (2) embedding pitch-matched low-swing analog processing at the periphery of bitcell array. The signal-to-noise ratio (SNR) is budgeted by employing low-swing operations in both memory read and processing to exploit the application level's error immunity for aggressive energy efficiency. This dissertation first describes the system rationale underlying the DIMA's processing stages by identifying the common functional flow across a diverse set of inference algorithms. Based on the analysis, this dissertation presents a multi-functional DIMA to support four algorithms: support vector machine (SVM), template matching (TM), k-nearest neighbor (k-NN), and matched filter. The circuit and architectural level design techniques and guidelines are provided to address the challenges in achieving multi-functionality. A prototype integrated circuit (IC) of a multi-functional DIMA was fabricated with a 16 KB SRAM array in a 65 nm CMOS process. Measurement results show up to 5.6X and 5.8X energy and delay reductions leading to 31X energy delay product (EDP) reduction with negligible (<1%) accuracy degradation as compared to the conventional 8-b fixed-point digital implementation optimally designed for each algorithm. Then, DIMA also has been applied to more complex algorithms: (1) convolutional neural network (CNN), (2) sparse distributed memory (SDM), and (3) random forest (RF). System-level simulations of CNN using circuit behavioral models in a 45 nm SOI CMOS demonstrate that high probability (>0.99) of handwritten digit recognition can be achieved using the MNIST database, along with a 24.5X reduced EDP, a 5.0X reduced energy, and a 4.9X higher throughput as compared to the conventional system. The DIMA-based SDM architecture also achieves up to 25X and 12X delay and energy reductions, respectively, over conventional SDM with negligible accuracy degradation (within 0.4%) for 16X16 binary-pixel image classification. A DIMA-based RF was realized as a prototype IC with a 16 KB SRAM array in a 65 nm process. To the best of our knowledge, this is the first IC realization of an RF algorithm. The measurement results show that the prototype achieves a 6.8X lower EDP compared to a conventional design at the same accuracy (94%) for an eight-class traffic sign recognition problem. The multi-functional DIMA and extension to other algorithms naturally motivated us to consider a programmable DIMA instruction set architecture (ISA), namely MATI. This dissertation explores a synergistic combination of the instruction set, architecture and circuit design to achieve the programmability without losing DIMA's energy and throughput benefits. Employing silicon-validated energy, delay and behavioral models of deep in-memory components, we demonstrate that MATI is able to realize nine ML benchmarks while incurring negligible overhead in energy (< 0.1%), and area (4.5%), and in throughput, over a fixed four-function DIMA. In this process, MATI is able to simultaneously achieve enhancements in both energy (2.5X to 5.5X) and throughput (1.4X to 3.4X) for an overall EDP improvement of up to 12.6X over fixed-function digital architectures

    Design Techniques of Energy Efficient PLL for Enhanced Noise and Lock Performance

    Get PDF
    Phase locked loops(PLLs)are vital building blocks of communication sys-tems whose performance dictates the quality of communication.The design of PLL to o_er superior performance is the prime objective of this research.It is desirable for the PLL to have fast locking,low noise,low reference spur,wide lock range,low power consumption consuming less silicon area.To achieve these performance parameters simultaneously in a PLL being a challenging task is taken up as a scope of the present work.A comprehensive study of the performance linked PLL components along with their design challenges is made in this report.The phase noise which is directly related to the dead zone of the PLL is minimized using an e_cient phase frequency detector(PFD)in this thesis.Here a voltage variable delay element is inserted in the reset path of the PFD to reduce the dead zone.An adaptive PFD architecture is also proposed to have a low noise and fast PLL simultaneously.In this work,before locking a fast PFD and in the locked state a low noise PFD operates to dictate the phase di_erence of the reference and feedback signals.To reduce the reference spur,a novel charge pump architecture is proposed which eventually reduces the lock time up to a great extent.In this charge pump a single current source is employed to reduce the output current mis-match and transmission gates are used to reduce the non ideal e_ects.Besides this,the fabrication process variations have a predominant e_ect on the PLL performance,which is directly linked to the locking capability.This necessitates a manufacturing process variation tolerant design of the PLL.In this work an e_cient multi-objective optimization method is also applied to at-tain multiple optimal performance objectives.The major performances under consideration are lock time,phase noise,lock range and power consumption

    Continuous-Time and Companding Digital Signal Processors Using Adaptivity and Asynchronous Techniques

    Get PDF
    The fully synchronous approach has been the norm for digital signal processors (DSPs) for many decades. Due to its simplicity, the classical DSP structure has been used in many applications. However, due to its rigid discrete-time operation, a classical DSP has limited efficiency or inadequate resolution for some emerging applications, such as processing of multimedia and biological signals. This thesis proposes fundamentally new approaches to designing DSPs, which are different from the classical scheme. The defining characteristic of all new DSPs examined in this thesis is the notion of "adaptivity" or "adaptability." Adaptive DSPs dynamically change their behavior to adjust to some property of their input stream, for example the rate of change of the input. This thesis presents both enhancements to existing adaptive DSPs, as well as new adaptive DSPs. The main class of DSPs that are examined throughout the thesis are continuous-time (CT) DSPs. CT DSPs are clock-less and event-driven; they naturally adapt their activity and power consumption to the rate of their inputs. The absence of a clock also provides a complete avoidance of aliasing in the frequency domain, hence improved signal fidelity. The core of this thesis deals with the complete and systematic design of a truly general-purpose CT DSP. A scalable design methodology for CT DSPs is presented. This leads to the main contribution of this thesis, namely a new CT DSP chip. This chip is the first general-purpose CT DSP chip, able to process many different classes of CT and synchronous signals. The chip has the property of handling various types of signals, i.e. various different digital modulations, both synchronous and asynchronous, without requiring any reconfiguration; such property is presented for the first time CT DSPs and is impossible for classical DSPs. As opposed to previous CT DSPs, which were limited to using only one type of digital format, and whose design was hard to scale for different bandwidths and bit-widths, this chip has a formal, robust and scalable design, due to the systematic usage of asynchronous design techniques. The second contribution of this thesis is a complete methodology to design adaptive delay lines. In particular, it is shown how to make the granularity, i.e. the number of stages, adaptive in a real-time delay line. Adaptive granularity brings about a significant improvement in the line's power consumption, up to 70% as reported by simulations on two design examples. This enhancement can have a direct large power impact on any CT DSP, since a delay line consumes the majority of a CT DSP's power. The robust methodology presented in this thesis allows safe dynamic reconfiguration of the line's granularity, on-the-fly and according to the input traffic. As a final contribution, the thesis also examines two additional DSPs: one operating the CT domain and one using the companding technique. The former operates only on level-crossing samples; the proposed methodology shows a potential for high-quality outputs by using a complex interpolation function. Finally, a companding DSP is presented for MPEG audio. Companding DSPs adapt their dynamic range to the amplitude of their input; the resulting can offer high-quality outputs even for small inputs. By applying companding to MPEG DSPs, it is shown how the DSP distortion can be made almost inaudible, without requiring complex arithmetic hardware

    Communication and energy delivery architectures for personal medical devices

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 219-232).Advances in sensor technologies and integrated electronics are revolutionizing how humans access and receive healthcare. However, many envisioned wearable or implantable systems are not deployable in practice due to high energy consumption and anatomically-limited size constraints, necessitating large form-factors for external devices, or eventual surgical re-implantation procedures for in-vivo applications. Since communication and energy-management sub-systems often dominate the power budgets of personal biomedical devices, this thesis explores alternative usecases, system architectures, and circuit solutions to reduce their energy burden. For wearable applications, a system-on-chip is designed that both communicates and delivers power over an eTextiles network. The transmitter and receiver front-ends are at least an order of magnitude more efficient than conventional body-area networks. For implantable applications, two separate systems are proposed that avoid reimplantation requirements. The first system extracts energy from the endocochlear potential, an electrochemical gradient found naturally within the inner-ear of mammals, in order to power a wireless sensor. Since extractable energy levels are limited, novel sensing, communication, and energy management solutions are proposed that leverage duty-cycling to achieve enabling power consumptions that are at least an order of magnitude lower than previous work. Clinical measurements show the first system demonstrated to sustain itself with a mammalian-generated electrochemical potential operating as the only source of energy into the system. The second system leverages the essentially unlimited number of re-charge cycles offered by ultracapacitors. To ease patient usability, a rapid wireless capacitor charging architecture is proposed that employs a multi-tapped secondary inductive coil to provide charging times that are significantly faster than conventional approaches.by Patrick Philip Mercier.Ph.D
    corecore