111 research outputs found

    NEMsCAM: A novel CAM cell based on nano-electro-mechanical switch and CMOS for energy efficient TLBs

    Get PDF
    In this paper we propose a novel Content Addressable Memory (CAM) cell, NEMsCAM, based on both Nano-electro-mechanical (NEM) switches and CMOS technologies. The memory component of the proposed CAM cell is designed with two complementary non-volatile NEM switches and located on top of the CMOS-based comparison component. As a use case for the NEMsCAM cell, we design first-level data and instruction Translation Lookaside Buffers (TLBs) with 16nm CMOS technology at 2GHz. The simulations show that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), write operation (by 41.9%) and standby mode (by 53.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead.We thank all anonymous reviewers for their insightful comments. This work is supported in part by the European Union (FEDER funds) under contract TIN2012-34557, and the European Union’s Seventh Framework Programme (FP7/2007-2013) under the ParaDIME project (GA no. 318693)Postprint (author's final draft

    Performance Analysis of Nanoelectromechanical Relay-Based Field-Programmable Gate Arrays

    Get PDF
    The energy consumption of field-programmable gate arrays (FPGA) is dominated by leakage currents and dynamic energy associated with programmable interconnect. An FPGA built entirely from nanoelectromechanical (NEM) relays can effectively eliminate leakage energy losses, reduce the interconnect dynamic energy, operate at temperatures &gt;225 °C and tolerate radiation doses in excess of 100 Mrad, while hybrid FPGAs comprising both complementary metal-oxide-semiconductor (CMOS) transistors and NEM relays (NEM-CMOS) have the potential to realize improvements in performance and energy efficiency. Large-scale integration of NEM relays, however, poses a significant engineering challenge due to the presence of moving parts. We discuss the design of FPGAs utilizing NEM relays based on a heterogeneous 3-D integration scheme, and carry out a scaling study to quantify key metrics related to performance and energy efficiency in both NEM-only and NEM-CMOS FPGAs. We show how the integration scheme has a profound effect on these metrics by changing the length of global wires. The scaling regime beyond which net performance and energy benefits is seen in NEM-CMOS over a baseline 90 nm CMOS technology is defined by an effective relay beam length of 0.5 μm , on-resistance of 200 kΩ , and a via pitch of 0.4 μm , all achievable with existing process technology. For ultra-low energy applications that are not performance critical, NEM-only FPGAs can provide close to 15× improvement in energy efficiency.QC 20180412</p

    An Energy-Efficient Design Paradigm for a Memory Cell Based on Novel Nanoelectromechanical Switches

    Get PDF
    In this chapter, we explain NEMsCAM cell, a new content-addressable memory (CAM) cell, which is designed based on both CMOS technologies and nanoelectromechanical (NEM) switches. The memory part of NEMsCAM is designed with two complementary nonvolatile NEM switches and located on top of the CMOS-based comparison component. As a use case, we evaluate first-level instruction and data translation lookaside buffers (TLBs) with 16 nm CMOS technology at 2 GHz. The simulation results demonstrate that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), standby mode (by 53.9%), write operation (by 41.9%), and the area (by 40.5%) compared to a CMOS-only TLB with minimal performance overhead

    Stochastic-Based Computing with Emerging Spin-Based Device Technologies

    Get PDF
    In this dissertation, analog and emerging device physics is explored to provide a technology platform to design new bio-inspired system and novel architecture. With CMOS approaching the nano-scaling, their physics limits in feature size. Therefore, their physical device characteristics will pose severe challenges to constructing robust digital circuitry. Unlike transistor defects due to fabrication imperfection, quantum-related switching uncertainties will seriously increase their susceptibility to noise, thus rendering the traditional thinking and logic design techniques inadequate. Therefore, the trend of current research objectives is to create a non-Boolean high-level computational model and map it directly to the unique operational properties of new, power efficient, nanoscale devices. The focus of this research is based on two-fold: 1) Investigation of the physical hysteresis switching behaviors of domain wall device. We analyze phenomenon of domain wall device and identify hysteresis behavior with current range. We proposed the Domain-Wall-Motion-based (DWM) NCL circuit that achieves approximately 30x and 8x improvements in energy efficiency and chip layout area, respectively, over its equivalent CMOS design, while maintaining similar delay performance for a one bit full adder. 2) Investigation of the physical stochastic switching behaviors of Mag- netic Tunnel Junction (MTJ) device. With analyzing of stochastic switching behaviors of MTJ, we proposed an innovative stochastic-based architecture for implementing artificial neural network (S-ANN) with both magnetic tunneling junction (MTJ) and domain wall motion (DWM) devices, which enables efficient computing at an ultra-low voltage. For a well-known pattern recognition task, our mixed-model HSPICE simulation results have shown that a 34-neuron S-ANN implementation, when compared with its deterministic-based ANN counterparts implemented with digital and analog CMOS circuits, achieves more than 1.5 ~ 2 orders of magnitude lower energy consumption and 2 ~ 2.5 orders of magnitude less hidden layer chip area

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

    Architectures and EDA for 3D FPGAs

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Non-invasive power gating techniques for bursty computation workloads using micro-electro-mechanical relays

    Get PDF
    PhD ThesisElectrostatically-actuated Micro-Electro-Mechanical/Nano-Electro- Mechanical (MEM/NEM) relays are promising devices overcoming the energy-efficiency limitations of CMOS transistors. Many exploratory research projects are currently under way investigating the mechanical, electrical and logical characteristics of MEM/NEM relays. One particular issue that this work addresses is the need for a scalable and accurate physical model of the MEM/NEM switches that can be plugged into the standard EDA software. The existing models are accurate and detailed but they suffer from the convergence problem. This problem requires finding ad-hoc workarounds and significantly impacts the designer’s productivity. In this thesis we propose a new simplified Verilog-AMS model. To test scalability of the proposed model we cross-checked it against our analysis of a range of benchmark circuits. Results show that, compared to standard models, the proposed model is sufficiently accurate with an average of 6% error and can handle larger designs without divergence. This thesis also investigates the modelling, designing and optimization of various MEM/NEM switches using 3D Finite Element Analysis (FEA) performed by the COMSOL multiphysics simulation tool. An extensive parametric sweep simulation is performed to study the energy-latency trade-offs of MEM/NEM relays. To accurately simulate MEMS/NEMS-based digital circuits, a Verilog-AMS model is proposed based on the evaluated parameters obtained from the multiphysics simulation tool. This allows an accurate calibration of the MEM/NEM relays with a significant reduction in simulation speed compared to that of 3D FEA exercised on COMSOL tool. The effectiveness of two power gating approaches in asynchronous micropipelines is also investigated using MEM/NEM switches and sleep transistors in reducing idle power dissipation with a particular target throughput. Sleep transistors are traditionally used to power gate idle circuits, however, these transistors have fundamental limitations in their effectiveness. Alternatively, MEM/NEM relays with zero leakage current can achieve greater energy savings under a certain data rate and design architecture. An asynchronous FIR filter 4 phase bundled data handshake protocol is presented. Implementation is accomplished in 90nm technology node and simulation exercised at various data rates and design complexities. It was demonstrated that our proposed approach offers 69% energy improvements at a data rate 1KHz compared to 39% of the previous work. The current trends for greater heterogeneity in future Systems-on- Chip (SoC) do not only concern their functionality but also their timing and power aspects. The increasing diversity of timing and power supply conditions, and associated concurrently operating modes, within an SoC calls for more efficient power delivery networks (PDN) for battery operated devices. This is especially important for systems with mixed duty cycling, where some parts are required to work regularly with low-throughput while other parts are activated spontaneously, i.e. in bursts. To improve their reaction time vs energy efficiency, this work proposes to incorporate a power-switching network based on MEM relays to switch the SoC power-performance state (PPS) into an active mode while eliminating the leakage current when it is idle. Results show that even with today0s large and high pull-in voltages, a MEM-relay-based power switching network (PSN) can achieve a 1000x savings in energy compared to its CMOS counterpart for low duty cycle. A simple case of optimising an on-chip charge pump required to switch-on the relay has been investigated and its energy-latency overhead has been evaluated. Heterogeneous many-core systems are increasingly being employed in modern embedded platforms for high throughput at low energy cost considerations. These applications typically exhibit bursty workloads that provide opportunities to minimize system energy. CMOS-based power gating circuitry, typically consisting of sleep transistors, is used as an effective technique for idle energy reduction in such applications. However, these transistors contribute high leakage current when driving large capacitive loads, making effective energy minimization challenging. This thesis proposes a novel MEMS-based idle energy control approach. Core to this approach is an integrated sleep mode management based on the performance-energy states and bursty workloads indicated by the performance counters. A number of PARSEC benchmark applications are used as case studies of bursty workloads, including CPU- and memory- intensive ones. These applications are exercised on an Exynos 5422 heterogeneous many-core platform, engineered with a performance counter facilities, showing 55.5% energy savings compared with an on-demand governor. Furthermore, an extensive trade-off analysis demonstrates the comparative advantages of the MEMS-based controller, including zero-leakage current and non-invasive implementations suitable for commercial off-the-shelf systems.Higher committee of education development in Iraq (HCED

    DEVELOPMENT OF NANO/MICROELECTROMECHANICAL SYSTEM (N/MEMS) SWITCHES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Design of an Embedded Fluorescence Imaging System for Implantable Optical Neural Recording

    Get PDF
    The brain is the most complex and least understood biological system known to man. New imaging techniques are providing scientists with an entirely new perspec- tive on the study of the functional brain at a neural circuit level, enabling in-depth understanding of both physiological processes and animal models of neurological and psychiatric diseases which currently lack e↵ective treatments. These new tools come at the cost of meeting the challenges associated with the miniaturization of the hard- ware for in vivo recording

    Signaling in 3-D integrated circuits, benefits and challenges

    Get PDF
    Three-dimensional (3-D) or vertical integration is a design and packaging paradigm that can mitigate many of the increasing challenges related to the design of modern integrated systems. 3-D circuits have recently been at the spotlight, since these circuits provide a potent approach to enhance the performance and integrate diverse functions within amulti-plane stack. Clock networks consume a great portion of the power dissipated in a circuit. Therefore, designing a low-power clock network in synchronous circuits is an important task. This requirement is stricter for 3-D circuits due to the increased power densities. Synchronization issues can be more challenging for 3-D circuits since a clock path can spread across several planes with different physical and electrical characteristics. Consequently, designing low power clock networks for 3-D circuits is an important issue. Resonant clock networks are considered efficient low-power alternatives to conventional clock distribution schemes. These networks utilize additional inductive circuits to reduce power while delivering a full swing clock signal to the sink nodes. In this research, a design method to apply resonant clocking to synthesized clock trees is proposed. Manufacturing processes for 3-D circuits include some additional steps as compared to standard CMOS processes which makes 3-D circuits more susceptible to manufacturing defects and lowers the overall yield of the bonded 3-D stack. Testing is another complicated task for 3-D ICs, where pre-bond test is a prerequisite. Pre-bond testability, in turn, presents new challenges to 3-D clock network design primarily due to the incomplete clock distribution networks prior to the bonding of the planes. A design methodology of resonant 3-D clock networks that support wireless pre-bond testing is introduced. To efficiently address this issue, inductive links are exploited to wirelessly transmit the clock signal to the disjoint resonant clock networks. The inductors comprising the LC tanks are used as the receiver circuit for the links, essentially eliminating the need for additional circuits and/or interconnect resources during pre-bond test. Recent FPGAs are quite complex circuits which provide reconfigurablity at the cost of lower performance and higher power consumption as compared to ASIC circuits. Exploiting a large number of programmable switches, routing structures are mainly responsible for performance degradation in FPAGs. Employing 3-D technology can providemore efficient switches which drastically improve the performance and reduce the power consumption of the FPGA. RRAM switches are one of the most promising candidates to improve the FPGA routing architecture thanks to their low on-resistance and non-volatility. Along with the configurable switches, buffers are the other important element of the FPGAs routing structure. Different characteristics of RRAM switches change the properties of signal paths in RRAM-based FPGAs. The on resistance of RRAMswitches is considerably lower than CMOS pass gate switches which results in lower RC delay for RRAM-based routing paths. This different nature in critical path and signal delay in turn affect the need for intermediate buffers. Thus the buffer allocation should be reconsidered. In the last part of this research, the effect of intermediate buffers on signal propagation delay is studied and a modified buffer allocation scheme for RRAM-based FPGA routing path is proposed
    corecore