44 research outputs found

    Harnessing noise to enhance robustness vs. efficiency trade-off in machine learning

    Get PDF
    While deep nets have achieved human-comparable accuracy in various classification tasks, they fall short significantly in terms of the robustness and cost metrics. For example, tiny engineered corruptions in deep net inputs can reduce their accuracy to zero. Furthermore, deep nets also require millions of trainable parameters, resulting in significant training and inference costs. These robustness and cost challenges are well recognized today. In response, there have been a plethora of works focusing on improving either the accuracy vs. robustness trade-off, or the accuracy vs. cost trade-off. However, simultaneous consideration of accuracy, robustness, and cost metrics is largely absent today, in part, because far fewer works have explored the robustness vs. cost trade-off. This dissertation aims to fill this gap by focusing explicitly on the robustness vs. cost trade-off in the presence of data noise, as well as hardware noise. Specifically, we explore how to harness the noise in order to enhance this trade-off. We characterize and improve robustness vs. cost trade-offs across diverse problem settings, ranging from beyond-CMOS hardware implementations of machine learning (ML) classifiers to efficient training of deep nets that are robust to multiple types of corruptions in their inputs. This dissertation can be roughly divided into two part, one focusing on hardware noise and the other on data noise. In the first part, we start by focusing on harnessing noise in spintronic hardware implementations, where the logic gates become error prone when operated at lower switching energy/delay. We propose techniques to shape the resulting hardware noise distribution and to efficiently compensate it at the system-level output. As a result, we observe 1000x improvement intolerance to gate-level switching error rates, while keeping the area/energy overhead of compensation circuits to as low as 15%. These robustness enhancements further enable 3× reduction in iso-throughput energy consumption of a binary ML classifier employed for EEG-based seizure detection. Building on this work, we propose spintronic channel networks, exponential decay of spin current to efficiently realize multi-bit dot product computation. We employ error-prone nanomagnets as efficient stochastic slicers biased by spin currents proportional to the likelihood of the classification decision. We achieve 112x-to-22.5x and 14x-to-2.5x higher energy-efficiency over conventional spin-based and 20 nm CMOS designs, respectively, when realizing 10-to-100-dimensional binary classifiers. Furthermore, we also consider the impact of hardware noise originated from process variations and readout circuits in in-memory computing implementations employing non-volatile resistive crossbar arrays. Based on our analysis, we identify design configurations achieving the highest signal-to-noise ratio (SNR), and further estimate how such robustness trades off with the array energy consumption. In the second part, we switch gears to improve the robustness vs. cost trade-off for deep nets in the presence of data noise. Specifically, we focus on the impact of adversarial perturbations in the deep nets inputs. We propose and validate the hypotheses about orientations of dominant subspaces of adversarial perturbations. We demonstrate how changes in the curvature of decision boundary of the deep nets affects the orientations of the adversarial perturbations. Based on these insights we demonstrate how shaped noise can be introduced as a feature to enhance robustness vs. cost trade-off in deep nets. Specifically, we propose shaped noise augmented processing (SNAP), a method to efficiently train deep nets that are robust to multiple types of adversarial perturbations, simultaneously. SNAP prepends a deep net with a shaped noise augmentation layer whose distribution is learned along with the network parameters using any established robust training framework. Based on extensive comparisons with nine state-of-the-art (SOTA) robust training frameworks, we show that SNAP achieves the best robustness vs. training cost trade-off. In particular, it enables 4x reduction in the training cost compared to the SOTA approach published just this last year. Furthermore, thanks to the computational simplicity of SNAP, it is the first technique of its kind that is scalable to large datasets, such as ImageNet

    Chemical Composition of Polymer Surfaces Imaged by Atomic ForceMicroscopy and Complementary Approaches

    Get PDF
    In this article we review the recent developments in the field of high resolution lateral mapping of the surface chemical composition of polymers by atomic force microscopy (AFM) and other complementary imaging techniques. The different AFM approaches toward nanometer scale mapping with chemical sensitivity based on chemical force microscopy (CFM) are discussed as a means to unravel, for instance, the lateral distribution of surface chemistry, the stability of various types of functional groups in various environments, or the interactions with controlled functional groups at the tip surface. The applicability and current limitations of CFM, which allows one to image chemical functional group distributions with a resolution in principle down to the 10–20 nm scale, are critically discussed. In addition, complementary imaging techniques are briefly reviewed and compared to the AFM-based techniques. The complementary approaches comprise various spectroscopies (infrared and Raman), secondary ion mass spectrometry (SIMS), matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS), X-ray photoelectron spectroscopy (XPS or ESCA), and near-field optical techniques used for imaging

    Architectural Solutions for NanoMagnet Logic

    Get PDF
    The successful era of CMOS technology is coming to an end. The limit on minimum fabrication dimensions of transistors and the increasing leakage power hinder the technological scaling that has characterized the last decades. In several different ways, this problem has been addressed changing the architectures implemented in CMOS, adopting parallel processors and thus increasing the throughput at the same operating frequency. However, architectural alternatives cannot be the definitive answer to a continuous increase in performance dictated by Moore’s law. This problem must be addressed from a technological point of view. Several alternative technologies that could substitute CMOS in next years are currently under study. Among them, magnetic technologies such as NanoMagnet Logic (NML) are interesting because they do not dissipate any leakage power. More- over, magnets have memory capability, so it is possible to merge logic and memory in the same device. However, magnetic circuits, and NML in this specific research, have also some important drawbacks that need to be addressed: first, the circuit clock frequency is limited to 100 MHz, to avoid errors in data propagation; second, there is a connection between circuit layout and timing, and in particular, longer wires will have longer latency. These drawbacks are intrinsic to the technology and for this reason they cannot be avoided. The only chance is to limit their impact from an architectural point of view. The first step followed in the research path of this thesis is indeed the choice and optimization of architectures able to deal with the problems of NML. Systolic Ar- rays are identified as an ideal solution for this technology, because they are regular structures with local interconnections that limit the long latency of wires; more- over they are composed of several Processing Elements that work in parallel, thus exploit parallelization to increase throughput (limiting the impact of the low clock frequency). Through the analysis of Systolic Arrays for NML, several possible im- provements have been identified and addressed: 1) it has been defined a rigorous way to increase throughput with interleaving, providing equations that allow to esti- mate the number of operations to be interleaved and the rules to provide inputs; 2) a latency insensitive circuit has been designed, that exploits a data communication protocol between processing elements to avoid data synchronization problems. This feature has been exploited to design a latency insensitive Systolic Array that is able to execute the Floyd-Steinberg dithering algorithm. All the improvements presented in this framework apply to Systolic Arrays implemented in any technology. So, they can also be exploited to increase performance of today’s CMOS parallel circuits. This research path is presented in Chapter 3. While Systolic Arrays are an interesting solution for NML, their usage could be quite limited because they are normally application-specific. The second re- search path addresses this problem. A Reconfigurable Systolic Array is presented, that can be programmed to execute several algorithms. This architecture has been tested implementing many algorithms, including FIR and IIR filters, Discrete Cosine Transform and Matrix Multiplication. This research path is presented in Chapter 4. In common Von Neumann architectures, the logic part of the circuit and the memory one are separated. Today bus communication between logic and memory represents the bottleneck of the system. This problem is addressed presenting Logic- In-Memory (LIM), an architecture where memory elements are merged in logic ones. This research path aims at defining a real LIM architectures. This has been done in two steps. The first step is represented by an architecture composed of three layers: memory, routing and logic. In the second step instead the routing plane is no more present, and its features are inherited by the memory plane. In this solution, a pyramidal memory model is used, where memories near logic elements contain the most probably used data, and other memory layers contain the remaining data and instruction set. This circuit has been tested with odd-even sort algorithms and it has been benchmarked against GPUs and ASIC. This research path is presented in Chapter 5. MagnetoElastic NML (ME-NML) is a technological improvement of the NML principle, proposed by researchers of Politecnico di Torino, where the clock system is based on the induced stretch of a piezoelectric substrate when a voltage is ap- plied to its boundaries. The main advantage of this solution is that it consumes much less power than the classic clock implementation. This technology has not yet been investigated from an architectural point of view and considering complex circuits. In this research field, a standard methodology for the design of ME-NML circuits has been proposed. It is based on a Standard Cell Library and an enhanced VHDL model. The effectiveness of this methodology has been proved designing a Galois Field Multiplier. Moreover the serial-parallel trade-off in ME-NML has been investigated, designing three different solutions for the Multiply and Accumulate structure. This research path is presented in Chapter 6. While ME-NML is an extremely interesting technology, it needs to be combined with other faster technologies to have a real competitive system. Signal interfaces between NML and other technologies (mainly CMOS) have been rarely presented in literature. A mixed-technology multiplexer is designed and presented as the basis for a CMOS to NML interface. The reverse interface (from ME-NML to CMOS) is instead based on a sensing circuit for the Faraday effect: a change in the polarization of a magnet induces an electric field that can be used to generate an input signal for a CMOS circuit. This research path is presented in Chapter 7. The research work presented in this thesis represents a fundamental milestone in the path towards nanotechnologies. The most important achievement is the de- sign and simulation of complex circuits with NML, benchmarking this technology with real application examples. The characterization of a technology considering complex functions is a major step to be performed and that has not yet been ad- dressed in literature for NML. Indeed, only in this way it is possible to intercept in advance any weakness of NanoMagnet Logic that cannot be discovered consid- ering only small circuits. Moreover, the architectural improvements introduced in this thesis, although technology-driven, can be actually applied to any technology. We have demonstrated the advantages that can derive applying them to CMOS cir- cuits. This thesis represents therefore a major step in two directions: the first is the enhancement of NML technology; the second is a general improvement of parallel architectures and the development of the new Logic-In-Memory paradigm

    Ancient and historical systems

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationPhotonic integration circuits (PICs) have received overwhelming attention in the past few decades due to various advantages over electronic circuits including absence of Joule effect and huge bandwidth. The most significant problem obstructing their commercial application is the integration density, which is largely determined by a signal wavelength that is in the order of microns. In this dissertation, we are focused on enhancing the integration density of PICs to warrant their practical applications. In general, we believe there are three ways to boost the integration density. The first is to downscale the dimension of individual integrated optical component. As an example, we have experimentally demonstrated an integrated optical diode with footprint 3 Ã- 3 m2, an integrated polarization beamsplitter with footprint 2.4 Ã- 2.4 m2, and a waveguide bend with effective bend radius as small as 0.65 m. All these devices offer the smallest footprint when compared to their alternatives. A second option to increase integration density is to combine the function of multiple devices into a single compact device. To illustrate the point, we have experimentally shown an integrated mode-converting polarization beamsplitter, and a free-space to waveguide coupler and polarization beamsplitter. Two distinct functionalities are offered in one single device without significantly sacrificing the footprint. A third option for enhancing integration density is to decrease the spacing between the individual devices. For this case, we have experimentally demonstrated an integrated cloak for nonresonant (waveguide) and resonant (microring-resonator) devices. Neighboring devices are totally invisible to each other even if they are separated as small as /2 apart. Inverse design algorithm is employed in demonstrating all of our devices. The basic premise is that, via nanofabrication, we can locally engineer the refractive index to achieve unique functionalities that are otherwise impossible. A nonlinear optimization algorithm is used to find the best permittivity distribution and a focused ion beam is used to define the fine nanostructures. Our future work lies in demonstrating active nanophotonic devices with compact footprint and high efficiency. Broadband and efficient silicon modulators, and all-optical and high-efficiency switches are envisioned with our design algorithm

    ANALYSIS AND MODULATION OF MOLECULAR QUANTUM-DOT CELLULAR AUTOMATA (QCA) DEVICES

    Get PDF
    Field-Coupled nanocomputing (FCN) paradigms offer fundamentally new approaches for digital computing without involving current transistors. Such paradigms perform computations using local field interactions between nanoscale building blocks which are organized with purposes. Among several FCN paradigms currently under active investigation, the Molecular Quantum-dot Cellular Automata (MQCA) is found to be the most promising and its unique features make it attractive as a candidate for post-CMOS nanocomputing. MQCA is based on electrostatic interactions among quantum cells with nanometer scale eliminating the need of charge transportation, hence its energy consumption is significantly decreased. Meanwhile it also possesses the potential of high throughput if efficient pipelining of information propagation is introduced. This could be realized adopting external clock signals which precisely control the adiabatic switching and direction of data flow in MQCA circuits. In this work, in order to model MQCA as electronic devices and analyze its information propagation with clock taken into account, an effective algorithm based on ab-initio simulations and modelling of molecular interactions has been applied in presence of a proposed clock mechanism for MQCA, including the binary wire, the wire bus and the majority voter. The quantitative results generated depict compelling clocked information propagation phenomena of MQCA devices and most importantly, provide crucial feedback for future MQCA experimental implementation

    Efficient Neuromorphic Computing Enabled by Spin-Transfer Torque: Devices, Circuits and Systems

    Get PDF
    Present day computers expend orders of magnitude more computational resources to perform various cognitive and perception related tasks that humans routinely perform everyday. This has recently resulted in a seismic shift in the field of computation where research efforts are being directed to develop a neurocomputer that attempts to mimic the human brain by nanoelectronic components and thereby harness its efficiency in recognition problems. Bridging the gap between neuroscience and nanoelectronics, this thesis demonstrates the encoding of biological neural and synaptic functionalities in the underlying physics of electron spin. Description of various spin-transfer torque mechanisms that can be potentially utilized for realizing neuro-mimetic device structures is provided. A cross-layer perspective extending from the device to the circuit and system level is presented to envision the design of an All-Spin neuromorphic processor enabled with on-chip learning functionalities. Device-circuit-algorithm co-simulation framework calibrated to experimental results suggest that such All-Spin neuromorphic systems can potentially achieve almost two orders of magnitude energy improvement in comparison to state-of-the-art CMOS implementations

    Variability and reliability analysis of carbon nanotube technology in the presence of manufacturing imperfections

    Get PDF
    In 1925, Lilienfeld patented the basic principle of field effect transistor (FET). Thirty-four years later, Kahng and Atalla invented the MOSFET. Since that time, it has become the most widely used type of transistor in Integrated Circuits (ICs) and then the most important device in the electronics industry. Progress in the field for at least the last 40 years has followed an exponential behavior in accordance with Moore¿s Law. That is, in order to achieve higher densities and performance at lower power consumption, MOS devices have been scaled down. But this aggressive scaling down of the physical dimensions of MOSFETs has required the introduction of a wide variety of innovative factors to ensure that they could still be properly manufactured. Transistors have expe- rienced an amazing journey in the last 10 years starting with strained channel CMOS transistors at 90nm, carrying on the introduction of the high-k/metal-gate silicon CMOS transistors at 45nm until the use of the multiple-gate transistor architectures at 22nm and at recently achieved 14nm technology node. But, what technology will be able to produce sub-10nm transistors? Different novel materials and devices are being investigated. As an extension and enhancement to current MOSFETs some promising devices are n-type III-V and p-type Germanium FETs, Nanowire and Tunnel FETs, Graphene FETs and Carbon Nanotube FETs. Also, non-conventional FETs and other charge-based information carrier devices and alternative information processing devices are being studied. This thesis is focused on carbon nanotube technology as a possible option for sub-10nm transistors. In recent years, carbon nanotubes (CNTs) have been attracting considerable attention in the field of nanotechnology. They are considered to be a promising substitute for silicon channel because of their small size, unusual geometry (1D structure), and extraordinary electronic properties, including excellent carrier mobility and quasi-ballistic transport. In the same way, carbon nanotube field-effect transistors (CNFETs) could be potential substitutes for MOSFETs. Ideal CNFETs (meaning all CNTs in the transistor behave as semiconductors, have the same diameter and doping level, and are aligned and well-positioned) are predicted to be 5x faster than silicon CMOS, while consuming the same power. However, nowadays CNFETs are also affected by manufacturing variability, and several significant challenges must be overcome before these benefits can be achieved. Certain CNFET manufacturing imperfections, such as CNT diameter and doping variations, mispositioned and misaligned CNTs, high metal-CNT contact resistance, the presence of metallic CNTs (m-CNTs), and CNT density variations, can affect CNFET performance and reliability and must be addressed. The main objective of this thesis is to analyze the impact of the current CNFET manufacturing challenges on multi-channel CNFET performance from the point of view of variability and reliability and at different levels, device and circuit level. Assuming that CNFETs are not ideal or non-homogeneous because of today CNFET manufacturing imperfections, we propose a methodology of analysis that based on a CNFET ideal compact model is able to simulate heterogeneous or non-ideal CNFETs; that is, transistors with different number of tubes that have different diameters, are not uniformly spaced, have different source/drain doping levels, and, most importantly, are made up not only of semiconducting CNTs but also metallic ones. This method will allow us to analyze how CNT-specific variations affect CNFET device characteristics and parameters and CNFET digital circuit performance. Furthermore, we also derive a CNFET failure model and propose an alternative technique based on fault-tolerant architectures to deal with the presence of m-CNTs, one of the main causes of failure in CNFET circuits

    Investigation of Molecular FCN for Beyond-CMOS: Technology, design, and modeling for nanocomputing

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Energy-Efficient System Architectures for Intermittently-Powered IoT Devices

    Get PDF
    Various industry forecasts project that, by 2020, there will be around 50 billion devices connected to the Internet of Things (IoT), helping to engineer new solutions to societal-scale problems such as healthcare, energy conservation, transportation, etc. Most of these devices will be wireless due to the expense, inconvenience, or in some cases, the sheer infeasibility of wiring them. With no cord for power and limited space for a battery, powering these devices for operating in a set-and-forget mode (i.e., achieve several months to possibly years of unattended operation) becomes a daunting challenge. Environmental energy harvesting (where the system powers itself using energy that it scavenges from its operating environment) has been shown to be a promising and viable option for powering these IoT devices. However, ambient energy sources (such as vibration, wind, RF signals) are often minuscule, unreliable, and intermittent in nature, which can lead to frequent intervals of power loss. Performing computations reliably in the face of such power supply interruptions is challenging
    corecore