19 research outputs found

    Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

    Full text link
    Binary Neural Networks (BNNs) have been shown to be robust to random bit-level noise, making aggressive voltage scaling attractive as a power-saving technique for both logic and SRAMs. In this work, we introduce the first fully programmable IoT end-node system-on-chip (SoC) capable of executing software-defined, hardware-accelerated BNNs at ultra-low voltage. Our SoC exploits a hybrid memory scheme where error-vulnerable SRAMs are complemented by reliable standard-cell memories to safely store critical data under aggressive voltage scaling. On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0.5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2.2X w.r.t. nominal conditions. Furthermore, we show that the supply voltage can be dropped to 0.42V (50% of nominal) while keeping more than99% of the nominal accuracy (with a bit error rate ~1/1000). In this operating point, our prototype performs 4Gop/s (15.4Inference/s on the CIFAR-10 dataset) by computing up to 13binary ops per pJ, achieving 22.8 Inference/s/mW while keeping within a peak power envelope of 674uW - low enough to enable always-on operation in ultra-low power smart cameras, long-lifetime environmental sensors, and insect-sized pico-drones.Comment: Submitted to ISICAS2020 journal special issu

    Design techniques to enhance low-power wireless communication soc with reconfigurability and wake up radio

    Get PDF
    Nowadays, Internet of things applications are increasing, and each end-node has more demanding requirements such as energy efficiency and speed. The thesis proposes a heterogeneous elaboration unit for smart power applications, that consists of an ultra-low-power microcontroller coupled with a small (around 1k equivalent gates) soft-core of embedded FPGA. This digital system is implemented in 90-nm BCD technology of STMicroelectronics, and through the analysis presented in this thesis proves to have good performance in terms of power consumption and latency. The idea is to increase the system performance exploiting the embedded FPGA to managing smart power tasks. For the intended applications, a remarkable computational load is not required, it is just required the implementation of simple finite state machines, since they are event-driven applications. In this way, while the microcontroller deals with other system computations such as high-level communications, the eFPGA can efficiently manage smart power applications. An added value of the proposed elaboration unit is that a soft-core approach is applied to the whole digital system including the eFPGA, and hence, it is portable to different technologies. On the other hand, the configurability improvement has a straightforward drawback of about a 20–27% area overhead. The eFPGA usage to manage smart power applications, allows the system to reduce the required energy per task from about 400 to around 800 times compared to a processor implementation. The eFPGA utilization improves also the latency performance of the system reaching from 8 to 145 times less latency in terms of clock cycles. The thesis also introduces the architecture of a nano-watt wake-up radio integrated circuit implemented in 90-nm BCD technology of STMicroelectronics. The wake-up radio is an auxiliary always-on radio for medium-range applications that allows the IoT end-nodes to drastically reduce the power consumption during the node idle-listening communication phase

    Low-Power Reconfigurable Sensing Circuitry for the Internet-of-Things Paradigm

    Get PDF
    With ubiquitous wireless communication via Wi-Fi and nascent 5th Generation mobile communications, more devices -- both smart and traditionally dumb -- will be interconnected than ever before. This burgeoning trend is referred to as the Internet-of-Things. These new sensing opportunities place a larger burden on the underlying circuitry that must operate on finite battery power and/or within energy-constrained environments. New developments of low-power reconfigurable analog sensing platforms like field-programmable analog arrays (FPAAs) present an attractive sensing solution by processing data in the analog domain while staying flexible in design. This work addresses some of the contemporary challenges of low-power wireless sensing via traditional application-specific sensing and with FPAAs. A large emphasis is placed on furthering the development of FPAAs by making them more accessible to designers without a strong integrated-circuit background -- much like FPGAs have done for digital designers

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    Integrated Circuits and Systems for Smart Sensory Applications

    Get PDF
    Connected intelligent sensing reshapes our society by empowering people with increasing new ways of mutual interactions. As integration technologies keep their scaling roadmap, the horizon of sensory applications is rapidly widening, thanks to myriad light-weight low-power or, in same cases even self-powered, smart devices with high-connectivity capabilities. CMOS integrated circuits technology is the best candidate to supply the required smartness and to pioneer these emerging sensory systems. As a result, new challenges are arising around the design of these integrated circuits and systems for sensory applications in terms of low-power edge computing, power management strategies, low-range wireless communications, integration with sensing devices. In this Special Issue recent advances in application-specific integrated circuits (ASIC) and systems for smart sensory applications in the following five emerging topics: (I) dedicated short-range communications transceivers; (II) digital smart sensors, (III) implantable neural interfaces, (IV) Power Management Strategies in wireless sensor nodes and (V) neuromorphic hardware

    A High Performance Advanced Encryption Standard (AES) Encrypted On-Chip Bus Architecture for Internet-of-Things (IoT) System-on-Chips (SoC)

    Get PDF
    With industry expectations of billions of Internet-connected things, commonly referred to as the IoT, we see a growing demand for high-performance on-chip bus architectures with the following attributes: small scale, low energy, high security, and highly configurable structures for integration, verification, and performance estimation. Our research thus mainly focuses on addressing these key problems and finding the balance among all these requirements that often work against each other. First of all, we proposed a low-cost and low-power System-on-Chips (SoCs) architecture (IBUS) that can frame data transfers differently. The IBUS protocol provides two novel transfer modes – the block and state modes, and is also backward compatible with the conventional linear mode. In order to evaluate the bus performance automatically and accurately, we also proposed an evaluation methodology based on the standard circuit design flow. Experimental results show that the IBUS based design uses the least hardware resource and reduces energy consumption to a half of an AMBA Advanced High-Performance Bus (AHB) and Advanced eXensible Interface (AXI). Additionally, the valid bandwidth of the IBUS based design is 2.3 and 1.6 times, respectively, compared with the AHB and AXI based implementations. As IoT advances, privacy and security issues become top tier concerns in addition to the high performance requirement of embedded chips. To leverage limited resources for tiny size chips and overhead cost for complex security mechanisms, we further proposed an advanced IBUS architecture to provide a structural support for the block-based AES algorithm. Our results show that the IBUS based AES-encrypted design costs less in terms of hardware resource and dynamic energy (60.2%), and achieves higher throughput (x1.6) compared with AXI. Effectively dealing with the automation in design and verification for mixed-signal integrated circuits is a critical problem, particularly when the bus architecture is new. Therefore, we further proposed a configurable and synthesizable IBUS design methodology. The flexible structure, together with bus wrappers, direct memory access (DMA), AES engine, memory controller, several mixed-signal verification intellectual properties (VIPs), and bus performance models (BPMs), forms the basic for integrated circuit design, allowing engineers to integrate application-specific modules and other peripherals to create complex SoCs

    Floating-Gate Design and Linearization for Reconfigurable Analog Signal Processing

    Get PDF
    Analog and mixed-signal integrated circuits have found a place in modern electronics design as a viable alternative to digital pre-processing. With metrics that boast high accuracy and low power consumption, analog pre-processing has opened the door to low-power state-monitoring systems when it is utilized in place of a power-hungry digital signal-processing stage. However, the complicated design process required by analog and mixed-signal systems has been a barrier to broader applications. The implementation of floating-gate transistors has begun to pave the way for a more reasonable approach to analog design. Floating-gate technology has widespread use in the digital domain. Analog and mixed-signal use of floating-gate transistors has only become a rising field of study in recent years. Analog floating gates allow for low-power implementation of mixed-signal systems, such as the field-programmable analog array, while simultaneously opening the door to complex signal-processing techniques. The field-programmable analog array, which leverages floating-gate technologies, is demonstrated as a reliable replacement to signal-processing tasks previously only solved by custom design. Living in an analog world demands the constant use and refinement of analog signal processing for the purpose of interfacing with digital systems. This work offers a comprehensive look at utilizing floating-gate transistors as the core element for analog signal-processing tasks. This work demonstrates the floating gate\u27s merit in large reconfigurable array-driven systems and in smaller-scale implementations, such as linearization techniques for oscillators and analog-to-digital converters. A study on analog floating-gate reliability is complemented with a temperature compensation scheme for implementing these systems in ever-changing, realistic environments

    Brain fame:From FPGA to heterogeneous acceleration of brain simulations

    Get PDF
    Among the various methods in neuroscience for understanding brain function, in-silico simulations have been gaining popularity. Advances in neuroscience and engineering led to the creation of mathematical models of networks that do not simply mimic biological behaviour in an abstract fashion but emulate its in significant detail, even to the level of its biophysical properties. Such an example is the Spiking Neural Network (SNN) that can model a variety of additional behavioural features, like encoding data and adapting according to a spike train`s amplitude, frequency and general precise pattern of arrival of spiking events on a neuron. As a result, SNNs have higher explanatory power than their predecessors, thus brain simulations based on SNNs become an attractive topic to explore. In-silico simulations of SNNs can have beneficial results not only for neuroscience research but breakthroughs can also potentially benefit medical, computing and A.I. research. SNNs, though, computationally depending workloads that traditional computing might not be able to cover. Thus, the use of High Performance Computing (HPC) platforms in this application domain becomes desirable. This dissertation explores the topic of HPC-based in-silico brain simulations. Initially, the effort focuses on custom hardware accelerators, due to their potential in providing real-time performance alongside support for large-scale non-real-time experiments and specifically Field Programmable Gate Arrays (FPGAs). The nature of FPGA-based accelerators provides specific benefits against other similar paradigms like Application Specific Integrated Circuit (ASIC) designs.Firstly, we explore the general characteristics of typical SNNs model types to identify their computational requirements in relation to their explanatory strength. We also identify major design characteristics in model development that can directly affect its performance and behaviour when ported to an HPC platform. Subsequently, a detailed literature review is made on FPGA-based SNN implementations. The HPC porting effort begins with the implementation of an extended-Hodgkin-Huxley model of the Inferior-olivary nucleus featuring advanced connectivity. The model is quite demanding and complex enough to act as a realistic benchmark for HPC implementations, while also being scientifically relevant in its own right. FPGA development shows promising performance results not only when doing custom designs but also using High-level synthesis (HLS) toolflows that significantly reduce development time. FPGAs have proven suitable for small-scale embedded-HPC uses as well. The various efforts, though, reveal a very specific weakness of FPGA development that has less to do with the silicon itself and more with its programming environment. The FPGA tools are very inaccessible to non-experts, thus any acceleration effort would require the engineer (and the FPGA development time) to be in the critical path of the research process. An important question to be answered is how the FPGA platform would compare to other popular software-based HPC solutions such as GPU- and CPU-based platforms. A detailed comparison of the best FPGA implementation with GPU and manycore-CPU ports of the same benchmark is conducted. The comparison and evaluation shows that, when it comes to real-time performance, FPGAs have a clear advantage. But for non-real-time, large scale simulations, there is no single platform that can optimally support the complete range of experiments that could be conducted with the inferior olive model. The comparison makes a clear case for BrainFrame, a platform that supports heterogeneous HPC substrates. This dissertation, thus, concludes with the proposal of the BrainFrame system. The proof-of-concept design supports standard and extended Hodgkin-Huxley models, , such as the original inferior-olive model. The system integrates a GPU-, CPU- and FPGA-based HPC back-end while also using a standard neuroscientific language front-end (PyNN) that can score best-in-class performance, alleviate some of the development hurdles and make it far more user-friendly for the typical model developer. Additionally, the multi-node potential of the platform is being explored. BrainFrame provides both a powerful heterogeneous platform for acceleration and also a front-end familiar to the neuroscientist
    corecore