228 research outputs found

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Center for Aeronautics and Space Information Sciences

    Get PDF
    This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets

    Circuit design and analysis for on-FPGA communication systems

    No full text
    On-chip communication system has emerged as a prominently important subject in Very-Large- Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects. Interconnects often dictates the system performance, and, therefore, research for new methodologies and system architectures that deliver high-performance communication services across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication. Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable fabrics, switches and the specific routing architecture also introduce additional latency and bandwidth degradation further hindering intra-chip communication performance. Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs. Communication with programmable interconnect received little attention and is inadequately understood. This thesis is among the first to research on-chip communication systems that are built on top of programmable fabrics and proposes methodologies to maximize the interconnect throughput performance. There are three major contributions in this thesis: (i) an analysis of on-chip interconnect fringing, which degrades the bandwidth of communication channels due to routing congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly improves the interconnect throughput by exploiting the fundamental electrical characteristics of the reconfigurable interconnect structures. This new scheme can potentially mitigate the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime optimization for route planning and dynamic routing which, effectively utilizes the in-silicon bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new methodologies and concepts are proposed to enhance the on-FPGA communication throughput performance that is of vital importance in new technology processes

    Containing the Nanometer “Pandora-Box”: Cross-Layer Design Techniques for Variation Aware Low Power Systems

    Get PDF
    The demand for richer multimedia services, multifunctional portable devices and high data rates can only been visioned due to the improvement in semiconductor technology. Unfortunately, sub-90 nm process nodes uncover the nanometer Pandora-box exposing the barriers of technology scaling—parameter variations, that threaten the correct operation of circuits, and increased energy consumption, that limits the operational lifetime of today’s systems. The contradictory design requirements for low-power and system robustness, is one of the most challenging design problems of today. The design efforts are further complicated due to the heterogeneous types of designs (logic, memory, mixed-signal) that are included in today’s complex systems and are characterized by different design requirements. This paper presents an overview of techniques at various levels of design abstraction that lead to low power and variation aware logic, memory and mixed-signal circuits and can potentially assist in meeting the strict power budgets and yield/quality requirements of future systems

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

    Designing energy-efficient computing systems using equalization and machine learning

    Full text link
    As technology scaling slows down in the nanometer CMOS regime and mobile computing becomes more ubiquitous, designing energy-efficient hardware for mobile systems is becoming increasingly critical and challenging. Although various approaches like near-threshold computing (NTC), aggressive voltage scaling with shadow latches, etc. have been proposed to get the most out of limited battery life, there is still no “silver bullet” to increasing power-performance demands of the mobile systems. Moreover, given that a mobile system could operate in a variety of environmental conditions, like different temperatures, have varying performance requirements, etc., there is a growing need for designing tunable/reconfigurable systems in order to achieve energy-efficient operation. In this work we propose to address the energy- efficiency problem of mobile systems using two different approaches: circuit tunability and distributed adaptive algorithms. Inspired by the communication systems, we developed feedback equalization based digital logic that changes the threshold of its gates based on the input pattern. We showed that feedback equalization in static complementary CMOS logic enabled up to 20% reduction in energy dissipation while maintaining the performance metrics. We also achieved 30% reduction in energy dissipation for pass-transistor digital logic (PTL) with equalization while maintaining performance. In addition, we proposed a mechanism that leverages feedback equalization techniques to achieve near optimal operation of static complementary CMOS logic blocks over the entire voltage range from near threshold supply voltage to nominal supply voltage. Using energy-delay product (EDP) as a metric we analyzed the use of the feedback equalizer as part of various sequential computational blocks. Our analysis shows that for near-threshold voltage operation, when equalization was used, we can improve the operating frequency by up to 30%, while the energy increase was less than 15%, with an overall EDP reduction of ≈10%. We also observe an EDP reduction of close to 5% across entire above-threshold voltage range. On the distributed adaptive algorithm front, we explored energy-efficient hardware implementation of machine learning algorithms. We proposed an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification hardness across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈100× more energy efficient but has ≈1% higher error rate than a complex radial basis function classifier and is ≈10× less energy efficient but has ≈40% lower error rate than a simple linear classifier across a wide range of classification data sets. We also developed a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) under tight energy budgets. The FoG architecture takes advantage of the fact that in random forests a small portion of the weak classifiers (decision trees) might be sufficient to achieve high statistical performance. By dividing the random forest into smaller forests (Groves), and conditionally executing the rest of the forest, FoG is able to achieve much higher energy efficiency levels for comparable error rates. We also take advantage of the distributed nature of the FoG to achieve high level of parallelism. Our evaluation shows that at maximum achievable accuracies FoG consumes ≈1.48×, ≈24×, ≈2.5×, and ≈34.7× lower energy per classification compared to conventional RF, SVM-RBF , Multi-Layer Perceptron Network (MLP), and CNN, respectively. FoG is 6.5× less energy efficient than SVM-LR, but achieves 18% higher accuracy on average across all considered datasets

    Homeostatic Fault Tolerance in Spiking Neural Networks : A Dynamic Hardware Perspective

    Get PDF
    Fault tolerance is a remarkable feature of biological systems and their self-repair capability influence modern electronic systems. In this paper, we propose a novel plastic neural network model, which establishes homeostasis in a spiking neural network. Combined with this plasticity and the inspiration from inhibitory interneurons, we develop a fault-resilient robotic controller implemented on an FPGA establishing obstacle avoidance task. We demonstrate the proposed methodology on a spiking neural network implemented on Xilinx Artix-7 FPGA. The system is able to maintain stable firing (tolerance ±10%) with a loss of up to 75% of the original synaptic inputs to a neuron. Our repair mechanism has minimal hardware overhead with a tuning circuit (repair unit) which consumes only three slices/neuron for implementing a threshold voltage-based homeostatic fault-tolerant unit. The overall architecture has a minimal impact on power consumption and, therefore, supports scalable implementations. This paper opens a novel way of implementing the behavior of natural fault tolerant system in hardware establishing homeostatic self-repair behavior

    A Heterogeneous System Architecture for Low-Power Wireless Sensor Nodes in Compute-Intensive Distributed Applications

    Get PDF
    Wireless Sensor Networks (WSNs) combine embedded sensing and processing capabilities with a wireless communication infrastructure, thus supporting distributed monitoring applications. WSNs have been investigated for more than three decades, and recent social and industrial developments such as home automation, or the Internet of Things, have increased the commercial relevance of this key technology. The communication bandwidth of the sensor nodes is limited by the transportation media and the restricted energy budget of the nodes. To still keep up with the ever increasing sensor count and sampling rates, the basic data acquisition and collection capabilities of WSNs have been extended with decentralized smart feature extraction and data aggregation algorithms. Energy-efficient processing elements are thus required to meet the ever-growing compute demands of the WSN motes within the available energy budget. The Hardware-Accelerated Low Power Mote (HaLoMote) is proposed and evaluated in this thesis to address the requirements of compute-intensive WSN applications. It is a heterogeneous system architecture, that combines a Field Programmable Gate Array (FPGA) for hardware-accelerated data aggregation with an IEEE 802.15.4 based Radio Frequency System-on-Chip for the network management and the top-level control of the applications. To properly support Dynamic Power Management (DPM) on the HaLoMote, a Microsemi IGLOO FPGA with a non-volatile configuration storage was chosen for a prototype implementation, called Hardware-Accelerated Low Energy Wireless Embedded Sensor Node (HaLOEWEn). As for every multi-processor architecture, the inter-processor communication and coordination strongly influences the efficiency of the HaLoMote. Therefore, a generic communication framework is proposed in this thesis. It is tightly coupled with the DPM strategy of the HaLoMote, that supports fast transitions between active and idle modes. Low-power sleep periods can thus be scheduled within every sampling cycle, even for sampling rates of hundreds of hertz. In addition to the development of the heterogeneous system architecture, this thesis focuses on the energy consumption trade-off between wireless data transmission and in-sensor data aggregation. The HaLOEWEn is compared with typical software processors in terms of runtime and energy efficiency in the context of three monitoring applications. The building blocks of these applications comprise hardware-accelerated digital signal processing primitives, lossless data compression, a precise wireless time synchronization protocol, and a transceiver scheduling for contention free information flooding from multiple sources to all network nodes. Most of these concepts are applicable to similar distributed monitoring applications with in-sensor data aggregation. A Structural Health Monitoring (SHM) application is used for the system level evaluation of the HaLoMote concept. The Random Decrement Technique (RDT) is a particular SHM data aggregation algorithm, which determines the free-decay response of the monitored structure for subsequent modal identification. The hardware-accelerated RDT executed on a HaLOEWEn mote requires only 43 % of the energy that a recent ARM Cortex-M based microcontroller consumes for this algorithm. The functionality of the overall WSN-based SHM system is shown with a laboratory-scale demonstrator. Compared to reference data acquired by a wire-bound laboratory measurement system, the HaLOEWEn network can capture the structural information relevant for the SHM application with less than 1 % deviation

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing
    corecore