448 research outputs found

    Designing energy-efficient computing systems using equalization and machine learning

    Full text link
    As technology scaling slows down in the nanometer CMOS regime and mobile computing becomes more ubiquitous, designing energy-efficient hardware for mobile systems is becoming increasingly critical and challenging. Although various approaches like near-threshold computing (NTC), aggressive voltage scaling with shadow latches, etc. have been proposed to get the most out of limited battery life, there is still no “silver bullet” to increasing power-performance demands of the mobile systems. Moreover, given that a mobile system could operate in a variety of environmental conditions, like different temperatures, have varying performance requirements, etc., there is a growing need for designing tunable/reconfigurable systems in order to achieve energy-efficient operation. In this work we propose to address the energy- efficiency problem of mobile systems using two different approaches: circuit tunability and distributed adaptive algorithms. Inspired by the communication systems, we developed feedback equalization based digital logic that changes the threshold of its gates based on the input pattern. We showed that feedback equalization in static complementary CMOS logic enabled up to 20% reduction in energy dissipation while maintaining the performance metrics. We also achieved 30% reduction in energy dissipation for pass-transistor digital logic (PTL) with equalization while maintaining performance. In addition, we proposed a mechanism that leverages feedback equalization techniques to achieve near optimal operation of static complementary CMOS logic blocks over the entire voltage range from near threshold supply voltage to nominal supply voltage. Using energy-delay product (EDP) as a metric we analyzed the use of the feedback equalizer as part of various sequential computational blocks. Our analysis shows that for near-threshold voltage operation, when equalization was used, we can improve the operating frequency by up to 30%, while the energy increase was less than 15%, with an overall EDP reduction of ≈10%. We also observe an EDP reduction of close to 5% across entire above-threshold voltage range. On the distributed adaptive algorithm front, we explored energy-efficient hardware implementation of machine learning algorithms. We proposed an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification hardness across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈100× more energy efficient but has ≈1% higher error rate than a complex radial basis function classifier and is ≈10× less energy efficient but has ≈40% lower error rate than a simple linear classifier across a wide range of classification data sets. We also developed a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) under tight energy budgets. The FoG architecture takes advantage of the fact that in random forests a small portion of the weak classifiers (decision trees) might be sufficient to achieve high statistical performance. By dividing the random forest into smaller forests (Groves), and conditionally executing the rest of the forest, FoG is able to achieve much higher energy efficiency levels for comparable error rates. We also take advantage of the distributed nature of the FoG to achieve high level of parallelism. Our evaluation shows that at maximum achievable accuracies FoG consumes ≈1.48×, ≈24×, ≈2.5×, and ≈34.7× lower energy per classification compared to conventional RF, SVM-RBF , Multi-Layer Perceptron Network (MLP), and CNN, respectively. FoG is 6.5× less energy efficient than SVM-LR, but achieves 18% higher accuracy on average across all considered datasets

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    Energy-Efficient HOG-based Object Detection at 1080HD 60 fps with Multi-Scale Support

    Get PDF
    In this paper, we present a real-time and energy-efficient multi-scale object detector using Histogram of Oriented Gradient (HOG) features and Support Vector Machine (SVM) classification. Parallel detectors with balanced workload are used to enable processing of multiple scales and increase the throughput such that voltage scaling can be applied to reduce energy consumption. Image pre-processing is also introduced to further reduce power and area cost of the image scales generation. This design can operate on high definition 1080HD video at 60 fps in real-time with a clock rate of 270 MHz, and consumes 45.3 mW (0.36 nJ/pixel) based on post-layout simulations. The ASIC has an area of 490 kgates and 0.538 Mbit on-chip memory in a 45nm SOI CMOS process

    FPGA-Based Cascade Support Vector Machine with Integrated Training

    Get PDF
    Machine learning algorithms allow us to reason about and analyze large amounts of data. The support vector machine (SVM) is one popular learning algorithm, which has been applied to a broad range of applications. To this end, hardware-based SVM processors are very appealing due to their improved runtime and energy efficiency. This research proposes an FPGA-based parallel support vector machine processor, which is capable of processing multi-dimensional data sets. The proposed FPGA SVM is based upon the cascade SVM algorithm, which is leveraged to allow efficient parallel processing of data on the FPGA platform, leading to significant processing efficiency

    Architectures and Design of VLSI Machine Learning Systems

    Get PDF
    Quintillions of bytes of data are generated every day in this era of big data. Machine learning techniques are utilized to perform predictive analysis on these data, to reveal hidden relationships and dependencies and perform predictions of outcomes and behaviors. The obtained predictive models are used to interpret the existing data and predict new data information. Nowadays, most machine learning algorithms are realized by software programs running on general-purpose processors, which usually takes a huge amount of CPU time and introduces unbelievably high energy consumption. In comparison, a dedicated hardware design is usually much more efficient than software programs running on general-purpose processors in terms of runtime and energy consumption. Therefore, the objective of this dissertation is to develop efficient hardware architectures for mainstream machine learning algorithms, to provide a promising solution to addressing the runtime and energy bottlenecks of machine learning applications. However, it is a really challenging task to map complex machine learning algorithms to efficient hardware architectures. In fact, many important design decisions need to be made during the hardware development for efficient tradeoffs. In this dissertation, a parallel digital VLSI architecture for combined SVM training and classification is proposed. For the first time, cascade SVM, a powerful training algorithm, is leveraged to significantly improve the scalability of hardware-based SVM training and develop an efficient parallel VLSI architecture. The parallel SVM processors provide a significant training time speedup and energy reduction compared with the software SVM algorithm running on a general-purpose CPU. Furthermore, a liquid state machine based neuromorphic learning processor with integrated training and recognition is proposed. A novel theoretical measure of computational power is proposed to facilitate fast design space exploration of the recurrent reservoir. Three low-power techniques are proposed to improve the energy efficiency. Meanwhile, a 2-layer spiking neural network with global inhibition is realized on Silicon. In addition, we also present architectural design exploration of a brain-inspired digital neuromorphic processor architecture with memristive synaptic crossbar array, and highlight several synaptic memory access styles. Various analog-to-digital converter schemes have been investigated to provide new insights into the tradeoff between the hardware cost and energy consumption

    Emerging Security Threats in Modern Digital Computing Systems: A Power Management Perspective

    Get PDF
    Design of computing systems — from pocket-sized smart phones to massive cloud based data-centers — have one common daunting challenge : minimizing the power consumption. In this effort, power management sector is undergoing a rapid and profound transformation to promote clean and energy proportional computing. At the hardware end of system design, there is proliferation of specialized, feature rich and complex power management hardware components. Similarly, in the software design layer complex power management suites are growing rapidly. Concurrent to this development, there has been an upsurge in the integration of third-party components to counter the pressures of shorter time-to-market. These trends collectively raise serious concerns about trust and security of power management solutions. In recent times, problems such as overheating, performance degradation and poor battery life, have dogged the mobile devices market, including the infamous recall of Samsung Note 7. Power outage in the data-center of a major airline left innumerable passengers stranded, with thousands of canceled flights costing over 100 million dollars. This research examines whether such events of unintentional reliability failure, can be replicated using targeted attacks by exploiting the security loopholes in the complex power management infrastructure of a computing system. At its core, this research answers an imminent research question: How can system designers ensure secure and reliable operation of third-party power management units? Specifically, this work investigates possible attack vectors, and novel non-invasive detection and defense mechanisms to safeguard system against malicious power attacks. By a joint exploration of the threat model and techniques to seamlessly detect and protect against power attacks, this project can have a lasting impact, by enabling the design of secure and cost-effective next generation hardware platforms

    TinyVers: A Tiny Versatile System-on-chip with State-Retentive eMRAM for ML Inference at the Extreme Edge

    Full text link
    Extreme edge devices or Internet-of-thing nodes require both ultra-low power always-on processing as well as the ability to do on-demand sampling and processing. Moreover, support for IoT applications like voice recognition, machine monitoring, etc., requires the ability to execute a wide range of ML workloads. This brings challenges in hardware design to build flexible processors operating in ultra-low power regime. This paper presents TinyVers, a tiny versatile ultra-low power ML system-on-chip to enable enhanced intelligence at the Extreme Edge. TinyVers exploits dataflow reconfiguration to enable multi-modal support and aggressive on-chip power management for duty-cycling to enable smart sensing applications. The SoC combines a RISC-V host processor, a 17 TOPS/W dataflow reconfigurable ML accelerator, a 1.7 μ\muW deep sleep wake-up controller, and an eMRAM for boot code and ML parameter retention. The SoC can perform up to 17.6 GOPS while achieving a power consumption range from 1.7 μ\muW-20 mW. Multiple ML workloads aimed for diverse applications are mapped on the SoC to showcase its flexibility and efficiency. All the models achieve 1-2 TOPS/W of energy efficiency with power consumption below 230 μ\muW in continuous operation. In a duty-cycling use case for machine monitoring, this power is reduced to below 10 μ\muW.Comment: Accepted in IEEE Journal of Solid-State Circuit

    An Efficient Classification of Hyperspectral Remotely Sensed Data Using Support Vector Machine

    Get PDF
    This work present an efficient hardware architecture of Support Vector Machine (SVM) for the classification of Hyperspectral remotely sensed data using High Level Synthesis (HLS) method. The high classification time and power consumption in traditional classification of remotely sensed data is the main motivation for this work. Therefore presented work helps to classify the remotely sensed data in real-time and to take immediate action during the natural disaster. An embedded based SVM is designed and implemented on Zynq SoC for classification of hyperspectral images. The data set of remotely sensed data are tested on different platforms and the performance is compared with existing works. Novelty in our proposed work is extend the HLS based FPGA implantation to the onboard classification system in remote sensing. The experimental results for selected data set from different class shows that our architecture on Zynq 7000 implementation generates a delay of 11.26 µs and power consumption of 1.7 Watts, which is extremely better as compared to other Field Programmable Gate Array (FPGA) implementation using Hardware description Language (HDL)  and Central Processing Unit (CPU) implementation
    corecore