2,732 research outputs found

    An approximate randomization-based neural network with dedicated digital architecture for energy-constrained devices

    Get PDF
    Variable energy constraints affect the implementations of neural networks on battery-operated embedded systems. This paper describes a learning algorithm for randomization-based neural networks with hard-limit activation functions. The approach adopts a novel cost function that balances accuracy and network complexity during training. From an energyspecific perspective, the new learning strategy allows to adjust, dynamically and in real time, the number of operations during the network’s forward phase. The proposed learning scheme leads to efficient predictors supported by digital architectures. The resulting digital architecture can switch to approximate computing at run time, in compliance with the available energy budget. Experiments on 10 real-world prediction testbeds confirmed the effectiveness of the learning scheme. Additional tests on limited-resource devices supported the implementation efficiency of the overall design approac

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Full text link
    The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

    기계학습 시스템 설계를 위한 방법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 최기영.Machine learning has been paid attention because intelligence such as recognition, decision making, and recommendation is a helpful utility in industrial, medical, transportation, entertainment systems, and others that human need to interact with. As machine learning techniques are extensively applied to various areas, the needs for more robust algorithms and more efficient hardware have been increased. In order to develop an efficient machine learning system, we have researched from high-level algorithm down to low-level hardware logicthe main focus of our work is on ensemble machine learning and stochastic computing (SC). The first work is to combine multiple components, i.e., multiple feature extractors (FE) and multiple classifiers in the aspect of pattern recognition. Ensemble of multiple components is one of challenging approaches for constructing a more accurate classifier. It can handle difficult problems where a single classifier easily makes a wrong decision due to lack of training or parameter optimization. Combining the decisions of participating classifiers statistically reduces the risk of wrong decision. We suggest a hierarchical ensemble framework of multiple feature extractors and multiple classifiers (MFMC). The second work is to construct efficient hardware building blocks for machine learning in order to reduce system complexity and generate high area- and energy-efficient logic, where we exploit the property of machine learning systems that does not require accurate computations. We select stochastic computing (SC), which is an alternative paradigm to conventional binary arithmetic computing. SC can boost efficiency in terms of area, power, and error tolerance, while relaxing the accuracy of computation. The third work is to combine both machine learning and stochastic computing, where we select deep learning. This work presents an efficient DNN design with stochastic computing. Observing that directly adopting stochastic computing to DNN has some challenges including random error fluctuation, range limitation, and overhead in accumulation, we address these problems by removing near-zero weights, applying weight-scaling, and integrating the activation function with the accumulator. The approach allows an easy implementation of early decision termination with a fixed hardware design by exploiting the progressive precision characteristics of stochastic computing, which was not easy with existing approaches. Experimental results show that our approach outperforms the conventional binary logic in terms of gate area, latency, and power consumption.1. Introduction 1 1.1 Hierarchical Ensemble Learning Framework 1 1.2 Hardware Building Block for Machine Learning By Using Stochastic Computing 1 1.2.1 Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks 5 2. A Design Framework for Hierarchical Ensemble of Multiple Feature Extractors and Multiple Classifiers 7 2.1 Introduction 7 2.2 Related work 9 2.3 Proposed hierarchical ensemble system 12 2.3.1 Local Mapping Block and Global Mapping Block 12 2.3.2 Complexity comparison according to composition of LMB 15 2.3.3 Motivation for differentiating local and global mappings17 2.3.4 Reinforcement learning for LMB 19 2.3.5 Construction of Bayesian network from GMB 24 2.4 Experimental results 32 2.4.1 Measure of effectiveness for WMV and RL 33 2.4.2 Pedestrian detection dataset 35 2.4.3 Comparison between GMB and AdaBoost 41 2.4.4 UCI Multiple Features dataset 42 2.4.5 LMB selection 44 2.4.6 Discussion 45 2.5 Conclusion 46 3. Synthesis of Efficient Stochastic Logic for Many-Variable Expressions 49 3.1 Introduction 49 3.2 Related Work 52 3.3 SC Logic Synthesis for Multivariate Expressions 54 3.3.1 Probabilistic Logic 55 3.3.2 Definitions 58 3.3.3 Overview of the Proposed Method 60 3.3.4 Direct Synthesis VS. Kernel-based Synthesis 60 3.3.5 SC Kernel 63 3.3.6 Prime SC Kernel 65 3.3.7 iSC Kernel 68 3.3.8 Relationship Between iSC Kernels 70 3.3.9 Hybrid Scheme 75 3.3.10 Cost Function 76 3.3.11 SC Synthesis Algorithm 78 3.4 Experimental Results 82 3.4.1 Performance of SC Logic Synthesis Algorithm 83 3.4.2 Quality of Synthesis Results 84 3.4.3 Comparison of Accuracy 89 3.5 Conclusion 90 4. An Energy-Efficient Random Number Generator for Stochastic Circuits 91 4.1 Introduction 91 4.2 II. Background 92 4.2.1 Preliminaries 92 4.2.2 Shortcomings of Conventional Approaches 93 4.3 III. Proposed Stochastic Number Generator 96 4.3.1 Overview of the Proposed SNG 96 4.3.2 Even-distribution Encoding 96 4.3.3 Inter-group Randomization 98 4.3.4 Proposed Building Block for Bit Shuffling 100 4.3.5 Intra-group Randomization 102 4.4 Experimental Results 103 4.4.1 Accuracy of Generated Stochastic Bit Stream 104 4.4.2 Area, Delay, Power, Energy and SCC Average 104 4.4.3 Energy Efficiency When Operated under Maximal Precision 105 4.5 Conclusion 106 5. Approximate De-randomizer for Stochastic Circuits 107 5.1 Introduction 107 5.2 Proposed Approximate Parallel Counter 108 5.2.1 Analysis for Gate Count in 1-layer Approximate PC 109 5.2.2 Analysis for Error in 1-layer Approximate PC 110 5.3 Experimental Results 111 5.4 Conclusion 112 6. Dynamic Energy-Accuracy Trade-off Using Stochastic Computing in Deep Neural Networks 113 6.1 Introduction 113 6.2 Background 115 6.4 DNN Using Stochastic Circuit 117 6.4.1 Overview of the Proposed DNN using SC 117 6.4.2 Removing Near-Zero Weights 119 6.4.3 Applying Weight Scaling 120 6.4.4 Activation Function with Accumulation 121 6.5 Early Decision Termination 125 6.5.1 Moving Average Tracking Output Trends 126 6.6 Experimental Results 127 6.6.1 Accuracy of DNN Using SC 128 6.6.2 Effectiveness of Early Decision Termination 129 6.6.3 Comparison of Synthesis Results 130 6.7 Conclusion 132 7. Conclusion 134 Bibliography 136 요약(국문초록) 144Docto
    corecore