657 research outputs found

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Full text link
    Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 201

    Hardware Implementations of a Deep Learning Approach to Optimal Configuration of Reconfigurable Intelligence Surfaces

    Get PDF
    Reconfigurable intelligent surfaces (RIS) offer the potential to customize the radio propagation environment for wireless networks, and will be a key element for 6G communications. However, due to the unique constraints in these systems, the optimization problems associated to RIS configuration are challenging to solve. This paper illustrates a new approach to the RIS configuration problem, based on the use of artificial intelligence (AI) and deep learning (DL) algorithms. Concretely, a custom convolutional neural network (CNN) intended for edge computing is presented, and implementations on different representative edge devices are compared, including the use of commercial AI-oriented devices and a field-programmable gate array (FPGA) platform. This FPGA option provides the best performance, with x20 performance increase over the closest FP32, GPU-accelerated option, and almost x3 performance advantage when compared with the INT8-quantized, TPU-accelerated implementation. More noticeably, this is achieved even when high-level synthesis (HLS) tools are used and no custom accelerators are developed. At the same time, the inherent reconfigurability of FPGAs opens a new field for their use as enabler hardware in RIS applications.This work is part of the project TED2021-129938B-I00, funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR

    Energy-efficient and real-time wearable for wellbeing-monitoring IoT system based on SoC-FPGA

    Get PDF
    Wearable devices used for personal monitoring applications have been improved over the last decades. However, these devices are limited in terms of size, processing capability and power consumption. This paper proposes an efficient hardware/software embedded system for monitoring bio-signals in real time, including a heart rate calculator using PPG and an emotion classifier from EEG. The system is suitable for outpatient clinic applications requiring data transfers to external medical staff. The proposed solution contributes with an effective alternative to the traditional approach of processing bio-signals offline by proposing a SoC-FPGA based system that is able to fully process the signals locally at the node. Two sub-systems were developed targeting a Zynq 7010 device and integrating custom hardware IP cores that accelerate the processing of the most complex tasks. The PPG sub-system implements an autocorrelation peak detection algorithm to calculate heart rate values. The EEG sub-system consists of a KNN emotion classifier of preprocessed EEG features. This work overcomes the processing limitations of microcontrollers and general-purpose units, presenting a scalable and autonomous wearable solution with high processing capability and real-time response.info:eu-repo/semantics/publishedVersio

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Evolutionary optimization of neural networks with heterogeneous computation: study and implementation

    Full text link
    In the optimization of artificial neural networks (ANNs) via evolutionary algorithms and the implementation of the necessary training for the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism, whereas hardware realizations usually rely on optimizations that reduce the range of applicable network topologies, or they attempt to increase processing efficiency by means of low-precision data representation. This paper presents, first of all, a study that shows the need of heterogeneous platform (CPU–GPU–FPGA) to accelerate the optimization of ANNs using genetic algorithms and, secondly, an implementation of a platform based on embedded systems with hardware accelerators implemented in Field Pro-grammable Gate Array (FPGA). The implementation of the individuals on a remote low-cost Altera FPGA allowed us to obtain a 3x–4x acceleration compared with a 2.83 GHz Intel Xeon Quad-Core and 6x–7x compared with a 2.2 GHz AMD Opteron Quad-Core 2354.The translation of this paper was funded by the Universitat Politecnica de Valencia, Spain.Fe, JD.; Aliaga Varea, RJ.; Gadea Gironés, R. (2015). Evolutionary optimization of neural networks with heterogeneous computation: study and implementation. The Journal of Supercomputing. 71(8):2944-2962. doi:10.1007/s11227-015-1419-7S29442962718Farmahini-Farahani A, Vakili S, Fakhraie SM, Safari S, Lucas C (2010) Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization. Eng Appl Artif Intell 23(2):177–187Curteanu S, Cartwright H (2011) Neural networks applied in chemistry. i. Determination of the optimal topology of multilayer perceptron neural networks. J Chemom 25(10):527–549. doi: 10.1002/cem.1401Islam MM, Sattar MA, Amin MF, Yao X, Murase K (2009) A new adaptive merging and growing algorithm for designing artificial neural networks. Ieee Trans Syst Man Cybern Part B-Cybern 39(3):705–722Han KH, Kim JH (2004) Quantum-inspired evolutionary algorithms with a new termination criterion, h-epsilon gate, and two-phase scheme. Ieee Trans Evol Comput 8(2):156–169Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. Ieee Trans Neural Netw 14(1):79–88Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid taguchi-genetic algorithm. Ieee Trans Neural Netw 17(1):69–80Ludermir TB, Yamazaki A, Zanchettin C (2006) An optimization methodology for neural network weights and architectures. Ieee Trans Neural Netw 17(6):1452–1459Palmes PP, Hayasaka T, Usui S (2005) Mutation-based genetic neural network. Trans Neural Netw 16(3):587–600. doi: 10.1109/TNN.2005.844858Mu T, Jiang J, Wang Y, Goulermas JY (2012) Adaptive data embedding framework for multiclass classification. Ieee Trans Neural Netw Learn Syst 23(8):1291–1303Lu T-C, Yu G-R, Juang J-C (2013) Quantum-based algorithm for optimizing artificial neural networks. IEEE Trans Neural Netw Lear Syst 24(8):1266–1278Yao X (1999) Evolving artificial neural networks. Proc Ieee 87(9):1423–1447Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. Ieee Trans Neural Netw 8(3):694–713Mateo F, Sovilj D, Gadea-Gironés R (2010) Approximate k-NN delta test minimization method using genetic algorithms: application to time series. NEUROCOMPUTING 73(10–12, Sp):2017–2029Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of the 5th international conference and data warehousing and knowledge discovery. DaWaK02, pp 170–180Fe J, Aliaga RJ, Gironés RG (2013) Experimental platform for accelerate the training of anns with genetic algorithm and embedded system on fpga. In: IWINAC (2), pp 413–420Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Technical reportAbbass HA (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 25:265–281Ahmad F, Isa NAM, Hussain Z, Sulaiman SN (2013) A genetic algorithm-based multi-objective optimization of an artificial neural network classifier for breast cancer diagnosis. Neural Comput Appl 23(5):1427–1435Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf H (2009) A massively parallel coprocessor for convolutional neural networks. In: Application-specific systems, architectures and processors, 2009. ASAP 2009. 20th IEEE international conference on, July, pp 53–60Prado R, Melo J, Oliveira J, Neto A (2012) Fpga based implementation of a fuzzy neural network modular architecture for embedded systems. In: Neural networks (IJCNN), The 2012 international joint conference on, June, pp 1–7Çavuşlu M, Karakuzu C, Sahin S, Yakut M (2011) Neural network training based on fpga with floating point number format and its performance. Neural Comput Appl 20:195–202. doi: 10.1007/s00521-010-0423-3Wu G-D, Zhu Z-W, Lin B-W (2011) Reconfigurable back propagation based neural network architecture. In: Integrated circuits (ISIC), 2011 13th international symposium on, Dec, pp 67–70Pinjare SL, Kumar A (2012) Implementation of neural network back propagation training algorithm on fpga. Int J Comput Appl 52(6): 1–7, August, published by Foundation of Computer Science, New York, USAhttp://www.altera.comAliaga R, Gadea R, Colom R, Cerda J, Ferrando N, Herrero V (2009) A mixed hardware–software approach to flexible artificial neural network training on fpga. In: Systems, architectures, modeling, and simulation, 2009. SAMOS ’09. International symposium on, July, pp 1–8http://www.matlab.co

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
    • …
    corecore