169 research outputs found

    Evolutionary optimization of neural networks with heterogeneous computation: study and implementation

    Full text link
    In the optimization of artificial neural networks (ANNs) via evolutionary algorithms and the implementation of the necessary training for the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism, whereas hardware realizations usually rely on optimizations that reduce the range of applicable network topologies, or they attempt to increase processing efficiency by means of low-precision data representation. This paper presents, first of all, a study that shows the need of heterogeneous platform (CPU–GPU–FPGA) to accelerate the optimization of ANNs using genetic algorithms and, secondly, an implementation of a platform based on embedded systems with hardware accelerators implemented in Field Pro-grammable Gate Array (FPGA). The implementation of the individuals on a remote low-cost Altera FPGA allowed us to obtain a 3x–4x acceleration compared with a 2.83 GHz Intel Xeon Quad-Core and 6x–7x compared with a 2.2 GHz AMD Opteron Quad-Core 2354.The translation of this paper was funded by the Universitat Politecnica de Valencia, Spain.Fe, JD.; Aliaga Varea, RJ.; Gadea GironĂ©s, R. (2015). Evolutionary optimization of neural networks with heterogeneous computation: study and implementation. The Journal of Supercomputing. 71(8):2944-2962. doi:10.1007/s11227-015-1419-7S29442962718Farmahini-Farahani A, Vakili S, Fakhraie SM, Safari S, Lucas C (2010) Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization. Eng Appl Artif Intell 23(2):177–187Curteanu S, Cartwright H (2011) Neural networks applied in chemistry. i. Determination of the optimal topology of multilayer perceptron neural networks. J Chemom 25(10):527–549. doi: 10.1002/cem.1401Islam MM, Sattar MA, Amin MF, Yao X, Murase K (2009) A new adaptive merging and growing algorithm for designing artificial neural networks. Ieee Trans Syst Man Cybern Part B-Cybern 39(3):705–722Han KH, Kim JH (2004) Quantum-inspired evolutionary algorithms with a new termination criterion, h-epsilon gate, and two-phase scheme. Ieee Trans Evol Comput 8(2):156–169Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. Ieee Trans Neural Netw 14(1):79–88Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid taguchi-genetic algorithm. Ieee Trans Neural Netw 17(1):69–80Ludermir TB, Yamazaki A, Zanchettin C (2006) An optimization methodology for neural network weights and architectures. Ieee Trans Neural Netw 17(6):1452–1459Palmes PP, Hayasaka T, Usui S (2005) Mutation-based genetic neural network. Trans Neural Netw 16(3):587–600. doi: 10.1109/TNN.2005.844858Mu T, Jiang J, Wang Y, Goulermas JY (2012) Adaptive data embedding framework for multiclass classification. Ieee Trans Neural Netw Learn Syst 23(8):1291–1303Lu T-C, Yu G-R, Juang J-C (2013) Quantum-based algorithm for optimizing artificial neural networks. IEEE Trans Neural Netw Lear Syst 24(8):1266–1278Yao X (1999) Evolving artificial neural networks. Proc Ieee 87(9):1423–1447Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. Ieee Trans Neural Netw 8(3):694–713Mateo F, Sovilj D, Gadea-GironĂ©s R (2010) Approximate k-NN delta test minimization method using genetic algorithms: application to time series. NEUROCOMPUTING 73(10–12, Sp):2017–2029Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of the 5th international conference and data warehousing and knowledge discovery. DaWaK02, pp 170–180Fe J, Aliaga RJ, GironĂ©s RG (2013) Experimental platform for accelerate the training of anns with genetic algorithm and embedded system on fpga. In: IWINAC (2), pp 413–420Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Technical reportAbbass HA (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 25:265–281Ahmad F, Isa NAM, Hussain Z, Sulaiman SN (2013) A genetic algorithm-based multi-objective optimization of an artificial neural network classifier for breast cancer diagnosis. Neural Comput Appl 23(5):1427–1435Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf H (2009) A massively parallel coprocessor for convolutional neural networks. In: Application-specific systems, architectures and processors, 2009. ASAP 2009. 20th IEEE international conference on, July, pp 53–60Prado R, Melo J, Oliveira J, Neto A (2012) Fpga based implementation of a fuzzy neural network modular architecture for embedded systems. In: Neural networks (IJCNN), The 2012 international joint conference on, June, pp 1–7ÇavuƟlu M, Karakuzu C, Sahin S, Yakut M (2011) Neural network training based on fpga with floating point number format and its performance. Neural Comput Appl 20:195–202. doi: 10.1007/s00521-010-0423-3Wu G-D, Zhu Z-W, Lin B-W (2011) Reconfigurable back propagation based neural network architecture. In: Integrated circuits (ISIC), 2011 13th international symposium on, Dec, pp 67–70Pinjare SL, Kumar A (2012) Implementation of neural network back propagation training algorithm on fpga. Int J Comput Appl 52(6): 1–7, August, published by Foundation of Computer Science, New York, USAhttp://www.altera.comAliaga R, Gadea R, Colom R, Cerda J, Ferrando N, Herrero V (2009) A mixed hardware–software approach to flexible artificial neural network training on fpga. In: Systems, architectures, modeling, and simulation, 2009. SAMOS ’09. International symposium on, July, pp 1–8http://www.matlab.co

    A system to predict the S&P 500 using a bio-inspired algorithm

    Get PDF
    The goal of this research was to develop an algorithmic system capable of predicting the directional trend of the S&P 500 financial index. The approach I have taken was inspired by the biology of the human retina. Extensive research has been published attempting to predict different financial markets using historical data, testing on an in-sample and trend basis with many employing sophisticated mathematical techniques. In reviewing and evaluating these in-sample methodologies, it became evident that this approach was unable to achieve sufficiently reliable prediction performance for commercial exploitation. For these reasons, I moved to an out-of-sample strategy and am able to predict tomorrow’s (t+1) directional trend of the S&P 500 at 55.1%. The key elements that underpin my bio-inspired out-of-sample system are: Identification of 51 financial market data (FMD) inputs, including other indices, currency pairs, swap rates, that affect the 500 component companies of the S&P 500. The use of an extensive historical data set, comprising the actual daily closing prices of the chosen 51 FMD inputs and S&P 500. The ability to compute this large data set in a time frame of less than 24 hours. The data set was fed into a linear regression algorithm to determine the predicted value of tomorrow’s (t+1) S&P 500 closing price. This process was initially carried out in MatLab which proved the concept of my approach, but (3) above was not met. In order to successfully meet the requirement of handling such a large data set to complete the prediction target on time, I decided to adopt a novel graphics processing unit (GPU) based computational architecture. Through extensive optimisation of my GPU engine, I was able to achieve a sufficient speed up of 150x to meet (3). In achieving my optimum directional trend of 55.1%, an extensive range of tests exploring a number of trade offs were carried out using an 8 year data set. The results I have obtained will form the basis of a commercial investment fund. It should be noted that my algorithm uses financial data of the past 60-days, and as such would not be able to predict rapid market changes such as a stock market crash

    Review : Deep learning in electron microscopy

    Get PDF
    Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy

    FPGA Acceleration of Domain-specific Kernels via High-Level Synthesis

    Get PDF
    L'abstract Ăš presente nell'allegato / the abstract is in the attachmen

    Optimization of Deep Neural Networks Using SoCs with OpenCL

    Full text link
    [EN] In the optimization of deep neural networks (DNNs) via evolutionary algorithms (EAs) and the implementation of the training necessary for the creation of the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions implemented on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism of these devices, whereas hardware realizations based on heterogeneous platforms (combining central processing units (CPUs), graphics processing units (GPUs) and/or field-programmable gate arrays (FPGAs)) are designed based on different solutions using methodologies supported by different languages and using very different implementation criteria. This paper first presents a study that demonstrates the need for a heterogeneous (CPU-GPU-FPGA) platform to accelerate the optimization of artificial neural networks (ANNs) using genetic algorithms. Second, the paper presents implementations of the calculations related to the individuals evaluated in such an algorithm on different (CPU- and FPGA-based) platforms, but with the same source files written in OpenCL. The implementation of individuals on remote, low-cost FPGA systems on a chip (SoCs) is found to enable the achievement of good efficiency in terms of performance per watt.This research was funded by Spanish Agency of Research grant number FPA2016-78595-C3-3-R.Gadea Gironés, R.; Colom Palero, RJ.; Herrero Bosch, V. (2018). Optimization of Deep Neural Networks Using SoCs with OpenCL. Sensors. 18(5). https://doi.org/10.3390/s18051384S18

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Autonomously Reconfigurable Artificial Neural Network on a Chip

    Get PDF
    Artificial neural network (ANN), an established bio-inspired computing paradigm, has proved very effective in a variety of real-world problems and particularly useful for various emerging biomedical applications using specialized ANN hardware. Unfortunately, these ANN-based systems are increasingly vulnerable to both transient and permanent faults due to unrelenting advances in CMOS technology scaling, which sometimes can be catastrophic. The considerable resource and energy consumption and the lack of dynamic adaptability make conventional fault-tolerant techniques unsuitable for future portable medical solutions. Inspired by the self-healing and self-recovery mechanisms of human nervous system, this research seeks to address reliability issues of ANN-based hardware by proposing an Autonomously Reconfigurable Artificial Neural Network (ARANN) architectural framework. Leveraging the homogeneous structural characteristics of neural networks, ARANN is capable of adapting its structures and operations, both algorithmically and microarchitecturally, to react to unexpected neuron failures. Specifically, we propose three key techniques --- Distributed ANN, Decoupled Virtual-to-Physical Neuron Mapping, and Dual-Layer Synchronization --- to achieve cost-effective structural adaptation and ensure accurate system recovery. Moreover, an ARANN-enabled self-optimizing workflow is presented to adaptively explore a "Pareto-optimal" neural network structure for a given application, on the fly. Implemented and demonstrated on a Virtex-5 FPGA, ARANN can cover and adapt 93% chip area (neurons) with less than 1% chip overhead and O(n) reconfiguration latency. A detailed performance analysis has been completed based on various recovery scenarios

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
    • 

    corecore