881 research outputs found

    Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

    Full text link
    Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning

    Intrinsically Evolvable Artificial Neural Networks

    Get PDF
    Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented

    Evolutionary optimization of neural networks with heterogeneous computation: study and implementation

    Full text link
    In the optimization of artificial neural networks (ANNs) via evolutionary algorithms and the implementation of the necessary training for the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism, whereas hardware realizations usually rely on optimizations that reduce the range of applicable network topologies, or they attempt to increase processing efficiency by means of low-precision data representation. This paper presents, first of all, a study that shows the need of heterogeneous platform (CPU–GPU–FPGA) to accelerate the optimization of ANNs using genetic algorithms and, secondly, an implementation of a platform based on embedded systems with hardware accelerators implemented in Field Pro-grammable Gate Array (FPGA). The implementation of the individuals on a remote low-cost Altera FPGA allowed us to obtain a 3x–4x acceleration compared with a 2.83 GHz Intel Xeon Quad-Core and 6x–7x compared with a 2.2 GHz AMD Opteron Quad-Core 2354.The translation of this paper was funded by the Universitat Politecnica de Valencia, Spain.Fe, JD.; Aliaga Varea, RJ.; Gadea GironĂ©s, R. (2015). Evolutionary optimization of neural networks with heterogeneous computation: study and implementation. The Journal of Supercomputing. 71(8):2944-2962. doi:10.1007/s11227-015-1419-7S29442962718Farmahini-Farahani A, Vakili S, Fakhraie SM, Safari S, Lucas C (2010) Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization. Eng Appl Artif Intell 23(2):177–187Curteanu S, Cartwright H (2011) Neural networks applied in chemistry. i. Determination of the optimal topology of multilayer perceptron neural networks. J Chemom 25(10):527–549. doi: 10.1002/cem.1401Islam MM, Sattar MA, Amin MF, Yao X, Murase K (2009) A new adaptive merging and growing algorithm for designing artificial neural networks. Ieee Trans Syst Man Cybern Part B-Cybern 39(3):705–722Han KH, Kim JH (2004) Quantum-inspired evolutionary algorithms with a new termination criterion, h-epsilon gate, and two-phase scheme. Ieee Trans Evol Comput 8(2):156–169Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. Ieee Trans Neural Netw 14(1):79–88Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid taguchi-genetic algorithm. Ieee Trans Neural Netw 17(1):69–80Ludermir TB, Yamazaki A, Zanchettin C (2006) An optimization methodology for neural network weights and architectures. Ieee Trans Neural Netw 17(6):1452–1459Palmes PP, Hayasaka T, Usui S (2005) Mutation-based genetic neural network. Trans Neural Netw 16(3):587–600. doi: 10.1109/TNN.2005.844858Mu T, Jiang J, Wang Y, Goulermas JY (2012) Adaptive data embedding framework for multiclass classification. Ieee Trans Neural Netw Learn Syst 23(8):1291–1303Lu T-C, Yu G-R, Juang J-C (2013) Quantum-based algorithm for optimizing artificial neural networks. IEEE Trans Neural Netw Lear Syst 24(8):1266–1278Yao X (1999) Evolving artificial neural networks. Proc Ieee 87(9):1423–1447Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. Ieee Trans Neural Netw 8(3):694–713Mateo F, Sovilj D, Gadea-GironĂ©s R (2010) Approximate k-NN delta test minimization method using genetic algorithms: application to time series. NEUROCOMPUTING 73(10–12, Sp):2017–2029Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of the 5th international conference and data warehousing and knowledge discovery. DaWaK02, pp 170–180Fe J, Aliaga RJ, GironĂ©s RG (2013) Experimental platform for accelerate the training of anns with genetic algorithm and embedded system on fpga. In: IWINAC (2), pp 413–420Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Technical reportAbbass HA (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 25:265–281Ahmad F, Isa NAM, Hussain Z, Sulaiman SN (2013) A genetic algorithm-based multi-objective optimization of an artificial neural network classifier for breast cancer diagnosis. Neural Comput Appl 23(5):1427–1435Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf H (2009) A massively parallel coprocessor for convolutional neural networks. In: Application-specific systems, architectures and processors, 2009. ASAP 2009. 20th IEEE international conference on, July, pp 53–60Prado R, Melo J, Oliveira J, Neto A (2012) Fpga based implementation of a fuzzy neural network modular architecture for embedded systems. In: Neural networks (IJCNN), The 2012 international joint conference on, June, pp 1–7ÇavuƟlu M, Karakuzu C, Sahin S, Yakut M (2011) Neural network training based on fpga with floating point number format and its performance. Neural Comput Appl 20:195–202. doi: 10.1007/s00521-010-0423-3Wu G-D, Zhu Z-W, Lin B-W (2011) Reconfigurable back propagation based neural network architecture. In: Integrated circuits (ISIC), 2011 13th international symposium on, Dec, pp 67–70Pinjare SL, Kumar A (2012) Implementation of neural network back propagation training algorithm on fpga. Int J Comput Appl 52(6): 1–7, August, published by Foundation of Computer Science, New York, USAhttp://www.altera.comAliaga R, Gadea R, Colom R, Cerda J, Ferrando N, Herrero V (2009) A mixed hardware–software approach to flexible artificial neural network training on fpga. In: Systems, architectures, modeling, and simulation, 2009. SAMOS ’09. International symposium on, July, pp 1–8http://www.matlab.co

    General Purpose Computing on Graphics Processing Units for Accelerated Deep Learning in Neural Networks

    Get PDF
    Graphics processing units (GPUs) contain a significant number of cores relative to central processing units (CPUs), allowing them to handle high levels of parallelization in multithreading. A general-purpose GPU (GPGPU) is a GPU that has its threads and memory repurposed on a software level to leverage the multithreading made possible by the GPU’s hardware, and thus is an extremely strong platform for intense computing – there is no hardware difference between GPUs and GPGPUs. Deep learning is one such example of intense computing that is best implemented on a GPGPU, as its hardware structure of a grid of blocks, each containing processing threads, can handle the immense number of necessary calculations in parallel. A convolutional neural network (CNN) created for financial data analysis shows this advantage in the runtime of the training and testing of a neural network
