Search CORE

881 research outputs found

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Author: Ben-Nun Tal
Hoefler Torsten
Publication venue
Publication date: 15/09/2018
Field of study

Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning

arXiv.org e-Print Archive

Repository for Publications and Research Data

Intrinsically Evolvable Artificial Neural Networks

Author: Merchant Saumil Girish
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2007
Field of study

Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented

University of Tennessee, Knoxville: Trace

Evolutionary optimization of neural networks with heterogeneous computation: study and implementation

Author: Aliaga Varea Ramón José
Fe Jorge Deolindo
Gadea Gironés Rafael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/04/2015
Field of study

In the optimization of artificial neural networks (ANNs) via evolutionary algorithms and the implementation of the necessary training for the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism, whereas hardware realizations usually rely on optimizations that reduce the range of applicable network topologies, or they attempt to increase processing efficiency by means of low-precision data representation. This paper presents, first of all, a study that shows the need of heterogeneous platform (CPU–GPU–FPGA) to accelerate the optimization of ANNs using genetic algorithms and, secondly, an implementation of a platform based on embedded systems with hardware accelerators implemented in Field Pro-grammable Gate Array (FPGA). The implementation of the individuals on a remote low-cost Altera FPGA allowed us to obtain a 3x–4x acceleration compared with a 2.83 GHz Intel Xeon Quad-Core and 6x–7x compared with a 2.2 GHz AMD Opteron Quad-Core 2354.The translation of this paper was funded by the Universitat Politecnica de Valencia, Spain.Fe, JD.; Aliaga Varea, RJ.; Gadea Gironés, R. (2015). Evolutionary optimization of neural networks with heterogeneous computation: study and implementation. The Journal of Supercomputing. 71(8):2944-2962. doi:10.1007/s11227-015-1419-7S29442962718Farmahini-Farahani A, Vakili S, Fakhraie SM, Safari S, Lucas C (2010) Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization. Eng Appl Artif Intell 23(2):177–187Curteanu S, Cartwright H (2011) Neural networks applied in chemistry. i. Determination of the optimal topology of multilayer perceptron neural networks. J Chemom 25(10):527–549. doi: 10.1002/cem.1401Islam MM, Sattar MA, Amin MF, Yao X, Murase K (2009) A new adaptive merging and growing algorithm for designing artificial neural networks. Ieee Trans Syst Man Cybern Part B-Cybern 39(3):705–722Han KH, Kim JH (2004) Quantum-inspired evolutionary algorithms with a new termination criterion, h-epsilon gate, and two-phase scheme. Ieee Trans Evol Comput 8(2):156–169Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. Ieee Trans Neural Netw 14(1):79–88Tsai JT, Chou JH, Liu TK (2006) Tuning the structure and parameters of a neural network by using hybrid taguchi-genetic algorithm. Ieee Trans Neural Netw 17(1):69–80Ludermir TB, Yamazaki A, Zanchettin C (2006) An optimization methodology for neural network weights and architectures. Ieee Trans Neural Netw 17(6):1452–1459Palmes PP, Hayasaka T, Usui S (2005) Mutation-based genetic neural network. Trans Neural Netw 16(3):587–600. doi: 10.1109/TNN.2005.844858Mu T, Jiang J, Wang Y, Goulermas JY (2012) Adaptive data embedding framework for multiclass classification. Ieee Trans Neural Netw Learn Syst 23(8):1291–1303Lu T-C, Yu G-R, Juang J-C (2013) Quantum-based algorithm for optimizing artificial neural networks. IEEE Trans Neural Netw Lear Syst 24(8):1266–1278Yao X (1999) Evolving artificial neural networks. Proc Ieee 87(9):1423–1447Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. Ieee Trans Neural Netw 8(3):694–713Mateo F, Sovilj D, Gadea-Gironés R (2010) Approximate k-NN delta test minimization method using genetic algorithms: application to time series. NEUROCOMPUTING 73(10–12, Sp):2017–2029Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of the 5th international conference and data warehousing and knowledge discovery. DaWaK02, pp 170–180Fe J, Aliaga RJ, Gironés RG (2013) Experimental platform for accelerate the training of anns with genetic algorithm and embedded system on fpga. In: IWINAC (2), pp 413–420Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Technical reportAbbass HA (2002) An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med 25:265–281Ahmad F, Isa NAM, Hussain Z, Sulaiman SN (2013) A genetic algorithm-based multi-objective optimization of an artificial neural network classifier for breast cancer diagnosis. Neural Comput Appl 23(5):1427–1435Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf H (2009) A massively parallel coprocessor for convolutional neural networks. In: Application-specific systems, architectures and processors, 2009. ASAP 2009. 20th IEEE international conference on, July, pp 53–60Prado R, Melo J, Oliveira J, Neto A (2012) Fpga based implementation of a fuzzy neural network modular architecture for embedded systems. In: Neural networks (IJCNN), The 2012 international joint conference on, June, pp 1–7Çavuşlu M, Karakuzu C, Sahin S, Yakut M (2011) Neural network training based on fpga with floating point number format and its performance. Neural Comput Appl 20:195–202. doi: 10.1007/s00521-010-0423-3Wu G-D, Zhu Z-W, Lin B-W (2011) Reconfigurable back propagation based neural network architecture. In: Integrated circuits (ISIC), 2011 13th international symposium on, Dec, pp 67–70Pinjare SL, Kumar A (2012) Implementation of neural network back propagation training algorithm on fpga. Int J Comput Appl 52(6): 1–7, August, published by Foundation of Computer Science, New York, USAhttp://www.altera.comAliaga R, Gadea R, Colom R, Cerda J, Ferrando N, Herrero V (2009) A mixed hardware–software approach to flexible artificial neural network training on fpga. In: Systems, architectures, modeling, and simulation, 2009. SAMOS ’09. International symposium on, July, pp 1–8http://www.matlab.co

Crossref

RiuNet

Recommended from our members

Computing resources sensitive parallelization of neural neworks for large scale diabetes data modelling, diagnosis and prediction

Author: Qi Hao
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Diabetes has become one of the most severe deceases due to an increasing number of diabetes patients globally. A large amount of digital data on diabetes has been collected through various channels. How to utilize these data sets to help doctors to make a decision on diagnosis, treatment and prediction of diabetic patients poses many challenges to the research community. The thesis investigates mathematical models with a focus on neural networks for large scale diabetes data modelling and analysis by utilizing modern computing technologies such as grid computing and cloud computing. These computing technologies provide users with an inexpensive way to have access to extensive computing resources over the Internet for solving data and computationally intensive problems. This thesis evaluates the performance of seven representative machine learning techniques in classification of diabetes data and the results show that neural network produces the best accuracy in classification but incurs high overhead in data training. As a result, the thesis develops MRNN, a parallel neural network model based on the MapReduce programming model which has become an enabling technology in support of data intensive applications in the clouds. By partitioning the diabetic data set into a number of equally sized data blocks, the workload in training is distributed among a number of computing nodes for speedup in data training. MRNN is first evaluated in small scale experimental environments using 12 mappers and subsequently is evaluated in large scale simulated environments using up to 1000 mappers. Both the experimental and simulations results have shown the effectiveness of MRNN in classification, and its high scalability in data training. MapReduce does not have a sophisticated job scheduling scheme for heterogonous computing environments in which the computing nodes may have varied computing capabilities. For this purpose, this thesis develops a load balancing scheme based on genetic algorithms with an aim to balance the training workload among heterogeneous computing nodes. The nodes with more computing capacities will receive more MapReduce jobs for execution. Divisible load theory is employed to guide the evolutionary process of the genetic algorithm with an aim to achieve fast convergence. The proposed load balancing scheme is evaluated in large scale simulated MapReduce environments with varied levels of heterogeneity using different sizes of data sets. All the results show that the genetic algorithm based load balancing scheme significantly reduce the makespan in job execution in comparison with the time consumed without load balancing.This work is funded by the EPSRC and China Market Association

Brunel University Research Archive

General Purpose Computing on Graphics Processing Units for Accelerated Deep Learning in Neural Networks

Author: Helmick Conor
Publication venue: Scholars Crossing
Publication date: 01/04/2022
Field of study

Graphics processing units (GPUs) contain a significant number of cores relative to central processing units (CPUs), allowing them to handle high levels of parallelization in multithreading. A general-purpose GPU (GPGPU) is a GPU that has its threads and memory repurposed on a software level to leverage the multithreading made possible by the GPU’s hardware, and thus is an extremely strong platform for intense computing – there is no hardware difference between GPUs and GPGPUs. Deep learning is one such example of intense computing that is best implemented on a GPGPU, as its hardware structure of a grid of blocks, each containing processing threads, can handle the immense number of necessary calculations in parallel. A convolutional neural network (CNN) created for financial data analysis shows this advantage in the runtime of the training and testing of a neural network

Liberty University Digital Commons