2 research outputs found
Performance modelling for scalable deep learning
Performance modelling for scalable deep learning is very important to quantify the
efficiency of large parallel workloads. Performance models are used to obtain run-time
estimates by modelling various aspects of an application on a target system. Designing
performance models requires comprehensive analysis in order to build accurate models.
Limitations of current performance models include poor explainability in the computation
time of the internal processes of a neural network model and limited applicability to
particular architectures.
Existing performance models in deep learning have been proposed, which are broadly
categorized into two methodologies: analytical modelling and empirical modelling. Analytical
modelling utilizes a transparent approach that involves converting the internal
mechanisms of the model or applications into a mathematical model that corresponds to
the goals of the system. Empirical modelling predicts outcomes based on observation and
experimentation, characterizes algorithm performance using sample data, and is a good alternative
to analytical modelling. However, both these approaches have limitations, such
as poor explainability in the computation time of the internal processes of a neural network
model and poor generalisation. To address these issues, hybridization of the analytical and
empirical approaches has been applied, leading to the development of a novel generic performance
model that provides a general expression of a deep neural network framework
in a distributed environment, allowing for accurate performance analysis and prediction.
The contributions can be summarized as follows:
In the initial study, a comprehensive literature review led to the development of a performance
model based on synchronous stochastic gradient descent (S-SGD) for analysing
the execution time performance of deep learning frameworks in a multi-GPU environment.
This model’s evaluation involved three deep learning models (Convolutional Neural Networks (CNN), Autoencoder (AE), and Multilayer Perceptron (MLP)), implemented in three popular deep learning frameworks (MXNet, Chainer, and TensorFlow) respectively, with a focus on following an analytical approach. Additionally, a generic expression for the performance model was formulated, considering intrinsic parameters and extrinsic scaling factors that impact computing time in a distributed environment. This formulation involved a global optimization problem with a cost function dependent on unknown constants within the generic expression. Differential evolution was utilized to identify the best fitting values, matching experimentally determined computation times. Furthermore, to enhance the accuracy and stability of the performance model, regularization techniques were applied. Lastly, the proposed generic performance model underwent experimental evaluation in a real-world application. The results of this evaluation provided valuable insights into the influence of hyperparameters on performance, demonstrating the robustness and applicability of the performance model in understanding and optimizing model behavior