6,290 research outputs found

    Dynamic GPU Energy Optimization for Machine Learning Training Workloads

    Get PDF
    GPUs are widely used to accelerate the training of machine learning workloads. As the machine learning models become increasingly larger, they require a longer time to train, which in turn leads to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing a set of novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the change of training iteration and only reprofile the performance counter when an iteration shift is detected. Then we use multi-objective models, based on the gradient boosting method, and a local search algorithm, to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with an average execution time increase of 5.1%
    • …
    corecore