2 research outputs found

    AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference under Stochastic Variance

    Full text link
    Deep learning inference is increasingly run at the edge. As the programming and system stack support becomes mature, it enables acceleration opportunities within a mobile system, where the system performance envelope is scaled up with a plethora of programmable co-processors. Thus, intelligent services designed for mobile users can choose between running inference on the CPU or any of the co-processors on the mobile system, or exploiting connected systems, such as the cloud or a nearby, locally connected system. By doing so, the services can scale out the performance and increase the energy efficiency of edge mobile systems. This gives rise to a new challenge - deciding when inference should run where. Such execution scaling decision becomes more complicated with the stochastic nature of mobile-cloud execution, where signal strength variations of the wireless networks and resource interference can significantly affect real-time inference performance and system energy efficiency. To enable accurate, energy-efficient deep learning inference at the edge, this paper proposes AutoScale. AutoScale is an adaptive and light-weight execution scaling engine built upon the custom-designed reinforcement learning algorithm. It continuously learns and selects the most energy-efficient inference execution target by taking into account characteristics of neural networks and available systems in the collaborative cloud-edge execution environment while adapting to the stochastic runtime variance. Real system implementation and evaluation, considering realistic execution scenarios, demonstrate an average of 9.8 and 1.6 times energy efficiency improvement for DNN edge inference over the baseline mobile CPU and cloud offloading, while meeting the real-time performance and accuracy requirement

    Modern Multicore CPUs are not Energy Proportional: Opportunity for Bi-objective Optimization for Performance and Energy

    Full text link
    Energy proportionality is the key design goal followed by architects of modern multicore CPUs. One of its implications is that optimization of an application for performance will also optimize it for energy. In this work, we show that energy proportionality does not hold true for multicore CPUs. This finding creates the opportunity for bi-objective optimization of applications for performance and energy. We propose and study the first application-level method for bi-objective optimization of multithreaded data-parallel applications for performance and energy. The method uses two decision variables, the number of identical multithreaded kernels (threadgroups) executing the application and the number of threads in each threadgroup, with the workload always partitioned equally between the threadgroups. We experimentally demonstrate the efficiency of the method using four highly optimized multithreaded data-parallel applications, 2D fast Fourier transform based on FFTW and Intel MKL, and dense matrix-matrix multiplication using OpenBLAS and Intel MKL. Four modern multicore CPUs are used in the experiments. The experiments show that optimization for performance alone results in the increase in dynamic energy consumption by up to 89% and optimization for dynamic energy alone degrades the performance by up to 49%. By solving the bi-objective optimization problem, the method determines up to 11 globally Pareto-optimal solutions. Finally, we propose a qualitative dynamic energy model employing performance monitoring counters as parameters, which we use to explain the discovered energy nonproportionality and the Pareto-optimal solutions determined by our method. The model shows that the energy nonproportionality in our case is due to the activity of the data translation lookaside buffer (dTLB), which is disproportionately energy expensive
    corecore