Heterogeneous platforms play an increasingly important role in modern computer
systems. They combine high performance with low power consumption. From mobiles
to supercomputers, we see an increasing number of computer systems that are
heterogeneous.
The most well-known heterogeneous system, CPU+GPU platforms have been widely
used in recent years. As they become more mainstream, serving multiple tasks from
multiple users is an emerging challenge. A good scheduler can greatly improve performance.
However, indiscriminately allocating tasks based on availability leads to poor
performance. As modern GPUs have a large number of hardware resources, most tasks
cannot efficiently utilize all of them. Concurrent task execution on GPU is a promising
solution, however, indiscriminately running tasks in parallel causes a slowdown.
This thesis focuses on scheduling OpenCL kernels. A runtime framework is developed
to determine where to schedule OpenCL kernels. It predicts the best-fit device by
using a machine learning-based classifier, then schedules the kernels accordingly to either
CPU or GPU. To improve GPU utilization, a kernel merging approach is proposed.
Kernels are merged if their predicted co-execution can provide better performance than
sequential execution. A machine learning based classifier is developed to find the best
kernel pairs for co-execution on GPU. Finally, a runtime framework is developed to
schedule kernels separately on either CPU or GPU, and run kernels in pairs if their
co-execution can improve performance. The approaches developed in this thesis significantly
improve system performance and outperform all existing techniques