Modern computing systems comprise heterogeneous designs which combine multiple
and diverse architectures on a single system. These designs provide potentials for
high performance under reduced power requirements but require advanced resource
management and workload scheduling across the available processors.
Programmability frameworks, such as OpenCL and CUDA, enable resource management
and workload scheduling on heterogeneous systems. These frameworks fully
assign the control of resource allocation and scheduling to the application. This design
sufficiently serves the needs of dedicated application systems but introduces significant
challenges for multi-tasking environments where multiple users and applications
compete for access to system resources.
This thesis considers these challenges and presents three major contributions that
enable efficient multi-tasking on heterogeneous systems. The presented contributions
are compatible with existing systems, remain portable across vendors and do not require
application changes or recompilation.
The first contribution of this thesis is an optimization technique that reduces host-device
communication overhead for OpenCL applications. It does this without modification
or recompilation of the application source code and is portable across platforms.
This work enables efficiency and performance improvements for diverse application
workloads found on multi-tasking systems.
The second contribution is the design and implementation of a secure, user-space
virtualization layer that integrates the accelerator resources of a system with the standard
multi-tasking and user-space virtualization facilities of the commodity Linux OS.
It enables fine-grained sharing of mixed-vendor accelerator resources and targets heterogeneous
systems found in data center nodes and requires no modification to the OS,
OpenCL or application.
Lastly, the third contribution is a technique and software infrastructure that enable
resource sharing control on accelerators, while supporting software managed scheduling
on accelerators. The infrastructure remains transparent to existing systems and
applications and requires no modifications or recompilation. In enforces fair accelerator
sharing which is required for multi-tasking purposes