6,319 research outputs found

    A scheduling theory framework for GPU tasks eļ¬ƒcient execution

    Get PDF
    Concurrent execution of tasks in GPUs can reduce the computation time of a workload by overlapping data transfer and execution commands. However it is diļ¬ƒcult to implement an eļ¬ƒcient run- time scheduler that minimizes the workload makespan as many execution orderings should be evaluated. In this paper, we employ scheduling theory to build a model that takes into account the device capabili- ties, workload characteristics, constraints and objec- tive functions. In our model, GPU tasks schedul- ing is reformulated as a ļ¬‚ow shop scheduling prob- lem, which allow us to apply and compare well known methods already developed in the operations research ļ¬eld. In addition we develop a new heuristic, specif- ically focused on executing GPU commands, that achieves better scheduling results than previous tech- niques. Finally, a comprehensive evaluation, showing the suitability and robustness of this new approach, is conducted in three diļ¬€erent NVIDIA architectures (Kepler, Maxwell and Pascal).Proyecto TIN2016- 0920R, Universidad de MĆ”laga (Campus de Excelencia Internacional AndalucĆ­a Tech) y programa de donaciĆ³n de NVIDIA Corporation

    Run Time Approximation of Non-blocking Service Rates for Streaming Systems

    Full text link
    Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires monitoring and optimization of multiple communications links. Most techniques to optimize these links use queueing network models or network flow models, which require some idea of the actual execution rate of each independent compute kernel within the system. What we want to know is how fast can each kernel process data independent of other communicating kernels. This is known as the "service rate" of the kernel within the queueing literature. Current approaches to divining service rates are static. Modern workloads, however, are often dynamic. Shared cloud systems also present applications with highly dynamic execution environments (multiple users, hardware migration, etc.). It is therefore desirable to continuously re-tune an application during run time (online) in response to changing conditions. Our approach enables online service rate monitoring under most conditions, obviating the need for reliance on steady state predictions for what are probably non-steady state phenomena. First, some of the difficulties associated with online service rate determination are examined. Second, the algorithm to approximate the online non-blocking service rate is described. Lastly, the algorithm is implemented within the open source RaftLib framework for validation using a simple microbenchmark as well as two full streaming applications.Comment: technical repor
    • ā€¦
    corecore