5 research outputs found

    DeepPlace: Learning to Place Applications in Multi-Tenant Clusters

    Full text link
    Large multi-tenant production clusters often have to handle a variety of jobs and applications with a variety of complex resource usage characteristics. It is non-trivial and non-optimal to manually create placement rules for scheduling that would decide which applications should co-locate. In this paper, we present DeepPlace, a scheduler that learns to exploits various temporal resource usage patterns of applications using Deep Reinforcement Learning (Deep RL) to reduce resource competition across jobs running in the same machine while at the same time optimizing for overall cluster utilization.Comment: APSys 201

    Local memory-aware kernel perforation

    Get PDF
    Many applications provide inherent resilience to some amount of error and can potentially trade accuracy for performance by using approximate computing. Applications running on GPUs often use local memory to minimize the number of global memory accesses and to speed up execution. Local memory can also be very useful to improve the way approximate computation is performed, e.g., by improving the quality of approximation with data reconstruction techniques. This paper introduces local memory-aware perforation techniques specifically designed for the acceleration and approximation of GPU kernels. We propose a local memory-aware kernel perforation technique that first skips the loading of parts of the input data from global memory, and later uses reconstruction techniques on local memory to reach higher accuracy while having performance similar to state-of-the-art techniques. Experiments show that our approach is able to accelerate the execution of a variety of applications from 1.6× to 3× while introducing an average error of 6%, which is much smaller than that of other approaches. Results further show how much the error depends on the input data and application scenario, the impact of local memory tuning and different parameter configurations

    On Performance Optimization and Quality Control for Approximate-Communication-Enabled Networks-on-Chip

    Get PDF
    For many applications showing error forgiveness, approximate computing is a new design paradigm that trades application output accuracy for mitigating computation/communication effort, which results in performance/energy benefit. Since networks-on-chip (NoCs) are one of the major contributors to system performance and power consumption, the underlying communication is approximated to achieve time/energy improvement. However, performing approximation blindly causes unacceptable quality loss. In this article, first, an optimization problem to maximize NoC performance is formulated with the constraint of application quality requirement, and the application quality loss is studied. Second, a congestion-aware quality control method is proposed to improve system performance by aggressively dropping network data, which is based on flow prediction and a lightweight heuristic. In the experiments, two recent approximation methods for NoCs are augmented with our proposed control method to compare with their original ones. Experimental results show that our proposed method can speed up execution by as much as 29.42% over the two state-of-the-art works
    corecore