Recently, deep learning has been an area of intense research. However, as a
kind of computing-intensive task, deep learning highly relies on the scale of
GPU memory, which is usually prohibitive and scarce. Although there are some
extensive works have been proposed for dynamic GPU memory management, they are
hard to be applied to systems with multiple dynamic workloads, such as
in-database machine learning systems.
In this paper, we demonstrated TENSILE, a method of managing GPU memory in
tensor granularity to reduce the GPU memory peak, considering the multiple
dynamic workloads. TENSILE tackled the cold-starting and across-iteration
scheduling problem existing in previous works. We implement TENSILE on a deep
learning framework built by ourselves and evaluated its performance. The
experiment results show that TENSILE can save more GPU memory with less extra
time overhead than prior works in both single and multiple dynamic workloads
scenarios