175 research outputs found
SECURITY-AWARE DATA MANAGEMENT AND PERFORMANCE OPTIMIZATION STRATEGIES FOR CLOUD STORAGE SYSTEMS
Ph.DDOCTOR OF PHILOSOPH
Baechi: Fast Device Placement of Machine Learning Graphs
Machine Learning graphs (or models) can be challenging or impossible to train
when either devices have limited memory, or models are large. To split the
model across devices, learning-based approaches are still popular. While these
result in model placements that train fast on data (i.e., low step times),
learning-based model-parallelism is time-consuming, taking many hours or days
to create a placement plan of operators on devices. We present the Baechi
system, the first to adopt an algorithmic approach to the placement problem for
running machine learning training graphs on small clusters of
memory-constrained devices. We integrate our implementation of Baechi into two
popular open-source learning frameworks: TensorFlow and PyTorch. Our
experimental results using GPUs show that: (i) Baechi generates placement plans
654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii)
Baechi-placed model's step (training) time is comparable to expert placements
in PyTorch, and only up to 6.2% worse than expert placements in TensorFlow. We
prove mathematically that our two algorithms are within a constant factor of
the optimal. Our work shows that compared to learning-based approaches,
algorithmic approaches can face different challenges for adaptation to Machine
learning systems, but also they offer proven bounds, and significant
performance benefits.Comment: Extended version of SoCC 2020 paper:
https://dl.acm.org/doi/10.1145/3419111.342130
Proceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014): Porto, Portugal
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014
Memory optimization techniques for embedded systems
Embedded systems have become ubiquitous and as a result optimization of the design and performance of programs that run on these systems have continued to remain as significant challenges to the computer systems research community. This dissertation addresses several key problems in the optimization of programs for embedded systems which include digital signal processors as the core processor. Chapter 2 develops an efficient and effective algorithm to construct a worm partition graph by finding a longest worm at the moment and maintaining the legality of scheduling. Proper assignment of offsets to variables in embedded DSPs plays a key role in determining the execution time and amount of program memory needed. Chapter 3 proposes a new approach of introducing a weight adjustment function and showed that its experimental results are slightly better and at least as well as the results of the previous works. Our solutions address several problems such as handling fragmented paths resulting from graph-based solutions, dealing with modify registers, and the effective utilization of multiple address registers. In addition to offset assignment, address register allocation is important for embedded DSPs. Chapter 4 develops a lower bound and an algorithm that can eliminate the explicit use of address register instructions in loops with array references. Scheduling of computations and the associated memory requirement are closely inter-related for loop computations. In Chapter 5, we develop a general framework for studying the trade-off between scheduling and storage requirements in nested loops that access multi-dimensional arrays. Tiling has long been used to improve the memory performance of loops. Only a sufficient condition for the legality of tiling was known previously. While it was conjectured that the sufficient condition would also become necessary for large enough tiles, there had been no precise characterization of what is large enough. Chapter 6 develops a new framework for characterizing tiling by viewing tiles as points on a lattice. This also leads to the development of conditions under the legality condition for tiling is both necessary and sufficient
- …