617 research outputs found
Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
One of the significant shifts of the next-generation computing technologies will certainly be in
the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD
landmark, evolved as a widely deployed BD operating system. Its new features include
federation structure and many associated frameworks, which provide Hadoop 3.x with the
maturity to serve different markets. This dissertation addresses two leading issues involved in
exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely,
(i)Scalability that directly affects the system performance and overall throughput using
portable Docker containers. (ii) Security that spread the adoption of data protection practices
among practitioners using access controls. An Enhanced Mapreduce Environment (EME),
OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker
(BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for
data streaming to the cloud computing are the main contribution of this thesis study
EasyFL: A Low-code Federated Learning Platform For Dummies
Academia and industry have developed several platforms to support the popular
privacy-preserving distributed learning method -- Federated Learning (FL).
However, these platforms are complex to use and require a deep understanding of
FL, which imposes high barriers to entry for beginners, limits the productivity
of researchers, and compromises deployment efficiency. In this paper, we
propose the first low-code FL platform, EasyFL, to enable users with various
levels of expertise to experiment and prototype FL applications with little
coding. We achieve this goal while ensuring great flexibility and extensibility
for customization by unifying simple API design, modular design, and granular
training flow abstraction. With only a few lines of code, EasyFL empowers them
with many out-of-the-box functionalities to accelerate experimentation and
deployment. These practical functionalities are heterogeneity simulation,
comprehensive tracking, distributed training optimization, and seamless
deployment. They are proposed based on challenges identified in the proposed FL
life cycle. Compared with other platforms, EasyFL not only requires just three
lines of code (at least 10x lesser) to build a vanilla FL application but also
incurs lower training overhead. Besides, our evaluations demonstrate that
EasyFL expedites distributed training by 1.5x. It also improves the efficiency
of deployment. We believe that EasyFL will increase the productivity of
researchers and democratize FL to wider audiences
Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this article we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL modelโs computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5 percent for GPU resource utilization, 23.7โ30.7 percent for makespan reduction and 68.3 percent in job wait time reduction
System Abstractions for Scalable Application Development at the Edge
Recent years have witnessed an explosive growth of Internet of Things (IoT) devices, which collect or generate huge amounts of data. Given diverse device capabilities and application requirements, data processing takes place across a range of settings, from on-device to a nearby edge server/cloud and remote cloud. Consequently, edge-cloud coordination has been studied extensively from the perspectives of job placement, scheduling and joint optimization. Typical approaches focus on performance optimization for individual applications. This often requires domain knowledge of the applications, but also leads to application-specific solutions. Application development and deployment over diverse scenarios thus incur repetitive manual efforts. There are two overarching challenges to provide system-level support for application development at the edge. First, there is inherent heterogeneity at the device hardware level. The execution settings may range from a small cluster as an edge cloud to on-device inference on embedded devices, differing in hardware capability and programming environments. Further, application performance requirements vary significantly, making it even more difficult to map different applications to already heterogeneous hardware. Second, there are trends towards incorporating edge and cloud and multi-modal data. Together, these add further dimensions to the design space and increase the complexity significantly. In this thesis, we propose a novel framework to simplify application development and deployment over a continuum of edge to cloud. Our framework provides key connections between different dimensions of design considerations, corresponding to the application abstraction, data abstraction and resource management abstraction respectively. First, our framework masks hardware heterogeneity with abstract resource types through containerization, and abstracts away the application processing pipelines into generic flow graphs. Further, our framework further supports a notion of degradable computing for application scenarios at the edge that are driven by multimodal sensory input. Next, as video analytics is the killer app of edge computing, we include a generic data management service between video query systems and a video store to organize video data at the edge. We propose a video data unit abstraction based on a notion of distance between objects in the video, quantifying the semantic similarity among video data. Last, considering concurrent application execution, our framework supports multi-application offloading with device-centric control, with a userspace scheduler service that wraps over the operating system scheduler
๋ถ์ฐ ๊ธฐ๊ณ ํ์ต์ ์์ ํจ์จ์ ์ธ ์ํ์ ์ํ ๋์ ์ต์ ํ ๊ธฐ์
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ์ ๋ณ๊ณค.Machine Learning(ML) systems are widely used to extract insights from data. Ever increasing dataset sizes and model complexity gave rise to many efforts towards ef๏ฌcient distributed machine learning systems. One of the popular approaches to support large scale data and complicated models is the parameter server (PS) approach. In this approach, a training job runs with distributed worker and server tasks, where workers iteratively compute gradients to update the global model parameters that are kept in servers.
To improve the PS system performance, this dissertation proposes two solutions that automatically optimize resource ef๏ฌciency and system performance. First, we propose a solution that optimizes the resource con๏ฌguration and workload partitioning of distributed ML training on PS system. To ๏ฌnd the best con๏ฌguration, we build an Optimizer based on a cost model that works with online metrics. To ef๏ฌciently apply decisions by Optimizer, we design our runtime elastic to perform recon๏ฌguration in the background with minimal overhead.
The second solution optimizes the scheduling of resources and tasks of multiple ML training jobs in a shared cluster. Speci๏ฌcally, we co-locate jobs with complementary resource use to increase resource utilization, while executing their tasks with ๏ฌne-grained unit to avoid resource contention. To alleviate memory pressure by co-located jobs, we enable dynamic spill/reload of data, which adaptively changes the ratio of data between disk and memory.
We build a working system that implements our approaches. The above two solutions are implemented in the same system and share the runtime part that can dynamically migrate jobs between machines and reallocate machine resources. We evaluate our system with popular ML applications to verify the effectiveness of our solutions.๊ธฐ๊ณ ํ์ต ์์คํ
์ ๋ฐ์ดํฐ์ ์จ๊ฒจ์ง ์๋ฏธ๋ฅผ ๋ฝ์๋ด๊ธฐ ์ํด ๋๋ฆฌ ์ฌ์ฉ๋๊ณ ์๋ค. ๋ฐ์ดํฐ์
์ ํฌ๊ธฐ์ ๋ชจ๋ธ์ ๋ณต์ก๋๊ฐ ์ด๋๋๋ณด๋ค ์ปค์ง์ ๋ฐ๋ผ ํจ์จ์ ์ธ ๋ถ์ฐ ๊ธฐ๊ณ ํ์ต ์์คํ
์์ํ ๋ง์ ๋
ธ๋ ฅ๋ค์ด ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ํ๋ผ๋ฏธํฐ ์๋ฒ ๋ฐฉ์์ ๊ฑฐ๋ํ ์ค์ผ์ผ์ ๋ฐ์ดํฐ์ ๋ณต์กํ ๋ชจ๋ธ์ ์ง์ํ๊ธฐ ์ํ ์ ๋ช
ํ ๋ฐฉ๋ฒ๋ค ์ค ํ๋์ด๋ค. ์ด ๋ฐฉ์์์, ํ์ต ์์
์ ๋ถ์ฐ ์์ปค์ ์๋ฒ๋ค๋ก ๊ตฌ์ฑ๋๊ณ , ์์ปค๋ค์ ํ ๋น๋ ์
๋ ฅ ๋ฐ์ดํฐ๋ก๋ถํฐ ๋ฐ๋ณต์ ์ผ๋ก ๊ทธ๋ ๋์ธํธ๋ฅผ ๊ณ์ฐํ์ฌ ์๋ฒ๋ค์ ๋ณด๊ด๋ ๊ธ๋ก๋ฒ ๋ชจ๋ธ ํ ๋ผ๋ฏธํฐ๋ค์ ์
๋ฐ์ดํธํ๋ค.
ํ๋ผ๋ฏธํฐ ์๋ฒ ์์คํ
์ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํด, ์ด ๋
ผ๋ฌธ์์๋ ์๋์ ์ผ๋ก ์์ ํจ์จ์ฑ๊ณผ ์์คํ
์ฑ๋ฅ์ ์ต์ ํํ๋ ๋๊ฐ์ง์ ํด๋ฒ์ ์ ์ํ๋ค. ์ฒซ๋ฒ์งธ ํด๋ฒ์, ํ๋ผ๋ฏธํฐ ์์คํ
์์ ๋ถ์ฐ ๊ธฐ๊ณ ํ์ต์ ์ํ์์ ์์ ์ค์ ๋ฐ ์ํฌ๋ก๋ ๋ถ๋ฐฐ๋ฅผ ์๋ํํ๋ ๊ฒ์ด๋ค. ์ต๊ณ ์ ์ค์ ์ ์ฐพ๊ธฐ ์ํด ์ฐ๋ฆฌ๋ ์จ๋ผ์ธ ๋ฉํธ๋ฆญ์ ์ฌ์ฉํ๋ ๋น์ฉ ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ๋ Optimizer๋ฅผ ๋ง๋ค์๋ค. Optimizer์ ๊ฒฐ์ ์ ํจ์จ์ ์ผ๋ก ์ ์ฉํ๊ธฐ ์ํด, ์ฐ๋ฆฌ๋ ๋ฐํ์์ ๋์ ์ฌ์ค์ ์ ์ต์์ ์ค๋ฒํค๋๋ก ๋ฐฑ๊ทธ๋ผ์ด๋์์ ์ํํ๋๋ก ๋์์ธํ๋ค.
๋๋ฒ์งธ ํด๋ฒ์ ๊ณต์ ํด๋ฌ์คํฐ ์ํฉ์์ ์ฌ๋ฌ ๊ฐ์ ๊ธฐ๊ณ ํ์ต ์์
์ ์ธ๋ถ ์์
๊ณผ ์์์ ์ค์ผ์ฅด๋ง์ ์ต์ ํํ ๊ฒ์ด๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ์ฐ๋ฆฌ๋ ์ธ๋ถ ์์
๋ค์ ์ธ๋ฐํ ๋จ์๋ก ์ํํจ์ผ๋ก์จ ์์ ๊ฒฝ์์ ์ต์ ํ๊ณ , ์๋ก๋ฅผ ๋ณด์ํ๋ ์์ ์ฌ์ฉ ํจํด์ ๋ณด์ด๋ ์์
๋ค์ ๊ฐ์ ์์์ ํจ๊ป ์์น์์ผ ์์ ํ์ฉ์จ์ ๋์ด์ฌ๋ ธ๋ค. ํจ๊ป ์์นํ ์์
๋ค์ ๋ฉ๋ชจ๋ฆฌ ์๋ ฅ์ ๊ฒฝ๊ฐ์ํค๊ธฐ ์ํด ์ฐ๋ฆฌ๋ ๋์ ์ผ๋ก ๋ฐ์ดํฐ๋ฅผ ๋์คํฌ๋ก ๋ด๋ ธ๋ค๊ฐ ๋ค์ ๋ฉ๋ชจ๋ฆฌ๋ก ์ฝ์ด์ค๋ ๊ธฐ๋ฅ์ ์ง์ํจ๊ณผ ๋์์, ๋์คํฌ์ ๋ฉ๋ชจ๋ฆฌ๊ฐ์ ๋ฐ์ดํฐ ๋น์จ์ ์ํฉ์ ๋ง๊ฒ ์์คํ
์ด ์๋์ผ๋ก ๋ง์ถ๋๋ก ํ์๋ค.
์์ ํด๋ฒ๋ค์ ์ค์ฒดํํ๊ธฐ ์ํด, ์ค์ ๋์ํ๋ ์์คํ
์ ๋ง๋ค์๋ค. ๋๊ฐ์ง์ ํด๋ฒ์ ํ๋์ ์์คํ
์ ๊ตฌํํจ์ผ๋ก์จ, ๋์ ์ผ๋ก ์์
์ ๋จธ์ ๊ฐ์ ์ฎ๊ธฐ๊ณ ์์์ ์ฌํ ๋นํ ์ ์๋ ๋ฐํ์์ ๊ณต์ ํ๋ค. ํด๋น ์๋ฃจ์
๋ค์ ํจ๊ณผ๋ฅผ ๋ณด์ฌ์ฃผ๊ธฐ ์ํด, ์ด ์์คํ
์ ๋ง์ด ์ฌ์ฉ๋๋ ๊ธฐ๊ณ ํ์ต ์ดํ๋ฆฌ์ผ์ด์
์ผ๋ก ์คํํ์๊ณ ๊ธฐ์กด ์์คํ
๋ค ๋๋น ๋ฐ์ด๋ ์ฑ๋ฅ ํฅ์์ ๋ณด์ฌ์ฃผ์๋ค.Chapter1. Introduction 1
1.1 Distributed Machine Learning on Parameter Servers 1
1.2 Automating System Conguration of Distributed Machine Learning 2
1.3 Scheduling of Multiple Distributed Machine Learning Jobs 3
1.4 Contributions 5
1.5 Dissertation Structure 6
Chapter2. Background 7
Chapter3. Automating System Conguration of Distributed Machine Learning 10
3.1 System Conguration Challenges 11
3.2 Finding Good System Conguration 13
3.2.1 Cost Model 13
3.2.2 Cost Formulation 15
3.2.3 Optimization 16
3.3 Cruise 18
3.3.1 Optimizer 19
3.3.2 Elastic Runtime 21
3.4 Evaluation 26
3.4.1 Experimental Setup 26
3.4.2 Finding Baselines with Grid Search 28
3.4.3 Optimization in the Homogeneous Environment 28
3.4.4 Utilizing Opportunistic Resources 30
3.4.5 Optimization in the Heterogeneous Environment 31
3.4.6 Reconguration Speed 32
3.5 Related Work 33
3.6 Summary 34
Chapter4 A Scheduling Framework Optimized for Multiple Distributed Machine Learning Jobs 36
4.1 Resource Under-utilization Problems in PS ML Training 37
4.2 Harmony Overview 42
4.3 Multiplexing ML Jobs 43
4.3.1 Fine-grained Execution with Subtasks 44
4.3.2 Dynamic Grouping of Jobs 45
4.3.3 Dynamic Data Reloading 54
4.4 Evaluation 56
4.4.1 Baselines 56
4.4.2 Experimental Setup 57
4.4.3 Performance Comparison 59
4.4.4 Performance Breakdown 59
4.4.5 Workload Sensitivity Analysis 61
4.4.6 Accuracy of the Performance Model 63
4.4.7 Performance and Scalability of the Scheduling Algorithm 64
4.4.8 Dynamic Data Reloading 66
4.5 Discussion 67
4.6 Related Work 67
4.7 Summary 70
Chapter5 Conclusion 71
5.1 Summary 71
5.2 Future Work 71
5.2.1 Other Communication Architecture Support 71
5.2.2 Deep Learning & GPU Resource Support 72
์์ฝ 81Docto
Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration
Future AI applications require performance, reliability and privacy that the
existing, cloud-dependant system architectures cannot provide. In this article,
we study orchestration in the device-edge-cloud continuum, and focus on AI for
edge, that is, the AI methods used in resource orchestration. We claim that to
support the constantly growing requirements of intelligent applications in the
device-edge-cloud computing continuum, resource orchestration needs to embrace
edge AI and emphasize local autonomy and intelligence. To justify the claim, we
provide a general definition for continuum orchestration, and look at how
current and emerging orchestration paradigms are suitable for the computing
continuum. We describe certain major emerging research themes that may affect
future orchestration, and provide an early vision of an orchestration paradigm
that embraces those research themes. Finally, we survey current key edge AI
methods and look at how they may contribute into fulfilling the vision of
future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures
and new section
Rise of the Planet of Serverless Computing: A Systematic Review
Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications.
It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and
error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment
of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a
comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164
papers on 17 research directions of serverless computing, including performance optimization, programming framework, application
migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms
for serverless computing, as well as promising research opportunities
Efficient Resource Management for Deep Learning Clusters
Deep Learning (DL) is gaining rapid popularity in various domains, such as computer vision, speech recognition, etc. With the increasing demands, large clusters have been built to develop DL models (i.e., data preparation and model training). DL jobs have some unique features ranging from their hardware requirements to execution patterns. However, the resource management techniques applied in existing DL clusters have not yet been adapted to those new features, which leads to resource inefficiency and hurts the performance of DL jobs.
We observed three major challenges brought by DL jobs. First, data preparation jobs, which prepare training datasets from a large volume of raw data, are memory intensive. DL clusters often over-allocate memory resource to those jobs for protecting their performance, which causes memory underutilization in DL clusters. Second, the execution time of a DL training job is often unknown before job completion. Without such information, existing cluster schedulers are unable to minimize the average Job Completion Time (JCT) of those jobs. Third, model aggregations in Distributed Deep Learning (DDL) training are often assigned with a fixed group of CPUs. However, a large portion of those CPUs are wasted because the bursty model aggregations can not saturate them all the time.
In this thesis, we propose a suite of techniques to eliminate the mismatches between DL jobs and resource management in DL clusters. First, we bring the idea of memory disaggregation to enhance the memory utilization of DL clusters. The unused memory in data preparation jobs is exposed as remote memory to other machines that are running out of local memory. Second, we design a two-dimensional attained-service-based scheduler to optimize the average JCT of DL training jobs. This scheduler takes the temporal and spatial characteristics of DL training jobs into consideration and can efficiently schedule them without knowing their execution time. Third, we define a shared model aggregation service to reduce the CPU cost of DDL training. Using this service, model aggregations from different DDL training jobs are carefully packed together and use the same group of CPUs in a time-sharing manner. With these techniques, we demonstrate that huge improvements in resource efficiency and job performance can be obtained when the clusterโs resource management matches with the features of DL jobs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169955/1/jcgu_1.pd
- โฆ