Search CORE

617 research outputs found

Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

Author: Awaysheh Feras Mahmoud Naji
Publication venue
Publication date: 01/01/2020
Field of study

One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

Repositorio Institucional da Universidade de Santiago de Compostela

EasyFL: A Low-code Federated Learning Platform For Dummies

Author: Gan Xin
Wen Yonggang
Zhang Shuai
Zhuang Weiming
Publication venue
Publication date: 03/07/2021
Field of study

Academia and industry have developed several platforms to support the popular privacy-preserving distributed learning method -- Federated Learning (FL). However, these platforms are complex to use and require a deep understanding of FL, which imposes high barriers to entry for beginners, limits the productivity of researchers, and compromises deployment efficiency. In this paper, we propose the first low-code FL platform, EasyFL, to enable users with various levels of expertise to experiment and prototype FL applications with little coding. We achieve this goal while ensuring great flexibility and extensibility for customization by unifying simple API design, modular design, and granular training flow abstraction. With only a few lines of code, EasyFL empowers them with many out-of-the-box functionalities to accelerate experimentation and deployment. These practical functionalities are heterogeneity simulation, comprehensive tracking, distributed training optimization, and seamless deployment. They are proposed based on challenges identified in the proposed FL life cycle. Compared with other platforms, EasyFL not only requires just three lines of code (at least 10x lesser) to build a vanilla FL application but also incurs lower training overhead. Besides, our evaluations demonstrate that EasyFL expedites distributed training by 1.5x. It also improves the efficiency of deployment. We believe that EasyFL will increase the productivity of researchers and democratize FL to wider audiences

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Author: Borowiec D
Friday A
Garraghan P
Harper R
Yang R
Yeung G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/05/2021
Field of study

To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this article we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model’s computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5 percent for GPU resource utilization, 23.7–30.7 percent for makespan reduction and 68.3 percent in job wait time reduction

Lancaster E-Prints

White Rose Research Online

System Abstractions for Scalable Application Development at the Edge

Author: Hu Bo
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2022
Field of study

Recent years have witnessed an explosive growth of Internet of Things (IoT) devices, which collect or generate huge amounts of data. Given diverse device capabilities and application requirements, data processing takes place across a range of settings, from on-device to a nearby edge server/cloud and remote cloud. Consequently, edge-cloud coordination has been studied extensively from the perspectives of job placement, scheduling and joint optimization. Typical approaches focus on performance optimization for individual applications. This often requires domain knowledge of the applications, but also leads to application-specific solutions. Application development and deployment over diverse scenarios thus incur repetitive manual efforts. There are two overarching challenges to provide system-level support for application development at the edge. First, there is inherent heterogeneity at the device hardware level. The execution settings may range from a small cluster as an edge cloud to on-device inference on embedded devices, differing in hardware capability and programming environments. Further, application performance requirements vary significantly, making it even more difficult to map different applications to already heterogeneous hardware. Second, there are trends towards incorporating edge and cloud and multi-modal data. Together, these add further dimensions to the design space and increase the complexity significantly. In this thesis, we propose a novel framework to simplify application development and deployment over a continuum of edge to cloud. Our framework provides key connections between different dimensions of design considerations, corresponding to the application abstraction, data abstraction and resource management abstraction respectively. First, our framework masks hardware heterogeneity with abstract resource types through containerization, and abstracts away the application processing pipelines into generic flow graphs. Further, our framework further supports a notion of degradable computing for application scenarios at the edge that are driven by multimodal sensory input. Next, as video analytics is the killer app of edge computing, we include a generic data management service between video query systems and a video store to organize video data at the edge. We propose a video data unit abstraction based on a notion of distance between objects in the video, quantifying the semantic similarity among video data. Last, considering concurrent application execution, our framework supports multi-application offloading with device-centric control, with a userspace scheduler service that wraps over the operating system scheduler

Yale University

분산 기계 학습의 자원 효율적인 수행을 위한 동적 최적화 기술

Author: 이우연
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 전병곤.Machine Learning(ML) systems are widely used to extract insights from data. Ever increasing dataset sizes and model complexity gave rise to many efforts towards efﬁcient distributed machine learning systems. One of the popular approaches to support large scale data and complicated models is the parameter server (PS) approach. In this approach, a training job runs with distributed worker and server tasks, where workers iteratively compute gradients to update the global model parameters that are kept in servers. To improve the PS system performance, this dissertation proposes two solutions that automatically optimize resource efﬁciency and system performance. First, we propose a solution that optimizes the resource conﬁguration and workload partitioning of distributed ML training on PS system. To ﬁnd the best conﬁguration, we build an Optimizer based on a cost model that works with online metrics. To efﬁciently apply decisions by Optimizer, we design our runtime elastic to perform reconﬁguration in the background with minimal overhead. The second solution optimizes the scheduling of resources and tasks of multiple ML training jobs in a shared cluster. Speciﬁcally, we co-locate jobs with complementary resource use to increase resource utilization, while executing their tasks with ﬁne-grained unit to avoid resource contention. To alleviate memory pressure by co-located jobs, we enable dynamic spill/reload of data, which adaptively changes the ratio of data between disk and memory. We build a working system that implements our approaches. The above two solutions are implemented in the same system and share the runtime part that can dynamically migrate jobs between machines and reallocate machine resources. We evaluate our system with popular ML applications to verify the effectiveness of our solutions.기계 학습 시스템은 데이터에 숨겨진 의미를 뽑아내기 위해 널리 사용되고 있다. 데이터셋의 크기와 모델의 복잡도가 어느때보다 커짐에 따라 효율적인 분산 기계 학습 시스템을위한 많은 노력들이 이루어지고 있다. 파라미터 서버 방식은 거대한 스케일의 데이터와 복잡한 모델을 지원하기 위한 유명한 방법들 중 하나이다. 이 방식에서, 학습 작업은 분산 워커와 서버들로 구성되고, 워커들은 할당된 입력 데이터로부터 반복적으로 그레디언트를 계산하여 서버들에 보관된 글로벌 모델 파 라미터들을 업데이트한다. 파라미터 서버 시스템의 성능을 향상시키기 위해, 이 논문에서는 자동적으로 자원 효율성과 시스템 성능을 최적화하는 두가지의 해법을 제안한다. 첫번째 해법은, 파라미터 시스템에서 분산 기계 학습을 수행시에 자원 설정 및 워크로드 분배를 자동화하는 것이다. 최고의 설정을 찾기 위해 우리는 온라인 메트릭을 사용하는 비용 모델을 기반으로 하는 Optimizer를 만들었다. Optimizer의 결정을 효율적으로 적용하기 위해, 우리는 런타임을 동적 재설정을 최소의 오버헤드로 백그라운드에서 수행하도록 디자인했다. 두번째 해법은 공유 클러스터 상황에서 여러 개의 기계 학습 작업의 세부 작업 과 자원의 스케쥴링을 최적화한 것이다. 구체적으로, 우리는 세부 작업들을 세밀한 단위로 수행함으로써 자원 경쟁을 억제하고, 서로를 보완하는 자원 사용 패턴을 보이는 작업들을 같은 자원에 함께 위치시켜 자원 활용율을 끌어올렸다. 함께 위치한 작업들의 메모리 압력을 경감시키기 위해 우리는 동적으로 데이터를 디스크로 내렸다가 다시 메모리로 읽어오는 기능을 지원함과 동시에, 디스크와 메모리간의 데이터 비율을 상황에 맞게 시스템이 자동으로 맞추도록 하였다. 위의 해법들을 실체화하기 위해, 실제 동작하는 시스템을 만들었다. 두가지의 해법을 하나의 시스템에 구현함으로써, 동적으로 작업을 머신 간에 옮기고 자원을 재할당할 수 있는 런타임을 공유한다. 해당 솔루션들의 효과를 보여주기 위해, 이 시스템을 많이 사용되는 기계 학습 어플리케이션으로 실험하였고 기존 시스템들 대비 뛰어난 성능 향상을 보여주었다.Chapter1. Introduction 1 1.1 Distributed Machine Learning on Parameter Servers 1 1.2 Automating System Conguration of Distributed Machine Learning 2 1.3 Scheduling of Multiple Distributed Machine Learning Jobs 3 1.4 Contributions 5 1.5 Dissertation Structure 6 Chapter2. Background 7 Chapter3. Automating System Conguration of Distributed Machine Learning 10 3.1 System Conguration Challenges 11 3.2 Finding Good System Conguration 13 3.2.1 Cost Model 13 3.2.2 Cost Formulation 15 3.2.3 Optimization 16 3.3 Cruise 18 3.3.1 Optimizer 19 3.3.2 Elastic Runtime 21 3.4 Evaluation 26 3.4.1 Experimental Setup 26 3.4.2 Finding Baselines with Grid Search 28 3.4.3 Optimization in the Homogeneous Environment 28 3.4.4 Utilizing Opportunistic Resources 30 3.4.5 Optimization in the Heterogeneous Environment 31 3.4.6 Reconguration Speed 32 3.5 Related Work 33 3.6 Summary 34 Chapter4 A Scheduling Framework Optimized for Multiple Distributed Machine Learning Jobs 36 4.1 Resource Under-utilization Problems in PS ML Training 37 4.2 Harmony Overview 42 4.3 Multiplexing ML Jobs 43 4.3.1 Fine-grained Execution with Subtasks 44 4.3.2 Dynamic Grouping of Jobs 45 4.3.3 Dynamic Data Reloading 54 4.4 Evaluation 56 4.4.1 Baselines 56 4.4.2 Experimental Setup 57 4.4.3 Performance Comparison 59 4.4.4 Performance Breakdown 59 4.4.5 Workload Sensitivity Analysis 61 4.4.6 Accuracy of the Performance Model 63 4.4.7 Performance and Scalability of the Scheduling Algorithm 64 4.4.8 Dynamic Data Reloading 66 4.5 Discussion 67 4.6 Related Work 67 4.7 Summary 70 Chapter5 Conclusion 71 5.1 Summary 71 5.2 Future Work 71 5.2.1 Other Communication Architecture Support 71 5.2.2 Deep Learning & GPU Resource Support 72 요약 81Docto

SNU Open Repository and Archive

Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration

Author: Angulo Iñigo
Bennis Mehdi
Dustdar Schahram
González-Gil Alfonso
Kokkonen Henna
Kostakos Panos
Leppänen Teemu
Liyanage Madhusanka
Lovén Lauri
Motlagh Naser Hossein
Nguyen Tri
Partala Juha
Pirttikangas Susanna
Pujol Víctor Casamayor
Riekki Jukka
Sola Ester
Tarkoma Sasu
Publication venue
Publication date: 01/07/2022
Field of study

Future AI applications require performance, reliability and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures and new section

arXiv.org e-Print Archive

Rise of the Planet of Serverless Computing: A Systematic Review

Author: Chen Zhenpeng
Jin Xin
Liu Xuanzhe
Wen Jinfeng
Publication venue
Publication date: 01/02/2023
Field of study

Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications. It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164 papers on 17 research directions of serverless computing, including performance optimization, programming framework, application migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms for serverless computing, as well as promising research opportunities

UCL Discovery

Efficient Resource Management for Deep Learning Clusters

Author: Gu Juncheng
Publication venue
Publication date: 01/01/2021
Field of study

Deep Learning (DL) is gaining rapid popularity in various domains, such as computer vision, speech recognition, etc. With the increasing demands, large clusters have been built to develop DL models (i.e., data preparation and model training). DL jobs have some unique features ranging from their hardware requirements to execution patterns. However, the resource management techniques applied in existing DL clusters have not yet been adapted to those new features, which leads to resource inefficiency and hurts the performance of DL jobs. We observed three major challenges brought by DL jobs. First, data preparation jobs, which prepare training datasets from a large volume of raw data, are memory intensive. DL clusters often over-allocate memory resource to those jobs for protecting their performance, which causes memory underutilization in DL clusters. Second, the execution time of a DL training job is often unknown before job completion. Without such information, existing cluster schedulers are unable to minimize the average Job Completion Time (JCT) of those jobs. Third, model aggregations in Distributed Deep Learning (DDL) training are often assigned with a fixed group of CPUs. However, a large portion of those CPUs are wasted because the bursty model aggregations can not saturate them all the time. In this thesis, we propose a suite of techniques to eliminate the mismatches between DL jobs and resource management in DL clusters. First, we bring the idea of memory disaggregation to enhance the memory utilization of DL clusters. The unused memory in data preparation jobs is exposed as remote memory to other machines that are running out of local memory. Second, we design a two-dimensional attained-service-based scheduler to optimize the average JCT of DL training jobs. This scheduler takes the temporal and spatial characteristics of DL training jobs into consideration and can efficiently schedule them without knowing their execution time. Third, we define a shared model aggregation service to reduce the CPU cost of DDL training. Using this service, model aggregations from different DDL training jobs are carefully packed together and use the same group of CPUs in a time-sharing manner. With these techniques, we demonstrate that huge improvements in resource efficiency and job performance can be obtained when the cluster’s resource management matches with the features of DL jobs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169955/1/jcgu_1.pd

Deep Blue Documents at the University of Michigan