Search CORE

1,011 research outputs found

OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

Author: Bian Rongcheng
Cai Bohua
Cao Qiong
Chen Guanpu
Chen Shixiang
Ding Liang
He Fengxiang
Li Chang
Li Jiaxing
Liu Daqing
Liu Dongkai
Liu Wei
Liu Xiangyang
Peng Xuyang
Shen Li
Tao Dacheng
Wang Chaoyue
Wang Zhenfang
Xie Shuai
Xue Chao
Yang Yibo
Zhan Yibing
Zhang Jing
Zhang Shijin
Zhang Yukang
Zhao Shanshan
Zhao Yiyan
Zheng Heliang
Publication venue
Publication date: 08/07/2023
Field of study

Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce

arXiv.org e-Print Archive

Workload Prediction for Efficient Performance Isolation and System Reliability

Author: Xue Ji
Publication venue: W&M ScholarWorks
Publication date: 24/03/2017
Field of study

In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers

College of William & Mary: W&M Publish

Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

Author: Fekete Jean-Daniel
Primet Romain
Publication venue
Publication date: 18/07/2016
Field of study

Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no longer be completed in a few seconds and data exploration is severely hampered. This article describes a novel computation paradigm called Progressive Computation for Data Analysis or more concisely Progressive Analytics, that brings at the programming language level a low-latency guarantee by performing computations in a progressive fashion. Moving this progressive computation at the language level relieves the programmer of exploratory data analysis systems from implementing the whole analytics pipeline in a progressive way from scratch, streamlining the implementation of scalable exploratory data analysis systems. This article describes the new paradigm through a prototype implementation called ProgressiVis, and explains the requirements it implies through examples.Comment: 10 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Programming and parallelising applications for distributed infrastructures

Author: Tejedor Saavedra Enric
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The last decade has witnessed unprecedented changes in parallel and distributed infrastructures. Due to the diminished gains in processor performance from increasing clock frequency, manufacturers have moved from uniprocessor architectures to multicores; as a result, clusters of computers have incorporated such new CPU designs. Furthermore, the ever-growing need of scienti c applications for computing and storage capabilities has motivated the appearance of grids: geographically-distributed, multi-domain infrastructures based on sharing of resources to accomplish large and complex tasks. More recently, clouds have emerged by combining virtualisation technologies, service-orientation and business models to deliver IT resources on demand over the Internet. The size and complexity of these new infrastructures poses a challenge for programmers to exploit them. On the one hand, some of the di culties are inherent to concurrent and distributed programming themselves, e.g. dealing with thread creation and synchronisation, messaging, data partitioning and transfer, etc. On the other hand, other issues are related to the singularities of each scenario, like the heterogeneity of Grid middleware and resources or the risk of vendor lock-in when writing an application for a particular Cloud provider. In the face of such a challenge, programming productivity - understood as a tradeo between programmability and performance - has become crucial for software developers. There is a strong need for high-productivity programming models and languages, which should provide simple means for writing parallel and distributed applications that can run on current infrastructures without sacri cing performance. In that sense, this thesis contributes with Java StarSs, a programming model and runtime system for developing and parallelising Java applications on distributed infrastructures. The model has two key features: first, the user programs in a fully-sequential standard-Java fashion - no parallel construct, API call or pragma must be included in the application code; second, it is completely infrastructure-unaware, i.e. programs do not contain any details about deployment or resource management, so that the same application can run in di erent infrastructures with no changes. The only requirement for the user is to select the application tasks, which are the model's unit of parallelism. Tasks can be either regular Java methods or web service operations, and they can handle any data type supported by the Java language, namely les, objects, arrays and primitives. For the sake of simplicity of the model, Java StarSs shifts the burden of parallelisation from the programmer to the runtime system. The runtime is responsible from modifying the original application to make it create asynchronous tasks and synchronise data accesses from the main program. Moreover, the implicit inter-task concurrency is automatically found as the application executes, thanks to a data dependency detection mechanism that integrates all the Java data types. This thesis provides a fairly comprehensive evaluation of Java StarSs on three di erent distributed scenarios: Grid, Cluster and Cloud. For each of them, a runtime system was designed and implemented to exploit their particular characteristics as well as to address their issues, while keeping the infrastructure unawareness of the programming model. The evaluation compares Java StarSs against state-of-the-art solutions, both in terms of programmability and performance, and demonstrates how the model can bring remarkable productivity to programmers of parallel distributed applications

Stage Tree를 활용한 딥러닝의 초 매개변수 최적화

Author: 신안재
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 전병곤.초 매개변수 최적화는 딥러닝 모델의 성능을 한계까지 끌어올리기 위해서는 필수 불가결한 과정이다. Study, 혹은 초 매개변수 최적화 작업은 각각 다른 초 매개변수 값을 가진 무수히 많은 딥러닝 학습 작업으로 이루어져 있으며, 각 학습 작업은 trial이라 불린다. 매우 많은 학습을 해야 하기에 연산이 많고, 짧게는 몇 시간에서 몇 주일씩 걸리기도 한다. 본 연구에서는 한 초 매개변수 최적화 작업으로부터 파생된 여러 trial 들의 초 매개변수 순열이 공통된 앞부분을 가짐을 밝힌다. 이러 한 발견으로부터, Hippo라는 새 시스템을 제안한다. Hippo는 trial들에서 공통된 순열 앞부분을 찾아 연산 결과를 재활용하여 전체 연산량을 크게 줄인다. 기존 초 매개변수 최적화 시스템은 trial마다 매번 새로 학습하는 반면, Hippo는 주어진 초 매개변수 순열을 stage라는 작은 단위로 쪼개어 동일한 stage끼리 합쳐 stage tree의 형태로 만든다. Hippo는 Search Plan이라는 내부 자료구조를 통해 현 초 매개변수 최적화 study의 모든 상태를 기록하며, 임계 경로 기반 스케줄러를 통해 전체 작업 수행 시간을 최적화한다. Hippo는 한 번에 한 개의 study뿐만 아니라, 복수의 study도 동시에 수행할 수 있다. Hippo는 여러 모델과 여러 초 매개변수 최적화 알고리즘에서 기존의 초 매개변수 최적화 시스템보다 전체 수행 시간을 최대 2.76배, GPU hour를 최대 4.81배 최적화한다. 복수의 study를 동시에 수행할 경우 수행 시간은 최대 4.81배, GPU hour는 최대 6.77배 최적화 할 수 있다.Hyper-parameter optimization is crucial for pushing the accuracy of a deep learning model to its limits. A hyper-parameter optimization job, referred to as a study, involves numerous trials of training a model using different training knobs, and therefore is very computation-heavy, typically taking hours and days to finish. We observe that trials issued from hyper-parameter optimization algorithms often share common hyper-parameter sequence prefixes. Based on this observa- tion, we propose TreeML, a hyper-parameter optimization system that reuses computation across trials to reduce the overall amount of computation signifi- cantly. Instead of treating each trial independently as in existing hyper-parameter optimization systems, TreeML breaks down the hyper-parameter sequences into stages and merges common stages to form a tree of stages (a stage tree). TreeML maintains an internal data structure, search plan, to manage the current status and history of a study, and employs a critical path based scheduler to minimize the overall study completion time. TreeML is applicable to not only single studies, but multi-study scenarios as well. Evaluations show that TreeML’s stage-based execution strategy outperforms trial-based methods for several mod- els and hyper-parameter optimization algorithms, reducing end-to-end training time by up to 2.76× (3.53×) and GPU-hours by up to 4.81× (6.77×), for single (multiple) studies.Abstract 1 1 Introduction 9 2 Background and Motivation 13 2.1 Hyper-Parameter Optimization . 13 2.2 Challenges of Sharing Computations in Hyper-Parameter OptimizationJobs 16 3 Stage Tree 18 4 TreeML System Design 21 4.1 Overview . 21 4.2 SearchPlan 24 4.2.1 SearchPlanDataStructure. 24 4.2.2 SearchPlanDatabase 29 4.3 Scheduler . 30 5 Implementation 33 5.1 DataPipeline. 33 5.2 CostEstimator 34 5.3 ClientLibrary. 34 6 Evaluation 39 6.1 SingleStudy 42 6.2 MultipleStudies . 43 6.3 SchedulerComparison 45 7 Related Work 47 8 Conclusion 50 A Appendix 51 A.1 SearchSpace . 51 Acknowledgements 60 초록 61석