1,011 research outputs found

    OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

    Full text link
    Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce

    Workload Prediction for Efficient Performance Isolation and System Reliability

    Get PDF
    In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers

    Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

    Get PDF
    Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no longer be completed in a few seconds and data exploration is severely hampered. This article describes a novel computation paradigm called Progressive Computation for Data Analysis or more concisely Progressive Analytics, that brings at the programming language level a low-latency guarantee by performing computations in a progressive fashion. Moving this progressive computation at the language level relieves the programmer of exploratory data analysis systems from implementing the whole analytics pipeline in a progressive way from scratch, streamlining the implementation of scalable exploratory data analysis systems. This article describes the new paradigm through a prototype implementation called ProgressiVis, and explains the requirements it implies through examples.Comment: 10 page

    Programming and parallelising applications for distributed infrastructures

    Get PDF
    The last decade has witnessed unprecedented changes in parallel and distributed infrastructures. Due to the diminished gains in processor performance from increasing clock frequency, manufacturers have moved from uniprocessor architectures to multicores; as a result, clusters of computers have incorporated such new CPU designs. Furthermore, the ever-growing need of scienti c applications for computing and storage capabilities has motivated the appearance of grids: geographically-distributed, multi-domain infrastructures based on sharing of resources to accomplish large and complex tasks. More recently, clouds have emerged by combining virtualisation technologies, service-orientation and business models to deliver IT resources on demand over the Internet. The size and complexity of these new infrastructures poses a challenge for programmers to exploit them. On the one hand, some of the di culties are inherent to concurrent and distributed programming themselves, e.g. dealing with thread creation and synchronisation, messaging, data partitioning and transfer, etc. On the other hand, other issues are related to the singularities of each scenario, like the heterogeneity of Grid middleware and resources or the risk of vendor lock-in when writing an application for a particular Cloud provider. In the face of such a challenge, programming productivity - understood as a tradeo between programmability and performance - has become crucial for software developers. There is a strong need for high-productivity programming models and languages, which should provide simple means for writing parallel and distributed applications that can run on current infrastructures without sacri cing performance. In that sense, this thesis contributes with Java StarSs, a programming model and runtime system for developing and parallelising Java applications on distributed infrastructures. The model has two key features: first, the user programs in a fully-sequential standard-Java fashion - no parallel construct, API call or pragma must be included in the application code; second, it is completely infrastructure-unaware, i.e. programs do not contain any details about deployment or resource management, so that the same application can run in di erent infrastructures with no changes. The only requirement for the user is to select the application tasks, which are the model's unit of parallelism. Tasks can be either regular Java methods or web service operations, and they can handle any data type supported by the Java language, namely les, objects, arrays and primitives. For the sake of simplicity of the model, Java StarSs shifts the burden of parallelisation from the programmer to the runtime system. The runtime is responsible from modifying the original application to make it create asynchronous tasks and synchronise data accesses from the main program. Moreover, the implicit inter-task concurrency is automatically found as the application executes, thanks to a data dependency detection mechanism that integrates all the Java data types. This thesis provides a fairly comprehensive evaluation of Java StarSs on three di erent distributed scenarios: Grid, Cluster and Cloud. For each of them, a runtime system was designed and implemented to exploit their particular characteristics as well as to address their issues, while keeping the infrastructure unawareness of the programming model. The evaluation compares Java StarSs against state-of-the-art solutions, both in terms of programmability and performance, and demonstrates how the model can bring remarkable productivity to programmers of parallel distributed applications

    Stage Tree๋ฅผ ํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹์˜ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ์ „๋ณ‘๊ณค.์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™”๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ•œ๊ณ„๊นŒ์ง€ ๋Œ์–ด์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•„์ˆ˜ ๋ถˆ๊ฐ€๊ฒฐํ•œ ๊ณผ์ •์ด๋‹ค. Study, ํ˜น์€ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™” ์ž‘์—…์€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ๊ฐ€์ง„ ๋ฌด์ˆ˜ํžˆ ๋งŽ์€ ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต ์ž‘์—…์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฉฐ, ๊ฐ ํ•™์Šต ์ž‘์—…์€ trial์ด๋ผ ๋ถˆ๋ฆฐ๋‹ค. ๋งค์šฐ ๋งŽ์€ ํ•™์Šต์„ ํ•ด์•ผ ํ•˜๊ธฐ์— ์—ฐ์‚ฐ์ด ๋งŽ๊ณ , ์งง๊ฒŒ๋Š” ๋ช‡ ์‹œ๊ฐ„์—์„œ ๋ช‡ ์ฃผ์ผ์”ฉ ๊ฑธ๋ฆฌ๊ธฐ๋„ ํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ•œ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™” ์ž‘์—…์œผ๋กœ๋ถ€ํ„ฐ ํŒŒ์ƒ๋œ ์—ฌ๋Ÿฌ trial ๋“ค์˜ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆœ์—ด์ด ๊ณตํ†ต๋œ ์•ž๋ถ€๋ถ„์„ ๊ฐ€์ง์„ ๋ฐํžŒ๋‹ค. ์ด๋Ÿฌ ํ•œ ๋ฐœ๊ฒฌ์œผ๋กœ๋ถ€ํ„ฐ, Hippo๋ผ๋Š” ์ƒˆ ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•œ๋‹ค. Hippo๋Š” trial๋“ค์—์„œ ๊ณตํ†ต๋œ ์ˆœ์—ด ์•ž๋ถ€๋ถ„์„ ์ฐพ์•„ ์—ฐ์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ™œ์šฉํ•˜์—ฌ ์ „์ฒด ์—ฐ์‚ฐ๋Ÿ‰์„ ํฌ๊ฒŒ ์ค„์ธ๋‹ค. ๊ธฐ์กด ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™” ์‹œ์Šคํ…œ์€ trial๋งˆ๋‹ค ๋งค๋ฒˆ ์ƒˆ๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐ˜๋ฉด, Hippo๋Š” ์ฃผ์–ด์ง„ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆœ์—ด์„ stage๋ผ๋Š” ์ž‘์€ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์–ด ๋™์ผํ•œ stage๋ผ๋ฆฌ ํ•ฉ์ณ stage tree์˜ ํ˜•ํƒœ๋กœ ๋งŒ๋“ ๋‹ค. Hippo๋Š” Search Plan์ด๋ผ๋Š” ๋‚ด๋ถ€ ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ํ˜„ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™” study์˜ ๋ชจ๋“  ์ƒํƒœ๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ, ์ž„๊ณ„ ๊ฒฝ๋กœ ๊ธฐ๋ฐ˜ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํ†ตํ•ด ์ „์ฒด ์ž‘์—… ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ์ตœ์ ํ™”ํ•œ๋‹ค. Hippo๋Š” ํ•œ ๋ฒˆ์— ํ•œ ๊ฐœ์˜ study๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ณต์ˆ˜์˜ study๋„ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. Hippo๋Š” ์—ฌ๋Ÿฌ ๋ชจ๋ธ๊ณผ ์—ฌ๋Ÿฌ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ๊ธฐ์กด์˜ ์ดˆ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™” ์‹œ์Šคํ…œ๋ณด๋‹ค ์ „์ฒด ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ์ตœ๋Œ€ 2.76๋ฐฐ, GPU hour๋ฅผ ์ตœ๋Œ€ 4.81๋ฐฐ ์ตœ์ ํ™”ํ•œ๋‹ค. ๋ณต์ˆ˜์˜ study๋ฅผ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•  ๊ฒฝ์šฐ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์€ ์ตœ๋Œ€ 4.81๋ฐฐ, GPU hour๋Š” ์ตœ๋Œ€ 6.77๋ฐฐ ์ตœ์ ํ™” ํ•  ์ˆ˜ ์žˆ๋‹ค.Hyper-parameter optimization is crucial for pushing the accuracy of a deep learning model to its limits. A hyper-parameter optimization job, referred to as a study, involves numerous trials of training a model using different training knobs, and therefore is very computation-heavy, typically taking hours and days to finish. We observe that trials issued from hyper-parameter optimization algorithms often share common hyper-parameter sequence prefixes. Based on this observa- tion, we propose TreeML, a hyper-parameter optimization system that reuses computation across trials to reduce the overall amount of computation signifi- cantly. Instead of treating each trial independently as in existing hyper-parameter optimization systems, TreeML breaks down the hyper-parameter sequences into stages and merges common stages to form a tree of stages (a stage tree). TreeML maintains an internal data structure, search plan, to manage the current status and history of a study, and employs a critical path based scheduler to minimize the overall study completion time. TreeML is applicable to not only single studies, but multi-study scenarios as well. Evaluations show that TreeMLโ€™s stage-based execution strategy outperforms trial-based methods for several mod- els and hyper-parameter optimization algorithms, reducing end-to-end training time by up to 2.76ร— (3.53ร—) and GPU-hours by up to 4.81ร— (6.77ร—), for single (multiple) studies.Abstract 1 1 Introduction 9 2 Background and Motivation 13 2.1 Hyper-Parameter Optimization . 13 2.2 Challenges of Sharing Computations in Hyper-Parameter OptimizationJobs 16 3 Stage Tree 18 4 TreeML System Design 21 4.1 Overview . 21 4.2 SearchPlan 24 4.2.1 SearchPlanDataStructure. 24 4.2.2 SearchPlanDatabase 29 4.3 Scheduler . 30 5 Implementation 33 5.1 DataPipeline. 33 5.2 CostEstimator 34 5.3 ClientLibrary. 34 6 Evaluation 39 6.1 SingleStudy 42 6.2 MultipleStudies . 43 6.3 SchedulerComparison 45 7 Related Work 47 8 Conclusion 50 A Appendix 51 A.1 SearchSpace . 51 Acknowledgements 60 ์ดˆ๋ก 61์„
    • โ€ฆ
    corecore