1,923 research outputs found

    Intelligent Resource Scheduling at Scale: a Machine Learning Perspective

    Get PDF
    Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement

    Market-Based Resourse Management for Many-Core Systems

    Get PDF
    101 σ.Αντικείμενο της διπλωματικής αποτελεί η μελέτη και η ανάπτυξη μιας κλιμακώσιμης και κατανεμημένης πλατφόρμας (framework) διαχείρισης πόρων σε χρόνο εκτέλεσης για συστήματα πολλαπλών πυρήνων. Σε αυτήν την πλατφόρμα η διαχείριση πόρων είναι βασισμένη σε μοντέλα, τα οποία είναι εμπνευσμένα από την οικονομία. Παρουσιάζεται ένας διαχειριστής πόρων, ο οποίος προσφέρει ένα περιβάλλον διαχείρισης πόρων και εφαρμογών καθ ́ όλη τη διάρκεια ζωής τους, στο οποίο η κατανομή και δρομολόγηση των εφαρμογών στους πόρους πραγματοποιείται με αλγόριθμους βασισμένους σε κανόνες αγοράς. Η αποδοτικότητα κάθε μοντέλου αξιολογείται βάσει της πτώσης της αξιοπιστίας των πόρων (μετρική MTTF-Mean Time To Failure).The purpose of this diploma thesis is the design and development of a scalable and distributed run-time resource management framework for Many-core systems. In this framework, resource management is based on economy-inspired models. The presented resource management framework offers an environment that manages both application tasks and resources at run-time, matches and distributes application tasks across resources with algorithms which are based on market principles. The efficiency of each model is evaluated with respect to resource reliability degradation (metric MTTF-Mean Time to Failure).Θεμιστοκλής Γ. Μελισσάρη

    Proceedings Work-In-Progress Session of the 13th Real-Time and Embedded Technology and Applications Symposium

    Get PDF
    The Work-In-Progress session of the 13th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS\u2707) presents papers describing contributions both to state of the art and state of the practice in the broad field of real-time and embedded systems. The 17 accepted papers were selected from 19 submissions. This proceedings is also available as Washington University in St. Louis Technical Report WUCSE-2007-17, at http://www.cse.seas.wustl.edu/Research/FileDownload.asp?733. Special thanks go to the General Chairs – Steve Goddard and Steve Liu and Program Chairs - Scott Brandt and Frank Mueller for their support and guidance

    Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

    Get PDF
    This chapter presents software architectures of the big data processing platforms. It will provide an in-depth knowledge on resource management techniques involved while deploying big data processing systems on cloud environment. It starts from the very basics and gradually introduce the core components of resource management which we have divided in multiple layers. It covers the state-of-art practices and researches done in SLA-based resource management with a specific focus on the job scheduling mechanisms.Comment: 27 pages, 9 figure

    Enforcing trustworthy cloud SLA with witnesses: A game theory–based model using smart contracts

    Get PDF
    There lacks trust between the cloud customer and provider to enforce traditional cloud SLA (Service Level Agreement) where the blockchain technique seems a promising solution. However, current explorations still face challenges to prove that the off-chain SLO (Service Level Objective) violations really happen before recorded into the on-chain transactions. In this paper, a witness model is proposed implemented with smart contracts to solve this trust issue. The introduced role, “Witness”, gains rewards as an incentive for performing the SLO violation report, and the payoff function is carefully designed in a way that the witness has to tell the truth, for maximizing the rewards. This fact that the witness has to be honest is analyzed and proved using the Nash Equilibrium principle of game theory. For ensuring the chosen witnesses are random and independent, an unbiased selection algorithm is proposed to avoid possible collusions. An auditing mechanism is also introduced to detect potential malicious witnesses. Specifically, we define three types of malicious behaviors and propose quantitative indicators to audit and detect these behaviors. Moreover, experimental studies based on Ethereum blockchain demonstrate the proposed model is feasible, and indicate that the performance, ie, transaction fee, of each interface follows the design expectations

    Private Law: Mineral Rights

    Get PDF

    Applications of agent architectures to decision support in distributed simulation and training systems

    Get PDF
    This work develops the approach and presents the results of a new model for applying intelligent agents to complex distributed interactive simulation for command and control. In the framework of tactical command, control communications, computers and intelligence (C4I), software agents provide a novel approach for efficient decision support and distributed interactive mission training. An agent-based architecture for decision support is designed, implemented and is applied in a distributed interactive simulation to significantly enhance the command and control training during simulated exercises. The architecture is based on monitoring, evaluation, and advice agents, which cooperate to provide alternatives to the dec ision-maker in a time and resource constrained environment. The architecture is implemented and tested within the context of an AWACS Weapons Director trainer tool. The foundation of the work required a wide range of preliminary research topics to be covered, including real-time systems, resource allocation, agent-based computing, decision support systems, and distributed interactive simulations. The major contribution of our work is the construction of a multi-agent architecture and its application to an operational decision support system for command and control interactive simulation. The architectural design for the multi-agent system was drafted in the first stage of the work. In the next stage rules of engagement, objective and cost functions were determined in the AWACS (Airforce command and control) decision support domain. Finally, the multi-agent architecture was implemented and evaluated inside a distributed interactive simulation test-bed for AWACS Vv\u27Ds. The evaluation process combined individual and team use of the decision support system to improve the performance results of WD trainees. The decision support system is designed and implemented a distributed architecture for performance-oriented management of software agents. The approach provides new agent interaction protocols and utilizes agent performance monitoring and remote synchronization mechanisms. This multi-agent architecture enables direct and indirect agent communication as well as dynamic hierarchical agent coordination. Inter-agent communications use predefined interfaces, protocols, and open channels with specified ontology and semantics. Services can be requested and responses with results received over such communication modes. Both traditional (functional) parameters and nonfunctional (e.g. QoS, deadline, etc.) requirements and captured in service requests

    Multi-Agent Systems

    Get PDF
    This Special Issue ""Multi-Agent Systems"" gathers original research articles reporting results on the steadily growing area of agent-oriented computing and multi-agent systems technologies. After more than 20 years of academic research on multi-agent systems (MASs), in fact, agent-oriented models and technologies have been promoted as the most suitable candidates for the design and development of distributed and intelligent applications in complex and dynamic environments. With respect to both their quality and range, the papers in this Special Issue already represent a meaningful sample of the most recent advancements in the field of agent-oriented models and technologies. In particular, the 17 contributions cover agent-based modeling and simulation, situated multi-agent systems, socio-technical multi-agent systems, and semantic technologies applied to multi-agent systems. In fact, it is surprising to witness how such a limited portion of MAS research already highlights the most relevant usage of agent-based models and technologies, as well as their most appreciated characteristics. We are thus confident that the readers of Applied Sciences will be able to appreciate the growing role that MASs will play in the design and development of the next generation of complex intelligent systems. This Special Issue has been converted into a yearly series, for which a new call for papers is already available at the Applied Sciences journal’s website: https://www.mdpi.com/journal/applsci/special_issues/Multi-Agent_Systems_2019

    Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

    Get PDF
    It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this article we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however, avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34 and 43.49 percent, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4 percent against the case of executing the LRAs alone
    corecore