Search CORE

2,993 research outputs found

Learning Scheduling Algorithms for Data Processing Clusters

Author: Abadi Martín
Addanki Ravichandra
Dai Hanjun
Finn Chelsea
Ghodsi Ali
Gog Ionel
Grandl Robert
Greensmith Evan
Hindman Benjamin
Kingma Diederik P
Mao Hongzi
Mao Hongzi
Marcus Ryan
Mirhoseini Azalia
Mirhoseini Azalia
Pinto Lerrel
Schulman John
Spark Apache
Sutton S.
Weaver Lex
Zaharia Matei
Publication venue
Publication date: 21/08/2019
Field of study

Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Adaptive planning for distributed systems using goal accomplishment tracking

Author: Lee K
Mann G
Small N
Publication venue: Australian Computer Society
Publication date: 01/01/2015
Field of study

Goal accomplishment tracking is the process of monitoring the progress of a task or series of tasks towards completing a goal. Goal accomplishment tracking is used to monitor goal progress in a variety of domains, including workflow processing, teleoperation and industrial manufacturing. Practically, it involves the constant monitoring of task execution, analysis of this data to determine the task progress and notification of interested parties. This information is usually used in a passive way to observe goal progress. However, responding to this information may prevent goal failures. In addition, responding proactively in an opportunistic way can also lead to goals being completed faster. This paper proposes an architecture to support the adaptive planning of tasks for fault tolerance or opportunistic task execution based on goal accomplishment tracking. It argues that dramatically increased performance can be gained by monitoring task execution and altering plans dynamically

CiteSeerX

Deakin Research Online

Nottingham Trent Institutional Repository (IRep)

Research Repository

Lifelong Multi-Agent Path Finding in Large-Scale Warehouses

Author: Durham Joseph W.
Kiesel Scott
Koenig Sven
Kumar T. K. Satish
Li Jiaoyang
Tinka Andrew
Publication venue
Publication date: 12/03/2021
Field of study

Multi-Agent Path Finding (MAPF) is the problem of moving a team of agents to their goal locations without collisions. In this paper, we study the lifelong variant of MAPF, where agents are constantly engaged with new goal locations, such as in large-scale automated warehouses. We propose a new framework Rolling-Horizon Collision Resolution (RHCR) for solving lifelong MAPF by decomposing the problem into a sequence of Windowed MAPF instances, where a Windowed MAPF solver resolves collisions among the paths of the agents only within a bounded time horizon and ignores collisions beyond it. RHCR is particularly well suited to generating pliable plans that adapt to continually arriving new goal locations. We empirically evaluate RHCR with a variety of MAPF solvers and show that it can produce high-quality solutions for up to 1,000 agents (= 38.9\% of the empty cells on the map) for simulated warehouse instances, significantly outperforming existing work.Comment: Published at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Framework for an adaptive grid scheduling: an organizational perspective

Author: Ghédira Khaled
Hanachi Chihab
Thabet Ines
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Grid systems are complex computational organizations made of several interacting components evolving in an unpredictable and dynamic environment. In such context, scheduling is a key component and should be adaptive to face the numerous disturbances of the grid while guaranteeing its robustness and efficiency. In this context, much work remains at low-level focusing on the scheduling component taken individually. However, thinking the scheduling adaptiveness at a macro level with an organizational view, through its interactions with the other components, is also important. Following this view, in this paper we model a grid system as an agent-based organization and scheduling as a cooperative activity. Indeed, agent technology provides high level organizational concepts (groups, roles, commitments, interaction protocols) to structure, coordinate and ease the adaptation of distributed systems efficiently. More precisely, we make the following contributions. We provide a grid conceptual model that identifies the concepts and entities involved in the cooperative scheduling activity. This model is then used to define a typology of adaptation including perturbing events and actions to undertake in order to adapt. Then, we provide an organizational model, based on the Agent Group Role (AGR) meta-model of Freber, to support an adaptive scheduling at the organizational level. Finally, a simulator and an experimental evaluation have been realized to demonstrate the feasibility of our approach

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

Self-adaptation for energy efficiency in software systems

Author: Alizadeh Moghaddam F.
Publication venue
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Agent-based Service Reconfiguration for Dynamic and Evolvable Systems

Author: Nelson Ricardo Martins Rodrigues
Publication venue
Publication date: 16/12/2019
Field of study

Repositório Aberto da Universidade do Porto

Conceptual multi-agent system design for distributed scheduling systems

Author: Alves Filipe
Leitão Paulo
Pereira Ana I.
Rocha Ana Maria A.C.
Publication venue
Publication date: 01/01/2023
Field of study

With the progressive increase in the complexity of dynamic environments, systems require an evolutionary configuration and optimization to meet the increased demand. In this sense, any change in the conditions of systems or products may require distributed scheduling and resource allocation of more elementary services. Centralized approaches might fall into bottleneck issues, becoming complex to adapt, especially in case of unexpected events. Thus, Multi-agent systems (MAS) can extract their automatic and autonomous behaviour to enhance the task effort distribution and support the scheduling decision-making. On the other hand, MAS is able to obtain quick solutions, through cooperation and smart control by agents, empowered by their coordination and interoperability. By leveraging an architecture that benefits of a collaboration with distributed artificial intelligence, it is proposed an approach based on a conceptual MAS design that allows distributed and intelligent management to promote technological innovation in basic concepts of society for more sustainable in everyday applications for domains with emerging needs, such as, manufacturing and healthcare scheduling systems.This work has been supported by FCT - Fundação para a Ciência e a Tecnologia within the R&D Units Projects Scope: UIDB/00319/2020 and UIDB/05757/2020. Filipe Alves is supported by FCT Doctorate Grant Reference SFRH/BD/143745/2019.info:eu-repo/semantics/publishedVersio

Biblioteca Digital do IPB