Search CORE

1,889 research outputs found

Integrated production & distribution scheduling at premix feed producers using deep reinforcement learning

Author: van Dijk M.J.W.
Publication venue
Publication date: 30/06/2022
Field of study

Building collaboration in multi-agent systems using reinforcement learning

Author: A Colorni
A Kazemi
A Kouider
C Watkins
G Tesauro
H Iima
J Bradtke
J Kennedy
J Vazquez-Salceda
JN Tsitsiklis
L Bull
L Bull
L Panait
LM Hercog
M Gath
M Kolp
M Tasgetiren
MB Ayhan
ME Aydin
ME Aydin
ME Aydin
R Poli
RS Sutton
S Mohebbi
U Wilensky
X Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm

Crossref

UWE Bristol Research Repository

Application of Reinforcement Learning to Multi-Agent Production Scheduling

Author: Wang Yi-chi
Publication venue: Scholars Junction
Publication date: 21/10/2003
Field of study

Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

A holonic approach to dynamic manufacturing scheduling

Author: Francisco José de Oliveira Restivo
Paulo Leitão
Publication venue
Publication date: 01/01/1996
Field of study

Manufacturing scheduling is a complex combinatorial problem, particularly in distributed and dynamic environments. This paper presents a holonic approach to manufacturing scheduling, where the scheduling functions are distributed by several entities, combining their calculation power and local optimization capability. In this scheduling and control approach, the objective is to achieve fast and dynamic re-scheduling using a scheduling mechanism that evolves dynamically to combine centralized and distributed strategies, improving its responsiveness to emergence, instead of the complex and optimized scheduling algorithms found in traditional approaches

Crossref

Biblioteca Digital do IPB

Repositório Aberto da Universidade do Porto

Incorporating Memory and Learning Mechanisms Into Meta-RaPS

Author: Arin Arif
Publication venue: ODU Digital Commons
Publication date: 01/07/2012
Field of study

Due to the rapid increase of dimensions and complexity of real life problems, it has become more difficult to find optimal solutions using only exact mathematical methods. The need to find near-optimal solutions in an acceptable amount of time is a challenge when developing more sophisticated approaches. A proper answer to this challenge can be through the implementation of metaheuristic approaches. However, a more powerful answer might be reached by incorporating intelligence into metaheuristics. Meta-RaPS (Metaheuristic for Randomized Priority Search) is a metaheuristic that creates high quality solutions for discrete optimization problems. It is proposed that incorporating memory and learning mechanisms into Meta-RaPS, which is currently classified as a memoryless metaheuristic, can help the algorithm produce higher quality results. The proposed Meta-RaPS versions were created by taking different perspectives of learning. The first approach taken is Estimation of Distribution Algorithms (EDA), a stochastic learning technique that creates a probability distribution for each decision variable to generate new solutions. The second Meta-RaPS version was developed by utilizing a machine learning algorithm, Q Learning, which has been successfully applied to optimization problems whose output is a sequence of actions. In the third Meta-RaPS version, Path Relinking (PR) was implemented as a post-optimization method in which the new algorithm learns the good attributes by memorizing best solutions, and follows them to reach better solutions. The fourth proposed version of Meta-RaPS presented another form of learning with its ability to adaptively tune parameters. The efficiency of these approaches motivated us to redesign Meta-RaPS by removing the improvement phase and adding a more sophisticated Path Relinking method. The new Meta-RaPS could solve even the largest problems in much less time while keeping up the quality of its solutions. To evaluate their performance, all introduced versions were tested using the 0-1 Multidimensional Knapsack Problem (MKP). After comparing the proposed algorithms, Meta-RaPS PR and Meta-RaPS Q Learning appeared to be the algorithms with the best and worst performance, respectively. On the other hand, they could all show superior performance than other approaches to the 0-1 MKP in the literature

Old Dominion University

Reactive approach for automating exploration and exploitation in ant colony optimization

Author: Allwawi Rafid Sagban Abbood
Publication venue
Publication date: 01/01/2016
Field of study

Ant colony optimization (ACO) algorithms can be used to solve nondeterministic polynomial hard problems. Exploration and exploitation are the main mechanisms in controlling search within the ACO. Reactive search is an alternative technique to maintain the dynamism of the mechanics. However, ACO-based reactive search technique has three (3) problems. First, the memory model to record previous search regions did not completely transfer the neighborhood structures to the next iteration which leads to arbitrary restart and premature local search. Secondly, the exploration indicator is not robust due to the difference of magnitudes in distance matrices for the current population. Thirdly, the parameter control techniques that utilize exploration indicators in their feedback process do not consider the problem of indicator robustness. A reactive ant colony optimization (RACO) algorithm has been proposed to overcome the limitations of the reactive search. RACO consists of three main components. The first component is a reactive max-min ant system algorithm for recording the neighborhood structures. The second component is a statistical machine learning mechanism named ACOustic to produce a robust exploration indicator. The third component is the ACO-based adaptive parameter selection algorithm to solve the parameterization problem which relies on quality, exploration and unified criteria in assigning rewards to promising parameters. The performance of RACO is evaluated on traveling salesman and quadratic assignment problems and compared with eight metaheuristics techniques in terms of success rate, Wilcoxon signed-rank, Chi-square and relative percentage deviation. Experimental results showed that the performance of RACO is superior than the eight (8) metaheuristics techniques which confirmed that RACO can be used as a new direction for solving optimization problems. RACO can be used in providing a dynamic exploration and exploitation mechanism, setting a parameter value which allows an efficient search, describing the amount of exploration an ACO algorithm performs and detecting stagnation situations

Universiti Utara Malaysia: UUM eTheses

Advances in Reinforcement Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

Directory of Open Access Books (DOAB)