1,889 research outputs found

    Building collaboration in multi-agent systems using reinforcement learning

    Get PDF
    © Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm

    Application of Reinforcement Learning to Multi-Agent Production Scheduling

    Get PDF
    Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling

    A holonic approach to dynamic manufacturing scheduling

    Get PDF
    Manufacturing scheduling is a complex combinatorial problem, particularly in distributed and dynamic environments. This paper presents a holonic approach to manufacturing scheduling, where the scheduling functions are distributed by several entities, combining their calculation power and local optimization capability. In this scheduling and control approach, the objective is to achieve fast and dynamic re-scheduling using a scheduling mechanism that evolves dynamically to combine centralized and distributed strategies, improving its responsiveness to emergence, instead of the complex and optimized scheduling algorithms found in traditional approaches

    Incorporating Memory and Learning Mechanisms Into Meta-RaPS

    Get PDF
    Due to the rapid increase of dimensions and complexity of real life problems, it has become more difficult to find optimal solutions using only exact mathematical methods. The need to find near-optimal solutions in an acceptable amount of time is a challenge when developing more sophisticated approaches. A proper answer to this challenge can be through the implementation of metaheuristic approaches. However, a more powerful answer might be reached by incorporating intelligence into metaheuristics. Meta-RaPS (Metaheuristic for Randomized Priority Search) is a metaheuristic that creates high quality solutions for discrete optimization problems. It is proposed that incorporating memory and learning mechanisms into Meta-RaPS, which is currently classified as a memoryless metaheuristic, can help the algorithm produce higher quality results. The proposed Meta-RaPS versions were created by taking different perspectives of learning. The first approach taken is Estimation of Distribution Algorithms (EDA), a stochastic learning technique that creates a probability distribution for each decision variable to generate new solutions. The second Meta-RaPS version was developed by utilizing a machine learning algorithm, Q Learning, which has been successfully applied to optimization problems whose output is a sequence of actions. In the third Meta-RaPS version, Path Relinking (PR) was implemented as a post-optimization method in which the new algorithm learns the good attributes by memorizing best solutions, and follows them to reach better solutions. The fourth proposed version of Meta-RaPS presented another form of learning with its ability to adaptively tune parameters. The efficiency of these approaches motivated us to redesign Meta-RaPS by removing the improvement phase and adding a more sophisticated Path Relinking method. The new Meta-RaPS could solve even the largest problems in much less time while keeping up the quality of its solutions. To evaluate their performance, all introduced versions were tested using the 0-1 Multidimensional Knapsack Problem (MKP). After comparing the proposed algorithms, Meta-RaPS PR and Meta-RaPS Q Learning appeared to be the algorithms with the best and worst performance, respectively. On the other hand, they could all show superior performance than other approaches to the 0-1 MKP in the literature

    Reactive approach for automating exploration and exploitation in ant colony optimization

    Get PDF
    Ant colony optimization (ACO) algorithms can be used to solve nondeterministic polynomial hard problems. Exploration and exploitation are the main mechanisms in controlling search within the ACO. Reactive search is an alternative technique to maintain the dynamism of the mechanics. However, ACO-based reactive search technique has three (3) problems. First, the memory model to record previous search regions did not completely transfer the neighborhood structures to the next iteration which leads to arbitrary restart and premature local search. Secondly, the exploration indicator is not robust due to the difference of magnitudes in distance matrices for the current population. Thirdly, the parameter control techniques that utilize exploration indicators in their feedback process do not consider the problem of indicator robustness. A reactive ant colony optimization (RACO) algorithm has been proposed to overcome the limitations of the reactive search. RACO consists of three main components. The first component is a reactive max-min ant system algorithm for recording the neighborhood structures. The second component is a statistical machine learning mechanism named ACOustic to produce a robust exploration indicator. The third component is the ACO-based adaptive parameter selection algorithm to solve the parameterization problem which relies on quality, exploration and unified criteria in assigning rewards to promising parameters. The performance of RACO is evaluated on traveling salesman and quadratic assignment problems and compared with eight metaheuristics techniques in terms of success rate, Wilcoxon signed-rank, Chi-square and relative percentage deviation. Experimental results showed that the performance of RACO is superior than the eight (8) metaheuristics techniques which confirmed that RACO can be used as a new direction for solving optimization problems. RACO can be used in providing a dynamic exploration and exploitation mechanism, setting a parameter value which allows an efficient search, describing the amount of exploration an ACO algorithm performs and detecting stagnation situations

    Advances in Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic
    • …
    corecore