175 research outputs found

    Steering approaches to Pareto-optimal multiobjective reinforcement learning

    Get PDF
    For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system

    Dynamic multi-objective optimisation using deep reinforcement learning::benchmark, algorithm and an application to identify vulnerable zones based on water quality

    Get PDF
    Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors’ knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in São Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario

    Design of Complex Engineered Systems Using Multiagent Coordination

    Get PDF
    This thesis is the combination of two research publications working toward a unified strategy in which the design of complex engineered systems can be completed using a multiagent coordination approach. Current engineered system modeling techniques segment large complex models into multiple groups to be simulated independently. These methods restrict the evaluations of such complex systems, as their failure properties are typically unknown until they are experienced in operation. In an effort to help engineers to design complex engineered systems, this research proposes that a distributed yet non-legislated approach can be used in the design processes by splitting up the overall system into specific teams. The approach specifically hypothesizes that multiagent credit assignment can be used to effectively determine how to properly incentivize subsystem designers so that the global set of system-level objectives can be achieved. The first publication presents a multiagent systems based approach for designing a self-replicating robotic manufacturing factory in space. The simulation in this work is able to present the coordination of the agents during the construction of the factory as the parameters of the learning algorithm are changed. The results show the advantage of using a learning algorithm to design a large system. The second publication presents a hybrid approach to design complex engineered systems, providing a method in which design decisions can be reconciled without the need for either detailed interaction models or external legislating mechanisms. The results of this paper demonstrate that a team of autonomous agents using a cooperative coevolutionary algorithm can effectively design a complex engineered system. Each publication utilized a system model to illustrate and simulate the methods and potential results. By designing complex systems with a multiagent coordination approach, a design methodology can be developed in an effort to reduce design uncertainty and provide mechanisms through which the system level impact of decisions can be estimated without explicitly modeling such interactions
    • …
    corecore