118 research outputs found

    Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach

    Full text link
    The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio

    Optimal energy management for a grid-tied solar PV-battery microgrid: A reinforcement learning approach

    Get PDF
    There has been a shift towards energy sustainability in recent years, and this shift should continue. The steady growth of energy demand because of population growth, as well as heightened worries about the number of anthropogenic gases released into the atmosphere and deployment of advanced grid technologies, has spurred the penetration of renewable energy resources (RERs) at different locations and scales in the power grid. As a result, the energy system is moving away from the centralized paradigm of large, controllable power plants and toward a decentralized network based on renewables. Microgrids, either grid-connected or islanded, provide a key solution for integrating RERs, load demand flexibility, and energy storage systems within this framework. Nonetheless, renewable energy resources, such as solar and wind energy, can be extremely stochastic as they are weather dependent. These resources coupled with load demand uncertainties lead to random variations on both the generation and load sides, thus challenging optimal energy management. This thesis develops an optimal energy management system (EMS) for a grid-tied solar PV-battery microgrid. The goal of the EMS is to obtain the minimum operational costs (cost of power exchange with the utility and battery wear cost) while still considering network constraints, which ensure grid violations are avoided. A reinforcement learning (RL) approach is proposed to minimize the operational cost of the microgrid under this stochastic setting. RL is a reward-motivated optimization technique derived from how animals learn to optimize their behaviour in new environments. Unlike other conventional model-based optimization approaches, RL doesn't need an explicit model of the optimization system to get optimal solutions. The EMS is modelled as a Markov Decision Process (MDP) to achieve optimality considering the state, action, and reward function. The feasibility of two RL algorithms, namely, conventional Q-learning algorithm and deep Q network algorithm, are developed, and their efficacy in performing optimal energy management for the designed system is evaluated in this thesis. First, the energy management problem is expressed as a sequential decision-making process, after which two algorithms, trading, and non-trading algorithm, are developed. In the trading algorithm case, excess microgrid's energy can be sold back to the utility to increase revenue, while in the latter case constraining rules are embedded in the designed EMS to ensure that no excess energy is sold back to the utility. Then a Q-learning algorithm is developed to minimize the operational cost of the microgrid under unknown future information. Finally, to evaluate the performance of the proposed EMS, a comparison study between a trading case EMS model and a non-trading case is performed using a typical commercial load curve and PV generation profile over a 24- hour horizon. Numerical simulation results indicated that the algorithm learned to select an optimized energy schedule that minimizes energy cost (cost of power purchased from the utility based on the time-varying tariff and battery wear cost) in both summer and winter case studies. However, comparing the non-trading EMS to the trading EMS model operational costs, the latter one decreased cost by 4.033% in the summer season and 2.199% in the winter season. Secondly, a deep Q network (DQN) method that uses recent learning algorithm enhancements, including experience replay and target network, is developed to learn the system uncertainties, including load demand, grid prices and volatile power supply from the renewables solve the optimal energy management problem. Unlike the Q-learning method, which updates the Q-function using a lookup table (which limits its scalability and overall performance in stochastic optimization), the DQN method uses a deep neural network that approximates the Q- function via statistical regression. The performance of the proposed method is evaluated with differently fluctuating load profiles, i.e., slow, medium, and fast. Simulation results substantiated the efficacy of the proposed method as the algorithm was established to learn from experience to raise the battery state of charge and optimally shift loads from a one-time instance, thus supporting the utility grid in reducing aggregate peak load. Furthermore, the performance of the proposed DQN approach was compared to the conventional Q-learning algorithm in terms of achieving a minimum global cost. Simulation results showed that the DQN algorithm outperformed the conventional Q-learning approach, reducing system operational costs by 15%, 24%, and 26% for the slow, medium, and fast fluctuating load profiles in the studied cases

    Advances in Computational Intelligence Applications in the Mining Industry

    Get PDF
    This book captures advancements in the applications of computational intelligence (artificial intelligence, machine learning, etc.) to problems in the mineral and mining industries. The papers present the state of the art in four broad categories: mine operations, mine planning, mine safety, and advances in the sciences, primarily in image processing applications. Authors in the book include both researchers and industry practitioners

    Deep neural networks in the cloud: Review, applications, challenges and research directions

    Get PDF
    Deep neural networks (DNNs) are currently being deployed as machine learning technology in a wide range of important real-world applications. DNNs consist of a huge number of parameters that require millions of floating-point operations (FLOPs) to be executed both in learning and prediction modes. A more effective method is to implement DNNs in a cloud computing system equipped with centralized servers and data storage sub-systems with high-speed and high-performance computing capabilities. This paper presents an up-to-date survey on current state-of-the-art deployed DNNs for cloud computing. Various DNN complexities associated with different architectures are presented and discussed alongside the necessities of using cloud computing. We also present an extensive overview of different cloud computing platforms for the deployment of DNNs and discuss them in detail. Moreover, DNN applications already deployed in cloud computing systems are reviewed to demonstrate the advantages of using cloud computing for DNNs. The paper emphasizes the challenges of deploying DNNs in cloud computing systems and provides guidance on enhancing current and new deployments.The EGIA project (KK-2022/00119The Consolidated Research Group MATHMODE (IT1456-22

    Modeling Mutual Influence in Multi-Agent Reinforcement Learning

    Get PDF
    In multi-agent systems (MAS), agents rarely act in isolation but tend to achieve their goals through interactions with other agents. To be able to achieve their ultimate goals, individual agents should actively evaluate the impacts on themselves of other agents' behaviors before they decide which actions to take. The impacts are reciprocal, and it is of great interest to model the mutual influence of agent's impacts with one another when they are observing the environment or taking actions in the environment. In this thesis, assuming that the agents are aware of each other's existence and their potential impact on themselves, I develop novel multi-agent reinforcement learning (MARL) methods that can measure the mutual influence between agents to shape learning. The first part of this thesis outlines the framework of recursive reasoning in deep multi-agent reinforcement learning. I hypothesize that it is beneficial for each agent to consider how other agents react to their behavior. I start from Probabilistic Recursive Reasoning (PR2) using level-1 reasoning and adopt variational Bayes methods to approximate the opponents' conditional policies. Each agent shapes the individual Q-value by marginalizing the conditional policies in the joint Q-value and finding the best response to improving their policies. I further extend PR2 to Generalized Recursive Reasoning (GR2) with different hierarchical levels of rationality. GR2 enables agents to possess various levels of thinking ability, thereby allowing higher-level agents to best respond to less sophisticated learners. The first part of the thesis shows that eliminating the joint Q-value to an individual Q-value via explicitly recursive reasoning would benefit the learning. In the second part of the thesis, in reverse, I measure the mutual influence by approximating the joint Q-value based on the individual Q-values. I establish Q-DPP, an extension of the Determinantal Point Process (DPP) with partition constraints, and apply it to multi-agent learning as a function approximator for the centralized value function. An attractive property of using Q-DPP is that when it reaches the optimum value, it can offer a natural factorization of the centralized value function, representing both quality (maximizing reward) and diversity (different behaviors). In the third part of the thesis, I depart from the action-level mutual influence and build a policy-space meta-game to analyze agents' relationship between adaptive policies. I present a Multi-Agent Trust Region Learning (MATRL) algorithm that augments single-agent trust region policy optimization with a weak stable fixed point approximated by the policy-space meta-game. The algorithm aims to find a game-theoretic mechanism to adjust the policy optimization steps that force the learning of all agents toward the stable point

    On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters

    Get PDF
    This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p

    Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models

    Get PDF
    This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidenc
    • …
    corecore