10,878 research outputs found

    Distributed Adaptive Reinforcement Learning: A Method for Optimal Routing

    Full text link
    In this paper, a learning-based optimal transportation algorithm for autonomous taxis and ridesharing vehicles is presented. The goal is to design a mechanism to solve the routing problem for multiple autonomous vehicles and multiple customers in order to maximize the transportation company's profit. As a result, each vehicle selects the customer whose request maximizes the company's profit in the long run. To solve this problem, the system is modeled as a Markov Decision Process (MDP) using past customers data. By solving the defined MDP, a centralized high-level planning recommendation is obtained, where this offline solution is used as an initial value for the real-time learning. Then, a distributed SARSA reinforcement learning algorithm is proposed to capture the model errors and the environment changes, such as variations in customer distributions in each area, traffic, and fares, thereby providing optimal routing policies in real-time. Vehicles, or agents, use only their local information and interaction, such as current passenger requests and estimates of neighbors' tasks and their optimal actions, to obtain the optimal policies in a distributed fashion. An optimal adaptive rate is introduced to make the distributed SARSA algorithm capable of adapting to changes in the environment and tracking the time-varying optimal policies. Furthermore, a game-theory-based task assignment algorithm is proposed, where each agent uses the optimal policies and their values from distributed SARSA to select its customer from the set of local available requests in a distributed manner. Finally, the customers data provided by the city of Chicago is used to validate the proposed algorithms

    Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning

    Full text link
    With the advent of the Internet of Things (IoT), an increasing number of energy harvesting methods are being used to supplement or supplant battery based sensors. Energy harvesting sensors need to be configured according to the application, hardware, and environmental conditions to maximize their usefulness. As of today, the configuration of sensors is either manual or heuristics based, requiring valuable domain expertise. Reinforcement learning (RL) is a promising approach to automate configuration and efficiently scale IoT deployments, but it is not yet adopted in practice. We propose solutions to bridge this gap: reduce the training phase of RL so that nodes are operational within a short time after deployment and reduce the computational requirements to scale to large deployments. We focus on configuration of the sampling rate of indoor solar panel based energy harvesting sensors. We created a simulator based on 3 months of data collected from 5 sensor nodes subject to different lighting conditions. Our simulation results show that RL can effectively learn energy availability patterns and configure the sampling rate of the sensor nodes to maximize the sensing data while ensuring that energy storage is not depleted. The nodes can be operational within the first day by using our methods. We show that it is possible to reduce the number of RL policies by using a single policy for nodes that share similar lighting conditions.Comment: 7 pages, 5 figure

    A unified decision making framework for supply and demand management in microgrid networks

    Full text link
    This paper considers two important problems -- on the supply-side and demand-side respectively and studies both in a unified framework. On the supply side, we study the problem of energy sharing among microgrids with the goal of maximizing profit obtained from selling power while at the same time not deviating much from the customer demand. On the other hand, under shortage of power, this problem becomes one of deciding the amount of power to be bought with dynamically varying prices. On the demand side, we consider the problem of optimally scheduling the time-adjustable demand - i.e., of loads with flexible time windows in which they can be scheduled. While previous works have treated these two problems in isolation, we combine these problems together and provide a unified Markov decision process (MDP) framework for these problems. We then apply the Q-learning algorithm, a popular model-free reinforcement learning technique, to obtain the optimal policy. Through simulations, we show that the policy obtained by solving our MDP model provides more profit to the microgrids

    AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection

    Full text link
    This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic forgetting due to being agnostic to the timescale of changes in the distribution of experiences. Although RL algorithms are guaranteed to converge to optimal policies in Markov decision processes (MDPs), this only holds in the presence of static environments. However, this assumption is very restrictive. In many real-world problems like ride-sharing, traffic control, etc., we are dealing with highly dynamic environments, where RL methods yield only sub-optimal decisions. To mitigate this problem in highly dynamic environments, we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to detect the changes in the distribution of experiences, (2) develop a Deep Q Network (DQN) agent that is capable of recognizing diurnal patterns and making informed dispatching decisions according to the changes in the underlying environment. Rather than fixing patterns by time of week, the proposed approach automatically detects that the MDP has changed, and uses the results of the new model. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, vehicle capacities, and locations. Evaluation on New York City Taxi public dataset shows the effectiveness of our approach in improving the fleet utilization, where less than 50% of the fleet are utilized to serve the demand of up to 90% of the requests, while maximizing profits and minimizing idle times.Comment: arXiv admin note: text overlap with arXiv:2010.0175

    Discriminative Data-driven Self-adaptive Fraud Control Decision System with Incomplete Information

    Full text link
    While E-commerce has been growing explosively and online shopping has become popular and even dominant in the present era, online transaction fraud control has drawn considerable attention in business practice and academic research. Conventional fraud control considers mainly the interactions of two major involved decision parties, i.e. merchants and fraudsters, to make fraud classification decision without paying much attention to dynamic looping effect arose from the decisions made by other profit-related parties. This paper proposes a novel fraud control framework that can quantify interactive effects of decisions made by different parties and can adjust fraud control strategies using data analytics, artificial intelligence, and dynamic optimization techniques. Three control models, Naive, Myopic and Prospective Controls, were developed based on the availability of data attributes and levels of label maturity. The proposed models are purely data-driven and self-adaptive in a real-time manner. The field test on Microsoft real online transaction data suggested that new systems could sizably improve the company's profit

    CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms

    Get PDF
    How to optimally dispatch orders to vehicles and how to tradeoff between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decision-making task of order dispatching and fleet management in online ride-hailing platforms. This task brings unique challenges in the following four aspects. First, to facilitate a huge number of vehicles to act and learn efficiently and robustly, we treat each region cell as an agent and build a multi-agent reinforcement learning framework. Second, to coordinate the agents from different regions to achieve long-term benefits, we leverage the geographical hierarchy of the region grids to perform hierarchical reinforcement learning. Third, to deal with the heterogeneous and variant action space for joint order dispatching and fleet management, we design the action as the ranking weight vector to rank and select the specific order or the fleet management destination in a unified formulation. Fourth, to achieve the multi-scale ride-hailing platform, we conduct the decision-making process in a hierarchical way where a multi-head attention mechanism is utilized to incorporate the impacts of neighbor agents and capture the key agent in each scale. The whole novel framework is named as CoRide. Extensive experiments based on multiple cities real-world data as well as analytic synthetic data demonstrate that CoRide provides superior performance in terms of platform revenue and user experience in the task of city-wide hybrid order dispatching and fleet management over strong baselines.Comment: CIKM 201

    Reinforcement Learning-based Application Autoscaling in the Cloud: A Survey

    Full text link
    Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision-making problems in complex uncertain environments. RL proposes a computational approach that allows learning through interaction in an environment with stochastic behavior, where agents take actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited superhuman performance in games like Go or Starcraft 2, which led to its gradual adoption in many other domains, including Cloud Computing. Therefore, RL appears as a promising approach for Autoscaling in Cloud since it is possible to learn transparent (with no human intervention), dynamic (no static plans), and adaptable (constantly updated) resource management policies to execute applications. These are three important distinctive aspects to consider in comparison with other widely used autoscaling policies that are defined in an ad-hoc way or statically computed as in solutions based on meta-heuristics. Autoscaling exploits the Cloud elasticity to optimize the execution of applications according to given optimization criteria, which demands to decide when and how to scale-up/down computational resources, and how to assign them to the upcoming processing workload. Such actions have to be taken considering that the Cloud is a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in the Cloud. In this work, we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and prospective research in the area.Comment: 40 pages, 9 figure

    Group Behavior Learning in Multi-Agent Systems Based on Social Interaction Among Agents

    Get PDF
    Research on multi-agent systems, in which autonomous agents are able to learn cooperative behavior, has been the subject of rising expectations in recent years. We have aimed at the group behavior generation of the multi-agents who have high levels of autonomous learning ability, like that of human beings, through social interaction between agents to acquire cooperative behavior. The sharing of environment states can improve cooperative ability, and the changing state of the environment in the information shared by agents will improve agents’ cooperative ability. On this basis, we use reward redistribution among agents to reinforce group behavior, and we propose a method of constructing a multi-agent system with an autonomous group creation ability. This is able to strengthen the cooperative behavior of the group as social agents

    On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

    Full text link
    This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially "learning to think." The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as "mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1404.782
    corecore