10,878 research outputs found
Distributed Adaptive Reinforcement Learning: A Method for Optimal Routing
In this paper, a learning-based optimal transportation algorithm for
autonomous taxis and ridesharing vehicles is presented. The goal is to design a
mechanism to solve the routing problem for multiple autonomous vehicles and
multiple customers in order to maximize the transportation company's profit. As
a result, each vehicle selects the customer whose request maximizes the
company's profit in the long run. To solve this problem, the system is modeled
as a Markov Decision Process (MDP) using past customers data. By solving the
defined MDP, a centralized high-level planning recommendation is obtained,
where this offline solution is used as an initial value for the real-time
learning. Then, a distributed SARSA reinforcement learning algorithm is
proposed to capture the model errors and the environment changes, such as
variations in customer distributions in each area, traffic, and fares, thereby
providing optimal routing policies in real-time. Vehicles, or agents, use only
their local information and interaction, such as current passenger requests and
estimates of neighbors' tasks and their optimal actions, to obtain the optimal
policies in a distributed fashion. An optimal adaptive rate is introduced to
make the distributed SARSA algorithm capable of adapting to changes in the
environment and tracking the time-varying optimal policies. Furthermore, a
game-theory-based task assignment algorithm is proposed, where each agent uses
the optimal policies and their values from distributed SARSA to select its
customer from the set of local available requests in a distributed manner.
Finally, the customers data provided by the city of Chicago is used to validate
the proposed algorithms
Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning
With the advent of the Internet of Things (IoT), an increasing number of
energy harvesting methods are being used to supplement or supplant battery
based sensors. Energy harvesting sensors need to be configured according to the
application, hardware, and environmental conditions to maximize their
usefulness. As of today, the configuration of sensors is either manual or
heuristics based, requiring valuable domain expertise. Reinforcement learning
(RL) is a promising approach to automate configuration and efficiently scale
IoT deployments, but it is not yet adopted in practice. We propose solutions to
bridge this gap: reduce the training phase of RL so that nodes are operational
within a short time after deployment and reduce the computational requirements
to scale to large deployments. We focus on configuration of the sampling rate
of indoor solar panel based energy harvesting sensors. We created a simulator
based on 3 months of data collected from 5 sensor nodes subject to different
lighting conditions. Our simulation results show that RL can effectively learn
energy availability patterns and configure the sampling rate of the sensor
nodes to maximize the sensing data while ensuring that energy storage is not
depleted. The nodes can be operational within the first day by using our
methods. We show that it is possible to reduce the number of RL policies by
using a single policy for nodes that share similar lighting conditions.Comment: 7 pages, 5 figure
A unified decision making framework for supply and demand management in microgrid networks
This paper considers two important problems -- on the supply-side and
demand-side respectively and studies both in a unified framework. On the supply
side, we study the problem of energy sharing among microgrids with the goal of
maximizing profit obtained from selling power while at the same time not
deviating much from the customer demand. On the other hand, under shortage of
power, this problem becomes one of deciding the amount of power to be bought
with dynamically varying prices. On the demand side, we consider the problem of
optimally scheduling the time-adjustable demand - i.e., of loads with flexible
time windows in which they can be scheduled. While previous works have treated
these two problems in isolation, we combine these problems together and provide
a unified Markov decision process (MDP) framework for these problems. We then
apply the Q-learning algorithm, a popular model-free reinforcement learning
technique, to obtain the optimal policy. Through simulations, we show that the
policy obtained by solving our MDP model provides more profit to the
microgrids
AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection
This paper introduces an adaptive model-free deep reinforcement approach that
can recognize and adapt to the diurnal patterns in the ride-sharing environment
with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic
forgetting due to being agnostic to the timescale of changes in the
distribution of experiences. Although RL algorithms are guaranteed to converge
to optimal policies in Markov decision processes (MDPs), this only holds in the
presence of static environments. However, this assumption is very restrictive.
In many real-world problems like ride-sharing, traffic control, etc., we are
dealing with highly dynamic environments, where RL methods yield only
sub-optimal decisions. To mitigate this problem in highly dynamic environments,
we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to
detect the changes in the distribution of experiences, (2) develop a Deep Q
Network (DQN) agent that is capable of recognizing diurnal patterns and making
informed dispatching decisions according to the changes in the underlying
environment. Rather than fixing patterns by time of week, the proposed approach
automatically detects that the MDP has changed, and uses the results of the new
model. In addition to the adaptation logic in dispatching, this paper also
proposes a dynamic, demand-aware vehicle-passenger matching and route planning
framework that dynamically generates optimal routes for each vehicle based on
online demand, vehicle capacities, and locations. Evaluation on New York City
Taxi public dataset shows the effectiveness of our approach in improving the
fleet utilization, where less than 50% of the fleet are utilized to serve the
demand of up to 90% of the requests, while maximizing profits and minimizing
idle times.Comment: arXiv admin note: text overlap with arXiv:2010.0175
Discriminative Data-driven Self-adaptive Fraud Control Decision System with Incomplete Information
While E-commerce has been growing explosively and online shopping has become
popular and even dominant in the present era, online transaction fraud control
has drawn considerable attention in business practice and academic research.
Conventional fraud control considers mainly the interactions of two major
involved decision parties, i.e. merchants and fraudsters, to make fraud
classification decision without paying much attention to dynamic looping effect
arose from the decisions made by other profit-related parties. This paper
proposes a novel fraud control framework that can quantify interactive effects
of decisions made by different parties and can adjust fraud control strategies
using data analytics, artificial intelligence, and dynamic optimization
techniques. Three control models, Naive, Myopic and Prospective Controls, were
developed based on the availability of data attributes and levels of label
maturity. The proposed models are purely data-driven and self-adaptive in a
real-time manner. The field test on Microsoft real online transaction data
suggested that new systems could sizably improve the company's profit
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
Reinforcement Learning-based Application Autoscaling in the Cloud: A Survey
Reinforcement Learning (RL) has demonstrated a great potential for
automatically solving decision-making problems in complex uncertain
environments. RL proposes a computational approach that allows learning through
interaction in an environment with stochastic behavior, where agents take
actions to maximize some cumulative short-term and long-term rewards. Some of
the most impressive results have been shown in Game Theory where agents
exhibited superhuman performance in games like Go or Starcraft 2, which led to
its gradual adoption in many other domains, including Cloud Computing.
Therefore, RL appears as a promising approach for Autoscaling in Cloud since it
is possible to learn transparent (with no human intervention), dynamic (no
static plans), and adaptable (constantly updated) resource management policies
to execute applications. These are three important distinctive aspects to
consider in comparison with other widely used autoscaling policies that are
defined in an ad-hoc way or statically computed as in solutions based on
meta-heuristics. Autoscaling exploits the Cloud elasticity to optimize the
execution of applications according to given optimization criteria, which
demands to decide when and how to scale-up/down computational resources, and
how to assign them to the upcoming processing workload. Such actions have to be
taken considering that the Cloud is a dynamic and uncertain environment.
Motivated by this, many works apply RL to the autoscaling problem in the Cloud.
In this work, we survey exhaustively those proposals from major venues, and
uniformly compare them based on a set of proposed taxonomies. We also discuss
open problems and prospective research in the area.Comment: 40 pages, 9 figure
Group Behavior Learning in Multi-Agent Systems Based on Social Interaction Among Agents
Research on multi-agent systems, in which autonomous agents are able to learn cooperative behavior, has been the subject of rising expectations in recent years. We have aimed at the group behavior generation of the multi-agents who have high levels
of autonomous learning ability, like that of human beings, through social interaction between agents to acquire cooperative behavior. The sharing of environment
states can improve cooperative ability, and
the changing state of the environment in the information shared by agents will improve agents’ cooperative ability. On this basis, we use reward redistribution among agents to reinforce group behavior, and we propose a method of constructing a multi-agent system
with an autonomous group creation ability. This is able to strengthen the cooperative behavior of the group as social agents
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
- …