770 research outputs found
A brief guide to multi-objective reinforcement learning and planning JAAMAS track
Real-world sequential decision-making tasks are usually complex, and require trade-offs between multiple - often conflicting - objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objectives. Such approaches may oversimplify the underlying problem, and produce suboptimal results. This extended abstract outlines the limitations of using a semi-blind iterative process to solve multi-objective decision making problems. Our extended paper [4], serves as a guide for the application of explicitly multi-objective methods to difficult problems. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved
A Data-driven Pricing Scheme for Optimal Routing through Artificial Currencies
Mobility systems often suffer from a high price of anarchy due to the
uncontrolled behavior of selfish users. This may result in societal costs that
are significantly higher compared to what could be achieved by a centralized
system-optimal controller. Monetary tolling schemes can effectively align the
behavior of selfish users with the system-optimum. Yet, they inevitably
discriminate the population in terms of income. Artificial currencies were
recently presented as an effective alternative that can achieve the same
performance, whilst guaranteeing fairness among the population. However, those
studies were based on behavioral models that may differ from practical
implementations. This paper presents a data-driven approach to automatically
adapt artificial-currency tolls within repetitive-game settings. We first
consider a parallel-arc setting whereby users commute on a daily basis from a
unique origin to a unique destination, choosing a route in exchange of an
artificial-currency price or reward while accounting for the impact of the
choices of the other users on travel discomfort. Second, we devise a
model-based reinforcement learning controller that autonomously learns the
optimal pricing policy by interacting with the proposed framework considering
the closeness of the observed aggregate flows to a desired system-optimal
distribution as a reward function. Our numerical results show that the proposed
data-driven pricing scheme can effectively align the users' flows with the
system optimum, significantly reducing the societal costs with respect to the
uncontrolled flows (by about 15% and 25% depending on the scenario), and
respond to environmental changes in a robust and efficient manner
Recommended from our members
Dynamic congestion pricing in within-day and day-to-day network equilibrium models
This dissertation explores two kinds of dynamic pricing models which react to within-day and day-to-day variation in traffic. Traffic patterns vary within each day due to uncertainty in the supply-side that is caused by non-recurring sources of congestion such as incidents, poor weather, and temporary bottlenecks. On the other hand, significant day-to-day variations in traffic patterns also arise from stochastic route choices of travelers who are not fully rational. Using slightly different assumptions, we analyze the network performance in these two scenarios and demonstrate the advantages of dynamic pricing over static tolls. In both cases, traffic networks are characterized by a set of stochastic states. We seek optimal tolls that are a function of the network states which evolve within each day or across days.
In the within-day equilibrium models, travelers are assumed to be completely rational and have knowledge of stochastic link-states, which have different delay functions. At every node, travelers observe the link-states of downstream links and select the next node to minimize their expected travel times. Collectively, such behavior leads to an equilibrium, which is also referred to as user equilibrium with recourse, in which all used routing policies have equal and minimal expected travel time. In this dissertation, we improve the system performance of the equilibrium flows using state-dependent marginal link tolls. These tolls address externalities associated with non-recurring congestion just as static marginal tolls in regular traffic assignment reflect externalities related to recurring congestion.
The set of tolls that improve system performance are not necessarily unique. Hence, in order to make the concept of tolling more acceptable to the public, we explore alternate pricing mechanisms that optimize social welfare and also collect the least amount of revenue in expectation. This minimum revenue toll model is formulated as a linear program whose inputs are derived from the solution to a novel reformulation of the user equilibrium with recourse problem.
We also study day-to-day dynamic models which unlike traditional equilibrium approaches capture the fluctuations or stochasticity in traffic due to route choice uncertainty. Travelers decisions are modeled using route choice dynamics, such as the logit choice protocol, that depend on historic network conditions. The evolution of the system is modeled as a stochastic process and its steady state is used to characterize the network performance. The objective of pricing in this context is to set dynamic tolls that depend on the state of the network on previous day(s) such that the expected total system travel time is minimized. This problem is formulated as an average cost Markov decision process. Approximation methods are suggested to improve computational tractability.
The day-to-day pricing models are extended to instances in which closed form dynamics are unavailable or unfit to represent travelers' choices. In such cases, we apply Q-learning in which the route choices may be simulated off-line or can be observed through experimentation in an online setting. The off-line methods were found to be promising and can be used in conjunction with complex discrete choice models that predict travel behavior with greater accuracy.
Overall, the findings in this dissertation highlight the pitfalls of using static tolls in the presence of different types of stochasticity and make a strong case for employing dynamic state-dependent tolls to improve system efficiency.Civil, Architectural, and Environmental Engineerin
Agent-Based Modeling and Simulation for the Bus-Corridor Problem in a Many-to-One Mass Transit System
With the growing problem of urban traffic congestion, departure time choice is becoming a more important factor to commuters. By using multiagent modeling and the Bush-Mosteller reinforcement learning model, we simulated the day-to-day evolution of commuters’ departure time choice on a many-to-one mass transit system during the morning peak period. To start with, we verified the model by comparison with traditional analytical methods. Then the formation process of departure time equilibrium is investigated additionally. Seeing the validity of the model, some initial assumptions were relaxed and two groups of experiments were carried out considering commuters’ heterogeneity and memory limitations. The results showed that heterogeneous commuters’ departure time distribution is broader and has a lower peak at equilibrium and different people behave in different pattern. When each commuter has a limited memory, some fluctuations exist in the evolutionary dynamics of the system, and hence an ideal equilibrium can hardly be reached. This research is helpful in acquiring a better understanding of commuter’s departure time choice and commuting equilibrium of the peak period; the approach also provides an effective way to explore the formation and evolution of complicated traffic phenomena
Recommended from our members
Harnessing Big Data for the Sharing Economy in Smart Cities
Motivated by the imbalance between demand (i.e., passenger requests) and supply (i.e., available vehicles) in the ride-hailing market and severe traffic congestion faced by modern cities, this dissertation aims to improve the efficiency of the sharing economy by building an agent-based methodological framework for optimal decision-making of distributed agents (e.g., autonomous shared vehicles), including passenger-seeking and route choice. Furthermore, noticing that city planners can impact the behavior of agents via some operational measures such as congestion pricing and signal control, this dissertation investigates the overall bilevel problem that involves the decision-making process of both distributed agents (i.e., the lower level) and central city planners (i.e., the upper level).
First of all, for the task of passenger-seeking, this dissertation proposes a model-based Markov decision process (MDP) approach to incorporate distinct features of e-hailing drivers. The modified MDP approach is found to outperform the baseline (i.e., the local hotspot strategy) in terms of both the rate of return and the utilization rate. Although the modified MDP approach is set up in the single-agent setting, we extend its applicability to multi-agent scenarios by a dynamic adjustment strategy of the order matching probability which is able to partially capture the competition among agents. Furthermore, noticing that the reward function is commonly assumed as some prior knowledge, this dissertation unveils the underlying reward function of the overall e-hailing driver population (i.e., 44,000 Didi drivers in Beijing) through an inverse reinforcement learning method, which paves the way for future research on discovering the underlying reward mechanism in a complex and dynamic ride-hailing market.
To better incorporate the competition among agents, this dissertation develops a model-free mean-field multi-agent actor-critic algorithm for multi-driver passenger-seeking. A bilevel optimization model is then formulated with the upper level as a reward design mechanism and the lower level as a multi-agent system. We use the developed mean field multi-agent actor-critic algorithm to solve for the optimal passenger-seeking policies of distributed agents in the lower level and Bayesian optimization to solve for the optimal control of upper-level city planners. The bilevel optimization model is applied to a real-world large-scale multi-class taxi driver repositioning task with congestion pricing as the upper-level control. It is disclosed that the derived optimal toll charge can efficiently improve the objective of city planners.
With agents knowingwhere to go (i.e., passenger-seeking), this dissertation then applies the bilevel optimization model to the research question of how to get there (i.e., route choice). Different from the task of passenger-seeking where the action space is always fixed-dimensional, the problem of variable action set emerges in the task of route choice. Therefore, a flow-dependent deep Q-learning algorithm is proposed to efficiently derive the optimal policies for multi-commodity multi-class agents. We demonstrate the effect of two countermeasures, namely tolling and signal control, on the behavior of travelers and show that the systematic objective of city planners can be optimized by a proper control
A practical guide to multi-objective reinforcement learning and planning
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)
- …