96 research outputs found
Manifold Representations for Continuous-State Reinforcement Learning
Reinforcement learning (RL) has shown itself to be an effective paradigm for solving optimal control problems with a finite number of states. Generalizing RL techniques to problems with a continuous state space has proven a difficult task. We present an approach to modeling the RL value function using a manifold representation. By explicitly modeling the topology of the value function domain, traditional problems with discontinuities and resolution can be addressed without resorting to complex function approximators. We describe how manifold techniques can be applied to value-function approximation, and present methods for constructing manifold representations in both batch and online settings. We present empirical results demonstrating the effectiveness of our approach
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning
Despite the recent success of reinforcement learning in various domains,
these approaches remain, for the most part, deterringly sensitive to
hyper-parameters and are often riddled with essential engineering feats
allowing their success. We consider the case of off-policy generative
adversarial imitation learning, and perform an in-depth review, qualitative and
quantitative, of the method. We show that forcing the learned reward function
to be local Lipschitz-continuous is a sine qua non condition for the method to
perform well. We then study the effects of this necessary condition and provide
several theoretical results involving the local Lipschitzness of the
state-value function. We complement these guarantees with empirical evidence
attesting to the strong positive effect that the consistent satisfaction of the
Lipschitzness constraint on the reward has on imitation performance. Finally,
we tackle a generic pessimistic reward preconditioning add-on spawning a large
class of reward shaping methods, which makes the base method it is plugged into
provably more robust, as shown in several additional theoretical guarantees. We
then discuss these through a fine-grained lens and share our insights.
Crucially, the guarantees derived and reported in this work are valid for any
reward satisfying the Lipschitzness condition, nothing is specific to
imitation. As such, these may be of independent interest
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning
A fundamental question in any peer-to-peer ridesharing system is how to, both
effectively and efficiently, dispatch user's ride requests to the right driver
in real time. Traditional rule-based solutions usually work on a simplified
problem setting, which requires a sophisticated hand-crafted weight design for
either centralized authority control or decentralized multi-agent scheduling
systems. Although recent approaches have used reinforcement learning to provide
centralized combinatorial optimization algorithms with informative weight
values, their single-agent setting can hardly model the complex interactions
between drivers and orders. In this paper, we address the order dispatching
problem using multi-agent reinforcement learning (MARL), which follows the
distributed nature of the peer-to-peer ridesharing problem and possesses the
ability to capture the stochastic demand-supply dynamics in large-scale
ridesharing scenarios. Being more reliable than centralized approaches, our
proposed MARL solutions could also support fully distributed execution through
recent advances in the Internet of Vehicles (IoV) and the Vehicle-to-Network
(V2N). Furthermore, we adopt the mean field approximation to simplify the local
interactions by taking an average action among neighborhoods. The mean field
approximation is capable of globally capturing dynamic demand-supply variations
by propagating many local interactions between agents and the environment. Our
extensive experiments have shown the significant improvements of MARL order
dispatching algorithms over several strong baselines on the gross merchandise
volume (GMV), and order response rate measures. Besides, the simulated
experiments with real data have also justified that our solution can alleviate
the supply-demand gap during the rush hours, thus possessing the capability of
reducing traffic congestion.Comment: 11 pages, 9 figure
Reinforcement learning for sequential decision-making: a data driven approach for finance
This work presents a variety of reinforcement learning applications to the
domain of nance. It composes of two-part. The rst one represents a technical
overview of the basic concepts in machine learning, which are required
to understand and work with the reinforcement learning paradigm and are
shared among the domains of applications. Chapter 1 outlines the fundamental
principle of machine learning reasoning before introducing the neural
network model as a central component of every algorithm presented in this
work. Chapter 2 introduces the idea of reinforcement learning from its roots,
focusing on the mathematical formalism generally employed in every application.
We focus on integrating the reinforcement learning framework with the
neural network, and we explain their critical role in the eld's development.
After the technical part, we present our original contribution, articulated
in three di erent essays. The narrative line follows the idea of introducing
the use of varying reinforcement learning algorithms through a trading application
(Brini and Tantari, 2021) in Chapter 3. Then in Chapter 4 we
focus on one of the presented reinforcement learning algorithms and aim at
improving its performance and scalability in solving the trading problem by
leveraging prior knowledge of the setting. In Chapter 5 of the second part,
we use the same reinforcement learning algorithm to solve the problem of
exchanging liquidity in a system of banks that can borrow and lend money,
highlighting the
exibility and the e ectiveness of the reinforcement learning
paradigm in the broad nancial domain. We conclude with some remarks
and ideas for further research in reinforcement learning applied to nance
The Role of Machine Learning in Knowledge-Based Response-Adapted Radiotherapy
With the continuous increase in radiotherapy patient-specific data from multimodality imaging and biotechnology molecular sources, knowledge-based response-adapted radiotherapy (KBR-ART) is emerging as a vital area for radiation oncology personalized treatment. In KBR-ART, planned dose distributions can be modified based on observed cues in patients’ clinical, geometric, and physiological parameters. In this paper, we present current developments in the field of adaptive radiotherapy (ART), the progression toward KBR-ART, and examine several applications of static and dynamic machine learning approaches for realizing the KBR-ART framework potentials in maximizing tumor control and minimizing side effects with respect to individual radiotherapy patients. Specifically, three questions required for the realization of KBR-ART are addressed: (1) what knowledge is needed; (2) how to estimate RT outcomes accurately; and (3) how to adapt optimally. Different machine learning algorithms for KBR-ART application shall be discussed and contrasted. Representative examples of different KBR-ART stages are also visited
- …