Search CORE

96 research outputs found

Manifold Representations for Continuous-State Reinforcement Learning

Author: Glaubius Robert
Smart William D.
Publication venue: Washington University Open Scholarship
Publication date: 01/05/2005
Field of study

Reinforcement learning (RL) has shown itself to be an effective paradigm for solving optimal control problems with a finite number of states. Generalizing RL techniques to problems with a continuous state space has proven a difficult task. We present an approach to modeling the RL value function using a manifold representation. By explicitly modeling the topology of the value function domain, traditional problems with discontinuities and resolution can be addressed without resorting to complex function approximators. We describe how manifold techniques can be applied to value-function approximation, and present methods for constructing manifold representations in both batch and online settings. We present empirical results demonstrating the effectiveness of our approach

Washington University St. Louis: Open Scholarship

Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning

Author: Blondé Lionel
Kalousis Alexandros
Strasser Pablo
Publication venue
Publication date: 03/07/2021
Field of study

Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance. Finally, we tackle a generic pessimistic reward preconditioning add-on spawning a large class of reward shaping methods, which makes the base method it is plugged into provably more robust, as shown in several additional theoretical guarantees. We then discuss these through a fine-grained lens and share our insights. Crucially, the guarantees derived and reported in this work are valid for any reward satisfying the Lipschitzness condition, nothing is specific to imitation. As such, these may be of independent interest

arXiv.org e-Print Archive

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

PubMed Central

Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning

Author: Alshamsi Aamena
Busoniu Lucian
Foerster Jakob
Gupta Jayesh K
Jaakkola Tommi
Kingma Diederik P
Li Ziru
Lowe Ryan
Silver David
Sutton Richard S
Tesauro Gerald
Wang Zhaodong
Yang Yaodong
Publication venue
Publication date: 31/01/2019
Field of study

A fundamental question in any peer-to-peer ridesharing system is how to, both effectively and efficiently, dispatch user's ride requests to the right driver in real time. Traditional rule-based solutions usually work on a simplified problem setting, which requires a sophisticated hand-crafted weight design for either centralized authority control or decentralized multi-agent scheduling systems. Although recent approaches have used reinforcement learning to provide centralized combinatorial optimization algorithms with informative weight values, their single-agent setting can hardly model the complex interactions between drivers and orders. In this paper, we address the order dispatching problem using multi-agent reinforcement learning (MARL), which follows the distributed nature of the peer-to-peer ridesharing problem and possesses the ability to capture the stochastic demand-supply dynamics in large-scale ridesharing scenarios. Being more reliable than centralized approaches, our proposed MARL solutions could also support fully distributed execution through recent advances in the Internet of Vehicles (IoV) and the Vehicle-to-Network (V2N). Furthermore, we adopt the mean field approximation to simplify the local interactions by taking an average action among neighborhoods. The mean field approximation is capable of globally capturing dynamic demand-supply variations by propagating many local interactions between agents and the environment. Our extensive experiments have shown the significant improvements of MARL order dispatching algorithms over several strong baselines on the gross merchandise volume (GMV), and order response rate measures. Besides, the simulated experiments with real data have also justified that our solution can alleviate the supply-demand gap during the rush hours, thus possessing the capability of reducing traffic congestion.Comment: 11 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Reinforcement learning for sequential decision-making: a data driven approach for finance

Author: BRINI Alessio
Publication venue: 'Scuola Normale Superiore - Edizioni della Normale'
Publication date: 12/09/2022
Field of study

This work presents a variety of reinforcement learning applications to the domain of nance. It composes of two-part. The rst one represents a technical overview of the basic concepts in machine learning, which are required to understand and work with the reinforcement learning paradigm and are shared among the domains of applications. Chapter 1 outlines the fundamental principle of machine learning reasoning before introducing the neural network model as a central component of every algorithm presented in this work. Chapter 2 introduces the idea of reinforcement learning from its roots, focusing on the mathematical formalism generally employed in every application. We focus on integrating the reinforcement learning framework with the neural network, and we explain their critical role in the eld's development. After the technical part, we present our original contribution, articulated in three di erent essays. The narrative line follows the idea of introducing the use of varying reinforcement learning algorithms through a trading application (Brini and Tantari, 2021) in Chapter 3. Then in Chapter 4 we focus on one of the presented reinforcement learning algorithms and aim at improving its performance and scalability in solving the trading problem by leveraging prior knowledge of the setting. In Chapter 5 of the second part, we use the same reinforcement learning algorithm to solve the problem of exchanging liquidity in a system of banks that can borrow and lend money, highlighting the exibility and the e ectiveness of the reinforcement learning paradigm in the broad nancial domain. We conclude with some remarks and ideas for further research in reinforcement learning applied to nance

Archivio istituzionale della Ricerca - Scuola Normale Superiore

The Role of Machine Learning in Knowledge-Based Response-Adapted Radiotherapy

Author: Huan-Hsin Tseng
Issam El Naqa
Randall K. Ten Haken
Yi Luo
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

With the continuous increase in radiotherapy patient-specific data from multimodality imaging and biotechnology molecular sources, knowledge-based response-adapted radiotherapy (KBR-ART) is emerging as a vital area for radiation oncology personalized treatment. In KBR-ART, planned dose distributions can be modified based on observed cues in patients’ clinical, geometric, and physiological parameters. In this paper, we present current developments in the field of adaptive radiotherapy (ART), the progression toward KBR-ART, and examine several applications of static and dynamic machine learning approaches for realizing the KBR-ART framework potentials in maximizing tumor control and minimizing side effects with respect to individual radiotherapy patients. Specifically, three questions required for the realization of KBR-ART are addressed: (1) what knowledge is needed; (2) how to estimate RT outcomes accurately; and (3) how to adapt optimally. Different machine learning algorithms for KBR-ART application shall be discussed and contrasted. Representative examples of different KBR-ART stages are also visited

Directory of Open Access Journals

Frontiers - Publisher Connector