1,256 research outputs found
Budgeted Reinforcement Learning in Continuous State Space
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov
Decision Process to critical applications requiring safety constraints. It
relies on a notion of risk implemented in the shape of a cost signal
constrained to lie below an - adjustable - threshold. So far, BMDPs could only
be solved in the case of finite state spaces with known dynamics. This work
extends the state-of-the-art to continuous spaces environments and unknown
dynamics. We show that the solution to a BMDP is a fixed point of a novel
Budgeted Bellman Optimality operator. This observation allows us to introduce
natural extensions of Deep Reinforcement Learning algorithms to address
large-scale BMDPs. We validate our approach on two simulated applications:
spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute
Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit
We introduce a method for pricing consumer credit using recent advances in offline deep reinforcement learning. This approach relies on a static dataset and as opposed to commonly used pricing approaches it requires no assumptions on the functional form of demand. Using both real and synthetic data on consumer credit applications, we demonstrate that our approach using the conservative Q-Learning algorithm is capable of learning an effective personalized pricing policy without any online interaction or price experimentation. In particular, using historical data on online auto loan applications we estimate an increase in expected profit of 21% with a less than 15% average change in prices relative to the original pricing policy
Budgeted Reinforcement Learning in Continuous State Space
International audienceA Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving
Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction
Automatic bill payment is an important part of business operations in fintech
companies. The practice of deduction was mainly based on the total amount or
heuristic search by dividing the bill into smaller parts to deduct as much as
possible. This article proposes an end-to-end approach of automatically
learning the optimal deduction paths (deduction amount in order), which reduces
the cost of manual path design and maximizes the amount of successful
deduction. Specifically, in view of the large search space of the paths and the
extreme sparsity of historical successful deduction records, we propose a deep
hierarchical reinforcement learning approach which abstracts the action into a
two-level hierarchical space: an upper agent that determines the number of
steps of deductions each day and a lower agent that decides the amount of
deduction at each step. In such a way, the action space is structured via prior
knowledge and the exploration space is reduced. Moreover, the inherited
information incompleteness of the business makes the environment just partially
observable. To be precise, the deducted amounts indicate merely the lower
bounds of the available account balance. To this end, we formulate the problem
as a partially observable Markov decision problem (POMDP) and employ an
environment correction algorithm based on the characteristics of the business.
In the world's largest electronic payment business, we have verified the
effectiveness of this scheme offline and deployed it online to serve millions
of users
Physical Deep Reinforcement Learning Towards Safety Guarantee
Deep reinforcement learning (DRL) has achieved tremendous success in many
complex decision-making tasks of autonomous systems with high-dimensional state
and/or action spaces. However, the safety and stability still remain major
concerns that hinder the applications of DRL to safety-critical autonomous
systems. To address the concerns, we proposed the Phy-DRL: a physical deep
reinforcement learning framework. The Phy-DRL is novel in two architectural
designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration
of physics-model-based control and data-driven control). The concurrent
physical reward and residual control empower the Phy-DRL the (mathematically)
provable safety and stability guarantees. Through experiments on the inverted
pendulum, we show that the Phy-DRL features guaranteed safety and stability and
enhanced robustness, while offering remarkably accelerated training and
enlarged reward.Comment: Working Pape
- …