31,327 research outputs found
MQLV: Optimal Policy of Money Management in Retail Banking with Q-Learning
Reinforcement learning has become one of the best approach to train a
computer game emulator capable of human level performance. In a reinforcement
learning approach, an optimal value function is learned across a set of
actions, or decisions, that leads to a set of states giving different rewards,
with the objective to maximize the overall reward. A policy assigns to each
state-action pairs an expected return. We call an optimal policy a policy for
which the value function is optimal. QLBS, Q-Learner in the
Black-Scholes(-Merton) Worlds, applies the reinforcement learning concepts, and
noticeably, the popular Q-learning algorithm, to the financial stochastic model
of Black, Scholes and Merton. It is, however, specifically optimized for the
geometric Brownian motion and the vanilla options. Its range of application is,
therefore, limited to vanilla option pricing within financial markets. We
propose MQLV, Modified Q-Learner for the Vasicek model, a new reinforcement
learning approach that determines the optimal policy of money management based
on the aggregated financial transactions of the clients. It unlocks new
frontiers to establish personalized credit card limits or to fulfill bank loan
applications, targeting the retail banking industry. MQLV extends the
simulation to mean reverting stochastic diffusion processes and it uses a
digital function, a Heaviside step function expressed in its discrete form, to
estimate the probability of a future event such as a payment default. In our
experiments, we first show the similarities between a set of historical
financial transactions and Vasicek generated transactions and, then, we
underline the potential of MQLV on generated Monte Carlo simulations. Finally,
MQLV is the first Q-learning Vasicek-based methodology addressing transparent
decision making processes in retail banking
Improving Search through A3C Reinforcement Learning based Conversational Agent
We develop a reinforcement learning based search assistant which can assist
users through a set of actions and sequence of interactions to enable them
realize their intent. Our approach caters to subjective search where the user
is seeking digital assets such as images which is fundamentally different from
the tasks which have objective and limited search modalities. Labeled
conversational data is generally not available in such search tasks and
training the agent through human interactions can be time consuming. We propose
a stochastic virtual user which impersonates a real user and can be used to
sample user behavior efficiently to train the agent which accelerates the
bootstrapping of the agent. We develop A3C algorithm based context preserving
architecture which enables the agent to provide contextual assistance to the
user. We compare the A3C agent with Q-learning and evaluate its performance on
average rewards and state values it obtains with the virtual user in validation
episodes. Our experiments show that the agent learns to achieve higher rewards
and better states.Comment: 17 pages, 7 figure
- …