596 research outputs found
Advances in machine learning algorithms for financial risk management
In this thesis, three novel machine learning techniques are introduced to address distinct
yet interrelated challenges involved in financial risk management tasks. These approaches
collectively offer a comprehensive strategy, beginning with the precise classification of credit
risks, advancing through the nuanced forecasting of financial asset volatility, and ending
with the strategic optimisation of financial asset portfolios.
Firstly, a Hybrid Dual-Resampling and Cost-Sensitive technique has been proposed to combat the prevalent issue of class imbalance in financial datasets, particularly in credit risk
assessment. The key process involves the creation of heuristically balanced datasets to effectively address the problem. It uses a resampling technique based on Gaussian mixture
modelling to generate a synthetic minority class from the minority class data and concurrently uses k-means clustering on the majority class. Feature selection is then performed
using the Extra Tree Ensemble technique. Subsequently, a cost-sensitive logistic regression
model is then applied to predict the probability of default using the heuristically balanced
datasets. The results underscore the effectiveness of our proposed technique, with superior
performance observed in comparison to other imbalanced preprocessing approaches. This
advancement in credit risk classification lays a solid foundation for understanding individual
financial behaviours, a crucial first step in the broader context of financial risk management.
Building on this foundation, the thesis then explores the forecasting of financial asset volatility, a critical aspect of understanding market dynamics. A novel model that combines a
Triple Discriminator Generative Adversarial Network with a continuous wavelet transform
is proposed. The proposed model has the ability to decompose volatility time series into
signal-like and noise-like frequency components, to allow the separate detection and monitoring of non-stationary volatility data. The network comprises of a wavelet transform
component consisting of continuous wavelet transforms and inverse wavelet transform components, an auto-encoder component made up of encoder and decoder networks, and a
Generative Adversarial Network consisting of triple Discriminator and Generator networks.
The proposed Generative Adversarial Network employs an ensemble of unsupervised loss derived from the Generative Adversarial Network component during training, supervised
loss and reconstruction loss as part of its framework. Data from nine financial assets are
employed to demonstrate the effectiveness of the proposed model. This approach not only
enhances our understanding of market fluctuations but also bridges the gap between individual credit risk assessment and macro-level market analysis.
Finally the thesis ends with a novel proposal of a novel technique or Portfolio optimisation. This involves the use of a model-free reinforcement learning strategy for portfolio
optimisation using historical Low, High, and Close prices of assets as input with weights of
assets as output. A deep Capsules Network is employed to simulate the investment strategy, which involves the reallocation of the different assets to maximise the expected return
on investment based on deep reinforcement learning. To provide more learning stability in
an online training process, a Markov Differential Sharpe Ratio reward function has been
proposed as the reinforcement learning objective function. Additionally, a Multi-Memory
Weight Reservoir has also been introduced to facilitate the learning process and optimisation of computed asset weights, helping to sequentially re-balance the portfolio throughout
a specified trading period. The use of the insights gained from volatility forecasting into
this strategy shows the interconnected nature of the financial markets. Comparative experiments with other models demonstrated that our proposed technique is capable of achieving
superior results based on risk-adjusted reward performance measures.
In a nut-shell, this thesis not only addresses individual challenges in financial risk management but it also incorporates them into a comprehensive framework; from enhancing the
accuracy of credit risk classification, through the improvement and understanding of market
volatility, to optimisation of investment strategies. These methodologies collectively show
the potential of the use of machine learning to improve financial risk management
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
We consider the reinforcement learning (RL) problem with general utilities
which consists in maximizing a function of the state-action occupancy measure.
Beyond the standard cumulative reward RL setting, this problem includes as
particular cases constrained RL, pure exploration and learning from
demonstrations among others. For this problem, we propose a simpler single-loop
parameter-free normalized policy gradient algorithm. Implementing a recursive
momentum variance reduction mechanism, our algorithm achieves
and
sample complexities for -first-order stationarity and
-global optimality respectively, under adequate assumptions. We
further address the setting of large finite state action spaces via linear
function approximation of the occupancy measure and show a
sample complexity for a simple policy
gradient method with a linear regression subroutine.Comment: 48 pages, 2 figures, ICML 2023, this paper was initially submitted in
January 26th 202
A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges
In recent years, the development of robotics and artificial intelligence (AI)
systems has been nothing short of remarkable. As these systems continue to
evolve, they are being utilized in increasingly complex and unstructured
environments, such as autonomous driving, aerial robotics, and natural language
processing. As a consequence, programming their behaviors manually or defining
their behavior through reward functions (as done in reinforcement learning
(RL)) has become exceedingly difficult. This is because such environments
require a high degree of flexibility and adaptability, making it challenging to
specify an optimal set of rules or reward signals that can account for all
possible situations. In such environments, learning from an expert's behavior
through imitation is often more appealing. This is where imitation learning
(IL) comes into play - a process where desired behavior is learned by imitating
an expert's behavior, which is provided through demonstrations.
This paper aims to provide an introduction to IL and an overview of its
underlying assumptions and approaches. It also offers a detailed description of
recent advances and emerging areas of research in the field. Additionally, the
paper discusses how researchers have addressed common challenges associated
with IL and provides potential directions for future research. Overall, the
goal of the paper is to provide a comprehensive guide to the growing field of
IL in robotics and AI.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Sample-efficient model-based reinforcement learning for quantum control
We propose a model-based reinforcement learning (RL) approach for noisy time-dependent gate optimization with reduced sample complexity over model-free RL. Sample complexity is defined as the number of controller interactions with the physical system. Leveraging an inductive bias, inspired by recent advances in neural ordinary differential equations (ODEs), we use an autodifferentiable ODE, parametrized by a learnable Hamiltonian ansatz, to represent the model approximating the environment, whose time-dependent part, including the control, is fully known. Control alongside Hamiltonian learning of continuous time-independent parameters is addressed through interactions with the system. We demonstrate an order of magnitude advantage in sample complexity of our method over standard model-free RL in preparing some standard unitary gates with closed and open system dynamics, in realistic computational experiments incorporating single-shot measurements, arbitrary Hilbert space truncations, and uncertainty in Hamiltonian parameters. Also, the learned Hamiltonian can be leveraged by existing control methods like GRAPE (gradient ascent pulse engineering) for further gradient-based optimization with the controllers found by RL as initializations. Our algorithm, which we apply to nitrogen vacancy (NV) centers and transmons, is well suited for controlling partially characterized one- and two-qubit systems
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
Most offline reinforcement learning (RL) algorithms return a target policy
maximizing a trade-off between (1) the expected performance gain over the
behavior policy that collected the dataset, and (2) the risk stemming from the
out-of-distribution-ness of the induced state-action occupancy. It follows that
the performance of the target policy is strongly related to the performance of
the behavior policy and, thus, the trajectory return distribution of the
dataset. We show that in mixed datasets consisting of mostly low-return
trajectories and minor high-return trajectories, state-of-the-art offline RL
algorithms are overly restrained by low-return trajectories and fail to exploit
high-performing trajectories to the fullest. To overcome this issue, we show
that, in deterministic MDPs with stochastic initial states, the dataset
sampling can be re-weighted to induce an artificial dataset whose behavior
policy has a higher return. This re-weighted sampling strategy may be combined
with any offline RL algorithm. We further analyze that the opportunity for
performance improvement over the behavior policy correlates with the
positive-sided variance of the returns of the trajectories in the dataset. We
empirically show that while CQL, IQL, and TD3+BC achieve only a part of this
potential policy improvement, these same algorithms combined with our
reweighted sampling strategy fully exploit the dataset. Furthermore, we
empirically demonstrate that, despite its theoretical limitation, the approach
may still be efficient in stochastic environments. The code is available at
https://github.com/Improbable-AI/harness-offline-rl
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Learning Latent Representations to Co-Adapt to Humans
When robots interact with humans in homes, roads, or factories the human's
behavior often changes in response to the robot. Non-stationary humans are
challenging for robot learners: actions the robot has learned to coordinate
with the original human may fail after the human adapts to the robot. In this
paper we introduce an algorithmic formalism that enables robots (i.e., ego
agents) to co-adapt alongside dynamic humans (i.e., other agents) using only
the robot's low-level states, actions, and rewards. A core challenge is that
humans not only react to the robot's behavior, but the way in which humans
react inevitably changes both over time and between users. To deal with this
challenge, our insight is that -- instead of building an exact model of the
human -- robots can learn and reason over high-level representations of the
human's policy and policy dynamics. Applying this insight we develop RILI:
Robustly Influencing Latent Intent. RILI first embeds low-level robot
observations into predictions of the human's latent strategy and strategy
dynamics. Next, RILI harnesses these predictions to select actions that
influence the adaptive human towards advantageous, high reward behaviors over
repeated interactions. We demonstrate that -- given RILI's measured performance
with users sampled from an underlying distribution -- we can probabilistically
bound RILI's expected performance across new humans sampled from the same
distribution. Our simulated experiments compare RILI to state-of-the-art
representation and reinforcement learning baselines, and show that RILI better
learns to coordinate with imperfect, noisy, and time-varying agents. Finally,
we conduct two user studies where RILI co-adapts alongside actual humans in a
game of tag and a tower-building task. See videos of our user studies here:
https://youtu.be/WYGO5amDXb
Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics
An inherent problem of reinforcement learning is performing exploration of an
environment through random actions, of which a large portion can be
unproductive. Instead, exploration can be improved by initializing the learning
policy with an existing (previously learned or hard-coded) oracle policy,
offline data, or demonstrations. In the case of using an oracle policy, it can
be unclear how best to incorporate the oracle policy's experience into the
learning policy in a way that maximizes learning sample efficiency. In this
paper, we propose a method termed Critic Confidence Guided Exploration (CCGE)
for incorporating such an oracle policy into standard actor-critic
reinforcement learning algorithms. More specifically, CCGE takes in the oracle
policy's actions as suggestions and incorporates this information into the
learning scheme when uncertainty is high, while ignoring it when the
uncertainty is low. CCGE is agnostic to methods of estimating uncertainty, and
we show that it is equally effective with two different techniques.
Empirically, we evaluate the effect of CCGE on various benchmark reinforcement
learning tasks, and show that this idea can lead to improved sample efficiency
and final performance. Furthermore, when evaluated on sparse reward
environments, CCGE is able to perform competitively against adjacent algorithms
that also leverage an oracle policy. Our experiments show that it is possible
to utilize uncertainty as a heuristic to guide exploration using an oracle in
reinforcement learning. We expect that this will inspire more research in this
direction, where various heuristics are used to determine the direction of
guidance provided to learning.Comment: Under review at TML
Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets
We propose a policy gradient algorithm for robust infinite-horizon Markov
Decision Processes (MDPs) with non-rectangular uncertainty sets, thereby
addressing an open challenge in the robust MDP literature. Indeed, uncertainty
sets that display statistical optimality properties and make optimal use of
limited data often fail to be rectangular. Unfortunately, the corresponding
robust MDPs cannot be solved with dynamic programming techniques and are in
fact provably intractable. This prompts us to develop a projected Langevin
dynamics algorithm tailored to the robust policy evaluation problem, which
offers global optimality guarantees. We also propose a deterministic policy
gradient method that solves the robust policy evaluation problem approximately,
and we prove that the approximation error scales with a new measure of
non-rectangularity of the uncertainty set. Numerical experiments showcase that
our projected Langevin dynamics algorithm can escape local optima, while
algorithms tailored to rectangular uncertainty fail to do so.Comment: 20 pages, 2 figure
- …