8,036 research outputs found
Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures
In this work we present a non-parametric online market regime detection
method for multidimensional data structures using a path-wise two-sample test
derived from a maximum mean discrepancy-based similarity metric on path space
that uses rough path signatures as a feature map. The latter similarity metric
has been developed and applied as a discriminator in recent generative models
for small data environments, and has been optimised here to the setting where
the size of new incoming data is particularly small, for faster reactivity.
On the same principles, we also present a path-wise method for regime
clustering which extends our previous work. The presented regime clustering
techniques were designed as ex-ante market analysis tools that can identify
periods of approximatively similar market activity, but the new results also
apply to path-wise, high dimensional-, and to non-Markovian settings as well as
to data structures that exhibit autocorrelation.
We demonstrate our clustering tools on easily verifiable synthetic datasets
of increasing complexity, and also show how the outlined regime detection
techniques can be used as fast on-line automatic regime change detectors or as
outlier detection tools, including a fully automated pipeline. Finally, we
apply the fine-tuned algorithms to real-world historical data including
high-dimensional baskets of equities and the recent price evolution of crypto
assets, and we show that our methodology swiftly and accurately indicated
historical periods of market turmoil.Comment: 65 pages, 52 figure
Active Coverage for PAC Reinforcement Learning
Collecting and leveraging data with good coverage properties plays a crucial
role in different aspects of reinforcement learning (RL), including reward-free
exploration and offline learning. However, the notion of "good coverage" really
depends on the application at hand, as data suitable for one context may not be
so for another. In this paper, we formalize the problem of active coverage in
episodic Markov decision processes (MDPs), where the goal is to interact with
the environment so as to fulfill given sampling requirements. This framework is
sufficiently flexible to specify any desired coverage property, making it
applicable to any problem that involves online exploration. Our main
contribution is an instance-dependent lower bound on the sample complexity of
active coverage and a simple game-theoretic algorithm, CovGame, that nearly
matches it. We then show that CovGame can be used as a building block to solve
different PAC RL tasks. In particular, we obtain a simple algorithm for PAC
reward-free exploration with an instance-dependent sample complexity that, in
certain MDPs which are "easy to explore", is lower than the minimax one. By
further coupling this exploration algorithm with a new technique to do implicit
eliminations in policy space, we obtain a computationally-efficient algorithm
for best-policy identification whose instance-dependent sample complexity
scales with gaps between policy values.Comment: Accepted at COLT 202
Peak Estimation of Time Delay Systems using Occupation Measures
This work proposes a method to compute the maximum value obtained by a state
function along trajectories of a Delay Differential Equation (DDE). An example
of this task is finding the maximum number of infected people in an epidemic
model with a nonzero incubation period. The variables of this peak estimation
problem include the stopping time and the original history (restricted to a
class of admissible histories). The original nonconvex DDE peak estimation
problem is approximated by an infinite-dimensional Linear Program (LP) in
occupation measures, inspired by existing measure-based methods in peak
estimation and optimal control. This LP is approximated from above by a
sequence of Semidefinite Programs (SDPs) through the moment-Sum of Squares
(SOS) hierarchy. Effectiveness of this scheme in providing peak estimates for
DDEs is demonstrated with provided examplesComment: 34 pages, 14 figures, 3 table
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Machine learning approach towards predicting turbulent fluid flow using convolutional neural networks
Using convolutional neural networks, we present a novel method for predicting turbulent fluid flow through an array of obstacles in this thesis. In recent years, machine learning has exploded in popularity due to its ability to create accurate data driven models and the abundance of available data. In an attempt to understand the characteristics of turbulent fluid flow, we utilise a novel convolutional autoencoder neural network to predict the first ten POD modes of turbulent fluid flow. We find
that the model is able to predict the first two POD modes well although and with less accuracy for the remaining eight POD modes. In addition, we find that the
ML-predicted POD modes are accurate enough to be used to reconstruct turbulent flow that adequately captures the large-scale details of the original simulation
Propagation of chaos for mean field Schr\"odinger problems
In this work, we study the mean field Schr\"odinger problem from a purely
probabilistic point of view by exploiting its connection to stochastic control
theory for McKean-Vlasov diffusions. Our main result shows that the mean field
Schr\"odinger problem arises as the limit of ``standard'' Schr\"odinger
problems over interacting particles. Due to the stochastic maximum principle
and a suitable penalization procedure, the result follows as a consequence of
novel (quantitative) propagation of chaos results for forward-backwards
particle systems. The approach described in the paper seems flexible enough to
address other questions in the theory. For instance, our stochastic control
technique further allows us to solve the mean field Schr\"odinger problem and
characterize its solution, the mean field Schr\"odinger bridge, by a
forward-backward planning equation
Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts
Many real-world systems often involve physical components or operating
environments with highly nonlinear and uncertain dynamics. A number of
different control algorithms can be used to design optimal controllers for such
systems, assuming a reasonably high-fidelity model of the actual system.
However, the assumptions made on the stochastic dynamics of the model when
designing the optimal controller may no longer be valid when the system is
deployed in the real-world. The problem addressed by this paper is the
following: Suppose we obtain an optimal trajectory by solving a control problem
in the training environment, how do we ensure that the real-world system
trajectory tracks this optimal trajectory with minimal amount of error in a
deployment environment. In other words, we want to learn how we can adapt an
optimal trained policy to distribution shifts in the environment. Distribution
shifts are problematic in safety-critical systems, where a trained policy may
lead to unsafe outcomes during deployment. We show that this problem can be
cast as a nonlinear optimization problem that could be solved using heuristic
method such as particle swarm optimization (PSO). However, if we instead
consider a convex relaxation of this problem, we can learn policies that track
the optimal trajectory with much better error performance, and faster
computation times. We demonstrate the efficacy of our approach on tracking an
optimal path using a Dubin's car model, and collision avoidance using both a
linear and nonlinear model for adaptive cruise control
A hybrid quantum algorithm to detect conical intersections
Conical intersections are topologically protected crossings between the
potential energy surfaces of a molecular Hamiltonian, known to play an
important role in chemical processes such as photoisomerization and
non-radiative relaxation. They are characterized by a non-zero Berry phase,
which is a topological invariant defined on a closed path in atomic coordinate
space, taking the value when the path encircles the intersection
manifold. In this work, we show that for real molecular Hamiltonians, the Berry
phase can be obtained by tracing a local optimum of a variational ansatz along
the chosen path and estimating the overlap between the initial and final state
with a control-free Hadamard test. Moreover, by discretizing the path into
points, we can use single Newton-Raphson steps to update our state
non-variationally. Finally, since the Berry phase can only take two discrete
values (0 or ), our procedure succeeds even for a cumulative error bounded
by a constant; this allows us to bound the total sampling cost and to readily
verify the success of the procedure. We demonstrate numerically the application
of our algorithm on small toy models of the formaldimine molecule
(\ce{H2C=NH}).Comment: 15 + 10 pages, 4 figure
- …