8,036 research outputs found

    Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures

    Full text link
    In this work we present a non-parametric online market regime detection method for multidimensional data structures using a path-wise two-sample test derived from a maximum mean discrepancy-based similarity metric on path space that uses rough path signatures as a feature map. The latter similarity metric has been developed and applied as a discriminator in recent generative models for small data environments, and has been optimised here to the setting where the size of new incoming data is particularly small, for faster reactivity. On the same principles, we also present a path-wise method for regime clustering which extends our previous work. The presented regime clustering techniques were designed as ex-ante market analysis tools that can identify periods of approximatively similar market activity, but the new results also apply to path-wise, high dimensional-, and to non-Markovian settings as well as to data structures that exhibit autocorrelation. We demonstrate our clustering tools on easily verifiable synthetic datasets of increasing complexity, and also show how the outlined regime detection techniques can be used as fast on-line automatic regime change detectors or as outlier detection tools, including a fully automated pipeline. Finally, we apply the fine-tuned algorithms to real-world historical data including high-dimensional baskets of equities and the recent price evolution of crypto assets, and we show that our methodology swiftly and accurately indicated historical periods of market turmoil.Comment: 65 pages, 52 figure

    Active Coverage for PAC Reinforcement Learning

    Full text link
    Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of "good coverage" really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an instance-dependent lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.Comment: Accepted at COLT 202

    Peak Estimation of Time Delay Systems using Occupation Measures

    Full text link
    This work proposes a method to compute the maximum value obtained by a state function along trajectories of a Delay Differential Equation (DDE). An example of this task is finding the maximum number of infected people in an epidemic model with a nonzero incubation period. The variables of this peak estimation problem include the stopping time and the original history (restricted to a class of admissible histories). The original nonconvex DDE peak estimation problem is approximated by an infinite-dimensional Linear Program (LP) in occupation measures, inspired by existing measure-based methods in peak estimation and optimal control. This LP is approximated from above by a sequence of Semidefinite Programs (SDPs) through the moment-Sum of Squares (SOS) hierarchy. Effectiveness of this scheme in providing peak estimates for DDEs is demonstrated with provided examplesComment: 34 pages, 14 figures, 3 table

    Beam scanning by liquid-crystal biasing in a modified SIW structure

    Get PDF
    A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Machine learning approach towards predicting turbulent fluid flow using convolutional neural networks

    Get PDF
    Using convolutional neural networks, we present a novel method for predicting turbulent fluid flow through an array of obstacles in this thesis. In recent years, machine learning has exploded in popularity due to its ability to create accurate data driven models and the abundance of available data. In an attempt to understand the characteristics of turbulent fluid flow, we utilise a novel convolutional autoencoder neural network to predict the first ten POD modes of turbulent fluid flow. We find that the model is able to predict the first two POD modes well although and with less accuracy for the remaining eight POD modes. In addition, we find that the ML-predicted POD modes are accurate enough to be used to reconstruct turbulent flow that adequately captures the large-scale details of the original simulation

    Propagation of chaos for mean field Schr\"odinger problems

    Full text link
    In this work, we study the mean field Schr\"odinger problem from a purely probabilistic point of view by exploiting its connection to stochastic control theory for McKean-Vlasov diffusions. Our main result shows that the mean field Schr\"odinger problem arises as the limit of ``standard'' Schr\"odinger problems over interacting particles. Due to the stochastic maximum principle and a suitable penalization procedure, the result follows as a consequence of novel (quantitative) propagation of chaos results for forward-backwards particle systems. The approach described in the paper seems flexible enough to address other questions in the theory. For instance, our stochastic control technique further allows us to solve the mean field Schr\"odinger problem and characterize its solution, the mean field Schr\"odinger bridge, by a forward-backward planning equation

    Convex Optimization-based Policy Adaptation to Compensate for Distributional Shifts

    Full text link
    Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control

    Scaling up integrated photonic reservoirs towards low-power high-bandwidth computing

    No full text

    A hybrid quantum algorithm to detect conical intersections

    Full text link
    Conical intersections are topologically protected crossings between the potential energy surfaces of a molecular Hamiltonian, known to play an important role in chemical processes such as photoisomerization and non-radiative relaxation. They are characterized by a non-zero Berry phase, which is a topological invariant defined on a closed path in atomic coordinate space, taking the value π\pi when the path encircles the intersection manifold. In this work, we show that for real molecular Hamiltonians, the Berry phase can be obtained by tracing a local optimum of a variational ansatz along the chosen path and estimating the overlap between the initial and final state with a control-free Hadamard test. Moreover, by discretizing the path into NN points, we can use NN single Newton-Raphson steps to update our state non-variationally. Finally, since the Berry phase can only take two discrete values (0 or π\pi), our procedure succeeds even for a cumulative error bounded by a constant; this allows us to bound the total sampling cost and to readily verify the success of the procedure. We demonstrate numerically the application of our algorithm on small toy models of the formaldimine molecule (\ce{H2C=NH}).Comment: 15 + 10 pages, 4 figure
    • …
    corecore