70 research outputs found

    Reinforcement Learning Curricula as Interpolations between Task Distributions

    Get PDF
    In the last decade, the increased availability of powerful computing machinery has led to an increasingly widespread application of machine learning methods. Machine learning has been particularly successful when large models, typically neural networks with an ever-increasing number of parameters, can leverage vast data to make predictions. While reinforcement learning (RL) has been no exception from this development, a distinguishing feature of RL is its well-known exploration-exploitation trade-off, whose optimal solution – while possible to model as a partially observable Markov decision process – evades computation in all but the simplest problems. Consequently, it seems unsurprising that notable demonstrations of reinforcement learning, such as an RL-based Go agent (AlphaGo) by Deepmind beating the professional Go player Lee Sedol, relied both on the availability of massive computing capabilities and specific forms of regularization that facilitate learning. In the case of AlphaGo, this regularization came in the form of self-play, enabling learning by interacting with gradually more proficient opponents. In this thesis, we develop techniques that, similarly to the concept of self-play of AlphaGo, improve the learning performance of RL agents by training on sequences of increasingly complex tasks. These task sequences are typically called curricula and are known to side-step problems such as slow learning or convergence to poor behavior that may occur when directly learning in complicated tasks. The algorithms we develop in this thesis create curricula by minimizing distances or divergences between probability distributions of learning tasks, generating interpolations between an initial distribution of easy learning tasks and a target task distribution. Apart from improving the learning performance of RL agents in experiments, developing methods that realize curricula as interpolations between task distributions results in a nuanced picture of key aspects of successful reinforcement learning curricula. In Chapter 1, we start this thesis by introducing required reinforcement learning notation and then motivating curriculum reinforcement learning from the perspective of continuation methods for non-linear optimization. Similar to curricula for reinforcement learning agents, continuation methods have been used in non-linear optimization to solve challenging optimization problems. This similarity provides an intuition about the effect of the curricula we aim to generate and their limits. In Chapter 2, we transfer the concept of self-paced learning, initially proposed in the supervised learning community, to the problem of RL, showing that an automated curriculum generation for RL agents can be motivated by a regularized RL objective. This regularized RL objective implies generating a curriculum as a sequence of task distributions that trade off the expected agent performance against similarity to a specified distribution of target tasks. This view on curriculum RL contrasts existing approaches, as it motivates curricula via a regularized RL objective instead of generating them from a set of assumptions about an optimal curriculum. In experiments, we show that an approximate implementation of the aforementioned curriculum – that restricts the interpolating task distribution to a Gaussian – results in improved learning performance compared to regular reinforcement learning, matching or surpassing the performance of existing curriculum-based methods. Subsequently, Chapter 3 builds up on the intuition of curricula as sequences of interpolating task distributions established in Chapter 2. Motivated by using more flexible task distribution representations, we show how parametric assumptions play a crucial role in the empirical success of the previous approach and subsequently uncover key ingredients that enable the generation of meaningful curricula without assuming a parametric model of the task distributions. One major ingredient is an explicit notion of task similarity via a distance function of two Markov Decision Processes. We turn towards optimal transport theory, allowing for flexible particle-based representations of the task distributions while properly considering the newly introduced metric structure of the task space. Combined with other improvements to our first method, such as a more aggressive restriction of the curriculum to tasks that are not too hard for the agent, the resulting approach delivers consistently high learning performance in multiple experiments. In the final Chapter 4, we apply the refined method of Chapter 3 to a trajectory-tracking task, in which we task an RL agent to follow a three-dimensional reference trajectory with the tip of an inverted pendulum mounted on a Barrett Whole Arm Manipulator. The access to only positional information results in a partially observable system that, paired with its inherent instability, underactuation, and non-trivial kinematic structure, presents a challenge for modern reinforcement learning algorithms, which we tackle via curricula. The technically infinite-dimensional task space of target trajectories allows us to probe the developed curriculum learning method for flaws that have not surfaced in the rather low-dimensional experiments of the previous chapters. Through an improved optimization scheme that better respects the non-Euclidean structure of target trajectories, we reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to learn state estimation and control for non-linear tracking tasks jointly. In summary, this thesis introduces a perspective on reinforcement learning curricula as interpolations between task distributions. The methods developed under this perspective enjoy a precise formulation as optimization problems and deliver empirical benefits throughout experiments. Building upon this precise formulation may allow future work to advance the formal understanding of reinforcement learning curricula and, with that, enable the solution of challenging decision-making and control problems with reinforcement learning

    Computing Perfect Stationary Equilibria in Stochastic Games

    Get PDF
    The notion of stationary equilibrium is one of the most crucial solution concepts in stochastic games. However, a stochastic game can have multiple stationary equilibria, some of which may be unstable or counterintuitive. As a refinement of stationary equilibrium, we extend the concept of perfect equilibrium in strategic games to stochastic games and formulate the notion of perfect stationary equilibrium (PeSE). To further promote its applications, we develop a differentiable homotopy method to compute such an equilibrium. We incorporate vanishing logarithmic barrier terms into the payoff functions, thereby constituting a logarithmic-barrier stochastic game. As a result of this barrier game, we attain a continuously differentiable homotopy system. To reduce the number of variables in the homotopy system, we eliminate the Bellman equations through a replacement of variables and derive an equivalent system. We use the equivalent system to establish the existence of a smooth path, which starts from an arbitrary total mixed strategy profile and ends at a PeSE. Extensive numerical experiments further affirm the effectiveness and efficiency of the method

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    Solution of nonlinear system of equations through homotopy path

    Full text link
    The paper aims to show the equivalency between nonlinear complementarity problem and the system of nonlinear equations. We propose a homotopy method with vector parameter λ\lambda in finding the solution of nonlinear complementarity problem through a system of nonlinear equations. We propose a smooth and bounded homotopy path to obtain solution of the system of nonlinear equations under some conditions. An oligopolistic market equilibrium problem is considered to show the effectiveness of the proposed homotopy continuation method. Keywords: Nonlinear complementarity problem, system of nonlinear equations, homotopy function with vector parameter, bounded smooth curve, oligopolistic market equilibrium.Comment: arXiv admin note: text overlap with arXiv:2209.0038

    Computer Science for Continuous Data:Survey, Vision, Theory, and Practice of a Computer Analysis System

    Get PDF
    Building on George Boole's work, Logic provides a rigorous foundation for the powerful tools in Computer Science that underlie nowadays ubiquitous processing of discrete data, such as strings or graphs. Concerning continuous data, already Alan Turing had applied "his" machines to formalize and study the processing of real numbers: an aspect of his oeuvre that we transform from theory to practice.The present essay surveys the state of the art and envisions the future of Computer Science for continuous data: natively, beyond brute-force discretization, based on and guided by and extending classical discrete Computer Science, as bridge between Pure and Applied Mathematics

    New Directions for Contact Integrators

    Get PDF
    Contact integrators are a family of geometric numerical schemes which guarantee the conservation of the contact structure. In this work we review the construction of both the variational and Hamiltonian versions of these methods. We illustrate some of the advantages of geometric integration in the dissipative setting by focusing on models inspired by recent studies in celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282

    Applications of monodromy in solving polynomial systems

    Get PDF
    Polynomial systems of equations that occur in applications frequently have a special structure. Part of that structure can be captured by an associated Galois/monodromy group. This makes numerical homotopy continuation methods that exploit this monodromy action an attractive choice for solving these systems; by contrast, other symbolic-numeric techniques do not generally see this structure. Naturally, there are trade-offs when monodromy is chosen over other methods. Nevertheless, there is a growing literature demonstrating that the trade can be worthwhile in practice. In this thesis, we consider a framework for efficient monodromy computation which rivals the state-of-the-art in homotopy continuation methods. We show how its implementation in the package MonodromySolver can be used to efficiently solve challenging systems of polynomial equations. Among many applications, we apply monodromy to computer vision---specifically, the study and classification of minimal problems used in RANSAC-based 3D reconstruction pipelines. As a byproduct of numerically computing their Galois/monodromy groups, we observe that several of these problems have a decomposition into algebraic subproblems. Although precise knowledge of such a decomposition is hard to obtain in general, we determine it in some novel cases.Ph.D

    Essays on strategic trading

    Get PDF
    This dissertation discusses various aspects of strategic trading using both analytical modeling and numerical methods. Strategic trading, in short, encompasses models of trading, most notably models of optimal execution and portfolio selection, in which one seeks to rigorously consider various---both explicit and implicit---costs stemming from the act of trading itself. The strategic trading approach, rooted in the market microstructure literature, contrasts with many classical finance models in which markets are assumed to be frictionless and traders can, for the most part, take prices as given. Introducing trading costs to dynamic models of financial markets tend to complicate matters. First, the objectives of the traders become more nuanced since now overtrading leads to poor outcomes due to increased trading costs. Second, when trades affect prices and there are multiple traders in the market, the traders start to behave in a more calculated fashion, taking into account both their own objectives and the perceived actions of others. Acknowledging this strategic behavior is especially important when the traders are asymmetrically informed. These new features allow the models discussed to better reflect aspects real-world trading, for instance, intraday trading patterns, and enable one to ask and answer new questions, for instance, related to the interactions between different traders. To efficiently analyze the models put forth, numerical methods must be utilized. This is, as is to be expected, the price one must pay from added complexity. However, it also opens an opportunity to have a closer look at the numerical approaches themselves. This opportunity is capitalized on and various new and novel computational procedures influenced by the growing field of numerical real algebraic geometry are introduced and employed. These procedures are utilizable beyond the scope of this dissertation and enable one to sharpen the analysis of dynamic equilibrium models.Tämä väitöskirja käsittelee strategista kaupankäyntiä hyödyntäen sekä analyyttisiä että numeerisia menetelmiä. Strategisen kaupankäynnin mallit, erityisesti optimaalinen kauppojen toteutus ja portfolion valinta, pyrkivät tarkasti huomioimaan kaupankäynnistä itsestään aiheutuvat eksplisiittiset ja implisiittiset kustannukset. Tämä erottaa strategisen kaupankäynnin mallit klassisista kitkattomista malleista. Kustannusten huomioiminen rahoitusmarkkinoiden dynaamisessa tarkastelussa monimutkaistaa malleja. Ensinnäkin kaupankävijöiden tavoitteet muuttuvat hienovaraisemmiksi, koska liian aktiivinen kaupankäynti johtaa korkeisiin kaupankäyntikuluihin ja heikkoon tuottoon. Toiseksi oletus siitä, että kaupankävijöiden valitsemat toimet vaikuttavat hintoihin, johtaa pelikäyttäytymiseen silloin, kun markkinoilla on useampia kaupankävijöitä. Pelikäyttäytymisen huomioiminen on ensiarvoisen tärkeää, mikäli informaatio kaupankävijöiden kesken on asymmetristä. Näiden piirteiden johdosta tässä väitöskirjassa käsitellyt mallit mahdollistavat abstrahoitujen rahoitusmarkkinoiden aiempaa täsmällisemmän tarkastelun esimerkiksi päivänsisäisen kaupankäynnin osalta. Tämän lisäksi mallien avulla voidaan löytää vastauksia uusiin kysymyksiin, kuten esimerkiksi siihen, millaisia ovat kaupankävijöiden keskinäiset vuorovaikutussuhteet dynaamisilla markkinoilla. Monimutkaisten mallien analysointiin hyödynnetään numeerisia menetelmiä. Tämä avaa mahdollisuuden näiden menetelmien yksityiskohtaisempaan tarkasteluun, ja tätä mahdollisuutta hyödynnetään pohtimalla laskennallisia ratkaisuja tuoreesta numeerista reaalista algebrallista geometriaa hyödyntävästä näkökulmasta. Väitöskirjassa esitellyt uudet laskennalliset ratkaisut ovat laajalti hyödynnettävissä, ja niiden avulla on mahdollista terävöittää dynaamisten tasapainomallien analysointia

    An Interior-Point Path-Following Method to Compute Stationary Equilibria in Stochastic Games

    Get PDF
    Subgame perfect equilibrium in stationary strategies (SSPE) is the most important solution concept used in applications of stochastic games, which makes it imperative to develop efficient numerical methods to compute an SSPE. For this purpose, this paper develops an interior-point path-following method (IPM), which remedies a number of issues with the existing method called stochastic linear tracing procedure (SLTP). The homotopy system of IPM is derived from the optimality conditions of an artificial barrier game, whose objective function is a combination of the original payoff function and a logarithmic term. Unlike SLTP, the starting stationary strategy profile can be arbitrarily chosen and IPM does not need switching between different systems of equations. The use of a perturbation term makes IPM applicable to all stochastic games, whereas SLTP only works for a generic stochastic game. A transformation of variables reduces the number of equations and variables of by roughly one half. Numerical results show that our method is more than three times as efficient as SLTP
    corecore