444,510 research outputs found

    Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

    Full text link
    Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function approximators (e.g. neural networks) to alleviate the sample complexity hurdle for better empirical performances. Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking. Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA). This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures. Most importantly, we show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research

    Reduction of complexity and variational data assimilation

    Get PDF
    Conferencia plenaria por invitaciónReduced basis methods belong to a class of approaches of \emph{model reduction} for the approximation of the solution of mathematical models involved in many fields of research or decision making and in data assimilation. These approaches allow to tackle, in --- close to --- real time, problems requiring, a priori, a large number of computations by formalizing two steps : one known as "offline stage” that is a preparation step and is quite costly and an ``online stage'' that is used on demand and is very cheap. The strategy uses the fact that the solutions we are interested in belong to a family, a manifold, parametrized by input coefficients, shapes or stochastic data, that has a small complexity. The complexity is measured in terms of a quantity like the ``Kolmogorov width'' that, when it is small, formalizes the fact that some small dimensional vectorial spaces allow to provide a good approximation of the elements on the manifold. We shall make a review of the fundamental background and state some results proving that such a dimension is small for a large class of problems of interest, then use this fact to propose approximation strategies in various cases depending on the knowledge we have of the solution we want to approximate : either explicit through values at points, or through outputs evaluated from the solution, or implicit through the Partial Differential Equation it satisfies. We shall also present a strategy available when a mixed of the above informations is available allowing to propose new efficient approaches in data assimilation and data mining. The theory on the numerical analysis (a priori and a posteriori) of these approaches will also be presented together with results on numerical simulations. Work done in close collaboration with A. T. Patera (MIT, Cambridge) and has benefited from the collaboration with A. Buffa (IAN, Pavia), R. Chakir (IFSTAR, Paris), Y. Chen (U. of Massachusetts, Dartmouth), Y. Hesthaven (EPFL, Lausanne), E. Lovgren (Simula, Oslo), O. Mula (UPMC, Paris), NC Nguyen (MIT, Cambridge), J. Pen (MIT, Cambridge), C. Prud'homme (U. Strasbourg), J. Rodriguez (U. Santiago de Compostella), E. M. Ronquist (U. Trondheim), B. Stamm (UPMC, Paris), G. Turinici (Dauphine, Paris), M. Yano (MIT, Cambridge).Universidad de Málaga. Campus de Excelencia Internacional Andalucia Tech. Conferencias del plan propio de investigación UM

    Rigorous Born Approximation and beyond for the Spin-Boson Model

    Full text link
    Within the lowest-order Born approximation, we present an exact calculation of the time dynamics of the spin-boson model in the ohmic regime. We observe non-Markovian effects at zero temperature that scale with the system-bath coupling strength and cause qualitative changes in the evolution of coherence at intermediate times of order of the oscillation period. These changes could significantly affect the performance of these systems as qubits. In the biased case, we find a prompt loss of coherence at these intermediate times, whose decay rate is set by α\sqrt{\alpha}, where α\alpha is the coupling strength to the environment. We also explore the calculation of the next order Born approximation: we show that, at the expense of very large computational complexity, interesting physical quantities can be rigorously computed at fourth order using computer algebra, presented completely in an accompanying Mathematica file. We compute the O(α)O(\alpha) corrections to the long time behavior of the system density matrix; the result is identical to the reduced density matrix of the equilibrium state to the same order in α\alpha. All these calculations indicate precision experimental tests that could confirm or refute the validity of the spin-boson model in a variety of systems.Comment: Greatly extended version of short paper cond-mat/0304118. Accompanying Mathematica notebook fop5.nb, available in Source, is an essential part of this work; it gives full details of the fourth-order Born calculation summarized in the text. fop5.nb is prepared in arXiv style (available from Wolfram Research

    FREDE: Linear-Space Anytime Graph Embeddings

    Full text link
    Low-dimensional representations, or embeddings, of a graph's nodes facilitate data mining tasks. Known embedding methods explicitly or implicitly rely on a similarity measure among nodes. As the similarity matrix is quadratic, a tradeoff between space complexity and embedding quality arises; past research initially opted for heuristics and linear-transform factorizations, which allow for linear space but compromise on quality; recent research has proposed a quadratic-space solution as a viable option too. In this paper we observe that embedding methods effectively aim to preserve the covariance among the rows of a similarity matrix, and raise the question: is there a method that combines (i) linear space complexity, (ii) a nonlinear transform as its basis, and (iii) nontrivial quality guarantees? We answer this question in the affirmative, with FREDE(FREquent Directions Embedding), a sketching-based method that iteratively improves on quality while processing rows of the similarity matrix individually; thereby, it provides, at any iteration, column-covariance approximation guarantees that are, in due course, almost indistinguishable from those of the optimal row-covariance approximation by SVD. Our experimental evaluation on variably sized networks shows that FREDE performs as well as SVD and competitively against current state-of-the-art methods in diverse data mining tasks, even when it derives an embedding based on only 10% of node similarities

    Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching

    Full text link
    Personalization in marketing aims at improving the shopping experience of customers by tailoring services to individuals. In order to achieve this, businesses must be able to make personalized predictions regarding the next purchase. That is, one must forecast the exact list of items that will comprise the next purchase, i.e., the so-called market basket. Despite its relevance to firm operations, this problem has received surprisingly little attention in prior research, largely due to its inherent complexity. In fact, state-of-the-art approaches are limited to intuitive decision rules for pattern extraction. However, the simplicity of the pre-coded rules impedes performance, since decision rules operate in an autoregressive fashion: the rules can only make inferences from past purchases of a single customer without taking into account the knowledge transfer that takes place between customers. In contrast, our research overcomes the limitations of pre-set rules by contributing a novel predictor of market baskets from sequential purchase histories: our predictions are based on similarity matching in order to identify similar purchase habits among the complete shopping histories of all customers. Our contributions are as follows: (1) We propose similarity matching based on subsequential dynamic time warping (SDTW) as a novel predictor of market baskets. Thereby, we can effectively identify cross-customer patterns. (2) We leverage the Wasserstein distance for measuring the similarity among embedded purchase histories. (3) We develop a fast approximation algorithm for computing a lower bound of the Wasserstein distance in our setting. An extensive series of computational experiments demonstrates the effectiveness of our approach. The accuracy of identifying the exact market baskets based on state-of-the-art decision rules from the literature is outperformed by a factor of 4.0.Comment: Accepted for oral presentation at 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019

    A Polynomial-Time Algorithm for 1/3-Approximate Nash Equilibria in Bimatrix Games

    Get PDF
    Since the celebrated PPAD-completeness result for Nash equilibria in bimatrix games, a long line of research has focused on polynomial-time algorithms that compute ?-approximate Nash equilibria. Finding the best possible approximation guarantee that we can have in polynomial time has been a fundamental and non-trivial pursuit on settling the complexity of approximate equilibria. Despite a significant amount of effort, the algorithm of Tsaknakis and Spirakis [Tsaknakis and Spirakis, 2008], with an approximation guarantee of (0.3393+?), remains the state of the art over the last 15 years. In this paper, we propose a new refinement of the Tsaknakis-Spirakis algorithm, resulting in a polynomial-time algorithm that computes a (1/3+?)-Nash equilibrium, for any constant ? > 0. The main idea of our approach is to go beyond the use of convex combinations of primal and dual strategies, as defined in the optimization framework of [Tsaknakis and Spirakis, 2008], and enrich the pool of strategies from which we build the strategy profiles that we output in certain bottleneck cases of the algorithm
    corecore