5,036 research outputs found

    Optimization frameworks and sensitivity analysis of Stackelberg mean-field games

    Full text link
    This paper proposes and studies a class of discrete-time finite-time-horizon Stackelberg mean-field games, with one leader and an infinite number of identical and indistinguishable followers. In this game, the objective of the leader is to maximize her reward considering the worst-case cost over all possible ϵ\epsilon-Nash equilibria among followers. A new analytical paradigm is established by showing the equivalence between this Stackelberg mean-field game and a minimax optimization problem. This optimization framework facilitates studying both analytically and numerically the set of Nash equilibria for the game; and leads to the sensitivity and the robustness analysis of the game value. In particular, when there is model uncertainty, the game value for the leader suffers non-vanishing sub-optimality as the perturbed model converges to the true model. In order to obtain a near-optimal solution, the leader needs to be more pessimistic with anticipation of model errors and adopts a relaxed version of the original Stackelberg game

    Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

    Full text link
    In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes NN. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of O~(N)\tilde{O}(\sqrt{N}). The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.Comment: 25 page

    0-Ï€\pi qubit in one Josephson junction

    Full text link
    Quantum states are usually fragile which makes quantum computation being not as stable as classical computation. Quantum correction codes can protect quantum states but need a large number of physical qubits to code a single logic qubit. Alternatively, the protection at the hardware level has been recently developed to maintain the coherence of the quantum information by using symmetries. However, it generally has to pay the expense of increasing the complexity of the quantum devices. In this work, we show that the protection at the hardware level can be approached without increasing the complexity of the devices. The interplay between the spin-orbit coupling and the Zeeman splitting in the semiconductor allows us to tune the Josephson coupling in terms of the spin degree of freedom of Cooper pairs, the hallmark of the superconducting spintronics. This leads to the implementation of the parity-protected 0-Ï€\pi superconducting qubit with only one highly transparent superconductor-semiconductor Josephson junction, which makes our proposal immune from the various fabrication imperfections.Comment: 5 pages, 4 figure

    A General Framework for Learning Mean-Field Games

    Full text link
    This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the NN-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1901.0958

    Violations of the weak cosmic censorship conjecture in the higher dimensional f(R)f(R) black holes with pressure

    Full text link
    We adopt the energy momentum relation of charged particles to study the thermodynamics laws and weak cosmic censorship conjecture of DD-dimensional f(R)f(R) AdS black holes in different phase spaces by considering charged particle absorption. In the normal phase space, it turns out that the laws of thermodynamic and the weak cosmic censorship conjecture are valid. In the extended phase space, though the first law of thermodynamics is valid, the second law of thermodynamics is invalid. More interestingly, the weak cosmic censorship conjecture is shown to be violated only in higher-dimensional near-extremal f(R)f(R) AdS black holes. In addition, the magnitudes of the violations for both the second law and weak cosmic censorship conjecture are dependent on the charge QQ, constant scalar curvature f′(R0)f'(R_0), AdS radius ll, dimension parameters pp, and their variations.Comment: Accepted by Eur. Phys. J.
    • …
    corecore