1,278 research outputs found

    A Regularized Opponent Model with Maximum Entropy Objective

    Get PDF
    In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.Comment: Accepted to International Joint Conference on Artificial Intelligence (IJCA2019

    B\to X_s\gamma, X_s l^+ l^- decays and constraints on the mass insertion parameters in the MSSM

    Full text link
    In this paper, we study the upper bounds on the mass insertion parameters (δABq)ij(\delta^{q}_{AB})_{ij} in the minimal supersymmetric standard model (MSSM). We found that the information from the measured branching ratio of BXsl+lB \to X_s l^+ l^- decay can help us to improve the upper bounds on the mass insertions parameters \left (\delta^{u,d}_{AB})_{3j,i3}. Some regions allowed by the data of Br(BXsγ)Br(B \to X_s \gamma) are excluded by the requirement of a SM-like C7γ(mb)C_{7\gamma}(m_b) imposed by the data of Br(BXsl+l)Br(B \to X_s l^+ l^-).Comment: 16 pages, 5 eps figure files, typos remove

    5-Fluoro-1H-indole-3-carb­oxy­lic acid

    Get PDF
    In the title compound, C9H6FNO2, the carboxyl group is twisted slightly away from the indole-ring plane [dihedral angle = 7.39 (10)°]. In the crystal, carboxyl inversion dimers linked by pairs of O—H⋯O hydrogen bonds generate R 2 2(8) loops and N—H⋯O hydrogen bonds connect the dimers into (10) sheets
    corecore