9,573 research outputs found

    f-Divergence constrained policy improvement

    Full text link
    To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we consider a more general class of f-divergences and derive the corresponding policy update rules. The generic solution is expressed through the derivative of the convex conjugate function to f and includes the KL solution as a special case. Within the class of f-divergences, we further focus on a one-parameter family of α\alpha-divergences to study effects of the choice of divergence on policy improvement. Previously known as well as new policy updates emerge for different values of α\alpha. We show that every type of policy update comes with a compatible policy evaluation resulting from the chosen f-divergence. Interestingly, the mean-squared Bellman error minimization is closely related to policy evaluation with the Pearson χ2\chi^2-divergence penalty, while the KL divergence results in the soft-max policy update and a log-sum-exp critic. We carry out asymptotic analysis of the solutions for different values of α\alpha and demonstrate the effects of using different divergence functions on a multi-armed bandit problem and on common standard reinforcement learning problems

    Chronological Inversion Method for the Dirac Matrix in Hybrid Monte Carlo

    Full text link
    In Hybrid Monte Carlo simulations for full QCD, the gauge fields evolve smoothly as a function of Molecular Dynamics time. Here we investigate improved methods of estimating the trial or starting solutions for the Dirac matrix inversion as superpositions of a chronological sequence of solutions in the recent past. By taking as the trial solution the vector which minimizes the residual in the linear space spanned by the past solutions, the number of conjugate gradient iterations per unit MD time is decreased by at least a factor of 2. Extensions of this basic approach to precondition the conjugate gradient iterations are also discussed.Comment: 35 pages, 18 EPS figures A new "preconditioning" method, derived from the Chronological Inversion, is described. Some new figures are appended. Some reorganization of the material has taken plac

    Learning Output Kernels for Multi-Task Problems

    Full text link
    Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernel-based multi-task learning technique that automatically reveals structural inter-task relationships. Building over the framework of output kernel learning (OKL), we introduce a method that jointly learns multiple functions and a low-rank multi-task kernel by solving a non-convex regularization problem. Optimization is carried out via a block coordinate descent strategy, where each subproblem is solved using suitable conjugate gradient (CG) type iterative methods for linear operator equations. The effectiveness of the proposed approach is demonstrated on pharmacological and collaborative filtering data

    Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices

    Full text link
    We propose a Preconditioned Locally Harmonic Residual (PLHR) method for computing several interior eigenpairs of a generalized Hermitian eigenvalue problem, without traditional spectral transformations, matrix factorizations, or inversions. PLHR is based on a short-term recurrence, easily extended to a block form, computing eigenpairs simultaneously. PLHR can take advantage of Hermitian positive definite preconditioning, e.g., based on an approximate inverse of an absolute value of a shifted matrix, introduced in [SISC, 35 (2013), pp. A696-A718]. Our numerical experiments demonstrate that PLHR is efficient and robust for certain classes of large-scale interior eigenvalue problems, involving Laplacian and Hamiltonian operators, especially if memory requirements are tight

    A Novel Antenna Selection Scheme for Spatially Correlated Massive MIMO Uplinks with Imperfect Channel Estimation

    Full text link
    We propose a new antenna selection scheme for a massive MIMO system with a single user terminal and a base station with a large number of antennas. We consider a practical scenario where there is a realistic correlation among the antennas and imperfect channel estimation at the receiver side. The proposed scheme exploits the sparsity of the channel matrix for the effective selection of a limited number of antennas. To this end, we compute a sparse channel matrix by minimising the mean squared error. This optimisation problem is then solved by the well-known orthogonal matching pursuit algorithm. Widely used models for spatial correlation among the antennas and channel estimation errors are considered in this work. Simulation results demonstrate that when the impacts of spatial correlation and imperfect channel estimation introduced, the proposed scheme in the paper can significantly reduce complexity of the receiver, without degrading the system performance compared to the maximum ratio combining.Comment: in Proc. IEEE 81st Vehicular Technology Conference (VTC), May 2015, 6 pages, 5 figure

    The principle of indirect elimination

    Get PDF
    The principle of indirect elimination states that an algorithm for solving discretized differential equations can be used to identify its own bad-converging modes. When the number of bad-converging modes of the algorithm is not too large, the modes thus identified can be used to strongly improve the convergence. The method presented here is applicable to any standard algorithm like Conjugate Gradient, relaxation or multigrid. An example from theoretical physics, the Dirac equation in the presence of almost-zero modes arising from instantons, is studied. Using the principle, bad-converging modes are removed efficiently. Applied locally, the principle is one of the main ingredients of the Iteratively Smooting Unigrid algorithm.Comment: 16 pages, LaTeX-style espart (elsevier preprint style). Three .eps-figures are now added with the figure command
    • …
    corecore