Search CORE

9,573 research outputs found

f-Divergence constrained policy improvement

Author: Belousov Boris
Peters Jan
Publication venue
Publication date: 04/04/2018
Field of study

To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we consider a more general class of f-divergences and derive the corresponding policy update rules. The generic solution is expressed through the derivative of the convex conjugate function to f and includes the KL solution as a special case. Within the class of f-divergences, we further focus on a one-parameter family of

\alpha

-divergences to study effects of the choice of divergence on policy improvement. Previously known as well as new policy updates emerge for different values of

\alpha

. We show that every type of policy update comes with a compatible policy evaluation resulting from the chosen f-divergence. Interestingly, the mean-squared Bellman error minimization is closely related to policy evaluation with the Pearson

\chi^2

-divergence penalty, while the KL divergence results in the soft-max policy update and a log-sum-exp critic. We carry out asymptotic analysis of the solutions for different values of

\alpha

and demonstrate the effects of using different divergence functions on a multi-armed bandit problem and on common standard reinforcement learning problems

arXiv.org e-Print Archive

TUbiblio

Chronological Inversion Method for the Dirac Matrix in Hybrid Monte Carlo

Author: A.R. Levi
Batroumi
Brower
Callaway
DeGrand
Duane
Duane
Duane
Duane
Golub
Gottlieb
Gottlieb
Gupta
Gupta
Gupta
Gupta
Gupta
Gupta
Hill
Jazwinski
K.N. Orginos
Lüscher
Mawhinney
Parisi
Polonyi
R.C. Brower
Santayana
T. Ivanenko
Publication venue: 'Elsevier BV'
Publication date: 05/04/1996
Field of study

In Hybrid Monte Carlo simulations for full QCD, the gauge fields evolve smoothly as a function of Molecular Dynamics time. Here we investigate improved methods of estimating the trial or starting solutions for the Dirac matrix inversion as superpositions of a chronological sequence of solutions in the recent past. By taking as the trial solution the vector which minimizes the residual in the linear space spanned by the past solutions, the number of conjugate gradient iterations per unit MD time is decreased by at least a factor of 2. Extensions of this basic approach to precondition the conjugate gradient iterations are also discussed.Comment: 35 pages, 18 EPS figures A new "preconditioning" method, derived from the Chronological Inversion, is described. Some new figures are appended. Some reorganization of the material has taken plac

arXiv.org e-Print Archive

Crossref

Learning Output Kernels for Multi-Task Problems

Author: Dinuzzo Francesco
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernel-based multi-task learning technique that automatically reveals structural inter-task relationships. Building over the framework of output kernel learning (OKL), we introduce a method that jointly learns multiple functions and a low-rank multi-task kernel by solving a non-convex regularization problem. Optimization is carried out via a block coordinate descent strategy, where each subproblem is solved using suitable conjugate gradient (CG) type iterative methods for linear operator equations. The effectiveness of the proposed approach is demonstrated on pharmacological and collaborative filtering data

arXiv.org e-Print Archive

CiteSeerX

Publikationsserver der Universität Tübingen

MPG.PuRe

Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices

Author: Knyazev Andrew
Vecharynski Eugene
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/11/2014
Field of study

We propose a Preconditioned Locally Harmonic Residual (PLHR) method for computing several interior eigenpairs of a generalized Hermitian eigenvalue problem, without traditional spectral transformations, matrix factorizations, or inversions. PLHR is based on a short-term recurrence, easily extended to a block form, computing eigenpairs simultaneously. PLHR can take advantage of Hermitian positive definite preconditioning, e.g., based on an approximate inverse of an absolute value of a shifted matrix, introduced in [SISC, 35 (2013), pp. A696-A718]. Our numerical experiments demonstrate that PLHR is efficient and robust for certain classes of large-scale interior eigenvalue problems, involving Laplacian and Hamiltonian operators, especially if memory requirements are tight

arXiv.org e-Print Archive

CiteSeerX

A Novel Antenna Selection Scheme for Spatially Correlated Massive MIMO Uplinks with Imperfect Channel Estimation

Author: Chen Yan
Dianati Mehrdad
Mi De
Muhaidat Sami
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2015
Field of study

We propose a new antenna selection scheme for a massive MIMO system with a single user terminal and a base station with a large number of antennas. We consider a practical scenario where there is a realistic correlation among the antennas and imperfect channel estimation at the receiver side. The proposed scheme exploits the sparsity of the channel matrix for the effective selection of a limited number of antennas. To this end, we compute a sparse channel matrix by minimising the mean squared error. This optimisation problem is then solved by the well-known orthogonal matching pursuit algorithm. Widely used models for spatial correlation among the antennas and channel estimation errors are considered in this work. Simulation results demonstrate that when the impacts of spatial correlation and imperfect channel estimation introduced, the proposed scheme in the paper can significantly reduce complexity of the receiver, without degrading the system performance compared to the maximum ratio combining.Comment: in Proc. IEEE 81st Vehicular Technology Conference (VTC), May 2015, 6 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Surrey Research Insight

The principle of indirect elimination

Author: Bäker Martin
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1995
Field of study

The principle of indirect elimination states that an algorithm for solving discretized differential equations can be used to identify its own bad-converging modes. When the number of bad-converging modes of the algorithm is not too large, the modes thus identified can be used to strongly improve the convergence. The method presented here is applicable to any standard algorithm like Conjugate Gradient, relaxation or multigrid. An example from theoretical physics, the Dirac equation in the presence of almost-zero modes arising from instantons, is studied. Using the principle, bad-converging modes are removed efficiently. Applied locally, the principle is one of the main ingredients of the Iteratively Smooting Unigrid algorithm.Comment: 16 pages, LaTeX-style espart (elsevier preprint style). Three .eps-figures are now added with the figure command

arXiv.org e-Print Archive

CiteSeerX

DESY Publication Database

Crossref

DESY

CERN Document Server