9,573 research outputs found
f-Divergence constrained policy improvement
To ensure stability of learning, state-of-the-art generalized policy
iteration algorithms augment the policy improvement step with a trust region
constraint bounding the information loss. The size of the trust region is
commonly determined by the Kullback-Leibler (KL) divergence, which not only
captures the notion of distance well but also yields closed-form solutions. In
this paper, we consider a more general class of f-divergences and derive the
corresponding policy update rules. The generic solution is expressed through
the derivative of the convex conjugate function to f and includes the KL
solution as a special case. Within the class of f-divergences, we further focus
on a one-parameter family of -divergences to study effects of the
choice of divergence on policy improvement. Previously known as well as new
policy updates emerge for different values of . We show that every type
of policy update comes with a compatible policy evaluation resulting from the
chosen f-divergence. Interestingly, the mean-squared Bellman error minimization
is closely related to policy evaluation with the Pearson -divergence
penalty, while the KL divergence results in the soft-max policy update and a
log-sum-exp critic. We carry out asymptotic analysis of the solutions for
different values of and demonstrate the effects of using different
divergence functions on a multi-armed bandit problem and on common standard
reinforcement learning problems
Chronological Inversion Method for the Dirac Matrix in Hybrid Monte Carlo
In Hybrid Monte Carlo simulations for full QCD, the gauge fields evolve
smoothly as a function of Molecular Dynamics time. Here we investigate improved
methods of estimating the trial or starting solutions for the Dirac matrix
inversion as superpositions of a chronological sequence of solutions in the
recent past. By taking as the trial solution the vector which minimizes the
residual in the linear space spanned by the past solutions, the number of
conjugate gradient iterations per unit MD time is decreased by at least a
factor of 2. Extensions of this basic approach to precondition the conjugate
gradient iterations are also discussed.Comment: 35 pages, 18 EPS figures A new "preconditioning" method, derived from
the Chronological Inversion, is described. Some new figures are appended.
Some reorganization of the material has taken plac
Learning Output Kernels for Multi-Task Problems
Simultaneously solving multiple related learning tasks is beneficial under a
variety of circumstances, but the prior knowledge necessary to correctly model
task relationships is rarely available in practice. In this paper, we develop a
novel kernel-based multi-task learning technique that automatically reveals
structural inter-task relationships. Building over the framework of output
kernel learning (OKL), we introduce a method that jointly learns multiple
functions and a low-rank multi-task kernel by solving a non-convex
regularization problem. Optimization is carried out via a block coordinate
descent strategy, where each subproblem is solved using suitable conjugate
gradient (CG) type iterative methods for linear operator equations. The
effectiveness of the proposed approach is demonstrated on pharmacological and
collaborative filtering data
Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices
We propose a Preconditioned Locally Harmonic Residual (PLHR) method for
computing several interior eigenpairs of a generalized Hermitian eigenvalue
problem, without traditional spectral transformations, matrix factorizations,
or inversions. PLHR is based on a short-term recurrence, easily extended to a
block form, computing eigenpairs simultaneously. PLHR can take advantage of
Hermitian positive definite preconditioning, e.g., based on an approximate
inverse of an absolute value of a shifted matrix, introduced in [SISC, 35
(2013), pp. A696-A718]. Our numerical experiments demonstrate that PLHR is
efficient and robust for certain classes of large-scale interior eigenvalue
problems, involving Laplacian and Hamiltonian operators, especially if memory
requirements are tight
A Novel Antenna Selection Scheme for Spatially Correlated Massive MIMO Uplinks with Imperfect Channel Estimation
We propose a new antenna selection scheme for a massive MIMO system with a
single user terminal and a base station with a large number of antennas. We
consider a practical scenario where there is a realistic correlation among the
antennas and imperfect channel estimation at the receiver side. The proposed
scheme exploits the sparsity of the channel matrix for the effective selection
of a limited number of antennas. To this end, we compute a sparse channel
matrix by minimising the mean squared error. This optimisation problem is then
solved by the well-known orthogonal matching pursuit algorithm. Widely used
models for spatial correlation among the antennas and channel estimation errors
are considered in this work. Simulation results demonstrate that when the
impacts of spatial correlation and imperfect channel estimation introduced, the
proposed scheme in the paper can significantly reduce complexity of the
receiver, without degrading the system performance compared to the maximum
ratio combining.Comment: in Proc. IEEE 81st Vehicular Technology Conference (VTC), May 2015, 6
pages, 5 figure
The principle of indirect elimination
The principle of indirect elimination states that an algorithm for solving
discretized differential equations can be used to identify its own
bad-converging modes. When the number of bad-converging modes of the algorithm
is not too large, the modes thus identified can be used to strongly improve the
convergence. The method presented here is applicable to any standard algorithm
like Conjugate Gradient, relaxation or multigrid. An example from theoretical
physics, the Dirac equation in the presence of almost-zero modes arising from
instantons, is studied. Using the principle, bad-converging modes are removed
efficiently. Applied locally, the principle is one of the main ingredients of
the Iteratively Smooting Unigrid algorithm.Comment: 16 pages, LaTeX-style espart (elsevier preprint style). Three
.eps-figures are now added with the figure command
- …