95,550 research outputs found
Iteration entropy
We apply a common measure of randomness, the entropy, in the context of
iterated functions on a finite set with n elements. For a permutation, it turns
out that this entropy is asymptotically (for a growing number of iterations)
close to \log_2(n) minus the entropy of the vector of its cycle lengths. For
general functions, a similar approximation holds.Comment: In Version 2, the definition of iteration entropy is modified by
subtracting log_2(n) from it. This simplifies some expression
Relative Entropy Regularized Policy Iteration
We present an off-policy actor-critic algorithm for Reinforcement Learning
(RL) that combines ideas from gradient-free optimization via stochastic search
with learned action-value function. The result is a simple procedure consisting
of three steps: i) policy evaluation by estimating a parametric action-value
function; ii) policy improvement via the estimation of a local non-parametric
policy; and iii) generalization by fitting a parametric policy. Each step can
be implemented in different ways, giving rise to several algorithm variants.
Our algorithm draws on connections to existing literature on black-box
optimization and 'RL as an inference' and it can be seen either as an extension
of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et
al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation
Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997]
to a policy iteration scheme. Our comparison on 31 continuous control tasks
from parkour suite [Heess et al., 2017], DeepMind control suite [Tassa et al.,
2018] and OpenAI Gym [Brockman et al., 2016] with diverse properties, limited
amount of compute and a single set of hyperparameters, demonstrate the
effectiveness of our method and the state of art results. Videos, summarizing
results, can be found at goo.gl/HtvJKR
Primal-Dual Entropy Based Interior-Point Algorithms for Linear Optimization
We propose a family of search directions based on primal-dual entropy in the
context of interior-point methods for linear optimization. We show that by
using entropy based search directions in the predictor step of a
predictor-corrector algorithm together with a homogeneous self-dual embedding,
we can achieve the current best iteration complexity bound for linear
optimization. Then, we focus on some wide neighborhood algorithms and show that
in our family of entropy based search directions, we can find the best search
direction and step size combination by performing a plane search at each
iteration. For this purpose, we propose a heuristic plane search algorithm as
well as an exact one. Finally, we perform computational experiments to study
the performance of entropy-based search directions in wide neighborhoods of the
central path, with and without utilizing the plane search algorithms
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
We propose a new policy iteration theory as an important extension of soft
policy iteration and Soft Actor-Critic (SAC), one of the most efficient model
free algorithms for deep reinforcement learning. Supported by the new theory,
arbitrary entropy measures that generalize Shannon entropy, such as Tsallis
entropy and Renyi entropy, can be utilized to properly randomize action
selection while fulfilling the goal of maximizing expected long-term rewards.
Our theory gives birth to two new algorithms, i.e., Tsallis entropy
Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis
shows that these algorithms can be more effective than SAC. Moreover, they pave
the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this
paper that features the use of a bootstrap mechanism for deep environment
exploration as well as a new value-function based mechanism for high-level
action selection. Empirically we show that TAC, RAC and EAC can achieve
state-of-the-art performance on a range of benchmark control tasks,
outperforming SAC and several cutting-edge learning algorithms in terms of both
sample efficiency and effectiveness
Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for -means Clustering
In this paper, we propose an implicit gradient descent algorithm for the
classic -means problem. The implicit gradient step or backward Euler is
solved via stochastic fixed-point iteration, in which we randomly sample a
mini-batch gradient in every iteration. It is the average of the fixed-point
trajectory that is carried over to the next gradient step. We draw connections
between the proposed stochastic backward Euler and the recent entropy
stochastic gradient descent (Entropy-SGD) for improving the training of deep
neural networks. Numerical experiments on various synthetic and real datasets
show that the proposed algorithm provides better clustering results compared to
-means algorithms in the sense that it decreased the objective function (the
cluster) and is much more robust to initialization
Another Monte Carlo Renormalization Group Algorithm
A Monte Carlo Renormalization Group algorithm is used on the Ising model to
derive critical exponents and the critical temperature. The algorithm is based
on a minimum relative entropy iteration developed previously to derive
potentials from equilibrium configurations. This previous algorithm is modified
to derive useful information in an RG iteration. The method is applied in
several dimensions with limited success. Required accuracy has not been
achieved, but the method is interesting.Comment: 9 pages, 3 ps figure
Minimal entropy approximation for cellular automata
We present a method for construction of approximate orbits of measures under
the action of cellular automata which is complementary to the local structure
theory. The local structure theory is based on the idea of Bayesian extension,
that is, construction of a probability measure consistent with given block
probabilities and maximizing entropy. If instead of maximizing entropy one
minimizes it, one can develop another method for construction of approximate
orbits, at the heart of which is the iteration of finitely-dimensional maps,
called minimal entropy maps. We present numerical evidence that minimal entropy
approximation sometimes spectacularly outperforms the local structure theory in
characterizing properties of cellular automata. Density response curve for
elementary CA rule 26 is used to illustrate this claim.Comment: 19 pages, 3 figure
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
In this paper, a sparse Markov decision process (MDP) with novel causal
sparse Tsallis entropy regularization is proposed.The proposed policy
regularization induces a sparse and multi-modal optimal policy distribution of
a sparse MDP. The full mathematical analysis of the proposed sparse MDP is
provided.We first analyze the optimality condition of a sparse MDP. Then, we
propose a sparse value iteration method which solves a sparse MDP and then
prove the convergence and optimality of sparse value iteration using the Banach
fixed point theorem. The proposed sparse MDP is compared to soft MDPs which
utilize causal entropy regularization. We show that the performance error of a
sparse MDP has a constant bound, while the error of a soft MDP increases
logarithmically with respect to the number of actions, where this performance
error is caused by the introduced regularization term. In experiments, we apply
sparse MDPs to reinforcement learning problems. The proposed method outperforms
existing methods in terms of the convergence speed and performance.Comment: 15 pages, 9 figure
Experiment Study of Entropy Convergence of Ant Colony Optimization
Ant colony optimization (ACO) has been applied to the field of combinatorial
optimization widely. But the study of convergence theory of ACO is rare under
general condition. In this paper, the authors try to find the evidence to prove
that entropy is related to the convergence of ACO, especially to the estimation
of the minimum iteration number of convergence. Entropy is a new view point
possibly to studying the ACO convergence under general condition. Key Words:
Ant Colony Optimization, Convergence of ACO, EntropyComment: 21 papges, 8 figure
Stopping Criteria for Iterative Decoding based on Mutual Information
In this paper we investigate stopping criteria for iterative decoding from a
mutual information perspective. We introduce new iteration stopping rules based
on an approximation of the mutual information between encoded bits and decoder
soft output. The first type stopping rule sets a threshold value directly on
the approximated mutual information for terminating decoding. The threshold can
be adjusted according to the expected bit error rate. The second one adopts a
strategy similar to that of the well known cross-entropy stopping rule by
applying a fixed threshold on the ratio of a simple metric obtained after each
iteration over that of the first iteration. Compared with several well known
stopping rules, the new methods achieve higher efficiency.Comment: The Asilomar Conference on Signals, Systems, and Computers, Monterey,
CA, Nov., 201
- …