Search CORE

95,550 research outputs found

Iteration entropy

Author: Gathen Joachim von zur
Publication venue
Publication date: 19/12/2017
Field of study

We apply a common measure of randomness, the entropy, in the context of iterated functions on a finite set with n elements. For a permutation, it turns out that this entropy is asymptotically (for a growing number of iterations) close to \log_2(n) minus the entropy of the vector of its cycle lengths. For general functions, a similar approximation holds.Comment: In Version 2, the definition of iteration entropy is modified by subtracting log_2(n) from it. This simplifies some expression

arXiv.org e-Print Archive

Relative Entropy Regularized Policy Iteration

Author: Abdolmaleki Abbas
Belov Dan
Bohez Steven
Degrave Jonas
Heess Nicolas
Riedmiller Martin
Springenberg Jost Tobias
Tassa Yuval
Publication venue
Publication date: 05/12/2018
Field of study

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and iii) generalization by fitting a parametric policy. Each step can be implemented in different ways, giving rise to several algorithm variants. Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997] to a policy iteration scheme. Our comparison on 31 continuous control tasks from parkour suite [Heess et al., 2017], DeepMind control suite [Tassa et al., 2018] and OpenAI Gym [Brockman et al., 2016] with diverse properties, limited amount of compute and a single set of hyperparameters, demonstrate the effectiveness of our method and the state of art results. Videos, summarizing results, can be found at goo.gl/HtvJKR

arXiv.org e-Print Archive

Primal-Dual Entropy Based Interior-Point Algorithms for Linear Optimization

Author: Karimi Mehdi
Lou Shen
Tunçel Levent
Publication venue
Publication date: 29/10/2014
Field of study

We propose a family of search directions based on primal-dual entropy in the context of interior-point methods for linear optimization. We show that by using entropy based search directions in the predictor step of a predictor-corrector algorithm together with a homogeneous self-dual embedding, we can achieve the current best iteration complexity bound for linear optimization. Then, we focus on some wide neighborhood algorithms and show that in our family of entropy based search directions, we can find the best search direction and step size combination by performing a plane search at each iteration. For this purpose, we propose a heuristic plane search algorithm as well as an exact one. Finally, we perform computational experiments to study the performance of entropy-based search directions in wide neighborhoods of the central path, with and without utilizing the plane search algorithms

arXiv.org e-Print Archive

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

Author: Chen Gang
Peng Yiming
Publication venue
Publication date: 13/02/2019
Field of study

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness

arXiv.org e-Print Archive

Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$ -means Clustering

Author: Oberman Adam
Osher Stanley
Pham Minh
Yin Penghang
Publication venue
Publication date: 21/05/2018
Field of study

In this paper, we propose an implicit gradient descent algorithm for the classic

k

-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent (Entropy-SGD) for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to

k

-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization

arXiv.org e-Print Archive

Another Monte Carlo Renormalization Group Algorithm

Author: Donohue John P.
Publication venue
Publication date: 01/01/2004
Field of study

A Monte Carlo Renormalization Group algorithm is used on the Ising model to derive critical exponents and the critical temperature. The algorithm is based on a minimum relative entropy iteration developed previously to derive potentials from equilibrium configurations. This previous algorithm is modified to derive useful information in an RG iteration. The method is applied in several dimensions with limited success. Required accuracy has not been achieved, but the method is interesting.Comment: 9 pages, 3 ps figure

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

Minimal entropy approximation for cellular automata

Author: Fukś Henryk
Publication venue: 'IOP Publishing'
Publication date: 06/05/2013
Field of study

We present a method for construction of approximate orbits of measures under the action of cellular automata which is complementary to the local structure theory. The local structure theory is based on the idea of Bayesian extension, that is, construction of a probability measure consistent with given block probabilities and maximizing entropy. If instead of maximizing entropy one minimizes it, one can develop another method for construction of approximate orbits, at the heart of which is the iteration of finitely-dimensional maps, called minimal entropy maps. We present numerical evidence that minimal entropy approximation sometimes spectacularly outperforms the local structure theory in characterizing properties of cellular automata. Density response curve for elementary CA rule 26 is used to illustrate this claim.Comment: 19 pages, 3 figure

arXiv.org e-Print Archive

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

Author: Choi Sungjoon
Lee Kyungjae
Oh Songhwai
Publication venue
Publication date: 13/10/2017
Field of study

In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Experiment Study of Entropy Convergence of Ant Colony Optimization

Author: Hu Ben-Qiong
Pang Chao-Yang
Wang Chong-Bao
Publication venue
Publication date: 24/10/2009
Field of study

Ant colony optimization (ACO) has been applied to the field of combinatorial optimization widely. But the study of convergence theory of ACO is rare under general condition. In this paper, the authors try to find the evidence to prove that entropy is related to the convergence of ACO, especially to the estimation of the minimum iteration number of convergence. Entropy is a new view point possibly to studying the ACO convergence under general condition. Key Words: Ant Colony Optimization, Convergence of ACO, EntropyComment: 21 papges, 8 figure

arXiv.org e-Print Archive

Stopping Criteria for Iterative Decoding based on Mutual Information

Author: Sheng Jia
Vojcic Branimir R.
Wu Jinhong
Publication venue
Publication date: 06/02/2013
Field of study

In this paper we investigate stopping criteria for iterative decoding from a mutual information perspective. We introduce new iteration stopping rules based on an approximation of the mutual information between encoded bits and decoder soft output. The first type stopping rule sets a threshold value directly on the approximated mutual information for terminating decoding. The threshold can be adjusted according to the expected bit error rate. The second one adopts a strategy similar to that of the well known cross-entropy stopping rule by applying a fixed threshold on the ratio of a simple metric obtained after each iteration over that of the first iteration. Compared with several well known stopping rules, the new methods achieve higher efficiency.Comment: The Asilomar Conference on Signals, Systems, and Computers, Monterey, CA, Nov., 201

arXiv.org e-Print Archive