116,343 research outputs found
State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning
In the framework of MDP, although the general reward function takes three
arguments-current state, action, and successor state; it is often simplified to
a function of two arguments-current state and action. The former is called a
transition-based reward function, whereas the latter is called a state-based
reward function. When the objective involves the expected cumulative reward
only, this simplification works perfectly. However, when the objective is
risk-sensitive, this simplification leads to an incorrect value. We present
state-augmentation transformations (SATs), which preserve the reward sequences
as well as the reward distributions and the optimal policy in risk-sensitive
reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for
every MDP with a stochastic transition-based reward function, there exists an
MDP with a deterministic state-based reward function, such that for any given
(randomized) policy for the first MDP, there exists a corresponding policy for
the second MDP, such that both Markov reward processes share the same reward
sequence. Secondly we illustrate that two situations require the proposed SATs
in an inventory control problem. One could be using Q-learning (or other
learning methods) on MDPs with transition-based reward functions, and the other
could be using methods, which are for the Markov processes with a deterministic
state-based reward functions, on the Markov processes with general reward
functions. We show the advantage of the SATs by considering Value-at-Risk as an
example, which is a risk measure on the reward distribution instead of the
measures (such as mean and variance) of the distribution. We illustrate the
error in the reward distribution estimation from the direct use of Q-learning,
and show how the SATs enable a variance formula to work on Markov processes
with general reward functions
Wiener ( Ornstein-Uhlenbeck processes. A generalization of known processes
We collect, scattered through literature, as well as we prove some new
properties of two Markov processes that in many ways resemble Wiener and
Ornstein--Uhlenbeck processes. Although processes considered in this paper were
defined either in non-commutative probability context or through quadratic
harnesses we define them once more as so to say 'continuous time '
generalization of a simple, symmetric, discrete time process satisfying simple
conditions imposed on the form of its first two conditional moments. The finite
dimensional distributions of the first one (say X=(X_{t})_{t\geq0} called
q-Wiener) depends on one parameter q\in(-1,1] and of the second one (say
Y=(Y_{t})_{t\inR} called ({\alpha},q)- Ornstein--Uhlenbeck) on two parameters
({\alpha},q)\in(0,\infty)\times(-1,1]. The first one resembles Wiener process
in the sense that for q=1 it is Wiener process but also that for |q|<1 and
\foralln\geq1: t^{n/2}H_{n}(X_{t}/\surdt|q), where (H_{n})_{n\geq0} are the so
called q-Hermite polynomials, are martingales. It does not have however neither
independent increments not allows continuous sample path modification. The
second one resembles Ornstein--Uhlenbeck process. For q=1 it is a classical OU
process. For |q|<1 it is also stationary with correlation function equal to
exp(-{\alpha}|t-s|) and has many properties resembling those of its classical
version. We think that these process are fascinating objects to study posing
many interesting, open questions.Comment: 25 page
Scaling Properties of Parallelized Multicanonical Simulations
We implemented a parallel version of the multicanonical algorithm and applied
it to a variety of systems with phase transitions of first and second order.
The parallelization relies on independent equilibrium simulations that only
communicate when the multicanonical weight function is updated. That way, the
Markov chains efficiently sample the temporary distributions allowing for good
estimations of consecutive weight functions.
The systems investigated range from the well known Ising and Potts spin
systems to bead-spring polymers. We estimate the speedup with increasing number
of parallel processes. Overall, the parallelization is shown to scale quite
well. In the case of multicanonical simulations of the -state Potts model
() and multimagnetic simulations of the Ising model, the optimal
performance is limited due to emerging barriers.Comment: Contribution to the Proceedings of "Recent Developments in Computer
Simulational Studies in Condensed Matter Physics 2013
On two parameter bivariate kernel built of q-ultraspherical polynomials and other Lancaster type expansions of bivariate distribution
Our most important result concerns the positivity of certain kernels built of
the so-called ultraspherical polynomials. Since this result apears at first
sight as primarily important for those who are working in orthogonal
polynomials, series theory and the so-called quantum polynomials, it might
have a limited number of interested researchers. That is why, we put our result
into a broader context. We recall the theory of Hilbert-Schmidt operators,
Lancaster expansions and their applications in Mathematical statistics, or
bivariate distributions absolutely continuous with respect to the product of
their marginal distributions leading to the generation of Markov process with
polynomial conditional moments (the main representative of such processes is a
famous Wiener process)
The L2-cutoff for reversible Markov processes
AbstractWe consider the problem of proving the existence of an L2-cutoff for families of ergodic Markov processes started from given initial distributions and associated with reversible (more, generally, normal) Markov semigroups. This includes classical examples such as families of finite reversible Markov chains and Brownian motion on compact Riemannian manifolds. We give conditions that are equivalent to the existence of an L2-cutoff and describe the L2-cutoff time in terms of the spectral decomposition. This is illustrated by several examples including the Ehrenfest process and the biased (p,q)-random walk on the non-negative integers, both started from an arbitrary point
Convexity of quantum -divergence
The quantum \chi^2-divergence has recently been introduced and applied to
quantum channels (quantum Markov processes). In contrast to the classical
setting the quantum \chi^2-divergence is not unique but depends on the choice
of quantum statistics. In the reference [11] a special one-parameter family of
quantum \chi^2_\alpha(\rho,\sigma)-divergences for density matrices were
studied, and it was established that they are convex functions in (\rho,\sigma)
for parameter values \alpha\in [0,1], thus mirroring the classical theorem for
the \chi^2(p,q)-divergence for probability distributions (p,q). We prove that
any quantum \chi^2-divergence is a convex function in its two arguments.Comment: Proof clarified, typos correcte
Quasi-stationary distributions
This paper contains a survey of results related to quasi-stationary distributions, which arise in the setting of stochastic dynamical systems that eventually evanesce, and which may be useful in describing the long-term behaviour of such systems before evanescence. We are concerned mainly with continuous-time Markov chains over a finite or countably infinite state space, since these processes most often arise in applications, but will make reference to results for other processes where appropriate. Next to giving an historical account of the subject, we review the most important results on the existence and identification of quasi-stationary distributions for general Markov chains, and give special attention to birth-death processes and related models. Results on the question of whether a quasi-stationary distribution, given its existence, is indeed a good descriptor of the long-term behaviour of a system before evanescence, are reviewed as well. The paper is concluded with a summary of recent developments in numerical and approximation methods
Timed Comparisons of Semi-Markov Processes
Semi-Markov processes are Markovian processes in which the firing time of the
transitions is modelled by probabilistic distributions over positive reals
interpreted as the probability of firing a transition at a certain moment in
time. In this paper we consider the trace-based semantics of semi-Markov
processes, and investigate the question of how to compare two semi-Markov
processes with respect to their time-dependent behaviour. To this end, we
introduce the relation of being "faster than" between processes and study its
algorithmic complexity. Through a connection to probabilistic automata we
obtain hardness results showing in particular that this relation is
undecidable. However, we present an additive approximation algorithm for a
time-bounded variant of the faster-than problem over semi-Markov processes with
slow residence-time functions, and a coNP algorithm for the exact faster-than
problem over unambiguous semi-Markov processes
Entropy: The Markov Ordering Approach
The focus of this article is on entropy and Markov processes. We study the
properties of functionals which are invariant with respect to monotonic
transformations and analyze two invariant "additivity" properties: (i)
existence of a monotonic transformation which makes the functional additive
with respect to the joining of independent systems and (ii) existence of a
monotonic transformation which makes the functional additive with respect to
the partitioning of the space of states. All Lyapunov functionals for Markov
chains which have properties (i) and (ii) are derived. We describe the most
general ordering of the distribution space, with respect to which all
continuous-time Markov processes are monotonic (the {\em Markov order}). The
solution differs significantly from the ordering given by the inequality of
entropy growth. For inference, this approach results in a convex compact set of
conditionally "most random" distributions.Comment: 50 pages, 4 figures, Postprint version. More detailed discussion of
the various entropy additivity properties and separation of variables for
independent subsystems in MaxEnt problem is added in Section 4.2.
Bibliography is extende
- …