116,343 research outputs found

    State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning

    Full text link
    In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected cumulative reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We present state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the direct use of Q-learning, and show how the SATs enable a variance formula to work on Markov processes with general reward functions

    qq-Wiener (α,q)\alpha,q)- Ornstein-Uhlenbeck processes. A generalization of known processes

    Full text link
    We collect, scattered through literature, as well as we prove some new properties of two Markov processes that in many ways resemble Wiener and Ornstein--Uhlenbeck processes. Although processes considered in this paper were defined either in non-commutative probability context or through quadratic harnesses we define them once more as so to say 'continuous time ' generalization of a simple, symmetric, discrete time process satisfying simple conditions imposed on the form of its first two conditional moments. The finite dimensional distributions of the first one (say X=(X_{t})_{t\geq0} called q-Wiener) depends on one parameter q\in(-1,1] and of the second one (say Y=(Y_{t})_{t\inR} called ({\alpha},q)- Ornstein--Uhlenbeck) on two parameters ({\alpha},q)\in(0,\infty)\times(-1,1]. The first one resembles Wiener process in the sense that for q=1 it is Wiener process but also that for |q|<1 and \foralln\geq1: t^{n/2}H_{n}(X_{t}/\surdt|q), where (H_{n})_{n\geq0} are the so called q-Hermite polynomials, are martingales. It does not have however neither independent increments not allows continuous sample path modification. The second one resembles Ornstein--Uhlenbeck process. For q=1 it is a classical OU process. For |q|<1 it is also stationary with correlation function equal to exp(-{\alpha}|t-s|) and has many properties resembling those of its classical version. We think that these process are fascinating objects to study posing many interesting, open questions.Comment: 25 page

    Scaling Properties of Parallelized Multicanonical Simulations

    Get PDF
    We implemented a parallel version of the multicanonical algorithm and applied it to a variety of systems with phase transitions of first and second order. The parallelization relies on independent equilibrium simulations that only communicate when the multicanonical weight function is updated. That way, the Markov chains efficiently sample the temporary distributions allowing for good estimations of consecutive weight functions. The systems investigated range from the well known Ising and Potts spin systems to bead-spring polymers. We estimate the speedup with increasing number of parallel processes. Overall, the parallelization is shown to scale quite well. In the case of multicanonical simulations of the qq-state Potts model (q6q\ge6) and multimagnetic simulations of the Ising model, the optimal performance is limited due to emerging barriers.Comment: Contribution to the Proceedings of "Recent Developments in Computer Simulational Studies in Condensed Matter Physics 2013

    On two parameter bivariate kernel built of q-ultraspherical polynomials and other Lancaster type expansions of bivariate distribution

    Full text link
    Our most important result concerns the positivity of certain kernels built of the so-called qq-ultraspherical polynomials. Since this result apears at first sight as primarily important for those who are working in orthogonal polynomials, qq-series theory and the so-called quantum polynomials, it might have a limited number of interested researchers. That is why, we put our result into a broader context. We recall the theory of Hilbert-Schmidt operators, Lancaster expansions and their applications in Mathematical statistics, or bivariate distributions absolutely continuous with respect to the product of their marginal distributions leading to the generation of Markov process with polynomial conditional moments (the main representative of such processes is a famous Wiener process)

    The L2-cutoff for reversible Markov processes

    Get PDF
    AbstractWe consider the problem of proving the existence of an L2-cutoff for families of ergodic Markov processes started from given initial distributions and associated with reversible (more, generally, normal) Markov semigroups. This includes classical examples such as families of finite reversible Markov chains and Brownian motion on compact Riemannian manifolds. We give conditions that are equivalent to the existence of an L2-cutoff and describe the L2-cutoff time in terms of the spectral decomposition. This is illustrated by several examples including the Ehrenfest process and the biased (p,q)-random walk on the non-negative integers, both started from an arbitrary point

    Convexity of quantum χ2\chi^2-divergence

    Full text link
    The quantum \chi^2-divergence has recently been introduced and applied to quantum channels (quantum Markov processes). In contrast to the classical setting the quantum \chi^2-divergence is not unique but depends on the choice of quantum statistics. In the reference [11] a special one-parameter family of quantum \chi^2_\alpha(\rho,\sigma)-divergences for density matrices were studied, and it was established that they are convex functions in (\rho,\sigma) for parameter values \alpha\in [0,1], thus mirroring the classical theorem for the \chi^2(p,q)-divergence for probability distributions (p,q). We prove that any quantum \chi^2-divergence is a convex function in its two arguments.Comment: Proof clarified, typos correcte

    Quasi-stationary distributions

    Get PDF
    This paper contains a survey of results related to quasi-stationary distributions, which arise in the setting of stochastic dynamical systems that eventually evanesce, and which may be useful in describing the long-term behaviour of such systems before evanescence. We are concerned mainly with continuous-time Markov chains over a finite or countably infinite state space, since these processes most often arise in applications, but will make reference to results for other processes where appropriate. Next to giving an historical account of the subject, we review the most important results on the existence and identification of quasi-stationary distributions for general Markov chains, and give special attention to birth-death processes and related models. Results on the question of whether a quasi-stationary distribution, given its existence, is indeed a good descriptor of the long-term behaviour of a system before evanescence, are reviewed as well. The paper is concluded with a summary of recent developments in numerical and approximation methods

    Timed Comparisons of Semi-Markov Processes

    Get PDF
    Semi-Markov processes are Markovian processes in which the firing time of the transitions is modelled by probabilistic distributions over positive reals interpreted as the probability of firing a transition at a certain moment in time. In this paper we consider the trace-based semantics of semi-Markov processes, and investigate the question of how to compare two semi-Markov processes with respect to their time-dependent behaviour. To this end, we introduce the relation of being "faster than" between processes and study its algorithmic complexity. Through a connection to probabilistic automata we obtain hardness results showing in particular that this relation is undecidable. However, we present an additive approximation algorithm for a time-bounded variant of the faster-than problem over semi-Markov processes with slow residence-time functions, and a coNP algorithm for the exact faster-than problem over unambiguous semi-Markov processes

    Entropy: The Markov Ordering Approach

    Full text link
    The focus of this article is on entropy and Markov processes. We study the properties of functionals which are invariant with respect to monotonic transformations and analyze two invariant "additivity" properties: (i) existence of a monotonic transformation which makes the functional additive with respect to the joining of independent systems and (ii) existence of a monotonic transformation which makes the functional additive with respect to the partitioning of the space of states. All Lyapunov functionals for Markov chains which have properties (i) and (ii) are derived. We describe the most general ordering of the distribution space, with respect to which all continuous-time Markov processes are monotonic (the {\em Markov order}). The solution differs significantly from the ordering given by the inequality of entropy growth. For inference, this approach results in a convex compact set of conditionally "most random" distributions.Comment: 50 pages, 4 figures, Postprint version. More detailed discussion of the various entropy additivity properties and separation of variables for independent subsystems in MaxEnt problem is added in Section 4.2. Bibliography is extende
    corecore