Search CORE

32,794 research outputs found

State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning

Author: Ma Shuai
Yu Jia Yuan
Publication venue
Publication date: 29/11/2018
Field of study

In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected cumulative reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We present state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the direct use of Q-learning, and show how the SATs enable a variance formula to work on Markov processes with general reward functions

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Triple Derivations and Triple Homomorphisms of Perfect Lie Superalgebras

Author: Chen Liangyun
Ma Yao
Zhou Jia
Publication venue: 'Elsevier BV'
Publication date: 06/12/2016
Field of study

In this paper, we study triple derivations and triple homomorphisms of perfect Lie superalgebras over a commutative ring

R

. It is proved that, if the base ring contains

\frac{1}{2}

L

is a perfect Lie superalgebra with zero center, then every triple derivation of

L

is a derivation, and every triple derivation of the derivation algebra

Der (L)

is an inner derivation. Let

L,~L^{'}

be Lie superalgebras over a commutative ring

R

, the notion of triple homomorphism from

L

L^{'}

is introduced. We proved that, under certain assumptions, homomorphisms, anti-homomorphisms, and sums of homomorphisms and anti-homomorphisms are all triple homomorphisms.Comment: 12pages in Indagationes Mathematicae, 201

arXiv.org e-Print Archive

CiteSeerX

Exclusive Decay of $1^{--}$ Quarkonia and $B_c$ Meson into a Lepton Pair Combined with Two Pions

Author: Ma J. P.
Xu Jia-Sheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2002
Field of study

We study the exclusive decay of

J/\Psi

\Upsilon

and

B_c

into a lepton pair combined with two pions in the two kinematic regions. One is specified by the two pions having large momenta, but a small invariant mass. The other is specified by the two pions having small momenta. In both cases we find that in the heavy quark limit the decay amplitude takes a factorized form, in which the nonperturbative effect related to heavy meson is represented by a NRQCD matrix element. The nonperturbative effects related to the two pions are represented by some universal functions characterizing the conversion of gluons into the pions. Using models for these universal functions and chiral perturbative theory we are able to obtain numerical predictions for the decay widths. Our numerical results show that the decay of \jpsi is at order of

10^{-5}

with reasonable cuts and can be observed at BES II and the proposed BES III and CLEO-C. For other decays the branching ratio may be too small to be measured.Comment: 19 pages, Latex 2e file, 12 EPS figures (included). Replaced with version to appear in Eur. Phys. J. C,published online: 8 May 200

arXiv.org e-Print Archive

Crossref

Invariants and K-spectrums of local theta lifts

Author: Loke Hung Yean
Ma Jia-jun
Publication venue: 'Wiley'
Publication date: 05/06/2014
Field of study

Let

(G,G')

be a type I irreducible reductive dual pair in

\mathrm{Sp}(W_{\mathbb{R}})

. We assume that

(G,G')

is in the stable range where

G

is the smaller member. Let

K

and

K'

be maximal compact subgroups of

G

and

G'

respectively. Let

\mathfrak{g} = \mathfrak{k} \oplus \mathfrak{p}

and

\mathfrak{g}' = \mathfrak{k}' \oplus \mathfrak{p}'

be the complexified Cartan decompositions of the Lie algebras of

G

and

G'

respectively. Let

{\widetilde{K}}

and

{\widetilde{K}}'

be the inverse images of

K

and

K'

in the metaplectic double cover

\widetilde{\mathrm{Sp}}(W_\mathbb{R})

{\mathrm{Sp}}(W_\mathbb{R})

. Let

\rho

be a genuine irreducible

(\mathfrak{g},{\widetilde{K}})

-module. Our first main result is that if

\rho

is unitarizable, then except for one special case, the full local theta lift

\rho' = \Theta(\rho)

is equal to the local theta lift

\theta(\rho)

. Thus excluding the special case, the full theta lift

\rho'

is an irreducible and unitarizable

(\mathfrak{g}',{\widetilde{K}}')

-module. Our second main result is that the associated variety and the associated cycle of

\rho'

are the theta lifts of the associated variety and the associated cycle of the contragredient representation

\rho^*

respectively. Finally we obtain some interesting

(\mathfrak{g},{\widetilde{K}})

-modules whose

{\widetilde{K}}

-spectrums are isomorphic to the spaces of global sections of some vector bundles on some nilpotent

K_\mathbb{C}

-orbits in

\mathfrak{p}^*

arXiv.org e-Print Archive

ScholarBank@NUS