315 research outputs found

    Model-free reinforcement learning for stochastic parity games

    Get PDF
    This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter Δ, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter Δ tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions

    Calculations of giant magnetoresistance in Fe/Cr trilayers using layer potentials determined from {\it ab-initio} methods

    Full text link
    The ab initio full-potential linearized augmented plane-wave method explicitly designed for the slab geometry was employed to elucidate the physical origin of the layer potentials for the trilayers nFe/3Cr/nFe(001), where n is the number of Fe monolayers. The thickness of the transition-metal ferromagnet has been ranged from n=1n=1 up to n=8 while the spacer thickness was fixed to 3 monolayers. The calculated potentials were inserted in the Fuchs-Sondheimer formalism in order to calculate the giant magnetoresistance (GMR) ratio. The predicted GMR ratio was compared with the experiment and the oscillatory behavior of the GMR as a function of the ferromagnetic layer thickness was discussed in the context of the layer potentials. The reported results confirm that the interface monolayers play a dominant role in the intrinsic GMR.Comment: 17 pages, 7 figures, 3 tables. accepted in J. Phys.: Cond. Matte

    A phenomenological approach to the simulation of metabolism and proliferation dynamics of large tumour cell populations

    Full text link
    A major goal of modern computational biology is to simulate the collective behaviour of large cell populations starting from the intricate web of molecular interactions occurring at the microscopic level. In this paper we describe a simplified model of cell metabolism, growth and proliferation, suitable for inclusion in a multicell simulator, now under development (Chignola R and Milotti E 2004 Physica A 338 261-6). Nutrients regulate the proliferation dynamics of tumor cells which adapt their behaviour to respond to changes in the biochemical composition of the environment. This modeling of nutrient metabolism and cell cycle at a mesoscopic scale level leads to a continuous flow of information between the two disparate spatiotemporal scales of molecular and cellular dynamics that can be simulated with modern computers and tested experimentally.Comment: 58 pages, 7 figures, 3 tables, pdf onl

    Reward Shaping for Reinforcement Learning with Omega-Regular Objectives

    Get PDF
    Recently, successful approaches have been made to exploit good-for-MDPs automata (B\"uchi automata with a restricted form of nondeterminism) for model free reinforcement learning, a class of automata that subsumes good for games automata and the most widespread class of limit deterministic automata. The foundation of using these B\"uchi automata is that the B\"uchi condition can, for good-for-MDP automata, be translated to reachability. The drawback of this translation is that the rewards are, on average, reaped very late, which requires long episodes during the learning process. We devise a new reward shaping approach that overcomes this issue. We show that the resulting model is equivalent to a discounted payoff objective with a biased discount that simplifies and improves on prior work in this direction

    Environmental contaminants as etiologic factors for diabetes.

    Get PDF
    For both type 1 and type 2 diabetes mellitus, the rates have been increasing in the United States and elsewhere; rates vary widely by country, and genetic factors account for less than half of new cases. These observations suggest environmental factors cause both type 1 and type 2 diabetes. Occupational exposures have been associated with increased risk of diabetes. In addition, recent data suggest that toxic substances in the environment, other than infectious agents or exposures that stimulate an immune response, are associated with the occurrence of these diseases. We reviewed the epidemiologic data that addressed whether environmental contaminants might cause type 1 or type 2 diabetes. For type 1 diabetes, higher intake of nitrates, nitrites, and N-nitroso compounds, as well as higher serum levels of polychlorinated biphenyls have been associated with increased risk. Overall, however, the data were limited or inconsistent. With respect to type 2 diabetes, data on arsenic and 2,3,7,8-tetrachlorodibenzo-p-dioxin relative to risk were suggestive of a direct association but were inconclusive. The occupational data suggested that more data on exposure to N-nitroso compounds, arsenic, dioxins, talc, and straight oil machining fluids in relation to diabetes would be useful. Although environmental factors other than contaminants may account for the majority of type 1 and type 2 diabetes, the etiologic role of several contaminants and occupational exposures deserves further study

    Optimal Control for Multi-mode Systems with Discrete Costs

    Get PDF
    This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an optimal control for this model can be computed in NEXPTIME and approximated in PSPACE. We also show that the one-dimensional case is simpler: although the problem is NP-complete (and in LOGSPACE for an infinite time horizon), we develop an FPTAS for finding an approximate solution.Comment: extended version of a FORMATS 2017 pape

    The Complexity of Nash Equilibria in Stochastic Multiplayer Games

    Get PDF
    We analyse the computational complexity of finding Nash equilibria in turn-based stochastic multiplayer games with omega-regular objectives. We show that restricting the search space to equilibria whose payoffs fall into a certain interval may lead to undecidability. In particular, we prove that the following problem is undecidable: Given a game G, does there exist a Nash equilibrium of G where Player 0 wins with probability 1? Moreover, this problem remains undecidable when restricted to pure strategies or (pure) strategies with finite memory. One way to obtain a decidable variant of the problem is to restrict the strategies to be positional or stationary. For the complexity of these two problems, we obtain a common lower bound of NP and upper bounds of NP and PSPACE respectively. Finally, we single out a special case of the general problem that, in many cases, admits an efficient solution. In particular, we prove that deciding the existence of an equilibrium in which each player either wins or loses with probability 1 can be done in polynomial time for games where the objective of each player is given by a parity condition with a bounded number of priorities

    Stochastic Context-Free Grammars, Regular Languages, and Newton's Method

    Get PDF
    We study the problem of computing the probability that a given stochastic context-free grammar (SCFG), G, generates a string in a given regular language L(D) (given by a DFA, D). This basic problem has a number of applications in statistical natural language processing, and it is also a key necessary step towards quantitative \omega-regular model checking of stochastic context-free processes (equivalently, 1-exit recursive Markov chains, or stateless probabilistic pushdown processes). We show that the probability that G generates a string in L(D) can be computed to within arbitrary desired precision in polynomial time (in the standard Turing model of computation), under a rather mild assumption about the SCFG, G, and with no extra assumption about D. We show that this assumption is satisfied for SCFG's whose rule probabilities are learned via the well-known inside-outside (EM) algorithm for maximum-likelihood estimation (a standard method for constructing SCFGs in statistical NLP and biological sequence analysis). Thus, for these SCFGs the algorithm always runs in P-time
    • 

    corecore