2,007 research outputs found

    Online Supplement to `Efficient Simulation Resource Sharing and Allocation for Selecting the Best'

    Get PDF
    This is the online supplement to the article by the same authors, "Efficient Simulation Resource Sharing and Allocation for Selecting the Best," published in the IEEE Transactions on Automatic Control

    Online Appendix for “Gradient-Based Myopic Allocation Policy: An Efficient Sampling Procedure in a Low-Confidence Scenario”

    Get PDF
    This is the online appendix, which includes theoretical and numerical supplements containing some technical details and three additional numerical examples, which could not fit in the main body due to page limits by the journal for a technical note. The abstract for the main body is as follows: In this note, we study a simulation optimization problem of selecting the alternative with the best performance from a finite set, or a so-called ranking and selection problem, in a special low-confidence scenario. The most popular sampling allocation procedures in ranking and selection do not perform well in this scenario, because they all ignore certain induced correlations that significantly affect the probability of correct selection in this scenario. We propose a gradient-based myopic allocation policy (G-MAP) that takes the induced correlations into account, reflecting a trade-off between the induced correlation and the two factors (mean-variance) found in the optimal computing budget allocation formula. Numerical experiments substantiate the efficiency of the new procedure in the low-confidence scenario.This work was supported in part by the National Science Foundation (NSF) under Grants CMMI-0856256, CMMI- 1362303, CMMI-1434419, by the National Natural Science Foundation of China (NSFC) under Grants 71571048, by the Air Force of Scientific Research (AFOSR) under Grant FA9550-15-10050, and by the Science and Technology Agency of Sichuan Province under Grant 2014GZX0002

    Maximum Entropy Heterogeneous-Agent Mirror Learning

    Full text link
    Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years. However, existing state-of-the-art methods face challenges related to sample inefficiency, brittleness regarding hyperparameters, and the risk of converging to a suboptimal Nash Equilibrium. To resolve these issues, in this paper, we propose a novel theoretical framework, named Maximum Entropy Heterogeneous-Agent Mirror Learning (MEHAML), that leverages the maximum entropy principle to design maximum entropy MARL actor-critic algorithms. We prove that algorithms derived from the MEHAML framework enjoy the desired properties of the monotonic improvement of the joint maximum entropy objective and the convergence to quantal response equilibrium (QRE). The practicality of MEHAML is demonstrated by developing a MEHAML extension of the widely used RL algorithm, HASAC (for soft actor-critic), which shows significant improvements in exploration and robustness on three challenging benchmarks: Multi-Agent MuJoCo, StarCraftII, and Google Research Football. Our results show that HASAC outperforms strong baseline methods such as HATD3, HAPPO, QMIX, and MAPPO, thereby establishing the new state of the art. See our project page at https://sites.google.com/view/mehaml

    Application of Perturbation Analysis to the Design and Analysis of Control Charts

    Get PDF
    The design of control charts in statistical quality control addresses the optimal selection of the design parameters such as the sampling frequency and the control limits; and includes sensitivity analysis with respect to system parameters such as the various process parameters and the economic costs of sampling. The advent of more complicated control chart schemes has necessitated the use of Monte Carlo simulation in the design process, particularly in the evaluation of performance measures such as average run length. In this paper, we apply perturbation analysis to derive gradient estimators that can be used in gradient-based optimization algorithms and in sensitivity analysis when Monte Carlo simulation is employed. We illustrate the technique on a simple Shewhart control chart and on a more complicated control chart that includes the exponentially- weighted moving average control chart as a special case

    Sensitivity Analysis for Monte Carlo Simulation of Option Pricing

    Get PDF
    corrections to published article; additional tables for numerical resultsMonte Carlo simulation is one alternative for analyzing options markets when the assumptions of simpler analytical models are violated. We introduce techniques for the sensitivity analysis of option pricing which can be efficiently carried out in the simulation. In particular, using these techniques, a single run of the simulation would often provide not only an estimate of the option value but also estimates of the sensitivities of the option value to various parameters of the model. Both European and American options are considered, starting with simple analytically tractable models to present the idea and proceeding to more complicated examples. We then propose an approach for the pricing of options with early exercise features by incorporating the gradient estimates in an iterative stochastic approximation algorithm. The procedure is illustrated in a simple example estimating the option value of an American call. Numerical results indicate that the additional computational effort required over that required to estimate a European option is relatively small

    A New Insight into the Role of CART in Cocaine Reward: Involvement of CaMKII and Inhibitory G-Protein Coupled Receptor Signaling

    Get PDF
    Cocaine- and amphetamine-regulated transcript (CART) peptides are neuropeptides that are expressed in brain regions associated with reward, such as the nucleus accumbens (NAc), and play a role in cocaine reward. Injection of CART into the NAc can inhibit the behavioral effects of cocaine, and injecting CART into the ventral tegmental area (VTA) reduces cocaine-seeking behavior. However, the exact mechanism of these effects is not clear. Recent research has demonstrated that Ca2+/calmodulin-dependent protein kinase II (CaMKII) and inhibitory G-protein coupled receptor (GPCR) signaling are involved in the mechanism of the effect of CART on cocaine reward. Hence, we review the role of CaMKII and inhibitory GPCR signaling in the effect of CART on cocaine reward and provide a new insight into the mechanism of that effect. In this article, we will first review the biological function of CART and discuss the role of CART in cocaine reward. Then, we will focus on the role of CaMKII and inhibitory GPCR signaling in cocaine reward. Furthermore, we will discuss how CaMKII and inhibitory GPCR signaling are involved in the mechanistic action of CART in cocaine reward. Finally, we will provide our opinions regarding the future directions of research on the role of CaMKII and inhibitory GPCR signaling in the effect of CART on cocaine reward
    corecore