11 research outputs found

    Actor-critic algorithms

    No full text
    Abstract. In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence

    On De Finetti Coherence and Kolmogorov Probability

    No full text
    This article addresses the problem of existence of a countably additive probability measure in the sense of Kolmogorov that is consistent with a probability assignment to a family of sets which is coherent in the sense of De Finetti. Key words: probability assignment, coherence condition, subjective probability, countably additive probability This work done while visiting the Laboratory for Information and Decision Systems. Massachusetts Institute of Technology. This research supported by Grant No. III 5(12)/96-ET from the Department of Science and Technology, Government of India and the U.S. Army Research O#ce under the MURI Grant: Data Fusion in Large Arrays of Microsensors DAAD19-00-1-0466

    Language support for multi agent reinforcement learning

    Get PDF
    Software Engineering must increasingly address the issues of complexity and uncertainty that arise when systems are to be deployed into a dynamic software ecosystem. There is also interest in using digital twins of systems in order to design, adapt and control them when faced with such issues. The use of multi-agent systems in combination with reinforcement learning is an approach that will allow software to intelligently adapt to respond to changes in the environment. This paper proposes a language extension that encapsulates learning-based agents and system building operations and shows how it is implemented in ESL. The paper includes examples the key features and describes the application of agent-based learning implemented in ESL applied to a real-world supply chain
    corecore