3 research outputs found

    Approximate information state based convergence analysis of recurrent Q-learning

    Full text link
    In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a complete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.Comment: 25 pages, 6 figure

    Designing Trustworthy Autonomous Systems

    Get PDF
    The design of autonomous systems is challenging and ensuring their trustworthiness can have different meanings, such as i) ensuring consistency and completeness of the requirements by a correct elicitation and formalization process; ii) ensuring that requirements are correctly mapped to system implementations so that any system behaviors never violate its requirements; iii) maximizing the reuse of available components and subsystems in order to cope with the design complexity; and iv) ensuring correct coordination of the system with its environment.Several techniques have been proposed over the years to cope with specific problems. However, a holistic design framework that, leveraging on existing tools and methodologies, practically helps the analysis and design of autonomous systems is still missing. This thesis explores the problem of building trustworthy autonomous systems from different angles. We have analyzed how current approaches of formal verification can provide assurances: 1) to the requirement corpora itself by formalizing requirements with assume/guarantee contracts to detect incompleteness and conflicts; 2) to the reward function used to then train the system so that the requirements do not get misinterpreted; 3) to the execution of the system by run-time monitoring and enforcing certain invariants; 4) to the coordination of the system with other external entities in a system of system scenario and 5) to system behaviors by automatically synthesize a policy which is correct
    corecore