427 research outputs found

    The exponential cost optimality for finite horizon semi-Markov decision processes

    Get PDF
    summary:This paper considers an exponential cost optimality problem for finite horizon semi-Markov decision processes (SMDPs). The objective is to calculate an optimal policy with minimal exponential costs over the full set of policies in a finite horizon. First, under the standard regular and compact-continuity conditions, we establish the optimality equation, prove that the value function is the unique solution of the optimality equation and the existence of an optimal policy by using the minimum nonnegative solution approach. Second, we establish a new value iteration algorithm to calculate both the value function and the ϵ\epsilon-optimal policy. Finally, we give a computable machine maintenance system to illustrate the convergence of the algorithm

    Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory

    Full text link
    Guaranteeing safe behaviour of reinforcement learning (RL) policies poses significant challenges for safety-critical applications, despite RL's generality and scalability. To address this, we propose a new approach to apply verification methods from control theory to learned value functions. By analyzing task structures for safety preservation, we formalize original theorems that establish links between value functions and control barrier functions. Further, we propose novel metrics for verifying value functions in safe control tasks and practical implementation details to improve learning. Our work presents a novel method for certificate learning, which unlocks a diversity of verification techniques from control theory for RL policies, and marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems

    Analysis and Design of Vehicle Platooning Operations on Mixed-Traffic Highways

    Full text link
    Platooning of connected and autonomous vehicles (CAVs) has a significant potential for throughput improvement. However, the interaction between CAVs and non-CAVs may limit the practically attainable improvement due to platooning. To better understand and address this limitation, we introduce a new fluid model of mixed-autonomy traffic flow and use this model to analyze and design platoon coordination strategies. We propose tandem-link fluid model that considers randomly arriving platoons sharing highway capacity with non-CAVs. We derive verifiable conditions for stability of the fluid model by analyzing an underlying M/D/1 queuing process and establishing a Foster-Lyapunov drift condition for the fluid model. These stability conditions enable a quantitative analysis of highway throughput under various scenarios. The model is useful for designing platoon coordination strategies that maximize throughput and minimize delay. Such coordination strategies are provably optimal in the fluid model and are practically relevant. We also validate our results using standard macroscopic (cell transmission model, CTM) and microscopic (Simulation for Urban Mobility, SUMO) simulation models
    • …
    corecore