14,653 research outputs found

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Asymmetric Shocks, Long-term Bonds and Sovereign Default

    Get PDF
    We present a sovereign default model with asymmetric shocks and long-term bonds, and solve the model using discrete state dynamic programming. As result, our model matches the Argentinean economy over period 1993Q1-2001Q4 quite well. We show that our model can match high default frequency, high debt/output ratio and other cyclical features, such as countercyclical interest rate and trade balance in emerging countries. Moreover, with asymmetric shocks we are able to match high sovereign spread level and low spread volatility simultaneously in one model, which is till now not well solved. As another contribution of our paper, we propose a simulation-based approach to approximate transition function of output shocks between finite states, which is an indispensable step in discrete state dynamic programming. Comparing to Tauchen’s method, our approach is very flexible in transforming various econometric models to finite state transition function, so that our approach can be widely used in simulating different kinds of discrete state shocks.Sovereign Default; Asymmetric Shocks; Transition Function; Long-term Bonds

    Optimal Interdiction of Unreactive Markovian Evaders

    Full text link
    The interdiction problem arises in a variety of areas including military logistics, infectious disease control, and counter-terrorism. In the typical formulation of network interdiction, the task of the interdictor is to find a set of edges in a weighted network such that the removal of those edges would maximally increase the cost to an evader of traveling on a path through the network. Our work is motivated by cases in which the evader has incomplete information about the network or lacks planning time or computational power, e.g. when authorities set up roadblocks to catch bank robbers, the criminals do not know all the roadblock locations or the best path to use for their escape. We introduce a model of network interdiction in which the motion of one or more evaders is described by Markov processes and the evaders are assumed not to react to interdiction decisions. The interdiction objective is to find an edge set of size B, that maximizes the probability of capturing the evaders. We prove that similar to the standard least-cost formulation for deterministic motion this interdiction problem is also NP-hard. But unlike that problem our interdiction problem is submodular and the optimal solution can be approximated within 1-1/e using a greedy algorithm. Additionally, we exploit submodularity through a priority evaluation strategy that eliminates the linear complexity scaling in the number of network edges and speeds up the solution by orders of magnitude. Taken together the results bring closer the goal of finding realistic solutions to the interdiction problem on global-scale networks.Comment: Accepted at the Sixth International Conference on integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CPAIOR 2009

    Modelling the Effect of Policy Reform on Structural Change in Irish Farming

    Get PDF
    End of project reportThe Mid Term Review (MTR) of the Common Agricultural Policy (CAP) has allowed for the decoupling of all direct payments from production from 2005 onwards; until then, most direct payments were coupled to production, requiring farmers to produce specific products in order to claim support. After decoupling, farmers will receive a payment regardless of production as long as their farm land is maintained in accordance with good agricultural practices. Direct payments to farmers have been an integral part of the CAP since the 1992 Mac Sharry reforms. Throughout the 1990s, market prices for farm produce have declined generally in line with policy while costs of production have continued to increase. Meanwhile, direct payments increased in value, increasing farmers’ reliance on this source of income. Furthermore, farmers adapted farming practices to maximise their receipt of direct payments, leading to the culture of ‘farming the subsidy’. By 1997, on cattle and tillage farms in Ireland 100 per cent of family farm income was derived from direct payments, meaning that on average the market-based revenue was insufficient to cover total costs

    Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

    Full text link
    In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool
    • …
    corecore