Search CORE

14,653 research outputs found

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue
Publication date: 27/05/2019
Field of study

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

Author: Alsheikh Mohammad Abu
Hoang Dinh Thai
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2015
Field of study

Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

arXiv.org e-Print Archive

University of Canberra Research Repository

Asymmetric Shocks, Long-term Bonds and Sovereign Default

Author: Xie Shiyu
Zhu Junjun
Publication venue
Publication date
Field of study

We present a sovereign default model with asymmetric shocks and long-term bonds, and solve the model using discrete state dynamic programming. As result, our model matches the Argentinean economy over period 1993Q1-2001Q4 quite well. We show that our model can match high default frequency, high debt/output ratio and other cyclical features, such as countercyclical interest rate and trade balance in emerging countries. Moreover, with asymmetric shocks we are able to match high sovereign spread level and low spread volatility simultaneously in one model, which is till now not well solved. As another contribution of our paper, we propose a simulation-based approach to approximate transition function of output shocks between finite states, which is an indispensable step in discrete state dynamic programming. Comparing to Tauchen’s method, our approach is very flexible in transforming various econometric models to finite state transition function, so that our approach can be widely used in simulating different kinds of discrete state shocks.Sovereign Default; Asymmetric Shocks; Transition Function; Long-term Bonds

Research Papers in Economics

Optimal Interdiction of Unreactive Markovian Evaders

Author: Gutfraind Alexander
Hagberg Aric
Pan Feng
Publication venue
Publication date: 01/01/2009
Field of study

The interdiction problem arises in a variety of areas including military logistics, infectious disease control, and counter-terrorism. In the typical formulation of network interdiction, the task of the interdictor is to find a set of edges in a weighted network such that the removal of those edges would maximally increase the cost to an evader of traveling on a path through the network. Our work is motivated by cases in which the evader has incomplete information about the network or lacks planning time or computational power, e.g. when authorities set up roadblocks to catch bank robbers, the criminals do not know all the roadblock locations or the best path to use for their escape. We introduce a model of network interdiction in which the motion of one or more evaders is described by Markov processes and the evaders are assumed not to react to interdiction decisions. The interdiction objective is to find an edge set of size B, that maximizes the probability of capturing the evaders. We prove that similar to the standard least-cost formulation for deterministic motion this interdiction problem is also NP-hard. But unlike that problem our interdiction problem is submodular and the optimal solution can be approximated within 1-1/e using a greedy algorithm. Additionally, we exploit submodularity through a priority evaluation strategy that eliminates the linear complexity scaling in the number of network edges and speeds up the solution by orders of magnitude. Taken together the results bring closer the goal of finding realistic solutions to the interdiction problem on global-scale networks.Comment: Accepted at the Sixth International Conference on integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CPAIOR 2009

arXiv.org e-Print Archive

UNT Digital Library

Modelling the Effect of Policy Reform on Structural Change in Irish Farming

Author: Hennessy Thia
Publication venue: Teagasc
Publication date: 01/07/2007
Field of study

End of project reportThe Mid Term Review (MTR) of the Common Agricultural Policy (CAP) has allowed for the decoupling of all direct payments from production from 2005 onwards; until then, most direct payments were coupled to production, requiring farmers to produce specific products in order to claim support. After decoupling, farmers will receive a payment regardless of production as long as their farm land is maintained in accordance with good agricultural practices. Direct payments to farmers have been an integral part of the CAP since the 1992 Mac Sharry reforms. Throughout the 1990s, market prices for farm produce have declined generally in line with policy while costs of production have continued to increase. Meanwhile, direct payments increased in value, increasing farmers’ reliance on this source of income. Furthermore, farmers adapted farming practices to maximise their receipt of direct payments, leading to the culture of ‘farming the subsidy’. By 1997, on cattle and tillage farms in Ireland 100 per cent of family farm income was derived from direct payments, meaning that on average the market-based revenue was insufficient to cover total costs

T-Stór

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

Author: Carpin Stefano
Chow Yin-Lam
Pavone Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool

arXiv.org e-Print Archive

Crossref

eScholarship - University of California