14,653 research outputs found
Budgeted Reinforcement Learning in Continuous State Space
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov
Decision Process to critical applications requiring safety constraints. It
relies on a notion of risk implemented in the shape of a cost signal
constrained to lie below an - adjustable - threshold. So far, BMDPs could only
be solved in the case of finite state spaces with known dynamics. This work
extends the state-of-the-art to continuous spaces environments and unknown
dynamics. We show that the solution to a BMDP is a fixed point of a novel
Budgeted Bellman Optimality operator. This observation allows us to introduce
natural extensions of Deep Reinforcement Learning algorithms to address
large-scale BMDPs. We validate our approach on two simulated applications:
spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Asymmetric Shocks, Long-term Bonds and Sovereign Default
We present a sovereign default model with asymmetric shocks and long-term bonds, and solve the model using discrete state dynamic programming. As result, our model matches the Argentinean economy over period 1993Q1-2001Q4 quite well. We show that our model can match high default frequency, high debt/output ratio and other cyclical features, such as countercyclical interest rate and trade balance in emerging countries. Moreover, with asymmetric shocks we are able to match high sovereign spread level and low spread volatility simultaneously in one model, which is till now not well solved. As another contribution of our paper, we propose a simulation-based approach to approximate transition function of output shocks between finite states, which is an indispensable step in discrete state dynamic programming. Comparing to Tauchen’s method, our approach is very flexible in transforming various econometric models to finite state transition function, so that our approach can be widely used in simulating different kinds of discrete state shocks.Sovereign Default; Asymmetric Shocks; Transition Function; Long-term Bonds
Optimal Interdiction of Unreactive Markovian Evaders
The interdiction problem arises in a variety of areas including military
logistics, infectious disease control, and counter-terrorism. In the typical
formulation of network interdiction, the task of the interdictor is to find a
set of edges in a weighted network such that the removal of those edges would
maximally increase the cost to an evader of traveling on a path through the
network.
Our work is motivated by cases in which the evader has incomplete information
about the network or lacks planning time or computational power, e.g. when
authorities set up roadblocks to catch bank robbers, the criminals do not know
all the roadblock locations or the best path to use for their escape.
We introduce a model of network interdiction in which the motion of one or
more evaders is described by Markov processes and the evaders are assumed not
to react to interdiction decisions. The interdiction objective is to find an
edge set of size B, that maximizes the probability of capturing the evaders.
We prove that similar to the standard least-cost formulation for
deterministic motion this interdiction problem is also NP-hard. But unlike that
problem our interdiction problem is submodular and the optimal solution can be
approximated within 1-1/e using a greedy algorithm. Additionally, we exploit
submodularity through a priority evaluation strategy that eliminates the linear
complexity scaling in the number of network edges and speeds up the solution by
orders of magnitude. Taken together the results bring closer the goal of
finding realistic solutions to the interdiction problem on global-scale
networks.Comment: Accepted at the Sixth International Conference on integration of AI
and OR Techniques in Constraint Programming for Combinatorial Optimization
Problems (CPAIOR 2009
Modelling the Effect of Policy Reform on Structural Change in Irish Farming
End of project reportThe Mid Term Review (MTR) of the Common Agricultural Policy (CAP) has allowed for the decoupling of all direct payments from production from 2005 onwards; until then, most direct payments were coupled to production, requiring farmers to produce specific products in order to claim support. After decoupling, farmers will receive a payment regardless of production as long as their farm land is maintained in accordance with good agricultural practices. Direct payments to farmers have been an integral part of the CAP since the 1992 Mac Sharry reforms. Throughout the 1990s, market prices for farm produce have declined generally in line with policy while costs of production have continued to increase. Meanwhile, direct payments increased in value, increasing farmers’ reliance on this source of income. Furthermore, farmers adapted farming practices to maximise their receipt of direct payments, leading to the culture of ‘farming the subsidy’. By 1997, on cattle and tillage farms in Ireland 100 per cent of family farm income was derived from direct payments, meaning that on average the market-based revenue was insufficient to cover total costs
Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk
In this paper we present an algorithm to compute risk averse policies in
Markov Decision Processes (MDP) when the total cost criterion is used together
with the average value at risk (AVaR) metric. Risk averse policies are needed
when large deviations from the expected behavior may have detrimental effects,
and conventional MDP algorithms usually ignore this aspect. We provide
conditions for the structure of the underlying MDP ensuring that approximations
for the exact problem can be derived and solved efficiently. Our findings are
novel inasmuch as average value at risk has not previously been considered in
association with the total cost criterion. Our method is demonstrated in a
rapid deployment scenario, whereby a robot is tasked with the objective of
reaching a target location within a temporal deadline where increased speed is
associated with increased probability of failure. We demonstrate that the
proposed algorithm not only produces a risk averse policy reducing the
probability of exceeding the expected temporal deadline, but also provides the
statistical distribution of costs, thus offering a valuable analysis tool
- …