47,711 research outputs found
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
Sensor networks and distributed CSP: communication, computation and complexity
We introduce SensorDCSP, a naturally distributed benchmark based on a real-world application that arises in the context of networked distributed systems. In order to study the performance of Distributed CSP (DisCSP) algorithms in a truly distributed setting, we use a discrete-event network simulator, which allows us to model the impact of different network traffic conditions on the performance of the algorithms. We consider two complete DisCSP algorithms: asynchronous backtracking (ABT) and asynchronous weak commitment search (AWC), and perform performance comparison for these algorithms on both satisfiable and unsatisfiable instances of SensorDCSP. We found that random delays (due to network traffic or in some cases actively introduced by the agents) combined with a dynamic decentralized restart strategy can improve the performance of DisCSP algorithms. In addition, we introduce GSensorDCSP, a plain-embedded version of SensorDCSP that is closely related to various real-life dynamic tracking systems. We perform both analytical and empirical study of this benchmark domain. In particular, this benchmark allows us to study the attractiveness of solution repairing for solving a sequence of DisCSPs that represent the dynamic tracking of a set of moving objects.This work was supported in part by AFOSR (F49620-01-1-0076, Intelligent Information Systems Institute and MURI F49620-01-1-0361), CICYT (TIC2001-1577-C03-03 and TIC2003-00950), DARPA (F30602-00-2- 0530), an NSF CAREER award (IIS-9734128), and an Alfred P. Sloan Research Fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the US Government
Sequential Randomized Algorithms for Convex Optimization in the Presence of Uncertainty
In this paper, we propose new sequential randomized algorithms for convex
optimization problems in the presence of uncertainty. A rigorous analysis of
the theoretical properties of the solutions obtained by these algorithms, for
full constraint satisfaction and partial constraint satisfaction, respectively,
is given. The proposed methods allow to enlarge the applicability of the
existing randomized methods to real-world applications involving a large number
of design variables. Since the proposed approach does not provide a priori
bounds on the sample complexity, extensive numerical simulations, dealing with
an application to hard-disk drive servo design, are provided. These simulations
testify the goodness of the proposed solution.Comment: 18 pages, Submitted for publication to IEEE Transactions on Automatic
Contro
Rational Deployment of CSP Heuristics
Heuristics are crucial tools in decreasing search effort in varied fields of
AI. In order to be effective, a heuristic must be efficient to compute, as well
as provide useful information to the search algorithm. However, some well-known
heuristics which do well in reducing backtracking are so heavy that the gain of
deploying them in a search algorithm might be outweighed by their overhead.
We propose a rational metareasoning approach to decide when to deploy
heuristics, using CSP backtracking search as a case study. In particular, a
value of information approach is taken to adaptive deployment of solution-count
estimation heuristics for value ordering. Empirical results show that indeed
the proposed mechanism successfully balances the tradeoff between decreasing
backtracking and heuristic computational overhead, resulting in a significant
overall search time reduction.Comment: 7 pages, 2 figures, to appear in IJCAI-2011, http://www.ijcai.org
Volume of the steady-state space of financial flows in a monetary stock-flow-consistent model
We show that a steady-state stock-flow consistent macro-economic model can be
represented as a Constraint Satisfaction Problem (CSP).The set of solutions is
a polytope, which volume depends on the constraintsapplied and reveals the
potential fragility of the economic circuit,with no need to study the dynamics.
Several methods to compute the volume are compared, inspired by operations
research methods and theanalysis of metabolic networks, both exact and
approximate.We also introduce a random transaction matrix, and study the
particularcase of linear flows with respect to money stocks
Recommended from our members
Experimental evaluation of preprocessing algorithms for constraint satisfaction problems
This paper presents an experimental evaluation of two orthogonal schemes for preprocessing constraint satisfaction problems (CSPs). The first of these schemes involves a class of local consistency techniques that includes directional arc consistency, directional path consistency, and adaptive consistency. The other scheme concerns the prearrangement of variables in a linear order to facilitate an efficient search. In the first series of experiments, we evaluated the effect of each of the local consistency techniques on backtracking and its common enhancement, backjumping. Surprizingly, although adaptive consistency has the best worst-case complexity bounds, we have found that it exhibits the worst performance, unless the constraint graph was very sparse. Directional arc consistency (followed by either backjumping or backtracking) and backjumping (without any pre-processing) outperformed all other techniques; moreover, the former dominated the latter in computationally intensive situations. The second series of experiments suggests that maximum cardinality and minimum width arc the best pre-ordering (i.e., static ordering) strategies, while dynamic search rearrangement is superior to all the preorderings studied
- …