47,711 research outputs found

    Batch Policy Learning under Constraints

    Get PDF
    When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting

    Sensor networks and distributed CSP: communication, computation and complexity

    Get PDF
    We introduce SensorDCSP, a naturally distributed benchmark based on a real-world application that arises in the context of networked distributed systems. In order to study the performance of Distributed CSP (DisCSP) algorithms in a truly distributed setting, we use a discrete-event network simulator, which allows us to model the impact of different network traffic conditions on the performance of the algorithms. We consider two complete DisCSP algorithms: asynchronous backtracking (ABT) and asynchronous weak commitment search (AWC), and perform performance comparison for these algorithms on both satisfiable and unsatisfiable instances of SensorDCSP. We found that random delays (due to network traffic or in some cases actively introduced by the agents) combined with a dynamic decentralized restart strategy can improve the performance of DisCSP algorithms. In addition, we introduce GSensorDCSP, a plain-embedded version of SensorDCSP that is closely related to various real-life dynamic tracking systems. We perform both analytical and empirical study of this benchmark domain. In particular, this benchmark allows us to study the attractiveness of solution repairing for solving a sequence of DisCSPs that represent the dynamic tracking of a set of moving objects.This work was supported in part by AFOSR (F49620-01-1-0076, Intelligent Information Systems Institute and MURI F49620-01-1-0361), CICYT (TIC2001-1577-C03-03 and TIC2003-00950), DARPA (F30602-00-2- 0530), an NSF CAREER award (IIS-9734128), and an Alfred P. Sloan Research Fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the US Government

    Sequential Randomized Algorithms for Convex Optimization in the Presence of Uncertainty

    Full text link
    In this paper, we propose new sequential randomized algorithms for convex optimization problems in the presence of uncertainty. A rigorous analysis of the theoretical properties of the solutions obtained by these algorithms, for full constraint satisfaction and partial constraint satisfaction, respectively, is given. The proposed methods allow to enlarge the applicability of the existing randomized methods to real-world applications involving a large number of design variables. Since the proposed approach does not provide a priori bounds on the sample complexity, extensive numerical simulations, dealing with an application to hard-disk drive servo design, are provided. These simulations testify the goodness of the proposed solution.Comment: 18 pages, Submitted for publication to IEEE Transactions on Automatic Contro

    Rational Deployment of CSP Heuristics

    Full text link
    Heuristics are crucial tools in decreasing search effort in varied fields of AI. In order to be effective, a heuristic must be efficient to compute, as well as provide useful information to the search algorithm. However, some well-known heuristics which do well in reducing backtracking are so heavy that the gain of deploying them in a search algorithm might be outweighed by their overhead. We propose a rational metareasoning approach to decide when to deploy heuristics, using CSP backtracking search as a case study. In particular, a value of information approach is taken to adaptive deployment of solution-count estimation heuristics for value ordering. Empirical results show that indeed the proposed mechanism successfully balances the tradeoff between decreasing backtracking and heuristic computational overhead, resulting in a significant overall search time reduction.Comment: 7 pages, 2 figures, to appear in IJCAI-2011, http://www.ijcai.org

    Volume of the steady-state space of financial flows in a monetary stock-flow-consistent model

    Full text link
    We show that a steady-state stock-flow consistent macro-economic model can be represented as a Constraint Satisfaction Problem (CSP).The set of solutions is a polytope, which volume depends on the constraintsapplied and reveals the potential fragility of the economic circuit,with no need to study the dynamics. Several methods to compute the volume are compared, inspired by operations research methods and theanalysis of metabolic networks, both exact and approximate.We also introduce a random transaction matrix, and study the particularcase of linear flows with respect to money stocks
    • …
    corecore