38 research outputs found

    Steps toward self-aware networks

    No full text

    Feed-Forward Learning: Fast Reinforcement Learning of Controllers

    No full text

    Unified Inter and Intra Options Learning Using Policy Gradient Methods

    No full text
    Abstract. Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first learning each option with a prescribed subgoal, and then learning to compose the learned options together. In this paper we offer a unified framework for concurrent inter- and intra-options learning. To that end, we propose a modular parameterization of intra-option policies together with option termination conditions and the option selection policy (inter options), and show that these three decision components may be viewed as a unified policy over an augmented state-action space, to which standard policy gradient algorithms may be applied. We identify the basis functions that apply to each of these decision components, and show that they possess a useful orthogonality property that allows to compute the natural gradient independently for each component. We further outline the extension of the suggested framework to several levels of options hierarchy, and conclude with a brief illustrative example.

    An optimal stopping strategy for online calibration in local search

    No full text
    This paper formalizes the problem of choosing online the number of explorations in a local search algorithm as a last-success problem. In this family of stochastic problems the events of interest belong to two categories (success or failure) and the objective consists in predicting when the last success will take place. The application to a local search setting is immediate if we identify the success with the detection of a new local optimum. Being able to predict when the last optimum will be found allows a computational gain by reducing the amount of iterations carried out in the neighborhood of the current solution. The paper proposes a new algorithm for online calibration of the number of iterations during exploration and assesses it with a set of continuous optimisation tasks. © Springer-Verlag Berlin Heidelberg 2011.SCOPUS: cp.kinfo:eu-repo/semantics/publishe

    TCP Modification Robust to Packet Reordering in Ant Routing Networks

    No full text

    Recursive Least-Squares Learning with Eligibility Traces

    Get PDF
    In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning with eligibility traces. This leads to two known algorithms, LSTD(λ)/LSPE(λ) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(λ) [21] remains the best least-squares algorithm

    Tracking in reinforcement learning

    No full text
    Abstract. Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the environment of the learning agent can be considered as stationary, generalized policy iteration frameworks, because of the interleaving of learning and control, will produce non-stationarity of the evaluated policy and so of its value function. Tracking the optimal solution instead of trying to converge to it is therefore preferable. In this paper, we propose to handle this tracking issue with a Kalman-based temporal difference framework. Complexity and convergence analysis are studied. Empirical investigations of its ability to handle non-stationarity is finally provided

    Rural township of Toodyay, Western Australia, February 1919 /

    No full text
    Title devised by cataloguer from accompanying information.; Part of the collection: Michael Terry collection of negatives of his expeditions and travels, 1918-1971.; Condition: Spotting.; Also available as a photograph: PIC Album 367.; Also available online at: http://nla.gov.au/nla.pic-vn6248152

    Decision-theoretic control of planetary rovers

    No full text
    Planetary rovers are small unmanned vehicles equipped with cameras and a variety of sensors used for scientific experiments. They must operate under tight constraints over such resources as operation time, power, storage capacity, and communication bandwidth. Moreover, the limited computational resources of the rover limit the complexity of on-line planning and scheduling. We describe two decision-theoretic approaches to maximize the productivity of planetary rovers: one based on adaptive planning and the other on hierarchical reinforcement learning
    corecore