4 research outputs found

    A classification-based approach to the optimal control of affine switched systems

    Get PDF
    This paper deals with the optimal control of discrete–time switched systems, characterized by a finite set of operating modes, each one associated with given affine dynamics. The objective is the design of the switching law so as to minimize an infinite–horizon expected cost, that penalizes frequent switchings. The optimal switching law is computed off–line, which allows an efficient online operation of the control via a state feedback policy. The latter associates a mode to each state and, as such, can be viewed as a classifier. In order to train such classifier–type controller one needs first to generate a set of training data in the form of optimal state–mode pairs. In the considered setting, this involves solving a Mixed Integer Quadratic Programming (MIQP) problem for each pair. A key feature of the proposed approach is the use of a classification method that provides guarantees on the generalization properties of the classifier. The approach is tested on a multi–room heating control problem

    Directed exploration of policy space using support vector classifiers

    No full text
    Summarization: Good policies in reinforcement learning problems typically exhibit significant structure. Several recent learning approaches based on the approximate policy iteration scheme suggest the use of classifiers for capturing this structure and representing policies compactly. Nevertheless, the space of possible policies, even under such structured representations, is huge and needs to be explored carefully to avoid computationally expensive simulations (rollouts) needed to probe the improved policy and obtain training samples at various points over the state space. Regarding rollouts as a scarce resource, we propose a method for directed exploration of policy space using support vector classifiers. We use a collection of binary support vector classifiers to represent policies, whereby each of these classifiers corresponds to a single action and captures the parts of the state space where this action dominates over the other actions. After an initial training phase with rollouts uniformly distributed over the entire state space, we use the support vectors of the classifiers to identify the critical parts of the state space with boundaries between different action choices in the represented policy. The policy is subsequently improved by probing the state space only at points around the support vectors that are distributed perpendicularly to the separating border. This directed focus on critical parts of the state space iteratively leads to the gradual refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations with a conservative use of rollouts. We demonstrate the proposed approach on three standard reinforcement learning domains: inverted pendulum, mountain car, and acrobot.Παρουσιάστηκε στο: 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learnin

    Directed policy search using relevance vector machines

    No full text
    Summarization: Several recent learning approaches based on approximate policy iteration suggest the use of classifiers for representing policies compactly. The space of possible policies, even under such structured representations, is huge and must be searched carefully to avoid computationally expensive policy simulations (rollouts). In our recent work, we proposed a method for directed exploration of policy space using support vector classifiers, whereby rollouts are directed to states around the boundaries between different action choices indicated by the separating hyper planes in the represented policies. While effective, this method suffers from the growing number of support vectors in the underlying classifiers as the number of training examples increases. In this paper, we propose an alternative method for directed policy search based on relevance vector machines. Relevance vector machines are used both for classification (to represent a policy) and regression (to approximate the corresponding relative action advantage function). Exploiting the internal structure of the regress or, we guide the probing of the state space only to critical areas corresponding to changes of action dominance in the underlying policy. This directed focus on critical parts of the state space iteratively leads to refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations, while the small number of relevance vectors yields significant computational time savings. We demonstrate the proposed approach and compare it with our previous method on standard reinforcement learning domains.Παρουσιάστηκε στο: 2012 IEEE International Conference on Tools with Artificial Intelligenc

    Directed policy search for decision making using relevance vector machines

    No full text
    Δημοσίευση σε επιστημονικό περιοδικόSummarization: Several recent learning approaches in decision making under uncertainty suggest the use of classifiers for representing policies compactly. The space of possible policies, even under such structured representations, is huge and must be searched carefully to avoid computationally expensive policy simulations (rollouts). In our recent work, we proposed a method for directed exploration of policy space using support vector classifiers, whereby rollouts are directed to states around the boundaries between different action choices indicated by the separating hyperplanes in the represented policies. While effective, this method suffers from the growing number of support vectors in the underlying classifiers as the number of training examples increases. In this paper, we propose an alternative method for directed policy search based on relevance vector machines. Relevance vector machines are used both for classification (to represent a policy) and regression (to approximate the corresponding relative action advantage function). Classification is enhanced by anomaly detection for accurate policy representation. Exploiting the internal structure of the regressor, we guide the probing of the state space only to critical areas corresponding to changes of action dominance in the underlying policy. This directed focus on critical parts of the state space iteratively leads to refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations, while the small number of relevance vectors yields significant computational time savings. We demonstrate the proposed approach and compare it with our previous method on standard reinforcement learning domains (inverted pendulum and mountain car).Presented on: International Journal on Artificial Intelligence Tool
    corecore