10,329 research outputs found

    A stochastic approximation algorithm for stochastic semidefinite programming

    Full text link
    Motivated by applications to multi-antenna wireless networks, we propose a distributed and asynchronous algorithm for stochastic semidefinite programming. This algorithm is a stochastic approximation of a continous- time matrix exponential scheme regularized by the addition of an entropy-like term to the problem's objective function. We show that the resulting algorithm converges almost surely to an ε\varepsilon-approximation of the optimal solution requiring only an unbiased estimate of the gradient of the problem's stochastic objective. When applied to throughput maximization in wireless multiple-input and multiple-output (MIMO) systems, the proposed algorithm retains its convergence properties under a wide array of mobility impediments such as user update asynchronicities, random delays and/or ergodically changing channels. Our theoretical analysis is complemented by extensive numerical simulations which illustrate the robustness and scalability of the proposed method in realistic network conditions.Comment: 25 pages, 4 figure

    Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

    Get PDF
    One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost

    RISE-Based Integrated Motion Control of Autonomous Ground Vehicles With Asymptotic Prescribed Performance

    Get PDF
    This article investigates the integrated lane-keeping and roll control for autonomous ground vehicles (AGVs) considering the transient performance and system disturbances. The robust integral of the sign of error (RISE) control strategy is proposed to achieve the lane-keeping control purpose with rollover prevention, by guaranteeing the asymptotic stability of the closed-loop system, attenuating systematic disturbances, and maintaining the controlled states within the prescribed performance boundaries. Three contributions have been made in this article: 1) a new prescribed performance function (PPF) that does not require accurate initial errors is proposed to guarantee the tracking errors restricted within the predefined asymptotic boundaries; 2) a modified neural network (NN) estimator which requires fewer adaptively updated parameters is proposed to approximate the unknown vertical dynamics; and 3) the improved RISE control based on PPF is proposed to achieve the integrated control objective, which analytically guarantees both the controller continuity and closed-loop system asymptotic stability by integrating the signum error function. The overall system stability is proved with the Lyapunov function. The controller effectiveness and robustness are finally verified by comparative simulations using two representative driving maneuvers, based on the high-fidelity CarSim-Simulink simulation

    Feedback control by online learning an inverse model

    Get PDF
    A model, predictor, or error estimator is often used by a feedback controller to control a plant. Creating such a model is difficult when the plant exhibits nonlinear behavior. In this paper, a novel online learning control framework is proposed that does not require explicit knowledge about the plant. This framework uses two learning modules, one for creating an inverse model, and the other for actually controlling the plant. Except for their inputs, they are identical. The inverse model learns by the exploration performed by the not yet fully trained controller, while the actual controller is based on the currently learned model. The proposed framework allows fast online learning of an accurate controller. The controller can be applied on a broad range of tasks with different dynamic characteristics. We validate this claim by applying our control framework on several control tasks: 1) the heating tank problem (slow nonlinear dynamics); 2) flight pitch control (slow linear dynamics); and 3) the balancing problem of a double inverted pendulum (fast linear and nonlinear dynamics). The results of these experiments show that fast learning and accurate control can be achieved. Furthermore, a comparison is made with some classical control approaches, and observations concerning convergence and stability are made
    corecore