59 research outputs found

    Robot brains

    No full text
    The brain hosts complex networks of neurons that are responsible for behavior in humans and animals that we generally call intelligent. I is not easy to give an exact definition of intelligence – for the purpose of this talk it will suffice to say that we refer to intelligence as a collection of capacities for abstract thought, reasoning, planning, problem solving, the use of language, and above all the capability to learn from experience and to adapt to changing conditions. You will see that Learning and adaptation are important subjects during the more technical part of my speech. Robots do not have brains – instead, their behavior is programmed at the design stage of the robot. Although robotics has already has made a tremendous impact on several specific industries, of which the automotive industry is probably the best known example, We expect that in the near future robots will fundamentally change many other areas of our activity. I argue that robotics will become an essential technology to address pressing issues of the mankind such as the future lack of manual labor due to the aging population in developed countries or the depletion of resources in accessible locations. However, I also argue that for robotics to become widespread, future robots must be endowed with control systems that exhibit a much higher degree of intelligence than the systems currently available and in this talk I will propose some directions toward achieving this goal.Delft Centre for Systems and ControlMechanical, Maritime and Materials Engineerin

    Fuzzy relational classifier trained by fuzzy clustering

    No full text

    Learning state representation for deep actor-critic control

    No full text
    Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.Accepted Author ManuscriptOLD Intelligent Control & Robotic

    Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning

    No full text
    Visual navigation is essential for many applications in robotics, from manipulation, through mobile robotics to automated driving. Deep reinforcement learning (DRL) provides an elegant map-free approach integrating image processing, localization, and planning in one module, which can be trained and therefore optimized for a given environment. However, to date, DRL-based visual navigation was validated exclusively in simulation, where the simulator provides information that is not available in the real world, e.g., the robot's position or segmentation masks. This precludes the use of the learned policy on a real robot. Therefore, we present a novel approach that enables a direct deployment of the trained policy on real robots. We have designed a new powerful simulator capable of domain randomization. To facilitate the training, we propose visual auxiliary tasks and a tailored reward scheme. The policy is fine-tuned on images collected from real-world environments. We have evaluated the method on a mobile robot in a real office environment. The training took approximately 30 hours on a single GPU. In 30 navigation experiments, the robot reached a 0.3-meter neighbourhood of the goal in more than 86.7% of cases. This result makes the proposed method directly applicable to tasks like mobile manipulation.Learning & Autonomous Contro

    Optimal control via reinforcement learning with symbolic policy approximation

    No full text
    Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper addresses the problem of finding a smooth policy based on the value function represented by means of a basis-function approximator. We first show that policies derived directly from the value function or represented explicitly by the same type of approximator lead to inferior control performance, manifested by non-smooth control signals and steady-state errors. We then propose a novel method to construct a smooth policy represented by an analytic equation, obtained by means of symbolic regression. The proposed method is illustrated on a reference-tracking problem of a 1-DOF robot arm operating under the influence of gravity. The results show that the analytic control law performs at least equally well as the original numerically approximated policy, while it leads to much smoother control signals. In addition, the analytic function is readable (as opposed to black-box approximators) and can be used in further analysis and synthesis of the closed loop.Learning & Autonomous Contro

    Selecting Informative Data Samples for Model Learning Through Symbolic Regression

    No full text
    Continual model learning for nonlinear dynamic systems, such as autonomous robots, presents several challenges. First, it tends to be computationally expensive as the amount of data collected by the robot quickly grows in time. Second, the model accuracy is impaired when data from repetitive motions prevail in the training set and outweigh scarcer samples that also capture interesting properties of the system. It is not known in advance which samples will be useful for model learning. Therefore, effective methods need to be employed to select informative training samples from the continuous data stream collected by the robot. Existing literature does not give any guidelines as to which of the available sample-selection methods are suitable for such a task. In this paper, we compare five sample-selection methods, including a novel method using the model prediction error. We integrate these methods into a model learning framework based on symbolic regression, which allows for learning accurate models in the form of analytic equations. Unlike the currently popular data-hungry deep learning methods, symbolic regression is able to build models even from very small training data sets. We demonstrate the approach on two real robots: the TurtleBot mobile robot and the Parrot Bebop drone. The results show that an accurate model can be constructed even from training sets as small as 24 samples. Informed sample-selection techniques based on prediction error and model variance clearly outperform uninformed methods, such as sequential or random selection.Learning & Autonomous Contro

    Symbolic method for deriving policy in reinforcement learning

    No full text
    This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.Accepted Author ManuscriptOLD Intelligent Control & Robotic

    Proxy functions for Approximate Reinforcement Learning

    No full text
    Approximate Reinforcement Learning (RL) is a method to solve sequential decisionmaking and dynamic control problems in an optimal way. This paper addresses RL for continuous state spaces which derive the control policy by using an approximate value function (V-function). The standard approach to derive a policy through the V-function is analogous to hill climbing: at each state the RL agent chooses the control input that maximizes the right-hand side of the Bellman equation. Although theoretically optimal, the actual control performance of this method is heavily influenced by the local smoothness of the V-function; a lack of smoothness results in undesired closed-loop behavior with input chattering or limit-cycles. To circumvent these problems, this paper provides a method based on Symbolic Regression to generate a locally smooth proxy to the V-function. The proposed method has been evaluated on two nonlinear control benchmarks: pendulum swing-up and magnetic manipulation. The new method has been compared with the standard policy derivation technique using the approximate V-function and the results show that the proposed approach outperforms the standard one with respect to the cumulative return.Learning & Autonomous Contro

    Integrated dynamic modelling and multivariable control of HVAC components

    No full text
    The field of energy efficiency in buildings offers challenging opportunities from a control point of view. Heating, Ventilation and Air-Conditioning (HVAC) units in buildings must be accurately controlled so as to ensure the occupants' comfort and reduced energy consumption. While the existing HVAC models consist of only one or a few HVAC components, this work involves the development of a complete HVAC model for one thermal zone. Also, a novel multivariable control strategy based on PI auto-tuning is proposed by combining the aforementioned model with optimization of the desired (time-varying) equilibria. One of the advantages of the proposed PI strategy is the use of time-varying input equilibria and zone setpoint temperature, which can lead to important energy savings. A comparison with a baseline control strategy with constant setpoint temperature is presented: the comparison results show good tracking performance and improved energy efficiency in terms of HVAC energy consumption.Accepted Author ManuscriptOLD Intelligent Control & RoboticsTeam DeSchutte
    • …
    corecore