36 research outputs found

    Neural Laplace Control for Continuous-time Delayed Systems

    Full text link
    Many real-world offline reinforcement learning (RL) problems involve continuous-time environments with delays. Such environments are characterized by two distinctive features: firstly, the state x(t) is observed at irregular time intervals, and secondly, the current action a(t) only affects the future state x(t + g) with an unknown delay g > 0. A prime example of such an environment is satellite control where the communication link between earth and a satellite causes irregular observations and delays. Existing offline RL algorithms have achieved success in environments with irregularly observed states in time or known delays. However, environments involving both irregular observations in time and unknown delays remains an open and challenging problem. To this end, we propose Neural Laplace Control, a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner--and is able to learn from an offline dataset sampled with irregular time intervals from an environment that has a inherent unknown constant delay. We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.Comment: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR: Volume 206. Copyright 2023 by the author(s

    Hopping, Landing, and Balancing with Springs

    Get PDF
    This work investigates the interaction of a planar double pendulum robot and springs, where the lower body (the leg) has been modified to include a spring-loaded passive prismatic joint. The thesis explores the mechanical advantage of adding a spring to the robot in hopping, landing, and balancing activities by formulating the motion problem as a boundary value problem; and also provides a control strategy for such scenarios. It also analyses the robustness of the developed controller to uncertain spring parameters, and an observer solution is provided to estimate these parameters while the robot is performing a tracking task. Finally, it shows a study of how well IMUs perform in bouncing conditions, which is critical for the proper operation of a hopping robot or a running-legged one

    METHODOLOGY AND ANALYSIS FOR EFFICIENT CUSTOM ARCHITECTURE DESIGN USING MACHINE LEARNING

    Get PDF
    Machine learning algorithms especially Deep Neural Networks (DNNs) have revolutionized the arena of computing in the last decade. DNNs along the with the computational advancements also bring an unprecedented appetite for compute and parallel processing. Computer architects have risen to challenge by creating novel custom architectures called accelerators. However, given the ongoing rapid advancements in algorithmic development accelerators architects are playing catch- up to churn out optimized designs each time new algorithmic changes are published. It is also worth noting that the accelerator design cycle is expensive. It requires multiple iteration of design space optimization and expert knowledge of both digital design as well as domain knowledge of the workload itself. It is therefore imperative to build scalable and flexible architectures which are adaptive to work well for a variety of workloads. Moreover, it is also important to develop relevant tools and design methodologies which lower the overheads incurred at design time such that subsequent design iterations are fast and sustainable. This thesis takes a three-pronged approach to address these problems and push the frontiers for DNN accelerator design process. First, the thesis presents the description of a now popular cycle accurate DNN accelerator simulator. This simulator is built with the goal of obtaining detailed metrics as fast as possible. A detailed analytical model is also presented in this thesis which enables the designer to understand the interactions of the workload and architecture parameters. The information from the model can be directly used to prune the design search space to achieve faster convergence. Second, the thesis details a couple of flexible yet scalable DNN accelerator architectures. Finally, this thesis describes the use of machine learning to capture the design space of DNN accelerators and train a model to predict optimum configurations when queried with workload parameters and design constraints. The novelty of this piece of work is that it systematically lays out the formulation of traditional design optimization into a machine learning problem and describes the quality and components of a model which works well across various architecture design tasks.Ph.D

    Control and safety of fully actuated and underactuated nonlinear systems: from adaptation to robustness to optimality

    Get PDF
    The state-of-the-art quadratic program-based control Lyapunov-control barrier function (QP-CLBF) is a powerful control approach to balance safety and stability in a pointwise optimal fashion. However, under this approach, modeling inaccuracies may degrade the performance of closed-loop systems and cause a violation of safety-critical constraints. This thesis extends the recently-developed QP-CLBF through the derivation of five novel robust quadratic program-based adaptive control approaches for fully actuated and underactuated nonlinear systems with a view toward adapting to unknown parameters, being robust to unmodeled dynamics and disturbances, ensuring the system remains in safe sets and being optimal with respect in a pointwise fashion. Simulation and quantitative results demonstrate the superiority of proposed approaches over the baseline methods.Ph.D

    A survey of preference-based reinforcement learning methods

    Get PDF
    Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function of ten requires a lot of task-specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert\u27s preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. Furthermore, we point out shortcomings of current algorithms, propose open research questions and briefly survey practical tasks that have been solved using PbRL
    corecore