1,224 research outputs found

    Physics-informed reinforcement learning via probabilistic co-adjustment functions

    Full text link
    Reinforcement learning of real-world tasks is very data inefficient, and extensive simulation-based modelling has become the dominant approach for training systems. However, in human-robot interaction and many other real-world settings, there is no appropriate one-model-for-all due to differences in individual instances of the system (e.g. different people) or necessary oversimplifications in the simulation models. This requires two approaches: 1. either learning the individual system's dynamics approximately from data which requires data-intensive training or 2. using a complete digital twin of the instances, which may not be realisable in many cases. We introduce two approaches: co-kriging adjustments (CKA) and ridge regression adjustment (RRA) as novel ways to combine the advantages of both approaches. Our adjustment methods are based on an auto-regressive AR1 co-kriging model that we integrate with GP priors. This yield a data- and simulation-efficient way of using simplistic simulation models (e.g., simple two-link model) and rapidly adapting them to individual instances (e.g., biomechanics of individual people). Using CKA and RRA, we obtain more accurate uncertainty quantification of the entire system's dynamics than pure GP-based and AR1 methods. We demonstrate the efficiency of co-kriging adjustment with an interpretable reinforcement learning control example, learning to control a biomechanical human arm using only a two-link arm simulation model (offline part) and CKA derived from a small amount of interaction data (on-the-fly online). Our method unlocks an efficient and uncertainty-aware way to implement reinforcement learning methods in real world complex systems for which only imperfect simulation models exist

    Adaptive and learning-based formation control of swarm robots

    Get PDF
    Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation

    A brief review of neural networks based learning and control and their applications for robots

    Get PDF
    As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

    Using wireless sensors and networks program for chemical particle propagation mapping and chemical source localization

    Get PDF
    Chemical source localization is a challenge for most of researchers. It has extensive applications, such as anti-terrorist military, Gas and oil industry, and environment engineering. This dissertation used wireless sensor and sensor networks to get chemical particle propagation mapping and chemical source localization. First, the chemical particle propagation mapping is built using interpolation and extrapolation methods. The interpolation method get the chemical particle path through the sensors, and the extrapolation method get the chemical particle beyond the sensors. Both of them compose of the mapping in the whole considered area. Second, the algorithm of sensor fusion is proposed. It smooths the chemical particle paths through integration of more sensors\u27 value and updating the parameters. The updated parameters are associated with including sensor fusion among chemical sensors and wind sensors at same positions and sensor fusion among sensors at different positions. This algorithm improves the accuracy and efficiency of chemical particle mapping. Last, the reasoning system is implemented aiming to detect the chemical source in the considered region where the chemical particle propagation mapping has been finished. This control scheme dynamically analyzes the data from the sensors and guide us to find the goal. In this dissertation, the novel algorithm of modelling chemical propagation is programmed using Matlab. Comparing the results from computational fluid dynamics (CFD) software COMSOL, this algorithm have the same level of accuracy. However, it saves more computational times and memories. The simulation and experiment of detecting chemical source in an indoor environment and outdoor environment are finished in this dissertation --Abstract, page iii

    INTELLIGENT VISION-BASED NAVIGATION SYSTEM

    Get PDF
    This thesis presents a complete vision-based navigation system that can plan and follow an obstacle-avoiding path to a desired destination on the basis of an internal map updated with information gathered from its visual sensor. For vision-based self-localization, the system uses new floor-edges-specific filters for detecting floor edges and their pose, a new algorithm for determining the orientation of the robot, and a new procedure for selecting the initial positions in the self-localization procedure. Self-localization is based on matching visually detected features with those stored in a prior map. For planning, the system demonstrates for the first time a real-world application of the neural-resistive grid method to robot navigation. The neural-resistive grid is modified with a new connectivity scheme that allows the representation of the collision-free space of a robot with finite dimensions via divergent connections between the spatial memory layer and the neuro-resistive grid layer. A new control system is proposed. It uses a Smith Predictor architecture that has been modified for navigation applications and for intermittent delayed feedback typical of artificial vision. A receding horizon control strategy is implemented using Normalised Radial Basis Function nets as path encoders, to ensure continuous motion during the delay between measurements. The system is tested in a simplified environment where an obstacle placed anywhere is detected visually and is integrated in the path planning process. The results show the validity of the control concept and the crucial importance of a robust vision-based self-localization process

    Hierarchical generative modelling for autonomous robots

    Full text link
    Humans can produce complex whole-body motions when interacting with their surroundings, by planning, executing and combining individual limb movements. We investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics the deep temporal architecture of human motor control. Here, temporal depth refers to the nested time scales at which successive levels of a forward or generative model unfold, for example, delivering an object requires a global plan to contextualise the fast coordination of multiple local movements of limbs. This separation of temporal scales also motivates robotics and control. Specifically, to achieve versatile sensorimotor control, it is advantageous to hierarchically structure the planning and low-level motor control of individual limbs. We use numerical and physical simulation to conduct experiments and to establish the efficacy of this formulation. Using a hierarchical generative model, we show how a humanoid robot can autonomously complete a complex task that necessitates a holistic use of locomotion, manipulation, and grasping. Specifically, we demonstrate the ability of a humanoid robot that can retrieve and transport a box, open and walk through a door to reach the destination, approach and kick a football, while showing robust performance in presence of body damage and ground irregularities. Our findings demonstrated the effectiveness of using human-inspired motor control algorithms, and our method provides a viable hierarchical architecture for the autonomous completion of challenging goal-directed tasks

    Structured machine learning models for robustness against different factors of variability in robot control

    Get PDF
    An important feature of human sensorimotor skill is our ability to learn to reuse them across different environmental contexts, in part due to our understanding of attributes of variability in these environments. This thesis explores how the structure of models used within learning for robot control could similarly help autonomous robots cope with variability, hence achieving skill generalisation. The overarching approach is to develop modular architectures that judiciously combine different forms of inductive bias for learning. In particular, we consider how models and policies should be structured in order to achieve robust behaviour in the face of different factors of variation - in the environment, in objects and in other internal parameters of a policy - with the end goal of more robust, accurate and data-efficient skill acquisition and adaptation. At a high level, variability in skill is determined by variations in constraints presented by the external environment, and in task-specific perturbations that affect the specification of optimal action. A typical example of environmental perturbation would be variation in lighting and illumination, affecting the noise characteristics of perception. An example of task perturbations would be variation in object geometry, mass or friction, and in the specification of costs associated with speed or smoothness of execution. We counteract these factors of variation by exploring three forms of structuring: utilising separate data sets curated according to the relevant factor of variation, building neural network models that incorporate this factorisation into the very structure of the networks, and learning structured loss functions. The thesis is comprised of four projects exploring this theme within robotics planning and prediction tasks. Firstly, in the setting of trajectory prediction in crowded scenes, we explore a modular architecture for learning static and dynamic environmental structure. We show that factorising the prediction problem from the individual representations allows for robust and label efficient forward modelling, and relaxes the need for full model re-training in new environments. This modularity explicitly allows for a more flexible and interpretable adaptation of trajectory prediction models to using pre-trained state of the art models. We show that this results in more efficient motion prediction and allows for performance comparable to the state-of-the-art supervised 2D trajectory prediction. Next, in the domain of contact-rich robotic manipulation, we consider a modular architecture that combines model-free learning from demonstration, in particular dynamic movement primitives (DMP), with modern model-free reinforcement learning (RL), using both on-policy and off-policy approaches. We show that factorising the skill learning problem to skill acquisition and error correction through policy adaptation strategies such as residual learning can help improve the overall performance of policies in the context of contact-rich manipulation. Our empirical evaluation demonstrates how to best do this with DMPs and propose “residual Learning from Demonstration“ (rLfD), a framework that combines DMPs with RL to learn a residual correction policy. Our evaluations, performed both in simulation and on a physical system, suggest that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of DMPs. We show that rLfD offers a gentle to the joints solution that improves the task success and generalisation of DMPs. Last but not least, our study shows that the extracted correction policies can be transferred to different geometries and frictions through few-shot task adaptation. Third, we employ meta learning to learn time-invariant reward functions, wherein both the objectives of a task (i.e., the reward functions) and the policy for performing that task optimally are learnt simultaneously. We propose a novel inverse reinforcement learning (IRL) formulation that allows us to 1) vary the length of execution by learning time-invariant costs, and 2) relax the temporal alignment requirements for learning from demonstration. We apply our method to two different types of cost formulations and evaluate their performance in the context of learning reward functions for simulated placement and peg in hole tasks executed on a 7DoF Kuka IIWA arm. Our results show that our approach enables learning temporally invariant rewards from misaligned demonstration that can also generalise spatially to out of distribution tasks. Finally, we employ our observations to evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which adversarially robust features can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. Our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.
    corecore