51 research outputs found

    Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

    Full text link
    Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this "reality gap". By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Full text link
    Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

    Asymmetric Actor Critic for Image-Based Robot Learning

    Full text link
    Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT

    Expression of proteins associated with therapy resistance in rhabdomyosarcoma and neuroblastoma tumour cells

    Get PDF
    The activity of multidrug resistance (MDR) proteins in tumour cells is associated with an increased resistance to therapy and in consequence with a decreased effectiveness of chemotherapy. The majority of MDR molecules belong to a family of ABC (ATP binding cassette) transporters. Neuroblastoma (NBL) and rhabdomyosarcoma (RMS) are common solid tumours of childhood. The response to therapy is better in NBL, worse in RMS, but still unsatisfactory despite surgery and aggressive chemotherapy. The immunohistochemical staining for p-gp (p-glycoprotein), MRP1 (multidrug resistance associated protein 1), BCRP (breast cancer resistance protein) and LRP (lung resistance protein) expression was performed in primary tumour sections of NBL (10 cases) and RMS (10 cases). A different pattern of MDR expression in NBL and RMS were noted. In NBL, MRP1 was expressed in all studied tumours, p-gp, BCRP only in 3 out of 10 tumours, LRP, in 4 cases. The combination of more than one protein was noted in the majority of NBL tumours. In RMS, the expression of 3 or 4 MDR proteins was noted in 9 cases. The high expression of an MDR protein profile in RMS suggests various mechanisms acting simultaneously, which might explain chemotherapy resistance and a low percentage of long-time survival in this tumour

    Domain Randomization and Generative Models for Robotic Grasping

    Full text link
    Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis. We generate millions of unique, unrealistic procedurally generated objects, and train a deep neural network to perform grasp planning on these objects. Since the distribution of successful grasps for a given object can be highly multimodal, we propose an autoregressive grasp planning model that maps sensor inputs of a scene to a probability distribution over possible grasps. This model allows us to sample grasps efficiently at test time (or avoid sampling entirely). We evaluate our model architecture and data generation pipeline in simulation and the real world. We find we can achieve a >>90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects. We also demonstrate an 80% success rate on real-world grasp attempts despite having only been trained on random simulated objects.Comment: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018

    “PI OF THE SKY” OFF-LINE EXPERIMENT WITH GLORIA

    Get PDF
    GLORIA is the first free and open-access network of robotic telescopes in the world. Based on the Web 2.0 environment, amateur and professional users can do research in astronomy by observing with robotic telescope, and/or analyzing data acquired with GLORIA, or from other free access databases. The GLORIA project develops free standards, protocols and tools for controlling Robotic Telescopes and related instrumentation, for scheduling observations in the telescope network, and for conducting so-called off-line experiments based on the analysis of astronomical data. This contribution summarizes the implementation and results from the first research level off-line demonstrator experiment implemented in GLORIA, which was based on data collected with the “Pi of the Sky” telescope in Chile

    Evaluation of the In vitro cytotoxic activity of caffeic acid derivatives and liposomal formulation against pancreatic cancer cell lines

    Get PDF
    Pancreatic cancer belongs to the most aggressive group of cancers, with very poor prognosis. Therefore, there is an important need to find more potent drugs that could deliver an improved therapeutic approach. In the current study we searched for selective and effective caffeic acid derivatives. For this purpose, we analyzed twelve compounds and evaluated their in vitro cytotoxic activity against two human pancreatic cancer cell lines, along with a control, normal fibroblast cell line, by the classic MTT assay. Six out of twelve tested caffeic acid derivatives showed a desirable effect. To improve the therapeutic efficacy of such active compounds, we developed a formulation where caffeic acid derivative (7) was encapsulated into liposomes composed of soybean phosphatidylcholine and DSPE-PEG2000. Subsequently, we analyzed the properties of this formulation in terms of basic physical parameters (such as size, zeta potential, stability at 4 °C and morphology), hemolytic and cytotoxic activity and cellular uptake. Overall, the liposomal formulation was found to be stable, non-hemolytic and had activity against pancreatic cancer cells (IC50 19.44 µM and 24.3 µM, towards AsPC1 and BxPC3 cells, respectively) with less toxicity against normal fibroblasts. This could represent a promising alternative to currently available treatment options

    POMIAR CZASU MARTWEGO METODĄ DWÓCH ŹRÓDEŁ – OPTYMIZACJA PODZIAŁU CZASU POMIARU

    Get PDF
    The article presents the analysis of the dead time measurement using two sources for a non-paralyzable detector. It determined the optimum division of count rate measurement time between both source measurement and a single source one. Results of the work can be used to optimize dead time measurement for systems which count photons or particles.W artykule zaprezentowano analizę pomiaru czasu martwego detektora nieparaliżowalnego metodą dwóch źródeł. Wyznaczono optymalny podział czasu pomiaru częstości zliczeń dla pomiaru jednym i dwoma źródłami. Wyniki pracy mogą być wykorzystane do optymalizacji systemów zliczających fotony lub cząstki
    corecore