52 research outputs found
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide
an abundant source of data and alleviate certain safety concerns during the
training process. But the behaviours developed by agents in simulation are
often specific to the characteristics of the simulator. Due to modeling error,
strategies that are successful in simulation may not transfer to their real
world counterparts. In this paper, we demonstrate a simple method to bridge
this "reality gap". By randomizing the dynamics of the simulator during
training, we are able to develop policies that are capable of adapting to very
different dynamics, including ones that differ significantly from the dynamics
on which the policies were trained. This adaptivity enables the policies to
generalize to the dynamics of the real world without any training on the
physical system. Our approach is demonstrated on an object pushing task using a
robotic arm. Despite being trained exclusively in simulation, our policies are
able to maintain a similar level of performance when deployed on a real robot,
reliably moving an object to a desired location from random initial
configurations. We explore the impact of various design decisions and show that
the resulting policies are robust to significant calibration error
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
Asymmetric Actor Critic for Image-Based Robot Learning
Deep reinforcement learning (RL) has proven a powerful technique in many
sequential decision making domains. However, Robotics poses many challenges for
RL, most notably training on a physical system can be expensive and dangerous,
which has sparked significant interest in learning control policies using a
physics simulator. While several recent works have shown promising results in
transferring policies trained in simulation to the real world, they often do
not fully utilize the advantage of working with a simulator. In this work, we
exploit the full state observability in the simulator to train better policies
which take as input only partial observations (RGBD images). We do this by
employing an actor-critic training algorithm in which the critic is trained on
full states while the actor (or policy) gets rendered images as input. We show
experimentally on a range of simulated tasks that using these asymmetric inputs
significantly improves performance. Finally, we combine this method with domain
randomization and show real robot experiments for several tasks like picking,
pushing, and moving a block. We achieve this simulation to real world transfer
without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT
Expression of proteins associated with therapy resistance in rhabdomyosarcoma and neuroblastoma tumour cells
The activity of multidrug resistance (MDR) proteins in tumour cells is associated
with an increased resistance to therapy and in consequence with a decreased
effectiveness of chemotherapy. The majority of MDR molecules belong to a family
of ABC (ATP binding cassette) transporters. Neuroblastoma (NBL) and
rhabdomyosarcoma (RMS) are common solid tumours of childhood. The response
to therapy is better in NBL, worse in RMS, but still unsatisfactory despite surgery
and aggressive chemotherapy. The immunohistochemical staining for p-gp
(p-glycoprotein), MRP1 (multidrug resistance associated protein 1), BCRP (breast
cancer resistance protein) and LRP (lung resistance protein) expression was
performed in primary tumour sections of NBL (10 cases) and RMS (10 cases).
A different pattern of MDR expression in NBL and RMS were noted. In NBL,
MRP1 was expressed in all studied tumours, p-gp, BCRP only in 3 out of 10
tumours, LRP, in 4 cases. The combination of more than one protein was noted in
the majority of NBL tumours. In RMS, the expression of 3 or 4 MDR proteins was
noted in 9 cases. The high expression of an MDR protein profile in RMS suggests
various mechanisms acting simultaneously, which might explain chemotherapy
resistance and a low percentage of long-time survival in this tumour
Domain Randomization and Generative Models for Robotic Grasping
Deep learning-based robotic grasping has made significant progress thanks to
algorithmic improvements and increased data availability. However,
state-of-the-art models are often trained on as few as hundreds or thousands of
unique object instances, and as a result generalization can be a challenge.
In this work, we explore a novel data generation pipeline for training a deep
neural network to perform grasp planning that applies the idea of domain
randomization to object synthesis. We generate millions of unique, unrealistic
procedurally generated objects, and train a deep neural network to perform
grasp planning on these objects.
Since the distribution of successful grasps for a given object can be highly
multimodal, we propose an autoregressive grasp planning model that maps sensor
inputs of a scene to a probability distribution over possible grasps. This
model allows us to sample grasps efficiently at test time (or avoid sampling
entirely).
We evaluate our model architecture and data generation pipeline in simulation
and the real world. We find we can achieve a 90% success rate on previously
unseen realistic objects at test time in simulation despite having only been
trained on random objects. We also demonstrate an 80% success rate on
real-world grasp attempts despite having only been trained on random simulated
objects.Comment: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2018
“PI OF THE SKY” OFF-LINE EXPERIMENT WITH GLORIA
GLORIA is the first free and open-access network of robotic telescopes in the world. Based on the Web 2.0 environment, amateur and professional users can do research in astronomy by observing with robotic telescope, and/or analyzing data acquired with GLORIA, or from other free access databases. The GLORIA project develops free standards, protocols and tools for controlling Robotic Telescopes and related instrumentation, for scheduling observations in the telescope network, and for conducting so-called off-line experiments based on the analysis of astronomical data. This contribution summarizes the implementation and results from the first research level off-line demonstrator experiment implemented in GLORIA, which was based on data collected with the “Pi of the Sky” telescope in Chile
Evaluation of the In vitro cytotoxic activity of caffeic acid derivatives and liposomal formulation against pancreatic cancer cell lines
Pancreatic cancer belongs to the most aggressive group of cancers, with very poor prognosis. Therefore, there is an important need to find more potent drugs that could deliver an improved therapeutic approach. In the current study we searched for selective and effective caffeic acid derivatives. For this purpose, we analyzed twelve compounds and evaluated their in vitro cytotoxic activity against two human pancreatic cancer cell lines, along with a control, normal fibroblast cell line, by the classic MTT assay. Six out of twelve tested caffeic acid derivatives showed a desirable effect. To improve the therapeutic efficacy of such active compounds, we developed a formulation where caffeic acid derivative (7) was encapsulated into liposomes composed of soybean phosphatidylcholine and DSPE-PEG2000. Subsequently, we analyzed the properties of this formulation in terms of basic physical parameters (such as size, zeta potential, stability at 4 °C and morphology), hemolytic and cytotoxic activity and cellular uptake. Overall, the liposomal formulation was found to be stable, non-hemolytic and had activity against pancreatic cancer cells (IC50 19.44 µM and 24.3 µM, towards AsPC1 and BxPC3 cells, respectively) with less toxicity against normal fibroblasts. This could represent a promising alternative to currently available treatment options
POMIAR CZASU MARTWEGO METODĄ DWÓCH ŹRÓDEŁ – OPTYMIZACJA PODZIAŁU CZASU POMIARU
The article presents the analysis of the dead time measurement using two sources for a non-paralyzable detector. It determined the optimum division of count rate measurement time between both source measurement and a single source one. Results of the work can be used to optimize dead time measurement for systems which count photons or particles.W artykule zaprezentowano analizę pomiaru czasu martwego detektora nieparaliżowalnego metodą dwóch źródeł. Wyznaczono optymalny podział czasu pomiaru częstości zliczeń dla pomiaru jednym i dwoma źródłami. Wyniki pracy mogą być wykorzystane do optymalizacji systemów zliczających fotony lub cząstki
- …