46 research outputs found
Towards practical reinforcement learning for tokamak magnetic control
Reinforcement learning (RL) has shown promising results for real-time control
systems, including the domain of plasma magnetic control. However, there are
still significant drawbacks compared to traditional feedback control approaches
for magnetic confinement. In this work, we address key drawbacks of the RL
method; achieving higher control accuracy for desired plasma properties,
reducing the steady-state error, and decreasing the required time to learn new
tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic
improvements to the agent architecture and training procedure. We present
simulation results that show up to 65\% improvement in shape accuracy, achieve
substantial reduction in the long-term bias of the plasma current, and
additionally reduce the training time required to learn new tasks by a factor
of 3 or more. We present new experiments using the upgraded RL-based
controllers on the TCV tokamak, which validate the simulation results achieved,
and point the way towards routinely achieving accurate discharges using the RL
approach
Robotic Table Tennis: A Case Study into a High Speed Learning System
We present a deep-dive into a real-world robotic learning system that, in
previous work, was shown to be capable of hundreds of table tennis rallies with
a human and has the ability to precisely return the ball to desired targets.
This system puts together a highly optimized perception subsystem, a high-speed
low-latency robot controller, a simulation paradigm that can prevent damage in
the real world and also train policies for zero-shot transfer, and automated
real world environment resets that enable autonomous training and evaluation on
physical robots. We complement a complete system description, including
numerous design decisions that are typically not widely disseminated, with a
collection of studies that clarify the importance of mitigating various sources
of latency, accounting for training and deployment distribution shifts,
robustness of the perception system, sensitivity to policy hyper-parameters,
and choice of action space. A video demonstrating the components of the system
and details of experimental results can be found at
https://youtu.be/uFcnWjB42I0.Comment: Published and presented at Robotics: Science and Systems (RSS2023