17 research outputs found
Safe Deep Policy Adaptation
A critical goal of autonomy and artificial intelligence is enabling
autonomous robots to rapidly adapt in dynamic and uncertain environments.
Classic adaptive control and safe control provide stability and safety
guarantees but are limited to specific system classes. In contrast, policy
adaptation based on reinforcement learning (RL) offers versatility and
generalizability but presents safety and robustness challenges. We propose
SafeDPA, a novel RL and control framework that simultaneously tackles the
problems of policy adaptation and safe reinforcement learning. SafeDPA jointly
learns adaptive policy and dynamics models in simulation, predicts environment
configurations, and fine-tunes dynamics models with few-shot real-world data. A
safety filter based on the Control Barrier Function (CBF) on top of the RL
policy is introduced to ensure safety during real-world deployment. We provide
theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA
against learning errors and extra perturbations. Comprehensive experiments on
(1) classic control problems (Inverted Pendulum), (2) simulation benchmarks
(Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate
great superiority of SafeDPA in both safety and task performance, over
state-of-the-art baselines. Particularly, SafeDPA demonstrates notable
generalizability, achieving a 300% increase in safety rate compared to the
baselines, under unseen disturbances in real-world experiments.Comment: 8 pages, 7 figure
Safety Index Synthesis via Sum-of-Squares Programming
Control systems often need to satisfy strict safety requirements. Safety
index provides a handy way to evaluate the safety level of the system and
derive the resulting safe control policies. However, designing safety index
functions under control limits is difficult and requires a great amount of
expert knowledge. This paper proposes a framework for synthesizing the safety
index for general control systems using sum-of-squares programming. Our
approach is to show that ensuring the non-emptiness of safe control on the safe
set boundary is equivalent to a local manifold positiveness problem. We then
prove that this problem is equivalent to sum-of-squares programming via the
Positivstellensatz of algebraic geometry. We validate the proposed method on
robot arms with different degrees of freedom and ground vehicles. The results
show that the synthesized safety index guarantees safety and our method is
effective even in high-dimensional robot systems
Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation
We present Human to Humanoid (H2O), a reinforcement learning (RL) based
framework that enables real-time whole-body teleoperation of a full-sized
humanoid robot with only an RGB camera. To create a large-scale retargeted
motion dataset of human movements for humanoid robots, we propose a scalable
"sim-to-data" process to filter and pick feasible motions using a privileged
motion imitator. Afterwards, we train a robust real-time humanoid motion
imitator in simulation using these refined motions and transfer it to the real
humanoid robot in a zero-shot manner. We successfully achieve teleoperation of
dynamic whole-body motions in real-world scenarios, including walking, back
jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our
knowledge, this is the first demonstration to achieve learning-based real-time
whole-body humanoid teleoperation.Comment: Project website: https://human2humanoid.com
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning
Safety is a critical hurdle that limits the application of deep reinforcement learning to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision process. However, constrained methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safety benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs to a Lagrangian-based policy learner and a constrained-optimization policy learner with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines
Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models
Safety is one of the biggest concerns to applying reinforcement learning (RL)
to the physical world. In its core part, it is challenging to ensure RL agents
persistently satisfy a hard state constraint without white-box or black-box
dynamics models. This paper presents an integrated model learning and safe
control framework to safeguard any agent, where its dynamics are learned as
Gaussian processes. The proposed theory provides (i) a novel method to
construct an offline dataset for model learning that best achieves safety
requirements; (ii) a parameterization rule for safety index to ensure the
existence of safe control; (iii) a safety guarantee in terms of probabilistic
forward invariance when the model is learned using the aforementioned dataset.
Simulation results show that our framework guarantees almost zero safety
violation on various continuous control tasks.Comment: L4DC 202
State-wise Safe Reinforcement Learning: A Survey
Despite the tremendous success of Reinforcement Learning (RL) algorithms in
simulation environments, applying RL to real-world applications still faces
many challenges. A major concern is safety, in another word, constraint
satisfaction. State-wise constraints are one of the most common constraints in
real-world applications and one of the most challenging constraints in Safe RL.
Enforcing state-wise constraints is necessary and essential to many challenging
tasks such as autonomous driving, robot manipulation. This paper provides a
comprehensive review of existing approaches that address state-wise constraints
in RL. Under the framework of State-wise Constrained Markov Decision Process
(SCMDP), we will discuss the connections, differences, and trade-offs of
existing approaches in terms of (i) safety guarantee and scalability, (ii)
safety and reward performance, and (iii) safety after convergence and during
training. We also summarize limitations of current methods and discuss
potential future directions.Comment: IJCAI Survey Track 202
A New Framework of the EAP System in Semiconductor Manufacturing Internet of Things
In modern semiconductor manufacturing, the computer-integrated manufacturing system plays an essential role in automation with plenty of software systems. Among them, the equipment automation program (EAP) is one of the fundamental systems to support the interconnection of various types of equipment. For the traditional EAP, the communication and logic models are tightly coupled. The occurrence of any exception in EAP may make the EAP power down such that no equipment is reachable. Additionally, it can handle a couple of manufacturing tools only. The extension of manufacturing tools in a semiconductor fab makes the investment in EAP unbearable. Thus, fabs are highly desired to solve such problems of the traditional EAP. To do so, this work designs a new framework for a distributed EAP system with new technologies being adopted to enhance the usage and stability of EAP. Additionally, this design philosophy makes the distributed EAP system more compatible and expansible. Further, this EAP system can be upgraded as communication and big data technologies advance. Experiments are carried out to verify the stability of the designed distributed EAP system
Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards
Ensuring safety in Reinforcement Learning (RL), typically framed as a
Constrained Markov Decision Process (CMDP), is crucial for real-world
exploration applications. Current approaches in handling CMDP struggle to
balance optimality and feasibility, as direct optimization methods cannot
ensure state-wise in-training safety, and projection-based methods correct
actions inefficiently through lengthy iterations. To address these challenges,
we propose Adaptive Chance-constrained Safeguards (ACS), an adaptive,
model-free safe RL algorithm using the safety recovery rate as a surrogate
chance constraint to iteratively ensure safety during exploration and after
achieving convergence. Theoretical analysis indicates that the relaxed
probabilistic constraint sufficiently guarantees forward invariance to the safe
set. And extensive experiments conducted on both simulated and real-world
safety-critical tasks demonstrate its effectiveness in enforcing safety (nearly
zero-violation) while preserving optimality (+23.8%), robustness, and fast
response in stochastic real-world settings.Comment: 8 pages, 6 figures, 3 table