31 research outputs found
Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk
Though deep reinforcement learning (DRL) has obtained substantial success, it
may encounter catastrophic failures due to the intrinsic uncertainty of both
transition and observation. Most of the existing methods for safe reinforcement
learning can only handle transition disturbance or observation disturbance
since these two kinds of disturbance affect different parts of the agent;
besides, the popular worst-case return may lead to overly pessimistic policies.
To address these issues, we first theoretically prove that the performance
degradation under transition disturbance and observation disturbance depends on
a novel metric of Value Function Range (VFR), which corresponds to the gap in
the value function between the best state and the worst state. Based on the
analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk
and propose a novel reinforcement learning algorithm of
CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive
constrained optimization problem by keeping its CVaR under a given threshold.
Experimental results show that CPPO achieves a higher cumulative reward and is
more robust against both observation and transition disturbances on a series of
continuous control tasks in MuJoCo
A Thermoplastic Elastomer Belt Based Robotic Gripper
Novel robotic grippers have captured increasing interests recently because of
their abilities to adapt to varieties of circumstances and their powerful
functionalities. Differing from traditional gripper with mechanical
components-made fingers, novel robotic grippers are typically made of novel
structures and materials, using a novel manufacturing process. In this paper, a
novel robotic gripper with external frame and internal thermoplastic elastomer
belt-made net is proposed. The gripper grasps objects using the friction
between the net and objects. It has the ability of adaptive gripping through
flexible contact surface. Stress simulation has been used to explore the
regularity between the normal stress on the net and the deformation of the net.
Experiments are conducted on a variety of objects to measure the force needed
to reliably grip and hold the object. Test results show that the gripper can
successfully grip objects with varying shape, dimensions, and textures. It is
promising that the gripper can be used for grasping fragile objects in the
industry or out in the field, and also grasping the marine organisms without
hurting them
Task Aware Dreamer for Task Generalization in Reinforcement Learning
A long-standing goal of reinforcement learning is to acquire agents that can
learn on training tasks and generalize well on unseen tasks that may share a
similar dynamic but with different reward functions. A general challenge is to
quantitatively measure the similarities between these different tasks, which is
vital for analyzing the task distribution and further designing algorithms with
stronger generalization. To address this, we present a novel metric named Task
Distribution Relevance (TDR) via optimal Q functions of different tasks to
capture the relevance of the task distribution quantitatively. In the case of
tasks with a high TDR, i.e., the tasks differ significantly, we show that the
Markovian policies cannot differentiate them, leading to poor performance.
Based on this insight, we encode all historical information into policies for
distinguishing different tasks and propose Task Aware Dreamer (TAD), which
extends world models into our reward-informed world models to capture invariant
latent features over different tasks. In TAD, we calculate the corresponding
variational lower bound of the data log-likelihood, including a novel term to
distinguish different tasks via states, to optimize reward-informed world
models. Extensive experiments in both image-based control tasks and state-based
control tasks demonstrate that TAD can significantly improve the performance of
handling different tasks simultaneously, especially for those with high TDR,
and demonstrate a strong generalization ability to unseen tasks
Atomic Ramsey interferometry with S- and D-band in a triangular optical lattice
Ramsey interferometers have wide applications in science and engineering.
Compared with the traditional interferometer based on internal states, the
interferometer with external quantum states has advantages in some applications
for quantum simulation and precision measurement. Here, we develop a Ramsey
interferometry with Bloch states in S- and D-band of a triangular optical
lattice for the first time. The key to realizing this interferometer in
two-dimensionally coupled lattice is that we use the shortcut method to
construct pulse. We observe clear Ramsey fringes and analyze the
decoherence mechanism of fringes. Further, we design an echo pulse
between S- and D-band, which significantly improves the coherence time. This
Ramsey interferometer in the dimensionally coupled lattice has potential
applications in the quantum simulations of topological physics, frustrated
effects, and motional qubits manipulation
Computationally Efficient Approximations Using Adaptive Weighting Coefficients for Solving Structural Optimization Problems
With rapid development of advanced manufacturing technologies and high demands for innovative lightweight constructions to mitigate the environmental and economic impacts, design optimization has attracted increasing attention in many engineering subjects, such as civil, structural, aerospace, automotive, and energy engineering. For nonconvex nonlinear constrained optimization problems with continuous variables, evaluations of the fitness and constraint functions by means of finite element simulations can be extremely expensive. To address this problem by algorithms with sufficient accuracy as well as less computational cost, an extended multipoint approximation method (EMAM) and an adaptive weighting-coefficient strategy are proposed to efficiently seek the optimum by the integration of metamodels with sequential quadratic programming (SQP). The developed EMAM stems from the principle of the polynomial approximation and assimilates the advantages of Taylor’s expansion for improving the suboptimal continuous solution. Results demonstrate the superiority of the proposed EMAM over other evolutionary algorithms (e.g., particle swarm optimization technique, firefly algorithm, genetic algorithm, metaheuristic methods, and other metamodeling techniques) in terms of the computational efficiency and accuracy by four well-established engineering problems. The developed EMAM reduces the number of simulations during the design phase and provides wealth of information for designers to effectively tailor the parameters for optimal solutions with computational efficiency in the simulation-based engineering optimization problems
Removal of antibiotics from black water by a membrane filtration-visible light photocatalytic system
International audienceTo address the problem of pollution caused by antibiotics in black water, we synthesized membranes containing the g-C3N4/TiO2 photocatalysts and tested them for the removal of sulfamethoxazole and tetracycline in pure water conditions and black water. We compared the basic membrane filtration and photocatalytic performance of the g-C3N4/TiO2 and the PVDF membranes, and investigated the influencing factors and application aspects of membrane filtration-photocatalytic systems for antibiotic removal. The anti-fouling performance and re-usability of g-C3N4/TiO2 membranes were investigated by evaluating the fouling reversibility of photocatalytic membranes. The results showed that g-C3N4/TiO2 improved the porosity, hydrophilicity and permeability of the membranes significantly. PgT-3 (PVDF/g-C 3 N 4 /TiO 2) membrane with 0.03 wt% of g-C3N4/TiO2 has the best overall performance with 72.8 % and 63.9 % removal efficiency for sulfamethoxazole and tetracycline respectively. Neutral or weakly acidic solution (pH = 5.0-7.0) is favorable for the removal of both study antibiotics. The complex composition of black water increased the adsorption load on the membrane and caused the inhibition of the photocatalysis of the g-C3N4/TiO2 membrane. The absorption of visible light by g-C3N4 accelerates the electron transfer rate and promotes the separation of electrons from holes. The oxidation-active substance h + produced in the system plays an important role in the removal of sulfamethoxazole and tetracycline
Study on Crystal Growth of Tobermorite Synthesized by Calcium Silicate Slag and Silica Fume
In order to high-value utilize the secondary solid waste calcium silicate slag (CSS) generated in the process of the extraction of alumina from fly ash, in this paper, tobermorite was synthesized using CSS and silica fume (SF) at different hydrothermal synthesis times. The hydrothermal synthesis was evaluated by means of XRD, SEM, EDS, and micropore analysis, and the results discussed. The results indicate that β-dicalcium silicate, the primary phase in the CSS, partially hydrates at the beginning of hydrothermal synthesis conditions to form mesh-like crystal C-S-H (calcium-rich) and calcium hydroxide. It then reacts with SF to form yarn-like crystal C-S-H (silicon-rich) and then furtherly grows into large flake-like crystal C-S-H (silicon-rich) at 3 h. When the synthesis time is 4 h, β-dicalcium silicate completely hydrates, and crystal C-S-H (calcium-rich) and calcium hydroxide further reacts with large flake-like crystal C-S-H (silicon-rich) to generate medium flake-like tobermorite. With the increase in time, the crystal of hydrothermal synthesis grows in the order of medium flake-like tobermorite, small flake-like tobermorite, strip flake-like tobermorite, fibrous-like tobermorite, and spindle-like tobermorite, and the APV, APD, and SSA show a trend of decreasing first, then increasing, and then decreasing. Meanwhile, strip flake-like tobermorite with a higher average pore volume (APV), average pore diameter (APD), and specific surface area (SSA) can be synthesized at 6 h
Hydration Mechanisms of Alkali-Activated Cementitious Materials with Ternary Solid Waste Composition
Considering the recent eco-friendly and efficient utilization of three kinds of solid waste, including calcium silicate slag (CSS), fly ash (FA), and blast-furnace slag (BFS), alkali-activated cementitious composite materials using these three waste products were prepared with varying content of sodium silicate solution. The hydration mechanisms of the cementitious materials were analyzed by X-ray diffraction, Fourier-transform infrared spectroscopy, scanning electron microscopy, and energy dispersive spectroscopy. The results show that the composite is a binary cementitious system composed of C(N)-A-S-H and C-S-H. Si and Al minerals in FA and BFS are depolymerized to form the Q0 structure of SiO4 and AlO4. Meanwhile, β-dicalcium silicate in CSS hydrates to form C-S-H and Ca(OH)2. Part of Ca(OH)2 reacts with the Q0 structure of AlO4 and SiO4 to produce lawsonite and wairakite with a low polymerization degree of the Si-O and Al-O bonds. With the participation of Na+, part of Ca(OH)2 reacts with the Q0 structure of AlO4 and the Q3 structure of SiO4, which comes from the sodium silicate solution. When the sodium silicate content is 9.2%, the macro properties of the composites effectively reach saturation. The compressive strength for composites with 9.2% sodium silicate was 23.7 and 35.9 MPa after curing for 7 and 28 days, respectively