31 research outputs found

    Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

    Full text link
    Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo

    A Thermoplastic Elastomer Belt Based Robotic Gripper

    Full text link
    Novel robotic grippers have captured increasing interests recently because of their abilities to adapt to varieties of circumstances and their powerful functionalities. Differing from traditional gripper with mechanical components-made fingers, novel robotic grippers are typically made of novel structures and materials, using a novel manufacturing process. In this paper, a novel robotic gripper with external frame and internal thermoplastic elastomer belt-made net is proposed. The gripper grasps objects using the friction between the net and objects. It has the ability of adaptive gripping through flexible contact surface. Stress simulation has been used to explore the regularity between the normal stress on the net and the deformation of the net. Experiments are conducted on a variety of objects to measure the force needed to reliably grip and hold the object. Test results show that the gripper can successfully grip objects with varying shape, dimensions, and textures. It is promising that the gripper can be used for grasping fragile objects in the industry or out in the field, and also grasping the marine organisms without hurting them

    Task Aware Dreamer for Task Generalization in Reinforcement Learning

    Full text link
    A long-standing goal of reinforcement learning is to acquire agents that can learn on training tasks and generalize well on unseen tasks that may share a similar dynamic but with different reward functions. A general challenge is to quantitatively measure the similarities between these different tasks, which is vital for analyzing the task distribution and further designing algorithms with stronger generalization. To address this, we present a novel metric named Task Distribution Relevance (TDR) via optimal Q functions of different tasks to capture the relevance of the task distribution quantitatively. In the case of tasks with a high TDR, i.e., the tasks differ significantly, we show that the Markovian policies cannot differentiate them, leading to poor performance. Based on this insight, we encode all historical information into policies for distinguishing different tasks and propose Task Aware Dreamer (TAD), which extends world models into our reward-informed world models to capture invariant latent features over different tasks. In TAD, we calculate the corresponding variational lower bound of the data log-likelihood, including a novel term to distinguish different tasks via states, to optimize reward-informed world models. Extensive experiments in both image-based control tasks and state-based control tasks demonstrate that TAD can significantly improve the performance of handling different tasks simultaneously, especially for those with high TDR, and demonstrate a strong generalization ability to unseen tasks

    Atomic Ramsey interferometry with S- and D-band in a triangular optical lattice

    Full text link
    Ramsey interferometers have wide applications in science and engineering. Compared with the traditional interferometer based on internal states, the interferometer with external quantum states has advantages in some applications for quantum simulation and precision measurement. Here, we develop a Ramsey interferometry with Bloch states in S- and D-band of a triangular optical lattice for the first time. The key to realizing this interferometer in two-dimensionally coupled lattice is that we use the shortcut method to construct π/2\pi/2 pulse. We observe clear Ramsey fringes and analyze the decoherence mechanism of fringes. Further, we design an echo π\pi pulse between S- and D-band, which significantly improves the coherence time. This Ramsey interferometer in the dimensionally coupled lattice has potential applications in the quantum simulations of topological physics, frustrated effects, and motional qubits manipulation

    Computationally Efficient Approximations Using Adaptive Weighting Coefficients for Solving Structural Optimization Problems

    Get PDF
    With rapid development of advanced manufacturing technologies and high demands for innovative lightweight constructions to mitigate the environmental and economic impacts, design optimization has attracted increasing attention in many engineering subjects, such as civil, structural, aerospace, automotive, and energy engineering. For nonconvex nonlinear constrained optimization problems with continuous variables, evaluations of the fitness and constraint functions by means of finite element simulations can be extremely expensive. To address this problem by algorithms with sufficient accuracy as well as less computational cost, an extended multipoint approximation method (EMAM) and an adaptive weighting-coefficient strategy are proposed to efficiently seek the optimum by the integration of metamodels with sequential quadratic programming (SQP). The developed EMAM stems from the principle of the polynomial approximation and assimilates the advantages of Taylor’s expansion for improving the suboptimal continuous solution. Results demonstrate the superiority of the proposed EMAM over other evolutionary algorithms (e.g., particle swarm optimization technique, firefly algorithm, genetic algorithm, metaheuristic methods, and other metamodeling techniques) in terms of the computational efficiency and accuracy by four well-established engineering problems. The developed EMAM reduces the number of simulations during the design phase and provides wealth of information for designers to effectively tailor the parameters for optimal solutions with computational efficiency in the simulation-based engineering optimization problems

    Subpixel edge estimation from Grayscale images

    No full text
    Bibliography: p. 77-80

    Removal of antibiotics from black water by a membrane filtration-visible light photocatalytic system

    No full text
    International audienceTo address the problem of pollution caused by antibiotics in black water, we synthesized membranes containing the g-C3N4/TiO2 photocatalysts and tested them for the removal of sulfamethoxazole and tetracycline in pure water conditions and black water. We compared the basic membrane filtration and photocatalytic performance of the g-C3N4/TiO2 and the PVDF membranes, and investigated the influencing factors and application aspects of membrane filtration-photocatalytic systems for antibiotic removal. The anti-fouling performance and re-usability of g-C3N4/TiO2 membranes were investigated by evaluating the fouling reversibility of photocatalytic membranes. The results showed that g-C3N4/TiO2 improved the porosity, hydrophilicity and permeability of the membranes significantly. PgT-3 (PVDF/g-C 3 N 4 /TiO 2) membrane with 0.03 wt% of g-C3N4/TiO2 has the best overall performance with 72.8 % and 63.9 % removal efficiency for sulfamethoxazole and tetracycline respectively. Neutral or weakly acidic solution (pH = 5.0-7.0) is favorable for the removal of both study antibiotics. The complex composition of black water increased the adsorption load on the membrane and caused the inhibition of the photocatalysis of the g-C3N4/TiO2 membrane. The absorption of visible light by g-C3N4 accelerates the electron transfer rate and promotes the separation of electrons from holes. The oxidation-active substance h + produced in the system plays an important role in the removal of sulfamethoxazole and tetracycline

    Study on Crystal Growth of Tobermorite Synthesized by Calcium Silicate Slag and Silica Fume

    No full text
    In order to high-value utilize the secondary solid waste calcium silicate slag (CSS) generated in the process of the extraction of alumina from fly ash, in this paper, tobermorite was synthesized using CSS and silica fume (SF) at different hydrothermal synthesis times. The hydrothermal synthesis was evaluated by means of XRD, SEM, EDS, and micropore analysis, and the results discussed. The results indicate that β-dicalcium silicate, the primary phase in the CSS, partially hydrates at the beginning of hydrothermal synthesis conditions to form mesh-like crystal C-S-H (calcium-rich) and calcium hydroxide. It then reacts with SF to form yarn-like crystal C-S-H (silicon-rich) and then furtherly grows into large flake-like crystal C-S-H (silicon-rich) at 3 h. When the synthesis time is 4 h, β-dicalcium silicate completely hydrates, and crystal C-S-H (calcium-rich) and calcium hydroxide further reacts with large flake-like crystal C-S-H (silicon-rich) to generate medium flake-like tobermorite. With the increase in time, the crystal of hydrothermal synthesis grows in the order of medium flake-like tobermorite, small flake-like tobermorite, strip flake-like tobermorite, fibrous-like tobermorite, and spindle-like tobermorite, and the APV, APD, and SSA show a trend of decreasing first, then increasing, and then decreasing. Meanwhile, strip flake-like tobermorite with a higher average pore volume (APV), average pore diameter (APD), and specific surface area (SSA) can be synthesized at 6 h

    Hydration Mechanisms of Alkali-Activated Cementitious Materials with Ternary Solid Waste Composition

    No full text
    Considering the recent eco-friendly and efficient utilization of three kinds of solid waste, including calcium silicate slag (CSS), fly ash (FA), and blast-furnace slag (BFS), alkali-activated cementitious composite materials using these three waste products were prepared with varying content of sodium silicate solution. The hydration mechanisms of the cementitious materials were analyzed by X-ray diffraction, Fourier-transform infrared spectroscopy, scanning electron microscopy, and energy dispersive spectroscopy. The results show that the composite is a binary cementitious system composed of C(N)-A-S-H and C-S-H. Si and Al minerals in FA and BFS are depolymerized to form the Q0 structure of SiO4 and AlO4. Meanwhile, β-dicalcium silicate in CSS hydrates to form C-S-H and Ca(OH)2. Part of Ca(OH)2 reacts with the Q0 structure of AlO4 and SiO4 to produce lawsonite and wairakite with a low polymerization degree of the Si-O and Al-O bonds. With the participation of Na+, part of Ca(OH)2 reacts with the Q0 structure of AlO4 and the Q3 structure of SiO4, which comes from the sodium silicate solution. When the sodium silicate content is 9.2%, the macro properties of the composites effectively reach saturation. The compressive strength for composites with 9.2% sodium silicate was 23.7 and 35.9 MPa after curing for 7 and 28 days, respectively
    corecore