306 research outputs found
Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts
Traditional model-based reinforcement learning (RL) methods generate forward
rollout traces using the learnt dynamics model to reduce interactions with the
real environment. The recent model-based RL method considers the way to learn a
backward model that specifies the conditional probability of the previous state
given the previous action and the current state to additionally generate
backward rollout trajectories. However, in this type of model-based method, the
samples derived from backward rollouts and those from forward rollouts are
simply aggregated together to optimize the policy via the model-free RL
algorithm, which may decrease both the sample efficiency and the convergence
rate. This is because such an approach ignores the fact that backward rollout
traces are often generated starting from some high-value states and are
certainly more instructive for the agent to improve the behavior. In this
paper, we propose the backward imitation and forward reinforcement learning
(BIFRL) framework where the agent treats backward rollout traces as expert
demonstrations for the imitation of excellent behaviors, and then collects
forward rollout transitions for policy reinforcement. Consequently, BIFRL
empowers the agent to both reach to and explore from high-value states in a
more efficient manner, and further reduces the real interactions, making it
potentially more suitable for real-robot learning. Moreover, a
value-regularized generative adversarial network is introduced to augment the
valuable states which are infrequently received by the agent. Theoretically, we
provide the condition where BIFRL is superior to the baseline methods.
Experimentally, we demonstrate that BIFRL acquires the better sample efficiency
and produces the competitive asymptotic performance on various MuJoCo
locomotion tasks compared against state-of-the-art model-based methods.Comment: Accepted by IROS202
The global solution of the minimal surface flow and translating surfaces
In this paper, we study evolved surfaces over convex planar domains which are
evolving by the minimal surface flow Here, we specify the
angle of contact of the evolved surface to the boundary cylinder. The
interesting question is to find translating solitons of the form where . Under an angle condition, we can prove the
a priori estimate holds true for the translating solitons (i.e., translator),
which makes the solitons exist. We can prove for suitable condition on
that there is the global solution of the minimal surface flow. Then we show,
provided the soliton exists, that the global solutions converge to some
translator.Comment: 16 page
Laxity-Aware Scalable Reinforcement Learning for HVAC Control
Demand flexibility plays a vital role in maintaining grid balance, reducing
peak demand, and saving customers' energy bills. Given their highly shiftable
load and significant contribution to a building's energy consumption, Heating,
Ventilation, and Air Conditioning (HVAC) systems can provide valuable demand
flexibility to the power systems by adjusting their energy consumption in
response to electricity price and power system needs. To exploit this
flexibility in both operation time and power, it is imperative to accurately
model and aggregate the load flexibility of a large population of HVAC systems
as well as designing effective control algorithms. In this paper, we tackle the
curse of dimensionality issue in modeling and control by utilizing the concept
of laxity to quantify the emergency level of each HVAC operation request. We
further propose a two-level approach to address energy optimization for a large
population of HVAC systems. The lower level involves an aggregator to aggregate
HVAC load laxity information and use least-laxity-first (LLF) rule to allocate
real-time power for individual HVAC systems based on the controller's total
power. Due to the complex and uncertain nature of HVAC systems, we leverage a
reinforcement learning (RL)-based controller to schedule the total power based
on the aggregated laxity information and electricity price. We evaluate the
temperature control and energy cost saving performance of a large-scale group
of HVAC systems in both single-zone and multi-zone scenarios, under varying
climate and electricity market conditions. The experiment results indicate that
proposed approach outperforms the centralized methods in the majority of test
scenarios, and performs comparably to model-based method in some scenarios.Comment: In Submissio
Adjustable Robust Reinforcement Learning for Online 3D Bin Packing
Designing effective policies for the online 3D bin packing problem (3D-BPP)
has been a long-standing challenge, primarily due to the unpredictable nature
of incoming box sequences and stringent physical constraints. While current
deep reinforcement learning (DRL) methods for online 3D-BPP have shown
promising results in optimizing average performance over an underlying box
sequence distribution, they often fail in real-world settings where some
worst-case scenarios can materialize. Standard robust DRL algorithms tend to
overly prioritize optimizing the worst-case performance at the expense of
performance under normal problem instance distribution. To address these
issues, we first introduce a permutation-based attacker to investigate the
practical robustness of both DRL-based and heuristic methods proposed for
solving online 3D-BPP. Then, we propose an adjustable robust reinforcement
learning (AR2L) framework that allows efficient adjustment of robustness
weights to achieve the desired balance of the policy's performance in average
and worst-case environments. Specifically, we formulate the objective function
as a weighted sum of expected and worst-case returns, and derive the lower
performance bound by relating to the return under a mixture dynamics. To
realize this lower bound, we adopt an iterative procedure that searches for the
associated mixture dynamics and improves the corresponding policy. We integrate
this procedure into two popular robust adversarial algorithms to develop the
exact and approximate AR2L algorithms. Experiments demonstrate that AR2L is
versatile in the sense that it improves policy robustness while maintaining an
acceptable level of performance for the nominal case.Comment: Accepted to NeurIPS202
On the Jets Induced by a Cavitation Bubble Near a Cylinder
The dynamics of cavitation bubbles in the vicinity of a solid cylinder or
fibre are seen in water treatment, demolition and/or cleaning of composite
materials, as well as bio-medical scenarios such as ultrasound-induced bubbles
near the tubular structures in the body. When the bubble collapses near the
surface, violent fluid jets may be generated. Understanding whether these jets
occur and predicting their directions -- departing or approaching the solid
surface -- is crucial for assessing their potential impact on the solid phase.
However, the criteria for classifying the onset and directions of the jets
created by cavitation near a curved surface of a cylinder have not been
established. In this research, we present models to predict the occurrence and
directions of the jet in such scenarios. The onset criteria and the
direction(s) of the jets are dictated by the bubble stand-off distance and the
cylinder diameter. Our models are validated by comprehensive experiments. The
results not only predict the jetting behaviour but can serve as guidelines for
designing and controlling the jets when a cavitation bubble collapses near a
cylinder, whether for protective or destructive purposes
Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying
reward function based on collected expert demonstrations. Considering that
obtaining expert demonstrations can be costly, the focus of current IRL
techniques is on learning a better-than-demonstrator policy using a reward
function derived from sub-optimal demonstrations. However, existing IRL
algorithms primarily tackle the challenge of trajectory ranking ambiguity when
learning the reward function. They overlook the crucial role of considering the
degree of difference between trajectories in terms of their returns, which is
essential for further removing reward ambiguity. Additionally, it is important
to note that the reward of a single transition is heavily influenced by the
context information within the trajectory. To address these issues, we
introduce the Distance-rank Aware Sequential Reward Learning (DRASRL)
framework. Unlike existing approaches, DRASRL takes into account both the
ranking of trajectories and the degrees of dissimilarity between them to
collaboratively eliminate reward ambiguity when learning a sequence of
contextually informed reward signals. Specifically, we leverage the distance
between policies, from which the trajectories are generated, as a measure to
quantify the degree of differences between traces. This distance-aware
information is then used to infer embeddings in the representation space for
reward learning, employing the contrastive learning technique. Meanwhile, we
integrate the pairwise ranking loss function to incorporate ranking information
into the latent features. Moreover, we resort to the Transformer architecture
to capture the contextual dependencies within the trajectories in the latent
space, leading to more accurate reward estimation. Through extensive
experimentation, our DRASRL framework demonstrates significant performance
improvements over previous SOTA methods
Structural analysis of a novel rabbit monoclonal antibody R53 targeting an epitope in HIV-1 gp120 C4 region critical for receptor and co-receptor binding
The fourth conserved region (C4) in the HIV-1 envelope glycoprotein (Env) gp120 is a structural element that is important for its function, as it binds to both the receptor CD4 and the co-receptor CCR5/CXCR4. It has long been known that this region is highly immunogenic and that it harbors B-cell as well as T-cell epitopes. It is the target of a number of antibodies in animal studies, which are called CD4-blockers. However, the mechanism by which the virus shields itself from such antibody responses is not known. Here, we determined the crystal structure of R53 in complex with its epitope peptide using a novel anti-C4 rabbit monoclonal antibody R53. Our data show that although the epitope of R53 covers a highly conserved sequence (433)AMYAPPI(439), it is in the gp120 trimer and in the CD4-bound conformation. Our results suggest a masking mechanism to explain how HIV-1 protects this critical region from the human immune system
Trace element zinc and skin disorders
Zinc is a necessary trace element and an important constituent of proteins and other biological molecules. It has many biological functions, including antioxidant, skin and mucous membrane integrity maintenance, and the promotion of various enzymatic and transcriptional responses. The skin contains the third most zinc in the organism. Zinc deficiency can lead to a range of skin diseases. Except for acrodermatitis enteropathic, a rare genetic zinc deficiency, it has also been reported in other diseases. In recent years, zinc supplementation has been widely used for various skin conditions, including infectious diseases (viral warts, genital herpes, cutaneous leishmaniasis, leprosy), inflammatory diseases (hidradenitis suppurativa, acne vulgaris, rosacea, eczematous dermatitis, seborrheic dermatitis, psoriasis, Behcet's disease, oral lichen planus), pigmentary diseases (vitiligo, melasma), tumor-associated diseases (basal cell carcinoma), endocrine and metabolic diseases (necrolytic migratory erythema, necrolytic acral erythema), hair diseases (alopecia), and so on. We reviewed the literature on zinc application in dermatology to provide references for better use
Rabbit anti-HIV-1 monoclonal antibodies raised by immunization can mimic the antigen-binding modes of antibodies derived from HIV-1-infected humans
The rabbit is a commonly used animal model in studying antibody responses in HIV/AIDS vaccine development. However, no rabbit monoclonal antibodies (MAbs) have been developed previously to study the epitope-specific antibody responses against HIV-1 envelope (Env) glycoproteins, and little is known about how the rabbit immune system can mimic the human immune system in eliciting such antibodies. Here we present structural analyses of two rabbit MAbs, R56 and R20, against the third variable region (V3) of HIV-1 gp120. R56 recognizes the well-studied immunogenic region in the V3 crown, while R20 targets a less-studied region at the C terminus of V3. By comparison of the Fab/epitope complex structures of these two antibodies raised by immunization with that of the corresponding human antibodies derived from patients chronically infected with HIV-1, we found that rabbit antibodies can recognize immunogenic regions of gp120 and mimic the binding modes of human antibodies. This result can provide new insight into the use of the rabbit as an animal model in AIDS vaccine development
- …