Search CORE

18 research outputs found

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Author: Huang Gao
Song Shiji
Wang Shenzhi
Yang Qisen
Zhang Qihang
Publication venue
Publication date: 04/09/2023
Field of study

Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem. To mitigate this issue, a typical solution is to impose a policy constraint on a policy improvement objective. However, existing methods generally adopt a ``one-size-fits-all'' practice, i.e., keeping only a single improvement-constraint balance for all the samples in a mini-batch or even the entire offline dataset. In this work, we argue that different samples should be treated with different policy constraint intensities. Based on this idea, a novel plug-in approach named Guided Offline RL (GORL) is proposed. GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample. We theoretically prove that the guidance provided by our method is rational and near-optimal. Extensive experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements

arXiv.org e-Print Archive

Boosting Offline Reinforcement Learning with Action Preference Query

Author: Huang Gao
Lin Matthieu Gaetan
Song Shiji
Wang Shenzhi
Yang Qisen
Publication venue
Publication date: 05/06/2023
Field of study

Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance and interaction costs. In particular, online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase. However, even limited online interactions can be inaccessible or catastrophic for high-stake scenarios like healthcare and autonomous driving. In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP). The main insight is that, compared to online fine-tuning, querying the preferences between pre-collected and learned actions can be equally or even more helpful to the erroneous estimate problem. By adaptively encouraging or suppressing policy constraint according to action preferences, OAP could distinguish overestimation from beneficial policy improvement and thus attains a more accurate evaluation of unseen data. Theoretically, we prove a lower bound of the behavior policy's performance improvement brought by OAP. Moreover, comprehensive experiments on the D4RL benchmark and state-of-the-art algorithms demonstrate that OAP yields higher (29% on average) scores, especially on challenging AntMaze tasks (98% higher).Comment: International Conference on Machine Learning 202

arXiv.org e-Print Archive

Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation

Author: Chen Shuo
Huang Gao
Liu Chang
Qi Siyuan
Song Shiji
Wang Chaofei
Wang Shenzhi
Yang Qisen
Zhao Andrew
Zheng Zilong
Publication venue
Publication date: 24/10/2023
Field of study

Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.Comment: 40 page

arXiv.org e-Print Archive

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

Author: Chen Hao
Gao Jiawei
Huang Gao
Jia Ning
Lin Matthieu Gaetan
Song Shiji
Wang Shenzhi
Wu Liwei
Yang Qisen
Publication venue
Publication date: 30/10/2023
Field of study

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a balance model to select a suitable policy for each state. Theoretically, we prove that state-adaptive balances are necessary for achieving a higher policy performance upper bound. Empirically, extensive experiments show that FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark. Codes are available at https://github.com/LeapLabTHU/FamO2O.Comment: NeurIPS 2023 spotlight. 24 pages, 13 figure

arXiv.org e-Print Archive

Multiple influence of immune cells in the bone metastatic cancer microenvironment on tumors

Author: Binghao Li
Binghao Li
Binghao Li
Binghao Li
Eloy Yinwang
Eloy Yinwang
Eloy Yinwang
Eloy Yinwang
Fangqian Wang
Fangqian Wang
Fangqian Wang
Fangqian Wang
Haochen Mou
Haochen Mou
Haochen Mou
Haochen Mou
Jiahao Zhang
Jiahao Zhang
Jiahao Zhang
Jiahao Zhang
Jiangchu Lei
Jiangchu Lei
Jiangchu Lei
Jiangchu Lei
Lingxiao Jin
Lingxiao Jin
Lingxiao Jin
Lingxiao Jin
Senxu Lu
Senxu Lu
Senxu Lu
Senxu Lu
Shenzhi Zhao
Shenzhi Zhao
Shenzhi Zhao
Shenzhi Zhao
Shixin Chen
Shixin Chen
Shixin Chen
Shixin Chen
Tao Chen
Tao Chen
Tao Chen
Tao Chen
Wenkan Zhang
Wenkan Zhang
Wenkan Zhang
Wenkan Zhang
Xupeng Chai
Xupeng Chai
Xupeng Chai
Xupeng Chai
Yucheng Xue
Yucheng Xue
Yucheng Xue
Yucheng Xue
Zenan Wang
Zenan Wang
Zenan Wang
Zenan Wang
Zengjie Zhang
Zengjie Zhang
Zengjie Zhang
Zengjie Zhang
Zhaoming Ye
Zhaoming Ye
Zhaoming Ye
Zhaoming Ye
Zhenxuan Shao
Zhenxuan Shao
Zhenxuan Shao
Zhenxuan Shao
Publication venue: Frontiers Media S.A.
Publication date: 01/02/2024
Field of study

Bone is a common organ for solid tumor metastasis. Malignant bone tumor becomes insensitive to systemic therapy after colonization, followed by poor prognosis and high relapse rate. Immune and bone cells in situ constitute a unique immune microenvironment, which plays a crucial role in the context of bone metastasis. This review firstly focuses on lymphatic cells in bone metastatic cancer, including their function in tumor dissemination, invasion, growth and possible cytotoxicity-induced eradication. Subsequently, we examine myeloid cells, namely macrophages, myeloid-derived suppressor cells, dendritic cells, and megakaryocytes, evaluating their interaction with cytotoxic T lymphocytes and contribution to bone metastasis. As important components of skeletal tissue, osteoclasts and osteoblasts derived from bone marrow stromal cells, engaging in ‘vicious cycle’ accelerate osteolytic bone metastasis. We also explain the concept tumor dormancy and investigate underlying role of immune microenvironment on it. Additionally, a thorough review of emerging treatments for bone metastatic malignancy in clinical research, especially immunotherapy, is presented, indicating current challenges and opportunities in research and development of bone metastasis therapies

Directory of Open Access Journals

Field monitoring and numerical analysis on piled-raft foundation : case study

Author: Wang Shenzhi
王慎之
Publication venue: 'The University of Hong Kong Libraries'
Publication date: 01/01/2014
Field of study

This thesis presents the result of detailed back-analysis, using three-dimensional finite-element analysis, of the instrumented piled-raft foundation in monitoring site. The piled-raft foundation is a composite foundation structure that consisting of piles, raft and surrounding soils acting as a whole system. To check the reliability of soil taking load under the raft and obtain a reasonable value of load proportion taken by piles for the soil conditions in Hong Kong, a piled-raft foundation was partially instrumented in the monitoring site. The pile head loading, raft-soil contact pressure of specified area and settlement at raft top for selected locations were being monitored. Comparisons of overall settlement, differential settlements and the load carried by the piles show reasonably good agreement. Followed by a 3D finite element modeling of the entire piled-raft foundation of the monitored site, the analysis includes a pile-soil slip interface model. The numerical analysis is performed to give insights to (1) load transfer behavior of the piled-raft foundation (2) effects of pile reduction on pile load ratio. Combined the observation from site monitoring and analysis results from the numerical analysis, the proportion of load shared between piles and raft is revealed as 7:3. The lower limit of pile ratio is proposed as 0.67 for the site after the parametric study by removing piles strategically. In spite of the settlement-reducing purpose of the piles, the design of piled-raft foundation still concentrate on providing adequate axial capacity, with settlement requirement treated as a secondary issue. The significance of the study is that it provides factual evidence of soil taking the load under the raft, and the economical benefits of piled-raft foundation as a reduction of piles will save more than 2 million of the construction budgets.published_or_final_versionCivil EngineeringMasterMaster of Philosoph

HKU Scholars Hub

Tiling a strip with triangles

Author: John Bodeen
Shenzhi Wang
Steve Butler
Taekyoung Kim
Xiyuan Sun
Publication venue
Publication date: 05/03/2020
Field of study

Abstract In this paper, we introduce the tilings of a 2×n "triangular strip" with triangles. These tilings have connections with Fibonacci numbers, Pell numbers, and other known sequences. We derive several different recurrences, establish some properties of these numbers, and give a refined count for these tilings (i.e., by the number and type of triangles used) and establish several properties of these refined counts

CiteSeerX

Effectiveness and safety of 99Tc-methylene diphosphonate as a disease-modifying anti-rheumatic drug (DMARD) in combination with conventional synthetic (cs) DMARDs in the treatment of rheumatoid arthritis: A systematic review and meta-analysis of 34 randomized controlled trials

Author: Guoqian Deng
Le Shao
Qibiao Wu
Shenzhi Wang
Xinyi Chen
Publication venue: Elsevier
Publication date: 01/11/2023
Field of study

Background: Technetium [99Tc] methylene diphosphonate injection (99Tc-MDP) is widely used for the treatment of rheumatoid arthritis (RA), but there is still insufficient evidence for its application. Through the utilization of meta-analysis and systematic reviews, this study aimed to evaluate the effectiveness and safety of 99 TC-MDP in combination with conventional synthetic disease-modifying anti-rheumatic drugs (csDMARDs) for RA. Methods: This study was registered on PROSPERO in advance (CRD42021220780). A systematic search was conducted in PubMed, Embase, the Cochrane Library, and multiple international public databases from their inception to April 2023 to identify clinical randomized controlled trials exploring the use of 99Tc-MDP combined with csDMARDs in the treatment of RA. Each outcome was subjected to meta-analysis, and the quality of evidence was assessed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. The American College of Rheumatology's 50 %/70 % response criteria scores (ACR50/70) scores were utilized as the primary effectiveness outcomes, and risks were measured by assessing the rates of AEs. Moreover, secondary efficacy outcomes were evaluated, including the Disease Activity Score 28 (DAS28) and bone mineral density (BMD) as joint function indicators and the erythrocyte sedimentation rate (ESR) and interleukin-17 (IL-17) as inflammatory indicators. Results: In this meta-analysis, a total of 34 studies (2296 patients) were included out of 1149 retrieved studies. The summarized results showed that the treatment group treated with the combination of 99Tc-MDP and csDMARDs had significantly higher ACR50 (RR = 1.32, 95 % CI: 1.13–1.55, P = 0.0004) and ACR70 (RR = 1.40, 95 % CI: 1.07–1.82, P = 0.01) scores than the control group receiving csDMARDs alone. In addition, the overall incidence of AEs was lower with the combination of 99Tc-MDP and csDMARDs than with csDMARDs alone (RR = 0.75, 95 % CI: 0.60–0.93, P = 0.009), but the possibility of phlebitis was higher in the treatment group (RR = 4.15, 95 % CI: 1.04–16.50, P = 0.04). In addition, the combination of 99Tc-MDP and csDMARDs had advantages over csDMARDs alone in improving DAS28 (WMD = 1.56, 95 % CI: 0.86–2.25, P < 0.0001), BMD (SMD = 1.12, 95 % CI 0.46–1.78, P = 0.0008), ESR (SMD = 0.71, 95 % CI 0.45–0.97, P < 0.00001), and IL-17 (WMD = 5.82, 95 % CI 3.86–7.77, P < 0.00001). However, the above results might have been influenced by the 99Tc-MDP dosage, csDMARD category, and treatment duration. Combining methotrexate and leflunomide, administering continuous treatment for 24 weeks, or using 3 sets of 99Tc-MDP doses (16.5 mg) may be the optimal 99Tc-MDP treatment plan for RA. Conclusion: Compared with csDMARD therapy alone, the combination therapy with 99Tc-MDP is more effective for RA patients and is associated with a lower overall incidence of adverse events, although the possibility of phlebitis was higher. However, due to the inherent limitations of the included RCTs, high-quality clinical trials are still needed to further assess the effectiveness and safety of this combination therapy

Directory of Open Access Journals

Space–Time Analysis of Vehicle Theft Patterns in Shanghai, China

Author: Can Wang
Jiajun Ding
Shenzhi Dai
Wei Zhu
Xinyue Ye
Yuanyuan Mao
Publication venue: 'MDPI AG'
Publication date: 01/08/2018
Field of study

To identify and compare the space–time patterns of vehicle thefts and the effects of associated environmental factors, this paper conducts a case study of the Pudong New Area (PNA), a major urban district in Shanghai, China’s largest city. Geographic information system (GIS)-based analysis indicated that there was a stable pattern of vehicle theft over time. Hotspots of vehicle theft across different time periods were identified. These data provide clues for how law enforcement can prioritize the deployment of limited patrol and investigative resources. Vehicle thefts, especially those of non-motor vehicles, tend to be concentrated in the central-western portion of the PNA, which experienced a dramatic rate of urbanization and has a high concentration of people and vehicles. Important factors contributing to vehicle thefts include a highly mobile and transitory population, a large population density, and high traffic volume

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

On the spectral sidebands’ evolution of mode-locked fiber lasers

Author: Duan Zhenyu
Hu Guanqu
Li Li
Nakkeeran K.
Wang Sen
Yuan Shenzhi
Zhou Renlai
Publication venue
Publication date: 01/10/2023
Field of study

Funding This work was supported by the National Natural Science Foundation of China (Grant Nos. 62275060); Natural Science Foundation of Heilongjiang Province (Grant Nos. LH2023F029 and LH2019F012).Peer reviewedPostprin

Aberdeen University Research