Search CORE

633 research outputs found

Boosting Continuous Control with Consistency Policy

Author: Chen Yuhui
Li Haoran
Zhao Dongbin
Publication venue
Publication date: 10/10/2023
Field of study

Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline reinforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL. We will release our code later.Comment: 18 pages, 9 page

arXiv.org e-Print Archive

Fumarase mediates transcriptional response to nutrient stress

Author: Qin Zhao
Yuhui Jiang
Publication venue: 'Shared Science Publishers OG'
Publication date: 01/10/2017
Field of study

Limited supply of nutrient normally causes cell growth arrest. Our recent study (Nat Cell Biol. (7):833-843) shows that fumarase (FH), a key enzyme responsible for the conversion between fumarate and malate in tricarboxylic acid cycle, is importantly involved in the cellular response to nutrient condition

Directory of Open Access Journals

RAIN: Your Language Models Can Align Themselves without Finetuning

Author: Li Yuhui
Wei Fangyun
Zhang Chao
Zhang Hongyang
Zhao Jinjing
Publication venue
Publication date: 13/09/2023
Field of study

Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guide backward rewind and forward generation for AI safety. Notably, RAIN operates without the need of extra data for model alignment and abstains from any training, gradient computation, or parameter updates; during the self-evaluation phase, the model receives guidance on which human preference to align with through a fixed-template prompt, eliminating the need to modify the initial prompt. Experimental results evaluated by GPT-4 and humans demonstrate the effectiveness of RAIN: on the HH dataset, RAIN improves the harmlessness rate of LLaMA 30B over vanilla inference from 82% to 97%, while maintaining the helpfulness rate. Under the leading adversarial attack llm-attacks on Vicuna 33B, RAIN establishes a new defense baseline by reducing the attack success rate from 94% to 19%

arXiv.org e-Print Archive

DiGAN breakthrough: advancing diabetic data analysis with innovative GAN-based imbalance correction techniques

Author: Deng Yuhui
Liu Xinhui
Liu Xinzhi
Wu Jingjin
Yue Zhiyi
Zhao Puyang
Zhao Qianyu
Publication venue
Publication date: 15/04/2024
Field of study

In the rapidly evolving field of medical diagnostics, the challenge of imbalanced datasets, particularly in diabetes classification, calls for innovative solutions. The study introduces DiGAN, a groundbreaking approach that leverages the power of Generative Adversarial Networks (GAN) to revolutionize diabetes data analysis. Marking a significant departure from traditional methods, DiGAN applies GANs, typically seen in image processing, to the realm of diabetes data. This novel application is complemented by integrating the unsupervised Laplacian Score for sophisticated feature selection. The pioneering approach not only surpasses the limitations of existing techniques but also sets a new benchmark in classification accuracy with a 90% weighted F1-score, achieving a remarkable improvement of over 20% compared to conventional methods. Additionally, DiGAN demonstrates superior performance over popular SMOTE-based methods in handling extremely imbalanced datasets. This research, focusing on the integrated use of Laplacian Score, GAN, and Random Forest, stands at the forefront of diabetic classification, offering a uniquely effective and innovative solution to the long-standing data imbalance issue in medical diagnostics

LSE Research Online

Automated Design of Metaheuristic Algorithms: A Survey

Author: Cheng Shi
Duan Qiqi
Shi Yuhui
Yan Bai
Zhao Qi
Publication venue
Publication date: 13/11/2023
Field of study

Metaheuristics have gained great success in academia and practice because their search logic can be applied to any problem with available solution representation, solution quality evaluation, and certain notions of locality. Manually designing metaheuristic algorithms for solving a target problem is criticized for being laborious, error-prone, and requiring intensive specialized knowledge. This gives rise to increasing interest in automated design of metaheuristic algorithms. With computing power to fully explore potential design choices, the automated design could reach and even surpass human-level design and could make high-performance algorithms accessible to a much wider range of researchers and practitioners. This paper presents a broad picture of automated design of metaheuristic algorithms, by conducting a survey on the common grounds and representative techniques in terms of design space, design strategies, performance evaluation strategies, and target problems in this field

arXiv.org e-Print Archive