633 research outputs found
Boosting Continuous Control with Consistency Policy
Due to its training stability and strong expression, the diffusion model has
attracted considerable attention in offline reinforcement learning. However,
several challenges have also come with it: 1) The demand for a large number of
diffusion steps makes the diffusion-model-based methods time inefficient and
limits their applications in real-time control; 2) How to achieve policy
improvement with accurate guidance for diffusion model-based policy is still an
open problem. Inspired by the consistency model, we propose a novel
time-efficiency method named Consistency Policy with Q-Learning (CPQL), which
derives action from noise by a single step. By establishing a mapping from the
reverse diffusion trajectories to the desired policy, we simultaneously address
the issues of time efficiency and inaccurate guidance when updating diffusion
model-based policy with the learned Q-function. We demonstrate that CPQL can
achieve policy improvement with accurate guidance for offline reinforcement
learning, and can be seamlessly extended for online RL tasks. Experimental
results indicate that CPQL achieves new state-of-the-art performance on 11
offline and 21 online tasks, significantly improving inference speed by nearly
45 times compared to Diffusion-QL. We will release our code later.Comment: 18 pages, 9 page
Fumarase mediates transcriptional response to nutrient stress
Limited supply of nutrient normally causes cell growth arrest. Our recent study (Nat Cell Biol. (7):833-843) shows that fumarase (FH), a key enzyme responsible for the conversion between fumarate and malate in tricarboxylic acid cycle, is importantly involved in the cellular response to nutrient condition
RAIN: Your Language Models Can Align Themselves without Finetuning
Large language models (LLMs) often demonstrate inconsistencies with human
preferences. Previous research gathered human preference data and then aligned
the pre-trained models using reinforcement learning or instruction tuning, the
so-called finetuning step. In contrast, aligning frozen LLMs without any extra
data is more appealing. This work explores the potential of the latter setting.
We discover that by integrating self-evaluation and rewind mechanisms,
unaligned LLMs can directly produce responses consistent with human preferences
via self-boosting. We introduce a novel inference method, Rewindable
Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate
their own generation and use the evaluation results to guide backward rewind
and forward generation for AI safety. Notably, RAIN operates without the need
of extra data for model alignment and abstains from any training, gradient
computation, or parameter updates; during the self-evaluation phase, the model
receives guidance on which human preference to align with through a
fixed-template prompt, eliminating the need to modify the initial prompt.
Experimental results evaluated by GPT-4 and humans demonstrate the
effectiveness of RAIN: on the HH dataset, RAIN improves the harmlessness rate
of LLaMA 30B over vanilla inference from 82% to 97%, while maintaining the
helpfulness rate. Under the leading adversarial attack llm-attacks on Vicuna
33B, RAIN establishes a new defense baseline by reducing the attack success
rate from 94% to 19%
DiGAN breakthrough: advancing diabetic data analysis with innovative GAN-based imbalance correction techniques
In the rapidly evolving field of medical diagnostics, the challenge of imbalanced datasets, particularly in diabetes classification, calls for innovative solutions. The study introduces DiGAN, a groundbreaking approach that leverages the power of Generative Adversarial Networks (GAN) to revolutionize diabetes data analysis. Marking a significant departure from traditional methods, DiGAN applies GANs, typically seen in image processing, to the realm of diabetes data. This novel application is complemented by integrating the unsupervised Laplacian Score for sophisticated feature selection. The pioneering approach not only surpasses the limitations of existing techniques but also sets a new benchmark in classification accuracy with a 90% weighted F1-score, achieving a remarkable improvement of over 20% compared to conventional methods. Additionally, DiGAN demonstrates superior performance over popular SMOTE-based methods in handling extremely imbalanced datasets. This research, focusing on the integrated use of Laplacian Score, GAN, and Random Forest, stands at the forefront of diabetic classification, offering a uniquely effective and innovative solution to the long-standing data imbalance issue in medical diagnostics
Automated Design of Metaheuristic Algorithms: A Survey
Metaheuristics have gained great success in academia and practice because
their search logic can be applied to any problem with available solution
representation, solution quality evaluation, and certain notions of locality.
Manually designing metaheuristic algorithms for solving a target problem is
criticized for being laborious, error-prone, and requiring intensive
specialized knowledge. This gives rise to increasing interest in automated
design of metaheuristic algorithms. With computing power to fully explore
potential design choices, the automated design could reach and even surpass
human-level design and could make high-performance algorithms accessible to a
much wider range of researchers and practitioners. This paper presents a broad
picture of automated design of metaheuristic algorithms, by conducting a survey
on the common grounds and representative techniques in terms of design space,
design strategies, performance evaluation strategies, and target problems in
this field
- …