Search CORE

29 research outputs found

Detoxify Language Model Step-by-Step

Author: Ding Yuyang
Li Juntao
Minzhang
Tang Zecheng
Wang Pinzheng
Zhou Keyan
Publication venue
Publication date: 16/08/2023
Field of study

Detoxification for LLMs is challenging since it requires models to avoid generating harmful content while maintaining the generation capability. To ensure the safety of generations, previous detoxification methods detoxify the models by changing the data distributions or constraining the generations from different aspects in a single-step manner. However, these approaches will dramatically affect the generation quality of LLMs, e.g., discourse coherence and semantic consistency, since language models tend to generate along the toxic prompt while detoxification methods work in the opposite direction. To handle such a conflict, we decompose the detoxification process into different sub-steps, where the detoxification is concentrated in the input stage and the subsequent continual generation is based on the non-toxic prompt. Besides, we also calibrate the strong reasoning ability of LLMs by designing a Detox-Chain to connect the above sub-steps in an orderly manner, which allows LLMs to detoxify the text step-by-step. Automatic and human evaluation on two benchmarks reveals that by training with Detox-Chain, six LLMs scaling from 1B to 33B can obtain significant detoxification and generation improvement. Our code and data are available at https://github.com/CODINNLG/Detox-CoT. Warning: examples in the paper may contain uncensored offensive content

arXiv.org e-Print Archive

Graph Sampling-based Meta-Learning for Molecular Property Prediction

Author: Chen Huajun
Ding Keyan
Fang Yin
Wu Bin
Zhang Qiang
Zhuang Xiang
Publication venue
Publication date: 29/06/2023
Field of study

Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https://github.com/HICAI-ZJU/GS-Meta.Comment: Accepted by IJCAI 202

arXiv.org e-Print Archive

Learning Invariant Molecular Representation in Latent Discrete Space

Author: Bian Yatao
Chen Hongyang
Chen Huajun
Ding Keyan
Lv Jingsong
Wang Xiao
Zhang Qiang
Zhuang Xiang
Publication venue
Publication date: 22/10/2023
Field of study

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD

arXiv.org e-Print Archive

Formononetin Administration Ameliorates Dextran Sulfate Sodium-Induced Acute Colitis by Inhibiting NLRP3 Inflammasome Signaling Pathway

Author: Bin Deng
Dacheng Wu
Guotao Lu
Jian Wu
Keyan Wu
Qing Shan
Qingtian Zhu
Weijuan Gong
Weiming Xiao
Yan Xue
Yanbing Ding
Zhigang Yan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

Crossref

Reproductive Hormones and Their Receptors May Affect Lung Cancer

Author: Keyan Zhu
Lifeng Li
Maofeng Guo
Mengmeng Dou
Ming Yan
Peihong Shen
Shuling Wang
Xianfei Ding
Xiaoming Deng
Xiufang Chen
Xueliang Zhou
Yuxuan Zhang
Zhaosen Gu
Zhirui Fan
Publication venue: 'S. Karger AG'
Publication date
Field of study

Crossref

Robust active contours for fast image segmentation

Author: Guirong Weng
Keyan Ding
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date
Field of study

Crossref

Stability of distribution network with large-scale PV penetration under off-grid operation

Author: Haoran Ding
Jiajin Huang
Keyan Liu
Shuai Wang
Wanxing Sheng
Publication venue: Elsevier
Publication date: 01/09/2023
Field of study

Large-scale photovoltaic (PV) penetration reduces system damping and causes stability problems on off-grid distribution systems. The single-machine equivalent method is typically used to simplify the full-order model by ignoring the differences in PVs. However, this results in substantial errors. When the differences between the control models of PVs are greater, the calculation results lead to greater error. Then the stability of the PV system cannot be judged. This study validated that the oscillation mode of the PV in the subsynchronous frequency band has a linear correlation with the operating conditions and control parameters. Accordingly, the authors proposed to divide the oscillation mode interval by taking the maximum and minimum PV operating conditions as the boundary to assess the stability of the system. Considering the difference in the operating conditions of the PVs, the average power was used to dynamically aggregate the PV electric farm to determine whether the system was at risk of instability. Finally, the effectiveness of the proposed method was verified through case studies on a 12-PV off-grid distribution system and a large-scale PV farm system. The verification results show that the proposed method can determine the stability of the distribution system accurately. The proposed method is significant for developing strategies to enhance the stability of off-grid distribution systems

Directory of Open Access Journals