Search CORE

946 research outputs found

Adversarial Preference Optimization

Author: Cheng Pengyu
Dai Yong
Du Nan
Li Jian
Yang Yifan
Publication venue
Publication date: 14/11/2023
Field of study

Human preference alignment is a crucial training step to improve the interaction quality of large language models (LLMs). Existing aligning methods depend on manually annotated preference data to guide the LLM optimization directions. However, in practice, continuously updating LLMs raises a distribution gap between model-generated samples and human-preferred responses, which hinders model fine-tuning efficiency. To mitigate this issue, previous methods require additional preference annotation on generated samples to adapt the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an adversarial preference optimization (APO) framework, where the LLM agent and the preference model update alternatively via a min-max game. Without additional annotation, our APO method can make a self-adaption to the generation distribution gap through the adversarial learning process. In experiments, we empirically verify the effectiveness of APO in improving LLM's helpfulness and harmlessness compared with rejection sampling baselines.Comment: In proces

arXiv.org e-Print Archive

Everyone Deserves A Reward: Learning Customized Human Preferences

Author: Bai Ke
Cheng Pengyu
Dai Yong
Du Nan
Xie Jiawen
Publication venue
Publication date: 06/09/2023
Field of study

Reward models (RMs) are crucial in aligning large language models (LLMs) with human preferences for improving interaction quality. However, the real world is pluralistic, which leads to diversified human preferences based on different religions, politics, cultures, etc. Moreover, each individual can have their own unique preferences on various topics. Neglecting the diversity of human preferences, current LLM training processes only use a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which collects preferred responses to each given query from four practical domains. Besides, from the perspective of data efficiency, we proposed a three-stage customized RM learning scheme, whose effectiveness is empirically verified on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages, and have found several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP

arXiv.org e-Print Archive

Recommended from our members

A ribose-functionalized NAD+ with unexpected high activity and selectivity for protein poly-ADP-ribosylation.

Author: Chen Jingwen
Cheng Qinqin
Dai Zhefu
Evdokimov Nikolai M
Lam Albert T
Louie Stan G
Lu Yanran
Pei Hua
Zhang Xiao-Nan
Zhang Yong
Publication venue: eScholarship, University of California
Publication date: 13/09/2019
Field of study

Nicotinamide adenine dinucleotide (NAD+)-dependent ADP-ribosylation plays important roles in physiology and pathophysiology. It has been challenging to study this key type of enzymatic post-translational modification in particular for protein poly-ADP-ribosylation (PARylation). Here we explore chemical and chemoenzymatic synthesis of NAD+ analogues with ribose functionalized by terminal alkyne and azido groups. Our results demonstrate that azido substitution at 3'-OH of nicotinamide riboside enables enzymatic synthesis of an NAD+ analogue with high efficiency and yields. Notably, the generated 3'-azido NAD+ exhibits unexpected high activity and specificity for protein PARylation catalyzed by human poly-ADP-ribose polymerase 1 (PARP1) and PARP2. And its derived poly-ADP-ribose polymers show increased resistance to human poly(ADP-ribose) glycohydrolase-mediated degradation. These unique properties lead to enhanced labeling of protein PARylation by 3'-azido NAD+ in the cellular contexts and facilitate direct visualization and labeling of mitochondrial protein PARylation. The 3'-azido NAD+ provides an important tool for studying cellular PARylation

eScholarship - University of California

Measuring the beaming angle of GRB 030329 by fitting the rebrightenings in its multiband afterglow

Author: Cheng
Dai
Dai
Frail
Gao
Greiner
Huang
King
Kumar
Lipkin
Matheson
Nakar
Panaitescu
Piran
Rees
Ricker
Sari
Sari
Sari
Schlegel
Si-Wei Kong
Stanek
Van der Horst
Vanderspek
Wang
Waxman
Wei
Wei Deng
Yong-Feng Huang
Zeh
Zeh
Zhang
Publication venue: 'IOP Publishing'
Publication date: 18/06/2010
Field of study

Multiple rebrightenings have been observed in the multiband afterglow of GRB 030329. Especially, a marked and quick rebrightening occurred at about t ~ 1.2 * 10^5 s. Energy injection from late and slow shells seems to be the best interpretation for these rebrightenings. Usually it is assumed that the energy is injected into the whole external shock. However, in the case of GRB 030329, the rebrightenings are so quick that the usual consideration fails to give a satisfactory fit to the observed light curves. Actually, since these late/slow shells coast freely in the wake of the external shock, they should be cold and may not expand laterally. The energy injection then should only occur at the central region of the external shock. Considering this effect, we numerically re-fit the quick rebrightenings observed in GRB 030329. By doing this, we were able to derive the beaming angle of the energy injection process. Our result, with a relative residual of only 5% - 10% during the major rebrightening, is better than any previous modeling. The derived energy injection angle is about 0.035. We assume that these late shells are ejected by the central engine via the same mechanism as those early shells that produce the prompt gamma-ray burst. The main difference is that their velocities are much slower, so that they catch up with the external shock very lately and manifest as the observed quick rebrightenings. If this were true, then the derived energy injection angle can give a good measure of the beaming angle of the prompt gamma-ray emission. Our study may hopefully provide a novel method to measure the beaming angle of gamma-ray bursts.Comment: 8 pages, 6 figures, Has been accepted by RAA (Research in Astronomy and Astrophysics

arXiv.org e-Print Archive

Crossref

Bis(2,2′-bi-1H-imidazole-κ2 N 3,N 3′)bis(dimethyl sulfoxide-κO)copper(II) bis(tetrafluoridoborate)

Author: Aminou
Burrows
Cun-Lin Zhang
Dai
Ding
Gruia
Jin
Li-Jun Xu
Li-Na Cui
Qiong-Hua Jin
Sheldrick
Yang
Yong-Cheng Dai
Publication venue: International Union of Crystallography
Publication date: 01/09/2010
Field of study

In the title copper(II) salt, [Cu(C6H6N4)2(C2H6OS)2](BF4)2, the Jahn–Teller distorted octahedral coordination sphere of copper is formed from four 2,2′-bi-1H-imidazole N atoms and two dimethyl sulfoxide O atoms. The Cu atom lies on a center of inversion. N—H⋯O and N—H⋯F hydrogen bonds give rise to a one-dimensional structure. The BF4 − anion is disordered over two sites in a 0.671 (10):0.329 (10) ratio

Crossref

Directory of Open Access Journals

PubMed Central

Sugar transporter engineering in yeast to enable simultaneous co-utilization of sugars prevalent in cellulosic hydrolysates

Author: Cheng Ming-Hsun
Dai Degaulle
Jin Yong-Su
Kang Nam-Kyu
Kim Junnyeon
Kuanyshev Nurzhan
Singh Vijay
Publication venue: ECI Digital Archives
Publication date: 01/10/2023
Field of study

Please click Additional Files below to see the full abstract

Engineering Conferences International

Wave energy conversion by an array of oscillating water columns deployed along a long-flexible floating breakwater

Author: Cheng Yong
Dai Saishuai
Du Weiming
Incecik Atilla
Yuan Zhiming
Publication venue
Publication date: 31/03/2024
Field of study

Large-scale spatial configurations combining Wave Energy Converters (WECs) and coastal attenuating-wave facilities have the potential to exploit marine renewable energy sustainably. In this study, an integrated concept of multiple Oscillating Water Columns (OWCs) and a very long floating breakwater is introduced. Associated energy extraction, gap resonance and hydroelastic interaction problems are examined. A coupled numerical simulation methodology consisting of a Finite Volume Method (FVM)based solver and a Finite Element Method (FEM) solver, is developed to investigate the strong fluid and structure coupled problem. The fluid-structure information is matched in real-time and the flexible modes of the floating breakwater are obtained by imposing a restrained beam inside the pontoon. The accurate time-domain model is validated against both simulated and measured data. Extensive parametric studies indicate that the energy conversion has a conflict with the wave attenuation in terms of determining the along-shore number of OWCs. The highest energy conversion in medium-period and long-period waves are observed in the OWCs near the end and middle locations, respectively. Besides, the constructive resonant gap effect between OWCs and the breakwater can amplify the peaks of energy conversion efficiency, leads to a sudden collapse in transmission coefficient curves. With an increased sidewall draft, OWCs closer to oblique incident direction generate stronger piston-type and sloshing oscillations. Additionally, compared with a rigid breakwater, the elastic deformation of the breakwater plays a destructive role in wave energy conversion, which is attributed to the out-of-phase interference of multi-mode radiated waves

University of Strathclyde Institutional Repository