946 research outputs found
Adversarial Preference Optimization
Human preference alignment is a crucial training step to improve the
interaction quality of large language models (LLMs). Existing aligning methods
depend on manually annotated preference data to guide the LLM optimization
directions. However, in practice, continuously updating LLMs raises a
distribution gap between model-generated samples and human-preferred responses,
which hinders model fine-tuning efficiency. To mitigate this issue, previous
methods require additional preference annotation on generated samples to adapt
the shifted distribution, which consumes a large amount of annotation
resources. Targeting more efficient human preference optimization, we propose
an adversarial preference optimization (APO) framework, where the LLM agent and
the preference model update alternatively via a min-max game. Without
additional annotation, our APO method can make a self-adaption to the
generation distribution gap through the adversarial learning process. In
experiments, we empirically verify the effectiveness of APO in improving LLM's
helpfulness and harmlessness compared with rejection sampling baselines.Comment: In proces
Everyone Deserves A Reward: Learning Customized Human Preferences
Reward models (RMs) are crucial in aligning large language models (LLMs) with
human preferences for improving interaction quality. However, the real world is
pluralistic, which leads to diversified human preferences based on different
religions, politics, cultures, etc. Moreover, each individual can have their
own unique preferences on various topics. Neglecting the diversity of human
preferences, current LLM training processes only use a general reward model,
which is below satisfaction for customized or personalized application
scenarios. To explore customized preference learning, we collect a
domain-specific preference (DSP) dataset, which collects preferred responses to
each given query from four practical domains. Besides, from the perspective of
data efficiency, we proposed a three-stage customized RM learning scheme, whose
effectiveness is empirically verified on both general preference datasets and
our DSP set. Furthermore, we test multiple training and data strategies on the
three learning stages, and have found several ways to better preserve the
general preferring ability while training the customized RMs, especially
general preference enrichment and customized preference imitation learning. The
DSP dataset and code are available at https://github.com/Linear95/DSP
Recommended from our members
A ribose-functionalized NAD+ with unexpected high activity and selectivity for protein poly-ADP-ribosylation.
Nicotinamide adenine dinucleotide (NAD+)-dependent ADP-ribosylation plays important roles in physiology and pathophysiology. It has been challenging to study this key type of enzymatic post-translational modification in particular for protein poly-ADP-ribosylation (PARylation). Here we explore chemical and chemoenzymatic synthesis of NAD+ analogues with ribose functionalized by terminal alkyne and azido groups. Our results demonstrate that azido substitution at 3'-OH of nicotinamide riboside enables enzymatic synthesis of an NAD+ analogue with high efficiency and yields. Notably, the generated 3'-azido NAD+ exhibits unexpected high activity and specificity for protein PARylation catalyzed by human poly-ADP-ribose polymerase 1 (PARP1) and PARP2. And its derived poly-ADP-ribose polymers show increased resistance to human poly(ADP-ribose) glycohydrolase-mediated degradation. These unique properties lead to enhanced labeling of protein PARylation by 3'-azido NAD+ in the cellular contexts and facilitate direct visualization and labeling of mitochondrial protein PARylation. The 3'-azido NAD+ provides an important tool for studying cellular PARylation
Measuring the beaming angle of GRB 030329 by fitting the rebrightenings in its multiband afterglow
Multiple rebrightenings have been observed in the multiband afterglow of GRB
030329. Especially, a marked and quick rebrightening occurred at about t ~ 1.2
* 10^5 s. Energy injection from late and slow shells seems to be the best
interpretation for these rebrightenings. Usually it is assumed that the energy
is injected into the whole external shock. However, in the case of GRB 030329,
the rebrightenings are so quick that the usual consideration fails to give a
satisfactory fit to the observed light curves. Actually, since these late/slow
shells coast freely in the wake of the external shock, they should be cold and
may not expand laterally. The energy injection then should only occur at the
central region of the external shock. Considering this effect, we numerically
re-fit the quick rebrightenings observed in GRB 030329. By doing this, we were
able to derive the beaming angle of the energy injection process. Our result,
with a relative residual of only 5% - 10% during the major rebrightening, is
better than any previous modeling. The derived energy injection angle is about
0.035. We assume that these late shells are ejected by the central engine via
the same mechanism as those early shells that produce the prompt gamma-ray
burst. The main difference is that their velocities are much slower, so that
they catch up with the external shock very lately and manifest as the observed
quick rebrightenings. If this were true, then the derived energy injection
angle can give a good measure of the beaming angle of the prompt gamma-ray
emission. Our study may hopefully provide a novel method to measure the beaming
angle of gamma-ray bursts.Comment: 8 pages, 6 figures, Has been accepted by RAA (Research in Astronomy
and Astrophysics
Bis(2,2′-bi-1H-imidazole-κ2 N 3,N 3′)bisÂ(dimethyl sulfoxide-κO)copper(II) bisÂ(tetraÂfluoridoborate)
In the title copper(II) salt, [Cu(C6H6N4)2(C2H6OS)2](BF4)2, the Jahn–Teller distorted octaÂhedral coordination sphere of copper is formed from four 2,2′-bi-1H-imidazole N atoms and two dimethyl sulfoxide O atoms. The Cu atom lies on a center of inversion. N—H⋯O and N—H⋯F hydrogen bonds give rise to a one-dimensional structure. The BF4
− anion is disordered over two sites in a 0.671 (10):0.329 (10) ratio
Sugar transporter engineering in yeast to enable simultaneous co-utilization of sugars prevalent in cellulosic hydrolysates
Please click Additional Files below to see the full abstract
Wave energy conversion by an array of oscillating water columns deployed along a long-flexible floating breakwater
Large-scale spatial configurations combining Wave Energy Converters (WECs) and coastal attenuating-wave facilities have the potential to exploit marine renewable energy sustainably. In this study, an integrated concept of multiple Oscillating Water Columns (OWCs) and a very long floating breakwater is introduced. Associated energy extraction, gap resonance and hydroelastic interaction problems are examined. A coupled numerical simulation methodology consisting of a Finite Volume Method (FVM)based solver and a Finite Element Method (FEM) solver, is developed to investigate the strong fluid and structure coupled problem. The fluid-structure information is matched in real-time and the flexible modes of the floating breakwater are obtained by imposing a restrained beam inside the pontoon. The accurate time-domain model is validated against both simulated and measured data. Extensive parametric studies indicate that the energy conversion has a conflict with the wave attenuation in terms of determining the along-shore number of OWCs. The highest energy conversion in medium-period and long-period waves are observed in the OWCs near the end and middle locations, respectively. Besides, the constructive resonant gap effect between OWCs and the breakwater can amplify the peaks of energy conversion efficiency, leads to a sudden collapse in transmission coefficient curves. With an increased sidewall draft, OWCs closer to oblique incident direction generate stronger piston-type and sloshing oscillations. Additionally, compared with a rigid breakwater, the elastic deformation of the breakwater plays a destructive role in wave energy conversion, which is attributed to the out-of-phase interference of multi-mode radiated waves
- …