946 research outputs found

    Adversarial Preference Optimization

    Full text link
    Human preference alignment is a crucial training step to improve the interaction quality of large language models (LLMs). Existing aligning methods depend on manually annotated preference data to guide the LLM optimization directions. However, in practice, continuously updating LLMs raises a distribution gap between model-generated samples and human-preferred responses, which hinders model fine-tuning efficiency. To mitigate this issue, previous methods require additional preference annotation on generated samples to adapt the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an adversarial preference optimization (APO) framework, where the LLM agent and the preference model update alternatively via a min-max game. Without additional annotation, our APO method can make a self-adaption to the generation distribution gap through the adversarial learning process. In experiments, we empirically verify the effectiveness of APO in improving LLM's helpfulness and harmlessness compared with rejection sampling baselines.Comment: In proces

    Everyone Deserves A Reward: Learning Customized Human Preferences

    Full text link
    Reward models (RMs) are crucial in aligning large language models (LLMs) with human preferences for improving interaction quality. However, the real world is pluralistic, which leads to diversified human preferences based on different religions, politics, cultures, etc. Moreover, each individual can have their own unique preferences on various topics. Neglecting the diversity of human preferences, current LLM training processes only use a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which collects preferred responses to each given query from four practical domains. Besides, from the perspective of data efficiency, we proposed a three-stage customized RM learning scheme, whose effectiveness is empirically verified on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages, and have found several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP

    Measuring the beaming angle of GRB 030329 by fitting the rebrightenings in its multiband afterglow

    Full text link
    Multiple rebrightenings have been observed in the multiband afterglow of GRB 030329. Especially, a marked and quick rebrightening occurred at about t ~ 1.2 * 10^5 s. Energy injection from late and slow shells seems to be the best interpretation for these rebrightenings. Usually it is assumed that the energy is injected into the whole external shock. However, in the case of GRB 030329, the rebrightenings are so quick that the usual consideration fails to give a satisfactory fit to the observed light curves. Actually, since these late/slow shells coast freely in the wake of the external shock, they should be cold and may not expand laterally. The energy injection then should only occur at the central region of the external shock. Considering this effect, we numerically re-fit the quick rebrightenings observed in GRB 030329. By doing this, we were able to derive the beaming angle of the energy injection process. Our result, with a relative residual of only 5% - 10% during the major rebrightening, is better than any previous modeling. The derived energy injection angle is about 0.035. We assume that these late shells are ejected by the central engine via the same mechanism as those early shells that produce the prompt gamma-ray burst. The main difference is that their velocities are much slower, so that they catch up with the external shock very lately and manifest as the observed quick rebrightenings. If this were true, then the derived energy injection angle can give a good measure of the beaming angle of the prompt gamma-ray emission. Our study may hopefully provide a novel method to measure the beaming angle of gamma-ray bursts.Comment: 8 pages, 6 figures, Has been accepted by RAA (Research in Astronomy and Astrophysics

    Bis(2,2′-bi-1H-imidazole-κ2 N 3,N 3′)bis­(dimethyl sulfoxide-κO)copper(II) bis­(tetra­fluoridoborate)

    Get PDF
    In the title copper(II) salt, [Cu(C6H6N4)2(C2H6OS)2](BF4)2, the Jahn–Teller distorted octa­hedral coordination sphere of copper is formed from four 2,2′-bi-1H-imidazole N atoms and two dimethyl sulfoxide O atoms. The Cu atom lies on a center of inversion. N—H⋯O and N—H⋯F hydrogen bonds give rise to a one-dimensional structure. The BF4 − anion is disordered over two sites in a 0.671 (10):0.329 (10) ratio

    Wave energy conversion by an array of oscillating water columns deployed along a long-flexible floating breakwater

    Get PDF
    Large-scale spatial configurations combining Wave Energy Converters (WECs) and coastal attenuating-wave facilities have the potential to exploit marine renewable energy sustainably. In this study, an integrated concept of multiple Oscillating Water Columns (OWCs) and a very long floating breakwater is introduced. Associated energy extraction, gap resonance and hydroelastic interaction problems are examined. A coupled numerical simulation methodology consisting of a Finite Volume Method (FVM)based solver and a Finite Element Method (FEM) solver, is developed to investigate the strong fluid and structure coupled problem. The fluid-structure information is matched in real-time and the flexible modes of the floating breakwater are obtained by imposing a restrained beam inside the pontoon. The accurate time-domain model is validated against both simulated and measured data. Extensive parametric studies indicate that the energy conversion has a conflict with the wave attenuation in terms of determining the along-shore number of OWCs. The highest energy conversion in medium-period and long-period waves are observed in the OWCs near the end and middle locations, respectively. Besides, the constructive resonant gap effect between OWCs and the breakwater can amplify the peaks of energy conversion efficiency, leads to a sudden collapse in transmission coefficient curves. With an increased sidewall draft, OWCs closer to oblique incident direction generate stronger piston-type and sloshing oscillations. Additionally, compared with a rigid breakwater, the elastic deformation of the breakwater plays a destructive role in wave energy conversion, which is attributed to the out-of-phase interference of multi-mode radiated waves
    • …
    corecore