281 research outputs found

    Language Model Alignment with Elastic Reset

    Full text link
    Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.Comment: Published at NeurIPS 202

    INTRACELLULAR TARGETS OF SPHINGOSINE-1-PHOSPHATE

    Get PDF
    The bioactive lipid mediator sphingosine-1-phosphate (S1P) has emerged as a key regulator of a variety of important physiological functions, including cell growth, cell survival, cell motility, angiogenesis, lymphocyte trafficking, and mast cell function. S1P is formed by two different sphingosine kinases (SphKs) and binds to a family of 5 differentially expressed G-protein coupled receptors (S1PRs). The majority of research to date has focused on the activation of these receptors, but there is compelling evidence to suggest that S1P exerts intracellular functions independent of S1PRs. However no bona fide intracellular targets of S1P have been identified. In my dissertation, I have identified a novel intracellular binding protein for S1P. This finding has important implications for the pleiotropic actions of S1P

    Integration of the End Cap TEC+ of the CMS Silicon Strip Tracker

    Get PDF
    The silicon strip tracker of the CMS experiment has been completed and inserted into the CMS detector in late 2007. The largest sub-system of the tracker is its end cap system, comprising two large end caps (TEC) each containing 3200 silicon strip modules. To ease construction, the end caps feature a modular design: groups of about 20 silicon modules are placed on sub-assemblies called petals and these self-contained elements are then mounted into the TEC support structures. Each end cap consists of 144 petals, and the insertion of these petals into the end cap structure is referred to as TEC integration. The two end caps were integrated independently in Aachen (TEC+) and at CERN (TEC--). This note deals with the integration of TEC+, describing procedures for end cap integration and for quality control during testing of integrated sections of the end cap and presenting results from the testing

    Over-communicate no more: Situated RL agents learn concise communication protocols

    Full text link
    While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other. Much research on communication emergence uses reinforcement learning (RL) and explores unsituated communication in one-step referential tasks -- the tasks are not temporally interactive and lack time pressures typically present in natural communication. In these settings, agents may successfully learn to communicate, but they do not learn to exchange information concisely -- they tend towards over-communication and an inefficient encoding. Here, we explore situated communication in a multi-step task, where the acting agent has to forgo an environmental action to communicate. Thus, we impose an opportunity cost on communication and mimic the real-world pressure of passing time. We compare communication emergence under this pressure against learning to communicate with a cost on articulation effort, implemented as a per-message penalty (fixed and progressively increasing). We find that while all tested pressures can disincentivise over-communication, situated communication does it most effectively and, unlike the cost on effort, does not negatively impact emergence. Implementing an opportunity cost on communication in a temporally extended environment is a step towards embodiment, and might be a pre-condition for incentivising efficient, human-like communication

    Search for Neutral Heavy Leptons Produced in Z Decays

    Get PDF
    Weak isosinglet Neutral Heavy Leptons (νm\nu_m) have been searched for using data collected by the DELPHI detector corresponding to 3.3×1063.3\times 10^{6} hadronic~Z0^{0} decays at LEP1. Four separate searches have been performed, for short-lived νm\nu_m production giving monojet or acollinear jet topologies, and for long-lived νm\nu_m giving detectable secondary vertices or calorimeter clusters. No indication of the existence of these particles has been found, leading to an upper limit for the branching ratio BR(BR(Z0νmν)^0\rightarrow \nu_m \overline{\nu}) of about 1.3×1061.3\times10^{-6} at 95\% confidence level for νm\nu_m masses between 3.5 and 50 GeV/c2c^2. Outside this range the limit weakens rapidly with the νm\nu_m mass. %Special emphasis has been given to the search for monojet--like topologies. One event %has passed the selection, in agreement with the expectation from the reaction: %e+eˉννˉe^+e^- \rightarrow\ell \bar\ell \nu\bar\nu. The results are also interpreted in terms of limits for the single production of excited neutrinos

    First Measurement of the Strange Quark Asymmetry at the Z0Z^{0} Peak

    Get PDF
    corecore