361 research outputs found

    The Transient Nature of Emergent In-Context Learning in Transformers

    Full text link
    Transformer neural networks can exhibit a surprising capacity for in-context learning (ICL) despite not being explicitly trained for it. Prior work has provided a deeper understanding of how ICL emerges in transformers, e.g. through the lens of mechanistic interpretability, Bayesian inference, or by examining the distributional properties of training data. However, in each of these cases, ICL is treated largely as a persistent phenomenon; namely, once ICL emerges, it is assumed to persist asymptotically. Here, we show that the emergence of ICL during transformer training is, in fact, often transient. We train transformers on synthetic data designed so that both ICL and in-weights learning (IWL) strategies can lead to correct predictions. We find that ICL first emerges, then disappears and gives way to IWL, all while the training loss decreases, indicating an asymptotic preference for IWL. The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models. We find that L2 regularization may offer a path to more persistent ICL that removes the need for early stopping based on ICL-style validation tasks. Finally, we present initial evidence that ICL transience may be caused by competition between ICL and IWL circuits.Comment: 19 pages, 16 figure

    Confronting Reward Model Overoptimization with Constrained RLHF

    Full text link
    Large language models are typically aligned with human preferences by optimizing reward models\textit{reward models} (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to overoptimization\textit{overoptimization}, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM's threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run

    Significance of Four Methionine Sulfoxide Reductases in Staphylococcus aureus

    Get PDF
    Staphylococcus aureus is a major human pathogen and emergence of antibiotic resistance in clinical staphylococcal isolates raises concerns about our ability to control these infections. Cell wall-active antibiotics cause elevated synthesis of methionine sulfoxide reductases (Msrs: MsrA1 and MsrB) in S. aureus. MsrA and MsrB enzymes reduce S-epimers and R-epimers of methionine sulfoxide, respectively, that are generated under oxidative stress. In the S. aureus chromosome, there are three msrA genes (msrA1, msrA2 and msrA3) and one msrB gene. To understand the precise physiological roles of Msr proteins in S. aureus, mutations in msrA1, msrA2 and msrA3 and msrB genes were created by site-directed mutagenesis. These mutants were combined to create a triple msrA (msrA1, msrA2 and msrA3) and a quadruple msrAB (msrA1, msrA2, msrA3, msrB) mutant. These mutants were used to determine the roles of Msr proteins in staphylococcal growth, antibiotic resistance, adherence to human lung epithelial cells, pigment production, and survival in mice relative to the wild-type strains. MsrA1-deficient strains were sensitive to oxidative stress conditions, less pigmented and less adherent to human lung epithelial cells, and showed reduced survival in mouse tissues. In contrast, MsrB-deficient strains were resistant to oxidants and were highly pigmented. Lack of MsrA2 and MsrA3 caused no apparent growth defect in S. aureus. In complementation experiments with the triple and quadruple mutants, it was MsrA1 and not MsrB that was determined to be critical for adherence and phagocytic resistance of S. aureus. Overall, the data suggests that MsrA1 may be an important virulence factor and MsrB probably plays a balancing act to counter the effect of MsrA1 in S. aureus.This work was supported in part by a Warner/Fermaturo grant and A.T. Still University Board of Trustees Research Funds, by grant 1R15AI090680-01 from the National Institutes of Health to VKS, and grants from the Kirksville College of Osteopathic Medicine Biomedical Sciences Graduate Program to TRJ and KRB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Conditional Allocation of Control Rights in Venture Capital Finance

    Get PDF
    When a young entrepreneurial firm matures, it is often necessary to replace the founding entrepreneur by a professional manager. This replacement decision can be affected by the private benefits of control enjoyed by the entrepreneur which gives rise to a conflict of interest between the entrepreneur and the venture capitalist. We show that a combination of convertible securities and contingent control rights can be used to resolve this conflict efficiently. This contractual arrangement is frequently observed in venture capital finance

    Seizures, ataxia and parvalbumin-expressing interneurons respond to selenium supply in Selenop-deficient mice

    Get PDF
    Mice with constitutive disruption of the Selenop gene have been key to delineate the importance of selenoproteins in neurobiology. However, the phenotype of this mouse model is exquisitely dependent on selenium supply and timing of selenium supplementation. Combining biochemical, histological, and behavioral methods, we tested the hypothesis that parvalbumin-expressing interneurons in the primary somatosensory cortex and hippocampus depend on dietary selenium availability in Selenop−/− mice. Selenop-deficient mice kept on adequate selenium diet (0.15 mg/kg, i.e. the recommended dietary allowance, RDA) developed ataxia, tremor, and hyperexcitability between the age of 4–5 weeks. Video-electroencephalography demonstrated epileptic seizures in Selenop−/− mice fed the RDA diet, while Selenop ± heterozygous mice behaved normally. Both neurological phenotypes, hyperexcitability/seizures and ataxia/dystonia were successfully prevented by selenium supplementation from birth or transgenic expression of human SELENOP under a hepatocyte-specific promoter. Selenium supplementation with 10 μM selenite in the drinking water on top of the RDA diet increased the activity of glutathione peroxidase in the brains of Selenop−/− mice to control levels. The effects of selenium supplementation on the neurological phenotypes were dose- and time-dependent. Selenium supplementation after weaning was apparently too late to prevent ataxia/dystonia, while selenium withdrawal from rescued Selenop−/− mice eventually resulted in ataxia. We conclude that SELENOP expression is essential for preserving interneuron survival under limiting Se supply, while SELENOP appears dispensable under sufficiently high Se status

    Main-Belt Comet P/2012 T1 (PANSTARRS)

    Full text link
    We present initial results from observations and numerical analyses aimed at characterizing main-belt comet P/2012 T1 (PANSTARRS). Optical monitoring observations were made between October 2012 and February 2013 using the University of Hawaii 2.2 m telescope, the Keck I telescope, the Baade and Clay Magellan telescopes, Faulkes Telescope South, the Perkins Telescope at Lowell Observatory, and the Southern Astrophysical Research (SOAR) telescope. The object's intrinsic brightness approximately doubles from the time of its discovery in early October until mid-November and then decreases by ~60% between late December and early February, similar to photometric behavior exhibited by several other main-belt comets and unlike that exhibited by disrupted asteroid (596) Scheila. We also used Keck to conduct spectroscopic searches for CN emission as well as absorption at 0.7 microns that could indicate the presence of hydrated minerals, finding an upper limit CN production rate of QCN<1.5x10^23 mol/s, from which we infer a water production rate of QH2O<5x10^25 mol/s, and no evidence of the presence of hydrated minerals. Numerical simulations indicate that P/2012 T1 is largely dynamically stable for >100 Myr and is unlikely to be a recently implanted interloper from the outer solar system, while a search for potential asteroid family associations reveal that it is dynamically linked to the ~155 Myr-old Lixiaohua asteroid family.Comment: 15 pages, 4 figures, accepted for publication in ApJ Letter

    Variation in MSRA Modifies Risk of Neonatal Intestinal Obstruction in Cystic Fibrosis

    Get PDF
    Meconium ileus (MI), a life-threatening intestinal obstruction due to meconium with abnormal protein content, occurs in approximately 15 percent of neonates with cystic fibrosis (CF). Analysis of twins with CF demonstrates that MI is a highly heritable trait, indicating that genetic modifiers are largely responsible for this complication. Here, we performed regional family-based association analysis of a locus that had previously been linked to MI and found that SNP haplotypes 5′ to and within the MSRA gene were associated with MI (P = 1.99×10−5 to 1.08×10−6; Bonferroni P = 0.057 to 3.1×10−3). The haplotype with the lowest P value showed association with MI in an independent sample of 1,335 unrelated CF patients (OR = 0.72, 95% CI [0.53–0.98], P = 0.04). Intestinal obstruction at the time of weaning was decreased in CF mice with Msra null alleles compared to those with wild-type Msra resulting in significant improvement in survival (P = 1.2×10−4). Similar levels of goblet cell hyperplasia were observed in the ilea of the Cftr−/− and Cftr−/−Msra−/− mice. Modulation of MSRA, an antioxidant shown to preserve the activity of enzymes, may influence proteolysis in the developing intestine of the CF fetus, thereby altering the incidence of obstruction in the newborn period. Identification of MSRA as a modifier of MI provides new insight into the biologic mechanism of neonatal intestinal obstruction caused by loss of CFTR function

    Root canal morphology of primary maxillary second molars:a micro-computed tomography analysis

    Get PDF
    Aim Successful endodontic treatment of primary teeth requires comprehensive knowledge and understanding of root canal morphology. The purpose of this study was to investigate the root canal configurations of primary maxillary second molars using micro-computed tomography. Methods Extracted human primary maxillary second molars (n = 57) were scanned using micro-computed tomography and reconstructed to produce three-dimensional models. Each root canal system was analysed qualitatively according to Vertucci's classification. Results 22.8% (n = 13) of the sample presented with the fusion of the disto-buccal and palatal roots; of these, Type V was the most prevalent classification. For teeth with three separate roots (n = 44), the most common root canal type was Type 1 for the palatal canal (100%) and disto-buccal canal (77.3%) and Type V for the mesio-buccal canal (36.4%). Overall, 7% (n = 4) of mesio-buccal canals were 'unclassifiable'. Conclusion The root canal systems of primary maxillary second molars were not only complex but had a range of configurations that may contribute to unfavourable clinical outcomes after endodontic treatment

    Hyperthyroidism and human chorionic gonadotrophin production in gestational trophoblastic disease

    Get PDF
    Background: Gestational trophoblastic disease (GTD) is a rare complication of pregnancy, ranging from molar pregnancy to choriocarcinoma. Patients with persistent disease require treatment with chemotherapy. For the vast majority, prognosis is excellent. Occasionally, GTD is complicated by hyperthyroidism, which may require treatment. This is thought to occur due to molecular mimicry between human chorionic gonadotrophin (HCG) and thyroid-stimulating hormone (TSH), and hence cross-reactivity with the TSH receptor. Hyperthyroidism usually resolves as the GTD is successfully treated and correspondingly HCG levels normalise. Methods: This paper reviews cases of GTD treated over a 5-year period at one of the three UK centres and identifies the prevalence of hyperthyroidism in this population. Four cases with clinical hyperthyroidism are discussed. Results: On review of the 196 patients with gestational trophoblastic neoplasia treated with chemotherapy in Sheffield since 2005, 14 (7%) had biochemical hyperthyroidism. Of these, four had evidence of clinical hyperthyroidism. Conclusion: Concomitant biochemical thyroid disease in patients with GTD is relatively common, and measurement of thyroid function in patients with persistent GTD is, therefore, important. The development of hyperthyroidism is largely influenced by the level of HCG and disease burden, and usually settles with treatment of the persistent GTD. However, rarely the thyroid stimulation can have potentially life-threatening consequences
    corecore