5,145 research outputs found

    Task interruption

    Get PDF

    Emergent inabilities? Inverse scaling over the course of pretraining

    Full text link
    Does inverse scaling only occur as a function of model size, or can it also occur over the course of training? We carry out an exploratory study investigating whether the performance of language models on specific tasks can decrease (while general performance remains high) during training on the language modeling task. We find 8 tasks on which Pythia 12B (Biderman et al., 2023) shows decreased performance over the course of training. Five of these tasks (TruthfulQA-MC1, TruthfulQA-MC2, Hindsight Neglect, Memo Trap, and Pattern Match Suppression) additionally show a consistent relationship whereby larger language models show a greater decrease in performance the more they are trained, despite showing standard (positive) scaling overall. This highlights the importance of testing performance at all relevant benchmarks any time models are trained on additional data, even if their overall performance improvesComment: Accepted to Findings of EMNLP 202

    Late Albian adaptive radiation in the calcareous nannofossil genus Eiffellithus

    Get PDF

    Do language models make human-like predictions about the coreferents of Italian anaphoric zero pronouns?

    Full text link
    Some languages allow arguments to be omitted in certain contexts. Yet human language comprehenders reliably infer the intended referents of these zero pronouns, in part because they construct expectations about which referents are more likely. We ask whether Neural Language Models also extract the same expectations. We test whether 12 contemporary language models display expectations that reflect human behavior when exposed to sentences with zero pronouns from five behavioral experiments conducted in Italian by Carminati (2005). We find that three models - XGLM 2.9B, 4.5B, and 7.5B - capture the human behavior from all the experiments, with others successfully modeling some of the results. This result suggests that human expectations about coreference can be derived from exposure to language, and also indicates features of language models that allow them to better reflect human behavior.Comment: Accepted at COLING 202

    Does clinical management improve outcomes following self-Harm? Results from the multicentre study of self-harm in England

    Get PDF
    Background Evidence to guide clinical management of self-harm is sparse, trials have recruited selected samples, and psychological treatments that are suggested in guidelines may not be available in routine practice. Aims To examine how the management that patients receive in hospital relates to subsequent outcome. Methods We identified episodes of self-harm presenting to three UK centres (Derby, Manchester, Oxford) over a 10 year period (2000 to 2009). We used established data collection systems to investigate the relationship between four aspects of management (psychosocial assessment, medical admission, psychiatric admission, referral for specialist mental health follow up) and repetition of self-harm within 12 months, adjusted for differences in baseline demographic and clinical characteristics. Results 35,938 individuals presented with self-harm during the study period. In two of the three centres, receiving a psychosocial assessment was associated with a 40% lower risk of repetition, Hazard Ratios (95% CIs): Centre A 0.99 (0.90–1.09); Centre B 0.59 (0.48–0.74); Centre C 0.59 (0.52–0.68). There was little indication that the apparent protective effects were mediated through referral and follow up arrangements. The association between psychosocial assessment and a reduced risk of repetition appeared to be least evident in those from the most deprived areas. Conclusion These findings add to the growing body of evidence that thorough assessment is central to the management of self-harm, but further work is needed to elucidate the possible mechanisms and explore the effects in different clinical subgroups

    A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages

    Full text link
    How should text dataset sizes be compared across languages? Even for content-matched (parallel) corpora, UTF-8 encoded text can require a dramatically different number of bytes for different languages. In our work, we define the byte premium between two languages as the ratio of bytes used to encode content-matched text in those languages. We compute byte premiums for 1155 languages, and we use linear regressions to estimate byte premiums for other languages. We release a tool to obtain byte premiums for any two languages, enabling comparisons of dataset sizes across languages for more equitable multilingual model development and data practices

    Unraveling the Relation Between Reading Comprehension and Print Exposure

    Get PDF
    The purpose of this study was to test the directionality of influence between reading comprehension (RC) and print exposure (PE), thereby estimating genetic and environmental effects of this relation. The sample consisted of 910 twins in fourth through ninth grades (Mage = 12.33 years, SD = 1.41) from the Florida Twin Project on Reading, Behavior, and Environment. Using direction-of-causation model in a twin design, results supported a direction of influence running from RC to PE. This relation was underpinned by genetic and environmental factors of RC as well as PE. Implications for reading education are discussed

    Chemically defined culture media: rational recipes or witches' brew?

    Get PDF
    A rational approach to study cells, tissues or even organs is to isolate them from the body and bring them into a controlled, and therefore reproducible, environment. In vivo, cells are surrounded by the extracellular matrix, and the body fluids nourish them. In vitro, these fluids are replaced by culture media. In the early days of tissue culture, tissue was cultured in a drop of clotted lymph. The early-day natural nutrient media have gradually become replaced by media of a more defined composition, culminating in the advent of completely defined culture media.Biomedical Reviews 1996; 6: 111-119
    corecore