3,061 research outputs found

    How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

    Full text link
    We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations? Answering this question has practical implications for LLM users (e.g., deciding which models to try), developers (e.g., prioritizing evaluation on representative tasks), and the research community (e.g., identifying hard-to-predict capabilities that warrant further investigation). We study the performance prediction problem on experiment records from BIG-bench. On a random train-test split, an MLP-based predictor achieves RMSE below 5%, demonstrating the presence of learnable patterns within the experiment records. Further, we formulate the problem of searching for "small-bench," an informative subset of BIG-bench tasks from which the performance of the full set can be maximally recovered, and find a subset as informative for evaluating new model families as BIG-bench Hard, while being 3x smaller

    Estimating Large Language Model Capabilities without Labeled Test Data

    Full text link
    Large Language Models (LLMs) have exhibited an impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where ICL is most appealing. In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled data for that task. To perform ICL accuracy estimation, we propose a method that trains a meta-model using LLM confidence scores as features. We compare our method to several strong accuracy estimation baselines on a new benchmark that covers 4 LLMs and 3 task collections. On average, the meta-model improves over all baselines and achieves the same estimation performance as directly evaluating on 40 labeled test examples per task, across the total 12 settings. We encourage future work to improve on our methods and evaluate on our ICL accuracy estimation benchmark to deepen our understanding of when ICL works.Comment: 14 pages, 4 figure

    Using historical tropical cyclone climate datasets to examine wind speed recurrence for coastal Australia

    Get PDF
    Likelihood estimates of extreme winds, including those from tropical cyclones (TCs) at certain locations are used to inform wind load standards for structural design. Here, wind speed average recurrence intervals (ARIs) determined from TC climate data dating back to the 1970s in two quantile–quantile adjusted reanalysis datasets (ERA5 and BARRA [1990]), and best-track observations for context, were compared with Standardized ARIs (AS/NZS) across seven tropical and two subtropical Australian inland coastal regions. The novelty of this work lies in determining TC-wind speed ARIs from a range of datasets that are not typically used to evaluate this metric. Inherent differences between the data used to determine the Standard ARIs (large sample size allow for larger extrapolations; GEV function) and TC data ARIs (smaller sample size and less certain data; the more asymptotic Lognormal/Weibull functions are used) led to the use of different extreme value functions. Results indicated that although these are two distinct ways of determining design wind speeds, when they are considered equivalent, there was a moderate reproduction of the ARI curves with respect to the Standard in both reanalysis datasets, suggesting that similar analyses using climate model products can provide useful information on these types of metrics with some caveats. Trends in TC wind strength affecting coastal Australia were also analyzed, indicating a potential slight downtrend in tropical West coast TC wind strength and slight uptrend for tropical East coast TC wind strength, noting considerable uncertainty given the short time period and limitations of data quality including over longer time periods. Such trends are not only limited to the relationship between TC intensity and anthropogenic warming, but also to regional changes in TC frequency and track direction. This could lead to significant trends emerging in regional Australian TC wind gust strength before several decades of warming have occurred. It is hoped that climate models can provide both longer-term and a more homogenous base for these types of evaluations and subsequent projections with respect to climate change simulations. © 2022, Crown

    Western north pacific tropical cyclone tracks in cmip5 models : statistical assessment using a model-independent detection and tracking scheme

    Get PDF
    Past studies have shown that tropical cyclone (TC) projection results can be sensitive to different types of TC tracking schemes, and that the relative adjustments of detection criteria to accommodate different models may not necessarily provide a consistent platform for comparison of projection results. Here, future climate projections of TC activity in the western North Pacific basin (WNP, defined from 0°-50°NAND 100°E-180°) are assessed with a model-independent detection and tracking scheme. This scheme is applied to models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) forced under the historical and representative concentration pathway 8.5 (RCP8.5) conditions. TC tracks from the observed records and independent models are analyzed simultaneously with a curve-clustering algorithm, allowing observed and model tracks to be projected onto the same set of clusters (k =9). Four of the nine clusters were projected to undergo significant changes in TC frequency. Straight-moving TCs in the South China Sea were projected to significantly decrease. Projected increases in TC frequency were found poleward of 20°N and east of 160°E, consistent with changes in ascending motion, as well as vertical wind shear and relative humidity respectively. Projections of TC track exposure indicated significant reductions for southern China and the Philippines and significant increases for the Korean peninsula and Japan, although very few model TCs reached the latter subtropical regions in comparison to the observations. The use of a fundamentally different detection methodology that overcomes the detector/tracker bias gives increased certainty to projections as best as lowresolution simulations can offer. © 2019 American Meteorological Society

    Borcherds symmetries in M-theory

    Get PDF
    It is well known but rather mysterious that root spaces of the EkE_k Lie groups appear in the second integral cohomology of regular, complex, compact, del Pezzo surfaces. The corresponding groups act on the scalar fields (0-forms) of toroidal compactifications of M theory. Their Borel subgroups are actually subgroups of supergroups of finite dimension over the Grassmann algebra of differential forms on spacetime that have been shown to preserve the self-duality equation obeyed by all bosonic form-fields of the theory. We show here that the corresponding duality superalgebras are nothing but Borcherds superalgebras truncated by the above choice of Grassmann coefficients. The full Borcherds' root lattices are the second integral cohomology of the del Pezzo surfaces. Our choice of simple roots uses the anti-canonical form and its known orthogonal complement. Another result is the determination of del Pezzo surfaces associated to other string and field theory models. Dimensional reduction on TkT^k corresponds to blow-up of kk points in general position with respect to each other. All theories of the Magic triangle that reduce to the EnE_n sigma model in three dimensions correspond to singular del Pezzo surfaces with A8nA_{8-n} (normal) singularity at a point. The case of type I and heterotic theories if one drops their gauge sector corresponds to non-normal (singular along a curve) del Pezzo's. We comment on previous encounters with Borcherds algebras at the end of the paper.Comment: 30 pages. Besides expository improvements, we exclude by hand real fermionic simple roots when they would naively aris

    Embedding PbS Quantum Dots (QDs) in Pb-Halide Perovskite Matrices: QD Surface Chemistry and Antisolvent Effects on QD Dispersion and Confinement Properties

    Get PDF
    Hybrid materials of metal chalcogenide colloidal quantum dots (QDs) embedded in metal halide perovskites (MHPs) have led to composites with synergistic properties. Here, we investigate how QD size, surface chemistry, and MHP film formation methods affect the resulting optoelectronic properties of QD/MHP “dot-in-matrix” systems. We monitor the QD absorption and photoluminescence throughout synthesis, ligand exchange, and transfer into the MHP ink, and we characterize the final QD/MHP films via electron microscopy and transient absorption. In addition, we are the first to globally map how PbS QDs are distributed on the micrometer scale within these dot-in-matrix systems, using three-dimensional (3D) tomography time-of-flight secondary ion mass spectrometry. The surface chemistry imparted during synthesis directly affects the optical properties of the dot-in-matrix composites. Pb-halide passivation leads to QD/MHP dot-in-matrix samples with optical properties that are well-described by a theoretical model, based on a Type I finite-barrier heterostructure between the PbS QD and the MHP matrix. Samples without Pb-halide passivation show complicated size-dependent behavior, indicating a transition from a Type I heterostructure between the PbS QD wells and MHP barriers for small-sized QDs to PbS QDs that are electronically decoupled from the MHP matrix for larger QDs. Furthermore, the choice in perovskite antisolvent crystallization method leads to a difference in the spatial QD distribution within the perovskite matrix, differences in carrier lifetime, and photoluminescence shifts of up to 180 meV for PbS in methylammonium lead iodide. This work establishes an understanding of such emerging synergistic systems relevant for technologies such as photovoltaics, infrared emitters and detectors, and other unexplored technological applications

    Parabens as Urinary Biomarkers of Exposure in Humans

    Get PDF
    BACKGROUND: Parabens appear frequently as antimicrobial preservatives in cosmetic products, in pharmaceuticals, and in food and beverage processing. In vivo and in vitro studies have revealed weak estrogenic activity of some parabens. Widespread use has raised concerns about the potential human health risks associated with paraben exposure. OBJECTIVES: Assessing human exposure to parabens usually involves measuring in urine the conjugated or free species of parabens or their metabolites. In animals, parabens are mostly hydrolyzed to p-hydroxybenzoic acid and excreted in the urine as conjugates. Still, monitoring urinary concentrations of p-hydroxybenzoic acid is not necessarily the best way to assess exposure to parabens. p-Hydroxybenzoic acid is a nonspecific biomarker, and the varying estrogenic bioactivities of parabens require specific biomarkers. Therefore, we evaluated the use of free and conjugated parent parabens as new biomarkers for human exposure to these compounds. RESULTS: We measured the urinary concentrations of methyl, ethyl, n-propyl, butyl (n- and iso-), and benzyl parabens in a demographically diverse group of 100 anonymous adults. We detected methyl and n-propyl parabens at the highest median concentrations (43.9 ng/mL and 9.05 ng/mL, respectively) in nearly all (> 96%) of the samples. We also detected other parabens in more than half of the samples (ethyl, 58%; butyl, 69%). Most important, however, we found that parabens in urine appear predominantly in their conjugated forms. CONCLUSIONS: The results, demonstrating the presence of urinary conjugates of parabens in humans, suggest that such conjugated parabens could be used as exposure biomarkers. Additionally, the fact that conjugates appear to be the main urinary products of parabens may be important for risk assessment

    T Cell Costimulation through CD28 Depends on Induction of the Bcl-xγ Isoform: Analysis of Bcl-xγ–deficient Mice

    Get PDF
    The molecular basis of CD28-dependent costimulation of T cells is poorly understood. Bcl-xγ is a member of the Bcl-x family whose expression is restricted to activated T cells and requires CD28-dependent ligation for full expression. We report that Bcl-xγ–deficient (Bcl-xγ−/−) T cells display defective proliferative and cytokine responses to CD28-dependent costimulatory signals, impaired memory responses to proteolipid protein peptide (PLP), and do not develop PLP-induced experimental autoimmune encephalomyelitis (EAE). In contrast, enforced expression of Bcl-xγ largely replaces the requirement for B7-dependent ligation of CD28. These findings identify the Bcl-xγ cytosolic protein as an essential downstream link in the CD28-dependent signaling pathway that underlies T cell costimulation
    corecore