44 research outputs found

    Levinson's theorem for graphs

    Full text link
    We prove an analog of Levinson's theorem for scattering on a weighted (m+1)-vertex graph with a semi-infinite path attached to one of its vertices. In particular, we show that the number of bound states in such a scattering problem is equal to m minus half the winding number of the phase of the reflection coefficient (where each so-called half-bound state is counted as half a bound state).Comment: 10 pages, 1 figure; v2: minor correction

    Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

    Full text link
    Tokenization, the division of input text into input tokens, is an often overlooked aspect of the large language model (LLM) pipeline and could be the source of useful or harmful inductive biases. Historically, LLMs have relied on byte pair encoding, without care to specific input domains. With the increased use of LLMs for reasoning, various number-specific tokenization schemes have been adopted, with popular models like LLaMa and PaLM opting for single-digit tokenization while GPT-3.5 and GPT-4 have separate tokens for each 1-, 2-, and 3-digit numbers. In this work, we study the effect this choice has on numerical reasoning through the use of arithmetic tasks. We consider left-to-right and right-to-left tokenization for GPT-3.5 and -4, finding that right-to-left tokenization (enforced by comma separating numbers at inference time) leads to largely improved performance. Furthermore, we find that model errors when using standard left-to-right tokenization follow stereotyped error patterns, suggesting that model computations are systematic rather than approximate. We show that the model is able to convert between tokenizations easily, thus allowing chain-of-thought-inspired approaches to recover performance on left-to-right tokenized inputs. We also find the gap between tokenization directions decreases when models are scaled, possibly indicating that larger models are better able to override this tokenization-dependent inductive bias. In summary, our work performs the first study of how number tokenization choices lead to differences in model performance on arithmetic tasks, accompanied by a thorough analysis of error patterns. We hope this work inspires practitioners to more carefully ablate number tokenization-related choices when working towards general models of numerical reasoning.Comment: 21 pages, 18 figure

    A Neural Architecture for Designing Truthful and Efficient Auctions

    Full text link
    Auctions are protocols to allocate goods to buyers who have preferences over them, and collect payments in return. Economists have invested significant effort in designing auction rules that result in allocations of the goods that are desirable for the group as a whole. However, for settings where participants' valuations of the items on sale are their private information, the rules of the auction must deter buyers from misreporting their preferences, so as to maximize their own utility, since misreported preferences hinder the ability for the auctioneer to allocate goods to those who want them most. Manual auction design has yielded excellent mechanisms for specific settings, but requires significant effort when tackling new domains. We propose a deep learning based approach to automatically design auctions in a wide variety of domains, shifting the design work from human to machine. We assume that participants' valuations for the items for sale are independently sampled from an unknown but fixed distribution. Our system receives a data-set consisting of such valuation samples, and outputs an auction rule encoding the desired incentive structure. We focus on producing truthful and efficient auctions that minimize the economic burden on participants. We evaluate the auctions designed by our framework on well-studied domains, such as multi-unit and combinatorial auctions, showing that they outperform known auction designs in terms of the economic burden placed on participants

    Confronting Reward Model Overoptimization with Constrained RLHF

    Full text link
    Large language models are typically aligned with human preferences by optimizing reward models\textit{reward models} (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to overoptimization\textit{overoptimization}, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM's threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run

    Melting Pot 2.0

    Full text link
    Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.Comment: 59 pages, 54 figures. arXiv admin note: text overlap with arXiv:2107.0685

    Quality of care for hypertension in the United States

    Get PDF
    BACKGROUND: Despite heavy recent emphasis on blood pressure (BP) control, many patients fail to meet widely accepted goals. While access and adherence to therapy certainly play a role, another potential explanation is poor quality of essential care processes (QC). Yet little is known about the relationship between QC and BP control. METHODS: We assessed QC in 12 U.S. communities by reviewing the medical records of a randomly selected group of patients for the two years preceding our study. We included patients with either a diagnosis of hypertension or two visits with BPs of ≥140/90 in their medical records. We used 28 process indicators based on explicit evidence to assess QC. The indicators covered a broad spectrum of care and were developed through a modified Delphi method. We considered patients who received all indicated care to have optimal QC. We defined control of hypertension as BP < 140/90 in the most recent reading. RESULTS: Of 1,953 hypertensive patients, only 57% received optimal care and 42% had controlled hypertension. Patients who had received optimal care were more likely to have their BP under control at the end of the study (45% vs. 35%, p = .0006). Patients were more likely to receive optimal care if they were over age 50 (76% vs. 63%, p < .0001), had diabetes (77% vs. 71%, p = .0038), coronary artery disease (87% vs. 69%, p < .0001), or hyperlipidemia (80% vs. 68%, p < .0001), and did not smoke (73% vs. 66%, p = .0005). CONCLUSIONS: Higher QC for hypertensive patients is associated with better BP control. Younger patients without cardiac risk factors are at greatest risk for poor care. Quality measurement systems like the one presented in this study can guide future quality improvement efforts
    corecore