44 research outputs found
Levinson's theorem for graphs
We prove an analog of Levinson's theorem for scattering on a weighted
(m+1)-vertex graph with a semi-infinite path attached to one of its vertices.
In particular, we show that the number of bound states in such a scattering
problem is equal to m minus half the winding number of the phase of the
reflection coefficient (where each so-called half-bound state is counted as
half a bound state).Comment: 10 pages, 1 figure; v2: minor correction
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
Tokenization, the division of input text into input tokens, is an often
overlooked aspect of the large language model (LLM) pipeline and could be the
source of useful or harmful inductive biases. Historically, LLMs have relied on
byte pair encoding, without care to specific input domains. With the increased
use of LLMs for reasoning, various number-specific tokenization schemes have
been adopted, with popular models like LLaMa and PaLM opting for single-digit
tokenization while GPT-3.5 and GPT-4 have separate tokens for each 1-, 2-, and
3-digit numbers. In this work, we study the effect this choice has on numerical
reasoning through the use of arithmetic tasks. We consider left-to-right and
right-to-left tokenization for GPT-3.5 and -4, finding that right-to-left
tokenization (enforced by comma separating numbers at inference time) leads to
largely improved performance. Furthermore, we find that model errors when using
standard left-to-right tokenization follow stereotyped error patterns,
suggesting that model computations are systematic rather than approximate. We
show that the model is able to convert between tokenizations easily, thus
allowing chain-of-thought-inspired approaches to recover performance on
left-to-right tokenized inputs. We also find the gap between tokenization
directions decreases when models are scaled, possibly indicating that larger
models are better able to override this tokenization-dependent inductive bias.
In summary, our work performs the first study of how number tokenization
choices lead to differences in model performance on arithmetic tasks,
accompanied by a thorough analysis of error patterns. We hope this work
inspires practitioners to more carefully ablate number tokenization-related
choices when working towards general models of numerical reasoning.Comment: 21 pages, 18 figure
A Neural Architecture for Designing Truthful and Efficient Auctions
Auctions are protocols to allocate goods to buyers who have preferences over
them, and collect payments in return. Economists have invested significant
effort in designing auction rules that result in allocations of the goods that
are desirable for the group as a whole. However, for settings where
participants' valuations of the items on sale are their private information,
the rules of the auction must deter buyers from misreporting their preferences,
so as to maximize their own utility, since misreported preferences hinder the
ability for the auctioneer to allocate goods to those who want them most.
Manual auction design has yielded excellent mechanisms for specific settings,
but requires significant effort when tackling new domains. We propose a deep
learning based approach to automatically design auctions in a wide variety of
domains, shifting the design work from human to machine. We assume that
participants' valuations for the items for sale are independently sampled from
an unknown but fixed distribution. Our system receives a data-set consisting of
such valuation samples, and outputs an auction rule encoding the desired
incentive structure. We focus on producing truthful and efficient auctions that
minimize the economic burden on participants. We evaluate the auctions designed
by our framework on well-studied domains, such as multi-unit and combinatorial
auctions, showing that they outperform known auction designs in terms of the
economic burden placed on participants
Confronting Reward Model Overoptimization with Constrained RLHF
Large language models are typically aligned with human preferences by
optimizing (RMs) fitted to human feedback. However,
human preferences are multi-faceted, and it is increasingly common to derive
reward from a composition of simpler reward models which each capture a
different aspect of language quality. This itself presents a challenge, as it
is difficult to appropriately weight these component RMs when combining them.
Compounding this difficulty, because any RM is only a proxy for human
evaluation, this process is vulnerable to , wherein
past a certain point, accumulating higher reward is associated with worse human
ratings. In this paper, we perform, to our knowledge, the first study on
overoptimization in composite RMs, showing that correlation between component
RMs has a significant effect on the locations of these points. We then
introduce an approach to solve this issue using constrained reinforcement
learning as a means of preventing the agent from exceeding each RM's threshold
of usefulness. Our method addresses the problem of weighting component RMs by
learning dynamic weights, naturally expressed by Lagrange multipliers. As a
result, each RM stays within the range at which it is an effective proxy,
improving evaluation performance. Finally, we introduce an adaptive method
using gradient-free optimization to identify and optimize towards these points
during a single run
Melting Pot 2.0
Multi-agent artificial intelligence research promises a path to develop
intelligent technologies that are more human-like and more human-compatible
than those produced by "solipsistic" approaches, which do not consider
interactions between agents. Melting Pot is a research tool developed to
facilitate work on multi-agent artificial intelligence, and provides an
evaluation protocol that measures generalization to novel social partners in a
set of canonical test scenarios. Each scenario pairs a physical environment (a
"substrate") with a reference set of co-players (a "background population"), to
create a social situation with substantial interdependence between the
individuals involved. For instance, some scenarios were inspired by
institutional-economics-based accounts of natural resource management and
public-good-provision dilemmas. Others were inspired by considerations from
evolutionary biology, game theory, and artificial life. Melting Pot aims to
cover a maximally diverse set of interdependencies and incentives. It includes
the commonly-studied extreme cases of perfectly-competitive (zero-sum)
motivations and perfectly-cooperative (shared-reward) motivations, but does not
stop with them. As in real-life, a clear majority of scenarios in Melting Pot
have mixed incentives. They are neither purely competitive nor purely
cooperative and thus demand successful agents be able to navigate the resulting
ambiguity. Here we describe Melting Pot 2.0, which revises and expands on
Melting Pot. We also introduce support for scenarios with asymmetric roles, and
explain how to integrate them into the evaluation protocol. This report also
contains: (1) details of all substrates and scenarios; (2) a complete
description of all baseline algorithms and results. Our intention is for it to
serve as a reference for researchers using Melting Pot 2.0.Comment: 59 pages, 54 figures. arXiv admin note: text overlap with
arXiv:2107.0685
Quality of care for hypertension in the United States
BACKGROUND: Despite heavy recent emphasis on blood pressure (BP) control, many patients fail to meet widely accepted goals. While access and adherence to therapy certainly play a role, another potential explanation is poor quality of essential care processes (QC). Yet little is known about the relationship between QC and BP control. METHODS: We assessed QC in 12 U.S. communities by reviewing the medical records of a randomly selected group of patients for the two years preceding our study. We included patients with either a diagnosis of hypertension or two visits with BPs of ≥140/90 in their medical records. We used 28 process indicators based on explicit evidence to assess QC. The indicators covered a broad spectrum of care and were developed through a modified Delphi method. We considered patients who received all indicated care to have optimal QC. We defined control of hypertension as BP < 140/90 in the most recent reading. RESULTS: Of 1,953 hypertensive patients, only 57% received optimal care and 42% had controlled hypertension. Patients who had received optimal care were more likely to have their BP under control at the end of the study (45% vs. 35%, p = .0006). Patients were more likely to receive optimal care if they were over age 50 (76% vs. 63%, p < .0001), had diabetes (77% vs. 71%, p = .0038), coronary artery disease (87% vs. 69%, p < .0001), or hyperlipidemia (80% vs. 68%, p < .0001), and did not smoke (73% vs. 66%, p = .0005). CONCLUSIONS: Higher QC for hypertensive patients is associated with better BP control. Younger patients without cardiac risk factors are at greatest risk for poor care. Quality measurement systems like the one presented in this study can guide future quality improvement efforts