543 research outputs found
The Optimal Double Bubble for Density
In 1993 Foisy et al. proved that the optimal Euclidean planar double
bubble---the least-perimeter way to enclose and separate two given areas---is
three circular arcs meeting at 120 degrees. We consider the plane with density
, joining the surge of research on manifolds with density after their
appearance in Perelman's 2006 proof of the Poincar\'e Conjecture. Dahlberg et
al. proved that the best single bubble in the plane with density is a
circle through the origin. We conjecture that the best double bubble is the
Euclidean solution with one of the vertices at the origin, for which we have
verified equilibrium (first variation or "first derivative" zero). To prove the
exterior of the minimizer connected, it would suffice to show that least
perimeter is increasing as a function of the prescribed areas. We give the
first direct proof of such monotonicity for the Euclidean case. Such arguments
were important in the 2002 Annals proof of the double bubble in Euclidean
3-spaceComment: 15 pages, 10 figure
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
In-context learning (ICL) is now a common method for supervising large
language models (LLMs): given labeled examples in the input context, the LLM
learns to perform the task without weight updates. Despite ICL's prevalence and
utility, we understand little about whether models supervised in this manner
represent the underlying structure of their tasks, rather than superficial
heuristics that only generalize to identically distributed examples. In this
study, we investigate the robustness of LLMs supervised via ICL using the test
case of sensitivity to syntax, which is a prerequisite for robust language
understanding. Our experiments are based on two simple and well-controlled
syntactic transformations tasks, where correct out-of-distribution
generalization requires an accurate syntactic analysis of the input. We further
investigate whether out-of-distribution generalization can be improved via
chain-of-thought prompting, where the model is provided with a sequence of
intermediate computation steps that illustrate how the task ought to be
performed. In experiments with models from the GPT, PaLM, and Llama 2 families,
we find large variance across LMs on this fundamental linguistic phenomenon,
and that the variance is explained more by the composition of the pre-training
corpus and supervision methods than by model size. In particular, we find
evidence that models pre-trained on code generalize better, and benefit to a
greater extent from chain-of-thought prompting
Microwave Properties of Ice-Phase Hydrometeors for Radar and Radiometers: Sensitivity to Model Assumptions
A simplied framework is presented for assessing the qualitative sensitivities of computed microwave properties, satellite brightness temperatures, and radar reflectivities to assumptions concerning the physical properties of ice-phase hydrometeors. Properties considered included the shape parameter of a gamma size distribution andthe melted-equivalent mass median diameter D0, the particle density, dielectric mixing formula, and the choice of complex index of refraction for ice. We examine these properties at selected radiometer frequencies of 18.7, 36.5, 89.0, and 150.0 GHz; and radar frequencies at 2.8, 13.4, 35.6, and 94.0 GHz consistent with existing and planned remote sensing instruments. Passive and active microwave observables of ice particles arefound to be extremely sensitive to the melted-equivalent mass median diameter D0 ofthe size distribution. Similar large sensitivities are found for variations in the ice vol-ume fraction whenever the geometric mass median diameter exceeds approximately 1/8th of the wavelength. At 94 GHz the two-way path integrated attenuation is potentially large for dense compact particles. The distribution parameter mu has a relatively weak effect on any observable: less than 1-2 K in brightness temperature and up to 2.7 dB difference in the effective radar reflectivity. Reversal of the roles of ice and air in the MaxwellGarnett dielectric mixing formula leads to a signicant change in both microwave brightness temperature (10 K) and radar reflectivity (2 dB). The choice of Warren (1984) or Warren and Brandt (2008) for the complex index of refraction of ice can produce a 3%-4% change in the brightness temperature depression
The Impact of Depth and Width on Transformer Language Model Generalization
To process novel sentences, language models (LMs) must generalize
compositionally -- combine familiar elements in new ways. What aspects of a
model's structure promote compositional generalization? Focusing on
transformers, we test the hypothesis, motivated by recent theoretical and
empirical work, that transformers generalize more compositionally when they are
deeper (have more layers). Because simply adding layers increases the total
number of parameters, confounding depth and size, we construct three classes of
models which trade off depth for width such that the total number of parameters
is kept constant (41M, 134M and 374M parameters). We pretrain all models as LMs
and fine-tune them on tasks that test for compositional generalization. We
report three main conclusions: (1) after fine-tuning, deeper models generalize
better out-of-distribution than shallower models do, but the relative benefit
of additional layers diminishes rapidly; (2) within each family, deeper models
show better language modeling performance, but returns are similarly
diminishing; (3) the benefits of depth for compositional generalization cannot
be attributed solely to better performance on language modeling or on
in-distribution data
Multisensor Observation and Simulation of Snowfall During the 2003 Wakasa Bay Field Experiment
This research seeks to assess and improve the accuracy of microphysical assumptions used in satellite passive microwave radiative transfer models and retrieval algorithms by exploiting complementary observations from satellite radiometers, such as TRMM/AMSR-E/GPM, and coincident aircraft instruments, such as the next generation precipitation radar (PR-2). We focus in particular on aircraft data obtained during the Wakasa Bay field experiment, Japan 2003, pertaining to surface snowfall events. The observations of vertical profiles of reflectivity and Doppler-derived fall speeds are used in conjunction with the radiometric measurements to identify 1-D profiles of precipitation particle types, sizes, and concentrations that are consistent with the observations
Debate Helps Supervise Unreliable Experts
As AI systems are used to answer more difficult questions and potentially
help create new knowledge, judging the truthfulness of their outputs becomes
more difficult and more important. How can we supervise unreliable experts,
which have access to the truth but may not accurately report it, to give
answers that are systematically true and don't just superficially seem true,
when the supervisor can't tell the difference between the two on their own? In
this work, we show that debate between two unreliable experts can help a
non-expert judge more reliably identify the truth. We collect a dataset of
human-written debates on hard reading comprehension questions where the judge
has not read the source passage, only ever seeing expert arguments and short
quotes selectively revealed by 'expert' debaters who have access to the
passage. In our debates, one expert argues for the correct answer, and the
other for an incorrect answer. Comparing debate to a baseline we call
consultancy, where a single expert argues for only one answer which is correct
half of the time, we find that debate performs significantly better, with 84%
judge accuracy compared to consultancy's 74%. Debates are also more efficient,
being 68% of the length of consultancies. By comparing human to AI debaters, we
find evidence that with more skilled (in this case, human) debaters, the
performance of debate goes up but the performance of consultancy goes down. Our
error analysis also supports this trend, with 46% of errors in human debate
attributable to mistakes by the honest debater (which should go away with
increased skill); whereas 52% of errors in human consultancy are due to
debaters obfuscating the relevant evidence from the judge (which should become
worse with increased skill). Overall, these results show that debate is a
promising approach for supervising increasingly capable but potentially
unreliable AI systems.Comment: 84 pages, 13 footnotes, 5 figures, 4 tables, 28 debate transcripts;
data and code at
https://github.com/julianmichael/debate/tree/2023-nyu-experiment
- …