28 research outputs found
Are Emergent Abilities of Large Language Models a Mirage?
Recent work claims that large language models display emergent abilities,
abilities not present in smaller-scale models that are present in larger-scale
models. What makes emergent abilities intriguing is two-fold: their sharpness,
transitioning seemingly instantaneously from not present to present, and their
unpredictability, appearing at seemingly unforeseeable model scales. Here, we
present an alternative explanation for emergent abilities: that for a
particular task and model family, when analyzing fixed model outputs, one can
choose a metric which leads to the inference of an emergent ability or another
metric which does not. Thus, our alternative suggests that existing claims of
emergent abilities are creations of the researcher's analyses, not fundamental
changes in model behavior on specific tasks with scale. We present our
explanation in a simple mathematical model, then test it in three complementary
ways: we (1) make, test and confirm three predictions on the effect of metric
choice using the InstructGPT/GPT-3 family on tasks with claimed emergent
abilities, (2) make, test and confirm two predictions about metric choices in a
meta-analysis of emergent abilities on BIG-Bench; and (3) show how similar
metric decisions suggest apparent emergent abilities on vision tasks in diverse
deep network architectures (convolutional, autoencoder, transformers). In all
three analyses, we find strong supporting evidence that emergent abilities may
not be a fundamental property of scaling AI models
Latent Multimodal Functional Graphical Model Estimation
Joint multimodal functional data acquisition, where functional data from
multiple modes are measured simultaneously from the same subject, has emerged
as an exciting modern approach enabled by recent engineering breakthroughs in
the neurological and biological sciences. One prominent motivation to acquire
such data is to enable new discoveries of the underlying connectivity by
combining multimodal signals. Despite the scientific interest, there remains a
gap in principled statistical methods for estimating the graph underlying
multimodal functional data. To this end, we propose a new integrative framework
that models the data generation process and identifies operators mapping from
the observation space to the latent space. We then develop an estimator that
simultaneously estimates the transformation operators and the latent graph.
This estimator is based on the partial correlation operator, which we
rigorously extend from the multivariate to the functional setting. Our
procedure is provably efficient, with the estimator converging to a stationary
point with quantifiable statistical error. Furthermore, we show recovery of the
latent graph under mild conditions. Our work is applied to analyze
simultaneously acquired multimodal brain imaging data where the graph indicates
functional connectivity of the brain. We present simulation and empirical
results that support the benefits of joint estimation
Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions
We study the design of loss functions for click-through rates (CTR) to
optimize (social) welfare in advertising auctions. Existing works either only
focus on CTR predictions without consideration of business objectives (e.g.,
welfare) in auctions or assume that the distribution over the participants'
expected cost-per-impression (eCPM) is known a priori, then use various
additional assumptions on the parametric form of the distribution to derive
loss functions for predicting CTRs. In this work, we bring back the welfare
objectives of ad auctions into CTR predictions and propose a novel weighted
rankloss to train the CTR model. Compared to existing literature, our approach
provides a provable guarantee on welfare but without assumptions on the eCPMs'
distribution while also avoiding the intractability of naively applying
existing learning-to-rank methods. Further, we propose a theoretically
justifiable technique for calibrating the losses using labels generated from a
teacher network, only assuming that the teacher network has bounded
generalization error. Finally, we demonstrate the advantages of the proposed
loss on synthetic and real-world data.Comment: 25 pages, 6 figure