1,784 research outputs found
Few-Shot Recalibration of Language Models
Recent work has uncovered promising ways to extract well-calibrated
confidence estimates from language models (LMs), where the model's confidence
score reflects how likely it is to be correct. However, while LMs may appear
well-calibrated over broad distributions, this often hides significant
miscalibration within narrower slices (e.g., systemic over-confidence in math
can balance out systemic under-confidence in history, yielding perfect
calibration in aggregate). To attain well-calibrated confidence estimates for
any slice of a distribution, we propose a new framework for few-shot
slice-specific recalibration. Specifically, we train a recalibration model that
takes in a few unlabeled examples from any given slice and predicts a curve
that remaps confidence scores to be more accurate for that slice. Our trained
model can recalibrate for arbitrary new slices, without using any labeled data
from that slice. This enables us to identify domain-specific confidence
thresholds above which the LM's predictions can be trusted, and below which it
should abstain. Experiments show that our few-shot recalibrator consistently
outperforms existing calibration methods, for instance improving calibration
error for PaLM2-Large on MMLU by 16%, as compared to temperature scaling.Comment: preprin
The privileging of English language use in academia: critical reflections from an international doctoral seminar
In this article, we, a Canadian team of doctoral researchers, reflected on our journey during an International Doctoral Research Seminar held in Beijing in 2015. As five doctoral students and two academics, we met with our doctoral colleagues from academic institutions in Brisbane (Australia) and Beijing (China). Although we did not discuss or negotiate which language we would be using in China, we were confronted with our assumption that English would be used, and that some the participants had a lower level of English competency than expected. It was apparent that this assumption of English language use privileged some (i.e., Canadian and Australian teams) while disadvantaging others (i.e., Chinese team). This confrontation brought up questions and concerns about equity in participation. As a result, this article chronicles the Canadian team reflecting on the International Doctoral Research Seminar including our privilege of using English, and coming to the position of wanting to create a more inclusive space for all participants to engage equitably in this international collaboration. As such, our reflections in this article focused on the domination of English as a lingua franca in academic spaces, in addition to how we decided to facilitate a transcultural space for all participants to be included
Critical Reflections in International Contexts: PolyEthnographic Accounts of an International Doctoral Research Seminar
As the world becomes more globally interconnected, international partnerships, including those within higher education, have increased. In an exemplar of these international partnerships from an academic standpoint, selected doctoral students and faculty from Australian, Chinese, and Canadian universities participated in an International Doctoral Research Seminar held in China in December 2015. The objective of this seminar was to have academic debate regarding educational reform. A critical by-product of this seminar was the meaning made by the participants from this experience. This paper reviews the critical polyethnographic reflections of the Canadian participants for three salient and influential topics including the role of culture, power dynamics, and organizational systems, all in relation to this international academic partnership experience. These reflections have ramifications for future programs specifically for enhancing the international development of doctoral students under the broader umbrella of international academic partnerships
Critical reflections in international contexts: polyethnographic accounts of an international doctoral research seminar
As the world becomes more globally interconnected, international partnerships, including those within higher education, have increased. In an exemplar of these international partnerships from an academic standpoint, selected doctoral students and faculty from Australian, Chinese, and Canadian universities participated in an International Doctoral Research Seminar held in China in December 2015. The objective of this seminar was to have academic debate regarding educational reform. A critical by-product of this seminar was the meaning made by the participants from this experience. This paper reviews the critical polyethnographic reflections of the Canadian participants for three salient and influential topics including the role of culture, power dynamics, and organizational systems, all in relation to this international academic partnership experience. These reflections have ramifications for future programs specifically for enhancing the international development of doctoral students under the broader umbrella of international academic partnerships
The Privileging of English Language Use in Academia: Critical Reflections from an International Doctoral Seminar
In this article, we, a Canadian team of doctoral researchers, reflected on our journey during an International Doctoral Research Seminar held in Beijing in 2015. As five doctoral students and two academics, we met with our doctoral colleagues from academic institutions in Brisbane (Australia) and Beijing (China). Although we did not discuss or negotiate which language we would be using in China, we were confronted with our assumption that English would be used, and that some the participants had a lower level of English competency than expected. It was apparent that this assumption of English language use privileged some (i.e., Canadian and Australian teams) while disadvantaging others (i.e., Chinese team). This confrontation brought up questions and concerns about equity in participation. As a result, this article chronicles the Canadian team reflecting on the International Doctoral Research Seminar including our privilege of using English, and coming to the position of wanting to create a more inclusive space for all participants to engage equitably in this international collaboration. As such, our reflections in this article focused on the domination of English as a lingua franca in academic spaces, in addition to how we decided to facilitate a transcultural space for all participants to be included
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
Retrieval-augmented in-context learning has emerged as a powerful approach
for addressing knowledge-intensive tasks using frozen language models (LM) and
retrieval models (RM). Existing work has combined these in simple
"retrieve-then-read" pipelines in which the RM retrieves passages that are
inserted into the LM prompt. To begin to fully realize the potential of frozen
LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that
relies on passing natural language texts in sophisticated pipelines between an
LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware
demonstrations, search for relevant passages, and generate grounded
predictions, systematically breaking down problems into small transformations
that the LM and RM can handle more reliably. We have written novel DSP programs
for answering questions in open-domain, multi-hop, and conversational settings,
establishing in early evaluations new state-of-the-art in-context learning
results and delivering 37-200%, 8-40%, and 80-290% relative gains against
vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous
self-ask pipeline, respectively
Contrastive Decoding: Open-ended Text Generation as Optimization
Likelihood, although useful as a training loss, is a poor search objective
for guiding open-ended generation from language models (LMs). Existing
generation algorithms must avoid both unlikely strings, which are incoherent,
and highly likely ones, which are short and repetitive. We propose contrastive
decoding (CD), a more reliable search objective that returns the difference
between likelihood under a large LM (called the expert, e.g. OPT-13b) and a
small LM (called the amateur, e.g. OPT-125m). CD is inspired by the fact that
the failures of larger LMs are even more prevalent in smaller LMs, and that
this difference signals exactly which texts should be preferred. CD requires
zero training, and produces higher quality text than decoding from the larger
LM alone. It also generalizes across model types (OPT and GPT2) and
significantly outperforms four strong decoding algorithms in automatic and
human evaluations
- …