1,784 research outputs found

    Few-Shot Recalibration of Language Models

    Full text link
    Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), where the model's confidence score reflects how likely it is to be correct. However, while LMs may appear well-calibrated over broad distributions, this often hides significant miscalibration within narrower slices (e.g., systemic over-confidence in math can balance out systemic under-confidence in history, yielding perfect calibration in aggregate). To attain well-calibrated confidence estimates for any slice of a distribution, we propose a new framework for few-shot slice-specific recalibration. Specifically, we train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice. Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice. This enables us to identify domain-specific confidence thresholds above which the LM's predictions can be trusted, and below which it should abstain. Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods, for instance improving calibration error for PaLM2-Large on MMLU by 16%, as compared to temperature scaling.Comment: preprin

    The privileging of English language use in academia: critical reflections from an international doctoral seminar

    Get PDF
    In this article, we, a Canadian team of doctoral researchers, reflected on our journey during an International Doctoral Research Seminar held in Beijing in 2015. As five doctoral students and two academics, we met with our doctoral colleagues from academic institutions in Brisbane (Australia) and Beijing (China). Although we did not discuss or negotiate which language we would be using in China, we were confronted with our assumption that English would be used, and that some the participants had a lower level of English competency than expected. It was apparent that this assumption of English language use privileged some (i.e., Canadian and Australian teams) while disadvantaging others (i.e., Chinese team). This confrontation brought up questions and concerns about equity in participation. As a result, this article chronicles the Canadian team reflecting on the International Doctoral Research Seminar including our privilege of using English, and coming to the position of wanting to create a more inclusive space for all participants to engage equitably in this international collaboration. As such, our reflections in this article focused on the domination of English as a lingua franca in academic spaces, in addition to how we decided to facilitate a transcultural space for all participants to be included

    Critical Reflections in International Contexts: PolyEthnographic Accounts of an International Doctoral Research Seminar

    Get PDF
    As the world becomes more globally interconnected, international partnerships, including those within higher education, have increased. In an exemplar of these international partnerships from an academic standpoint, selected doctoral students and faculty from Australian, Chinese, and Canadian universities participated in an International Doctoral Research Seminar held in China in December 2015. The objective of this seminar was to have academic debate regarding educational reform. A critical by-product of this seminar was the meaning made by the participants from this experience. This paper reviews the critical polyethnographic reflections of the Canadian participants for three salient and influential topics including the role of culture, power dynamics, and organizational systems, all in relation to this international academic partnership experience. These reflections have ramifications for future programs specifically for enhancing the international development of doctoral students under the broader umbrella of international academic partnerships

    Critical reflections in international contexts: polyethnographic accounts of an international doctoral research seminar

    Get PDF
    As the world becomes more globally interconnected, international partnerships, including those within higher education, have increased. In an exemplar of these international partnerships from an academic standpoint, selected doctoral students and faculty from Australian, Chinese, and Canadian universities participated in an International Doctoral Research Seminar held in China in December 2015. The objective of this seminar was to have academic debate regarding educational reform. A critical by-product of this seminar was the meaning made by the participants from this experience. This paper reviews the critical polyethnographic reflections of the Canadian participants for three salient and influential topics including the role of culture, power dynamics, and organizational systems, all in relation to this international academic partnership experience. These reflections have ramifications for future programs specifically for enhancing the international development of doctoral students under the broader umbrella of international academic partnerships

    The Privileging of English Language Use in Academia: Critical Reflections from an International Doctoral Seminar

    Get PDF
    In this article, we, a Canadian team of doctoral researchers, reflected on our journey during an International Doctoral Research Seminar held in Beijing in 2015. As five doctoral students and two academics, we met with our doctoral colleagues from academic institutions in Brisbane (Australia) and Beijing (China). Although we did not discuss or negotiate which language we would be using in China, we were confronted with our assumption that English would be used, and that some the participants had a lower level of English competency than expected. It was apparent that this assumption of English language use privileged some (i.e., Canadian and Australian teams) while disadvantaging others (i.e., Chinese team). This confrontation brought up questions and concerns about equity in participation. As a result, this article chronicles the Canadian team reflecting on the International Doctoral Research Seminar including our privilege of using English, and coming to the position of wanting to create a more inclusive space for all participants to engage equitably in this international collaboration. As such, our reflections in this article focused on the domination of English as a lingua franca in academic spaces, in addition to how we decided to facilitate a transcultural space for all participants to be included

    Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

    Full text link
    Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-200%, 8-40%, and 80-290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively

    Contrastive Decoding: Open-ended Text Generation as Optimization

    Full text link
    Likelihood, although useful as a training loss, is a poor search objective for guiding open-ended generation from language models (LMs). Existing generation algorithms must avoid both unlikely strings, which are incoherent, and highly likely ones, which are short and repetitive. We propose contrastive decoding (CD), a more reliable search objective that returns the difference between likelihood under a large LM (called the expert, e.g. OPT-13b) and a small LM (called the amateur, e.g. OPT-125m). CD is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs, and that this difference signals exactly which texts should be preferred. CD requires zero training, and produces higher quality text than decoding from the larger LM alone. It also generalizes across model types (OPT and GPT2) and significantly outperforms four strong decoding algorithms in automatic and human evaluations
    corecore