99 research outputs found

    Teaching Large Language Models to Self-Debug

    Full text link
    Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose Self-Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations. In particular, we demonstrate that Self-Debugging can teach the large language model to perform rubber duck debugging; i.e., without any feedback on the code correctness or error messages, the model is able to identify its mistakes by explaining the generated code in natural language. Self-Debugging achieves the state-of-the-art performance on several code generation benchmarks, including the Spider dataset for text-to-SQL generation, TransCoder for C++-to-Python translation, and MBPP for text-to-Python generation. On the Spider benchmark where there are no unit tests to verify the correctness of predictions, Self-Debugging with code explanation consistently improves the baseline by 2-3%, and improves the prediction accuracy on problems of the hardest label by 9%. On TransCoder and MBPP where unit tests are available, Self-Debugging improves the baseline accuracy by up to 12%. Meanwhile, by leveraging feedback messages and reusing failed predictions, Self-Debugging notably improves sample efficiency, and can match or outperform baseline models that generate more than 10x candidate programs

    Large Language Models Can Be Easily Distracted by Irrelevant Context

    Full text link
    Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context. In particular, we introduce Grade-School Math with Irrelevant Context (GSM-IC), an arithmetic reasoning dataset with irrelevant information in the problem description. We use this benchmark to measure the distractibility of cutting-edge prompting techniques for large language models, and find that the model performance is dramatically decreased when irrelevant information is included. We also identify several approaches for mitigating this deficiency, such as decoding with self-consistency and adding to the prompt an instruction that tells the language model to ignore the irrelevant information

    Understanding Haemophilus parasuis infection in porcine spleen through a transcriptomics approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Haemophilus parasuis </it>(HPS) is an important swine pathogen that causes Glässer's disease, which is characterized by fibrinous polyserositis, meningitis and arthritis. The molecular mechanisms that underlie the pathogenesis of the disease remain poorly understood, particularly the resistance of porcine immune system to HPS invasion. In this study, we investigated the global changes in gene expression in the spleen following HPS infection using the Affymetrix Porcine Genechip™.</p> <p>Results</p> <p>A total of 931 differentially expressed (DE) transcripts were identified in the porcine spleen 7 days after HPS infection; of these, 92 unique genes showed differential expression patterns based on analysis using BLASTX and Gene Ontology. The DE genes involved in the immune response included genes for inflammasomes (<it>RETN</it>, <it>S100A8</it>, <it>S100A9</it>, <it>S100A12</it>), adhesion molecules (<it>CLDN3</it>, <it>CSPG2</it>, <it>CD44</it>, <it>LGALS8</it>), transcription factors (<it>ZBTB16</it>, <it>SLC39A14</it>, <it>CEBPD</it>, <it>CEBPB</it>), acute-phase proteins and complement (<it>SAA1</it>, <it>LTF</it>, <it>HP</it>, <it>C3</it>), differentiation genes for epithelial cells and keratinocytes (<it>TGM1</it>, <it>MS4A8B</it>, <it>CSTA</it>), and genes related to antigen processing and presentation (<it>HLA-B</it>, <it>HLA-DRB1</it>). Further immunostimulation analyses indicated that mRNA levels of <it>S100A8</it>, <it>S100A9</it>, and <it>S100A12 </it>in porcine PK-15 cells increased within 48 h and were sustained after administration of lipopolysaccharide (LPS) and Poly(I:C) respectively. In addition, mapping of DE genes to porcine health traits QTL regions showed that 70 genes were distributed in 7 different known porcine QTL regions. Finally, 10 DE genes were validated by quantitative PCR.</p> <p>Conclusion</p> <p>Our findings demonstrate previously unrecognized changes in gene transcription that are associated with HPS infection <it>in vivo</it>, and many potential cascades identified in the study clearly merit further investigation. Our data provide new clues to the nature of the immune response in mammals, and we have identified candidate genes that are related to resistance to HPS.</p

    Compositional Semantic Parsing with Large Language Models

    Full text link
    Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications.Comment: Fixed metadata. No other change

    Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

    Full text link
    We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide the reasoning steps, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L models and observe substantial performance gains on a wide range of challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU Physics and Chemistry by 7% and 11%, TimeQA by 27%, and MuSiQue by 7%

    Universal Self-Consistency for Large Language Model Generation

    Full text link
    Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on various challenging tasks, by utilizing multiple reasoning paths sampled from large language models (LLMs). However, self-consistency relies on the answer extraction process to aggregate multiple solutions, which is not applicable to free-form answers. In this work, we propose Universal Self-Consistency (USC), which leverages LLMs themselves to select the most consistent answer among multiple candidates. We evaluate USC on a variety of benchmarks, including mathematical reasoning, code generation, long-context summarization, and open-ended question answering. On open-ended generation tasks where the original self-consistency method is not applicable, USC effectively utilizes multiple samples and improves the performance. For mathematical reasoning, USC matches the standard self-consistency performance without requiring the answer formats to be similar. Finally, without access to execution results, USC also matches the execution-based voting performance on code generation

    The Emerging Roles of the RNA Binding Protein QKI in Cardiovascular Development and Function

    Get PDF
    RNA binding proteins (RBPs) have a broad biological and physiological function and are critical in regulating pre-mRNA posttranscriptional processing, intracellular migration, and mRNA stability. QKI, also known as Quaking, is a member of the signal transduction and activation of RNA (STAR) family, which also belongs to the heterogeneous nuclear ribonucleoprotein K- (hnRNP K-) homology domain protein family. There are three major alternatively spliced isoforms, QKI-5, QKI-6, and QKI-7, differing in carboxy-terminal domains. They share a common RNA binding property, but each isoform can regulate pre-mRNA splicing, transportation or stability differently in a unique cell type-specific manner. Previously, QKI has been known for its important role in contributing to neurological disorders. A series of recent work has further demonstrated that QKI has important roles in much broader biological systems, such as cardiovascular development, monocyte to macrophage differentiation, bone metabolism, and cancer progression. In this mini-review, we will focus on discussing the emerging roles of QKI in regulating cardiac and vascular development and function and its potential link to cardiovascular pathophysiology

    Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

    Full text link
    Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we conduct empirical studies across three experimental setups: (i) Direct finetuning on individual downstream tasks devoid of instruction tuning; (ii) Instructiontuning followed by in-context few-shot or zero-shot generalization on downstream tasks; and (iii) Instruction tuning supplemented by further finetuning on individual downstream tasks. In the first scenario, MoE models overall underperform dense models of identical computational capacity. This narrative, however, dramatically changes with the introduction of instruction tuning (second and third scenario), used independently or in conjunction with task-specific finetuning. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks, while using only a third of the FLOPs. The advancements embodied byFLAN-MOE inspire a reevaluation of the design principles of large-scale, high-performance language models in the framework of task-agnostic learning.Comment: Preprin
    • …
    corecore