Search CORE

99 research outputs found

Teaching Large Language Models to Self-Debug

Author: Chen Xinyun
Lin Maxwell
Schärli Nathanael
Zhou Denny
Publication venue
Publication date: 11/04/2023
Field of study

Large language models (LLMs) have achieved impressive performance on code generation. However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose Self-Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations. In particular, we demonstrate that Self-Debugging can teach the large language model to perform rubber duck debugging; i.e., without any feedback on the code correctness or error messages, the model is able to identify its mistakes by explaining the generated code in natural language. Self-Debugging achieves the state-of-the-art performance on several code generation benchmarks, including the Spider dataset for text-to-SQL generation, TransCoder for C++-to-Python translation, and MBPP for text-to-Python generation. On the Spider benchmark where there are no unit tests to verify the correctness of predictions, Self-Debugging with code explanation consistently improves the baseline by 2-3%, and improves the prediction accuracy on problems of the hardest label by 9%. On TransCoder and MBPP where unit tests are available, Self-Debugging improves the baseline accuracy by up to 12%. Meanwhile, by leveraging feedback messages and reusing failed predictions, Self-Debugging notably improves sample efficiency, and can match or outperform baseline models that generate more than 10x candidate programs

arXiv.org e-Print Archive

Large Language Models Can Be Easily Distracted by Irrelevant Context

Author: Chen Xinyun
Chi Ed
Dohan David
Misra Kanishka
Scales Nathan
Schärli Nathanael
Shi Freda
Zhou Denny
Publication venue
Publication date: 13/02/2023
Field of study

Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context. In particular, we introduce Grade-School Math with Irrelevant Context (GSM-IC), an arithmetic reasoning dataset with irrelevant information in the problem description. We use this benchmark to measure the distractibility of cutting-edge prompting techniques for large language models, and find that the model performance is dramatically decreased when irrelevant information is included. We also identify several approaches for mitigating this deficiency, such as decoding with self-consistency and adding to the prompt an instruction that tells the language model to ignore the irrelevant information

arXiv.org e-Print Archive

Understanding Haemophilus parasuis infection in porcine spleen through a transcriptomics approach

Author: Chen Hongbo
Fang Mingdi
Li Changchun
Li Kui
Li Xinyun
Zhao Shuhong
Zhou Rui
Zhu Mengjin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background <it>Haemophilus parasuis </it>(HPS) is an important swine pathogen that causes Glässer's disease, which is characterized by fibrinous polyserositis, meningitis and arthritis. The molecular mechanisms that underlie the pathogenesis of the disease remain poorly understood, particularly the resistance of porcine immune system to HPS invasion. In this study, we investigated the global changes in gene expression in the spleen following HPS infection using the Affymetrix Porcine Genechip™. Results A total of 931 differentially expressed (DE) transcripts were identified in the porcine spleen 7 days after HPS infection; of these, 92 unique genes showed differential expression patterns based on analysis using BLASTX and Gene Ontology. The DE genes involved in the immune response included genes for inflammasomes (<it>RETN</it>, <it>S100A8</it>, <it>S100A9</it>, <it>S100A12</it>), adhesion molecules (<it>CLDN3</it>, <it>CSPG2</it>, <it>CD44</it>, <it>LGALS8</it>), transcription factors (<it>ZBTB16</it>, <it>SLC39A14</it>, <it>CEBPD</it>, <it>CEBPB</it>), acute-phase proteins and complement (<it>SAA1</it>, <it>LTF</it>, <it>HP</it>, <it>C3</it>), differentiation genes for epithelial cells and keratinocytes (<it>TGM1</it>, <it>MS4A8B</it>, <it>CSTA</it>), and genes related to antigen processing and presentation (<it>HLA-B</it>, <it>HLA-DRB1</it>). Further immunostimulation analyses indicated that mRNA levels of <it>S100A8</it>, <it>S100A9</it>, and <it>S100A12 </it>in porcine PK-15 cells increased within 48 h and were sustained after administration of lipopolysaccharide (LPS) and Poly(I:C) respectively. In addition, mapping of DE genes to porcine health traits QTL regions showed that 70 genes were distributed in 7 different known porcine QTL regions. Finally, 10 DE genes were validated by quantitative PCR. Conclusion Our findings demonstrate previously unrecognized changes in gene transcription that are associated with HPS infection <it>in vivo</it>, and many potential cascades identified in the study clearly merit further investigation. Our data provide new clues to the nature of the immune response in mammals, and we have identified candidate genes that are related to resistance to HPS.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Compositional Semantic Parsing with Large Language Models

Author: Akyürek Ekin
Bousquet Olivier
Chen Xinyun
Drozdov Andrew
Scales Nathan
Schärli Nathanael
Song Xinying
Zhou Denny
Publication venue
Publication date: 29/09/2022
Field of study

Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications.Comment: Fixed metadata. No other change

arXiv.org e-Print Archive

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

Author: Chen Xinyun
Cheng Heng-Tze
Chi Ed H.
Le Quoc V
Mishra Swaroop
Zheng Huaixiu Steven
Zhou Denny
Publication venue
Publication date: 09/10/2023
Field of study

We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide the reasoning steps, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L models and observe substantial performance gains on a wide range of challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU Physics and Chemistry by 7% and 11%, TimeQA by 27%, and MuSiQue by 7%

arXiv.org e-Print Archive

Universal Self-Consistency for Large Language Model Generation

Author: Aksitov Renat
Alon Uri
Chen Xinyun
Prakash Sushant
Ren Jie
Sutton Charles
Wang Xuezhi
Xiao Kefan
Yin Pengcheng
Zhou Denny
Publication venue
Publication date: 28/11/2023
Field of study

Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on various challenging tasks, by utilizing multiple reasoning paths sampled from large language models (LLMs). However, self-consistency relies on the answer extraction process to aggregate multiple solutions, which is not applicable to free-form answers. In this work, we propose Universal Self-Consistency (USC), which leverages LLMs themselves to select the most consistent answer among multiple candidates. We evaluate USC on a variety of benchmarks, including mathematical reasoning, code generation, long-context summarization, and open-ended question answering. On open-ended generation tasks where the original self-consistency method is not applicable, USC effectively utilizes multiple samples and improves the performance. For mathematical reasoning, USC matches the standard self-consistency performance without requiring the answer formats to be similar. Finally, without access to execution results, USC also matches the execution-based voting performance on code generation

arXiv.org e-Print Archive

The Emerging Roles of the RNA Binding Protein QKI in Cardiovascular Development and Function

Author: Cao Dayan
Chen Xinyun
Liu Ying
Shou Weinian
Xiao Deyong
Yin Jianwen
Zhou Zhongjun
Publication venue: 'Frontiers Media SA'
Publication date: 16/06/2021
Field of study

RNA binding proteins (RBPs) have a broad biological and physiological function and are critical in regulating pre-mRNA posttranscriptional processing, intracellular migration, and mRNA stability. QKI, also known as Quaking, is a member of the signal transduction and activation of RNA (STAR) family, which also belongs to the heterogeneous nuclear ribonucleoprotein K- (hnRNP K-) homology domain protein family. There are three major alternatively spliced isoforms, QKI-5, QKI-6, and QKI-7, differing in carboxy-terminal domains. They share a common RNA binding property, but each isoform can regulate pre-mRNA splicing, transportation or stability differently in a unique cell type-specific manner. Previously, QKI has been known for its important role in contributing to neurological disorders. A series of recent work has further demonstrated that QKI has important roles in much broader biological systems, such as cardiovascular development, monocyte to macrophage differentiation, bone metabolism, and cancer progression. In this mini-review, we will focus on discussing the emerging roles of QKI in regulating cardiac and vascular development and function and its potential link to cardiovascular pathophysiology

IUPUIScholarWorks

PubMed Central

Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

Author: Chen Wuyang
Chen Xinyun
Chung Hyung Won
Darrell Trevor
Du Nan
Fedus William
Hou Le
Keutzer Kurt
Li Yunxuan
Longpre Shayne
Shen Sheng
Vu Tu
Webson Albert
Wei Jason
Wu Yuexin
Yu Hongkun
Zhao Vincent
Zhou Denny
Zhou Yanqi
Zoph Barret
Publication venue
Publication date: 05/07/2023
Field of study

Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we conduct empirical studies across three experimental setups: (i) Direct finetuning on individual downstream tasks devoid of instruction tuning; (ii) Instructiontuning followed by in-context few-shot or zero-shot generalization on downstream tasks; and (iii) Instruction tuning supplemented by further finetuning on individual downstream tasks. In the first scenario, MoE models overall underperform dense models of identical computational capacity. This narrative, however, dramatically changes with the introduction of instruction tuning (second and third scenario), used independently or in conjunction with task-specific finetuning. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks, while using only a third of the FLOPs. The advancements embodied byFLAN-MOE inspire a reevaluation of the design principles of large-scale, high-performance language models in the framework of task-agnostic learning.Comment: Preprin

arXiv.org e-Print Archive

Towards interpreting recurrent neural networks through probabilistic abstraction

Author: Angluin Dana
Arthur David
Ayache Stéphane
Bahdanau Dzmitry
Bojarski Mariusz
Carlini Nicholas
Carrasco Rafael C
Chen Xinyun
Clarke Edmund M
D'Ulizia Arianna
Dalal Siddhartha R
Du Xiaoning
Ebrahimi Javid
Fawcett Tom
Gehr Timon
Gers Felix A.
Goodfellow Ian J.
Hou Bo-Jian
Huang Xiaowei
Jacobsson Henrik
Katz Guy
Lakkaraju Himabindu
Li Jinfeng
Lundberg Scott M
Ma Lei
Mao Hua
Mikolov Tomas
Oaksford Mike
Omlin Christian W
Pang Bo
Pei Kexin
Ron Dana
Ron Dana
Ruan Wenjie
Rudin Cynthia
Sammapun U
Sun Youcheng
Szegedy Christian
Tang Duyu
Wang Jingyi
Wang Jingyi
Wang Qinglong
Weiss Gail
Weiss Gail
Xu Rui
Yuan Zhenlong
Zhang Xiang
Zhou Zhi-Hua
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2020
Field of study

National Research Foundation (NRF) Singapore under its AI Singapore Programm

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University