146 research outputs found

    The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

    Full text link
    Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.Comment: Accepted as a long paper at the Linguistic Annotation Workshop (LAW) at ACL 201

    Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking

    Full text link
    There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for DST. First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python. Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance. Finally, we introduce a novel re-weighting method during decoding that takes into account probabilities of competing surface forms, and produces a more accurate dialogue state prediction. We evaluate our approach using MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero and few-shot settings.Comment: 14 pages, 2 figures, to appear in Findings of the ACL 202

    Forming Trees with Treeformers

    Full text link
    Popular models such as Transformers and LSTMs use tokens as its unit of information. That is, each token is encoded into a vector representation, and those vectors are used directly in a computation. However, humans frequently consider spans of tokens (i.e., phrases) instead of their constituent tokens. In this paper we introduce Treeformer, an architecture inspired by the CKY algorithm and Transformer which learns a composition operator and pooling function in order to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating a hierarchical structure into the Transformer, and show significant improvements compared to a baseline Transformer in machine translation, abstractive summarization, and various natural language understanding tasks

    Task Contamination: Language Models May Not Be Few-Shot Anymore

    Full text link
    Large language models (LLMs) offer impressive performance in various zero-shot and few-shot tasks. However, their success in zero-shot and few-shot settings may be affected by task contamination, a potential limitation that has not been thoroughly examined. This paper investigates how zero-shot and few-shot performance of LLMs has changed chronologically over time. Utilizing GPT-3 series models and several other recent open-sourced LLMs, and controlling for dataset difficulty, we find that on datasets released before the LLM training data creation date, LLMs perform surprisingly better than on datasets released after. This strongly indicates that, for many LLMs, there exists task contamination on zero-shot and few-shot evaluation for datasets released prior to the LLMs' training data creation date. Additionally, we utilize training data inspection, task example extraction, and a membership inference attack, which reveal further evidence of task contamination. Importantly, we find that for classification tasks with no possibility of task contamination, LLMs rarely demonstrate statistically significant improvements over simple majority baselines, in both zero and few-shot settings.Comment: Accepted by AAAI 202

    Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts

    Full text link
    Figurative language is commonplace in natural language, and while making communication memorable and creative, can be difficult to understand. In this work, we investigate the robustness of Question Answering (QA) models on figurative text. Yes/no questions, in particular, are a useful probe of figurative language understanding capabilities of large language models. We propose FigurativeQA, a set of 1000 yes/no questions with figurative and non-figurative contexts, extracted from the domains of restaurant and product reviews. We show that state-of-the-art BERT-based QA models exhibit an average performance drop of up to 15\% points when answering questions from figurative contexts, as compared to non-figurative ones. While models like GPT-3 and ChatGPT are better at handling figurative texts, we show that further performance gains can be achieved by automatically simplifying the figurative contexts into their non-figurative (literal) counterparts. We find that the best overall model is ChatGPT with chain-of-thought prompting to generate non-figurative contexts. Our work provides a promising direction for building more robust QA models with figurative language understanding capabilities.Comment: Accepted at RANLP 202

    Structural Prediction and Mutational Analysis of the Gifsy-1 Xis Protein

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The <it>Gifsy-1 </it>phage integrates into the <it>Salmonella </it>Typhimurium chromosome via an integrase mediated, site-specific recombination mechanism. Excision of the <it>Gifsy-1 </it>phage requires three proteins, the <it>Gifsy-1 </it>integrase (Int), the <it>Gifsy-1 </it>excisionase (Xis) protein, and host encoded Integration Host Factor (IHF). The <it>Gifsy-1 xis </it>gene encodes the 94-residue <it>Gifsy-1 </it>excisionase protein that has a molecular weight of 11.2 kDa and a pI of 10.2. Electrophoretic Mobility Shift Assays (EMSA) suggested at least one region of the protein is responsible for protein-DNA interactions with a tripartite DNA binding site composed of three direct imperfect repeats.</p> <p>Results</p> <p>Here we have undertaken experiments to dissect and model the structural motifs of <it>Gifsy-1 </it>Xis necessary for its observed DNA binding activity. Diethyl sulfate mutagenesis (DES) and mutagenic PCR techniques were used to generate <it>Gifsy-1 xis </it>mutants. Mutant Xis proteins that lacked activity in vivo were purified and tested by EMSA for binding to the <it>Gifsy-1 </it>Xis <it>attP </it>attachment site. Results from mutagenesis experiments and EMSA were compared to results of structural predictions and sequence analyses.</p> <p>Conclusion</p> <p>Sequence comparisons revealed evidence for three distinct structural motifs in the <it>Gifsy-1 </it>Xis protein. Multiple sequence alignments revealed unexpected homologies between the <it>Gifsy-1 </it>Xis protein and two distinct subsets of polynucleotide binding proteins. Our data may suggest a role for the <it>Gifsy-1 </it>Xis in the regulation of the <it>Gifsy-1 </it>phage excision beyond that of DNA binding and possible interactions with the <it>Gifsy-1 </it>Int protein.</p

    Understanding the Role of Optimization in Double Descent

    Full text link
    The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double descent has been observed in various tasks and architectures, the peak of double descent can sometimes be noticeably absent or diminished, even without explicit regularization, such as weight decay and early stopping. In this paper, we investigate this intriguing phenomenon from the optimization perspective and propose a simple optimization-based explanation for why double descent sometimes occurs weakly or not at all. To the best of our knowledge, we are the first to demonstrate that many disparate factors contributing to model-wise double descent (initialization, normalization, batch size, learning rate, optimization algorithm) are unified from the viewpoint of optimization: model-wise double descent is observed if and only if the optimizer can find a sufficiently low-loss minimum. These factors directly affect the condition number of the optimization problem or the optimizer and thus affect the final minimum found by the optimizer, reducing or increasing the height of the double descent peak. We conduct a series of controlled experiments on random feature models and two-layer neural networks under various optimization settings, demonstrating this optimization-based unified view. Our results suggest the following implication: Double descent is unlikely to be a problem for real-world machine learning setups. Additionally, our results help explain the gap between weak double descent peaks in practice and strong peaks observable in carefully designed setups.Comment: NeurIPS Workshop 2023 Optimization for Machine Learnin

    The Logic of AMR: Practical, Unified, Graph-Based Sentence Semantics for NLP

    Get PDF
    The Abstract Meaning Representation formalism is rapidly emerging as an important practical form of structured sentence semantics which, thanks to the availability of largescale annotated corpora, has potential as a convergence point for NLP research. This tutorial unmasks the design philosophy, data creation process, and existing algorithms for AMR semantics. It is intended for anyone interested in working with AMR data, including parsing text into AMRs, generating text from AMRs, and applying AMRs to tasks such as machine translation and summarization. The goals of this tutorial are twofold. First, it will describe the nature and design principles behind the representation, and demonstrate that it can be practical for annotation. In Part I: The AMR Formalism, participants will be coached in the basics of annotation so that, when working with AMR data in the future, they will appreciate the benefits and limitations of the process by which it was created. Second, the tutorial will survey the state of the art for computation with AMRs. Part II: Algorithms and Applications will focus on the task of parsing English text into AMR graphs, which requires algorithms for alignment, for structured prediction, and for statistical learning. The tutorial will also address graph grammar formalisms that have been recently developed, and future applications such as AMR-based machine translation and summarization. Participants with laptops are encouraged to bring them to the tutorial
    corecore