146 research outputs found
The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures
Materials science literature contains millions of materials synthesis
procedures described in unstructured natural language text. Large-scale
analysis of these synthesis procedures would facilitate deeper scientific
understanding of materials synthesis and enable automated synthesis planning.
Such analysis requires extracting structured representations of synthesis
procedures from the raw text as a first step. To facilitate the training and
evaluation of synthesis extraction models, we introduce a dataset of 230
synthesis procedures annotated by domain experts with labeled graphs that
express the semantics of the synthesis sentences. The nodes in this graph are
synthesis operations and their typed arguments, and labeled edges specify
relations between the nodes. We describe this new resource in detail and
highlight some specific challenges to annotating scientific text with shallow
semantic structure. We make the corpus available to the community to promote
further research and development of scientific information extraction systems.Comment: Accepted as a long paper at the Linguistic Annotation Workshop (LAW)
at ACL 201
Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking
There has been significant interest in zero and few-shot learning for
dialogue state tracking (DST) due to the high cost of collecting and annotating
task-oriented dialogues. Recent work has demonstrated that in-context learning
requires very little data and zero parameter updates, and even outperforms
trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST,
which advances the state of the art with three advancements to in-context
learning for DST. First, we formulate DST as a Python programming task,
explicitly modeling language coreference as variable reference in Python.
Second, since in-context learning depends highly on the context examples, we
propose a method to retrieve a diverse set of relevant examples to improve
performance. Finally, we introduce a novel re-weighting method during decoding
that takes into account probabilities of competing surface forms, and produces
a more accurate dialogue state prediction. We evaluate our approach using
MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero
and few-shot settings.Comment: 14 pages, 2 figures, to appear in Findings of the ACL 202
Forming Trees with Treeformers
Popular models such as Transformers and LSTMs use tokens as its unit of
information. That is, each token is encoded into a vector representation, and
those vectors are used directly in a computation. However, humans frequently
consider spans of tokens (i.e., phrases) instead of their constituent tokens.
In this paper we introduce Treeformer, an architecture inspired by the CKY
algorithm and Transformer which learns a composition operator and pooling
function in order to construct hierarchical encodings for phrases and
sentences. Our extensive experiments demonstrate the benefits of incorporating
a hierarchical structure into the Transformer, and show significant
improvements compared to a baseline Transformer in machine translation,
abstractive summarization, and various natural language understanding tasks
Task Contamination: Language Models May Not Be Few-Shot Anymore
Large language models (LLMs) offer impressive performance in various
zero-shot and few-shot tasks. However, their success in zero-shot and few-shot
settings may be affected by task contamination, a potential limitation that has
not been thoroughly examined. This paper investigates how zero-shot and
few-shot performance of LLMs has changed chronologically over time. Utilizing
GPT-3 series models and several other recent open-sourced LLMs, and controlling
for dataset difficulty, we find that on datasets released before the LLM
training data creation date, LLMs perform surprisingly better than on datasets
released after. This strongly indicates that, for many LLMs, there exists task
contamination on zero-shot and few-shot evaluation for datasets released prior
to the LLMs' training data creation date. Additionally, we utilize training
data inspection, task example extraction, and a membership inference attack,
which reveal further evidence of task contamination. Importantly, we find that
for classification tasks with no possibility of task contamination, LLMs rarely
demonstrate statistically significant improvements over simple majority
baselines, in both zero and few-shot settings.Comment: Accepted by AAAI 202
Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts
Figurative language is commonplace in natural language, and while making
communication memorable and creative, can be difficult to understand. In this
work, we investigate the robustness of Question Answering (QA) models on
figurative text. Yes/no questions, in particular, are a useful probe of
figurative language understanding capabilities of large language models. We
propose FigurativeQA, a set of 1000 yes/no questions with figurative and
non-figurative contexts, extracted from the domains of restaurant and product
reviews. We show that state-of-the-art BERT-based QA models exhibit an average
performance drop of up to 15\% points when answering questions from figurative
contexts, as compared to non-figurative ones. While models like GPT-3 and
ChatGPT are better at handling figurative texts, we show that further
performance gains can be achieved by automatically simplifying the figurative
contexts into their non-figurative (literal) counterparts. We find that the
best overall model is ChatGPT with chain-of-thought prompting to generate
non-figurative contexts. Our work provides a promising direction for building
more robust QA models with figurative language understanding capabilities.Comment: Accepted at RANLP 202
Structural Prediction and Mutational Analysis of the Gifsy-1 Xis Protein
<p>Abstract</p> <p>Background</p> <p>The <it>Gifsy-1 </it>phage integrates into the <it>Salmonella </it>Typhimurium chromosome via an integrase mediated, site-specific recombination mechanism. Excision of the <it>Gifsy-1 </it>phage requires three proteins, the <it>Gifsy-1 </it>integrase (Int), the <it>Gifsy-1 </it>excisionase (Xis) protein, and host encoded Integration Host Factor (IHF). The <it>Gifsy-1 xis </it>gene encodes the 94-residue <it>Gifsy-1 </it>excisionase protein that has a molecular weight of 11.2 kDa and a pI of 10.2. Electrophoretic Mobility Shift Assays (EMSA) suggested at least one region of the protein is responsible for protein-DNA interactions with a tripartite DNA binding site composed of three direct imperfect repeats.</p> <p>Results</p> <p>Here we have undertaken experiments to dissect and model the structural motifs of <it>Gifsy-1 </it>Xis necessary for its observed DNA binding activity. Diethyl sulfate mutagenesis (DES) and mutagenic PCR techniques were used to generate <it>Gifsy-1 xis </it>mutants. Mutant Xis proteins that lacked activity in vivo were purified and tested by EMSA for binding to the <it>Gifsy-1 </it>Xis <it>attP </it>attachment site. Results from mutagenesis experiments and EMSA were compared to results of structural predictions and sequence analyses.</p> <p>Conclusion</p> <p>Sequence comparisons revealed evidence for three distinct structural motifs in the <it>Gifsy-1 </it>Xis protein. Multiple sequence alignments revealed unexpected homologies between the <it>Gifsy-1 </it>Xis protein and two distinct subsets of polynucleotide binding proteins. Our data may suggest a role for the <it>Gifsy-1 </it>Xis in the regulation of the <it>Gifsy-1 </it>phage excision beyond that of DNA binding and possible interactions with the <it>Gifsy-1 </it>Int protein.</p
Understanding the Role of Optimization in Double Descent
The phenomenon of model-wise double descent, where the test error peaks and
then reduces as the model size increases, is an interesting topic that has
attracted the attention of researchers due to the striking observed gap between
theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double
descent has been observed in various tasks and architectures, the peak of
double descent can sometimes be noticeably absent or diminished, even without
explicit regularization, such as weight decay and early stopping. In this
paper, we investigate this intriguing phenomenon from the optimization
perspective and propose a simple optimization-based explanation for why double
descent sometimes occurs weakly or not at all. To the best of our knowledge, we
are the first to demonstrate that many disparate factors contributing to
model-wise double descent (initialization, normalization, batch size, learning
rate, optimization algorithm) are unified from the viewpoint of optimization:
model-wise double descent is observed if and only if the optimizer can find a
sufficiently low-loss minimum. These factors directly affect the condition
number of the optimization problem or the optimizer and thus affect the final
minimum found by the optimizer, reducing or increasing the height of the double
descent peak. We conduct a series of controlled experiments on random feature
models and two-layer neural networks under various optimization settings,
demonstrating this optimization-based unified view. Our results suggest the
following implication: Double descent is unlikely to be a problem for
real-world machine learning setups. Additionally, our results help explain the
gap between weak double descent peaks in practice and strong peaks observable
in carefully designed setups.Comment: NeurIPS Workshop 2023 Optimization for Machine Learnin
The Logic of AMR: Practical, Unified, Graph-Based Sentence Semantics for NLP
The Abstract Meaning Representation formalism is rapidly emerging as an important practical form of structured sentence semantics which, thanks to the availability of largescale annotated corpora, has potential as a convergence point for NLP research. This tutorial unmasks the design philosophy, data creation process, and existing algorithms for AMR semantics. It is intended for anyone interested in working with AMR data, including parsing text into AMRs, generating text from AMRs, and applying AMRs to tasks such as machine translation and summarization. The goals of this tutorial are twofold. First, it will describe the nature and design principles behind the representation, and demonstrate that it can be practical for annotation. In Part I: The AMR Formalism, participants will be coached in the basics of annotation so that, when working with AMR data in the future, they will appreciate the benefits and limitations of the process by which it was created. Second, the tutorial will survey the state of the art for computation with AMRs. Part II: Algorithms and Applications will focus on the task of parsing English text into AMR graphs, which requires algorithms for alignment, for structured prediction, and for statistical learning. The tutorial will also address graph grammar formalisms that have been recently developed, and future applications such as AMR-based machine translation and summarization. Participants with laptops are encouraged to bring them to the tutorial
- …
