35 research outputs found
New Resources and Perspectives for Biomedical Event Extraction
Event extraction is a major focus of recent work in biomedical information extraction. Despite substantial advances, many challenges still remain for reliable automatic extraction of events from text. We introduce a new biomedical event extraction resource consisting of analyses automatically created by systems participating in the recent BioNLP Shared Task (ST) 2011. In providing for the first time the outputs of a broad set of state-ofthe-art event extraction systems, this resource opens many new opportunities for studying aspects of event extraction, from the identification of common errors to the study of effective approaches to combining the strengths of systems. We demonstrate these opportunities through a multi-system analysis on three BioNLP ST 2011 main tasks, focusing on events that none of the systems can successfully extract. We further argue for new perspectives to the performance evaluation of domain event extraction systems, considering a document-level, âoff-the-page â representation and evaluation to complement the mentionlevel evaluations pursued in most recent work.
Learning to Reason with Adaptive Computation
Multi-hop inference is necessary for machine learning systems to successfully solve tasks such as Recognising Textual Entailment and Machine Reading. In this work, we demonstrate the effectiveness of adaptive computation for learning the number of inference steps required for examples of different complexity and that learning the correct number of inference steps is difficult. We introduce the first model involving Adaptive Computation Time which provides a small performance benefit on top of a similar model without an adaptive component as well as enabling considerable insight into the reasoning process of the model
brat: a Web-based Tool for NLP-Assisted Text Annotation
We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. We discuss several case studies of real-world annotation projects using pre-release versions of BRAT and present an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15 % decrease in total annotation time. BRAT is available under an opensource license from
Using Natural Language Explanations to Improve Robustness of In-context Learning
Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recent works show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. We prompt LLMs with a small set of human-generated NLEs to produce further NLEs, yielding more accurate results than both a zero-shot-ICL setting and using only human-generated NLEs. Our results on five popular LLMs (GPT3.5-turbo, Llama2, Vicuna, Zephyr, and Mistral) show that our approach yields over 6% improvement over baseline approaches for eight adversarial datasets: HANS, ISCS, NaN, ST, PICD, PISP, ANLI, and PAWS. Furthermore, previous studies have demonstrated that prompt selection strategies significantly enhance ICL on in-distribution test sets. However, our findings reveal that these strategies do not match the efficacy of our approach for robustness evaluations, resulting in an accuracy drop of 8% compared to the proposed approach
Learning to Generate Textual Data
To learn text understanding models with millions of parameters one needs massive amounts of data. In this work, we argue that generating data can compensate for this need. While defining generic data generators is dif-
ficult, we propose to allow generators to be âweaklyâ specified in the sense that a set of parameters controls how the data is generated. Consider for example generators where the example templates, grammar, and/or vocabulary
is determined by this set of parameters. Instead of manually tuning these parameters, we learn them from the limited training data at our disposal. To achieve this, we derive an efficient algorithm called GENERE that jointly
estimates the parameters of the model and the undetermined generation parameters. We illustrate its benefits by learning to solve math exam questions using a highly parametrized sequence-to-sequence neural network
Neural architectures for fine-grained entity type classification
In this work, we investigate several neural network architectures for fine-grained entity type classification and make three key contributions. Despite being a natural comparison and addition, previous work on attentive neural architectures have not considered hand-crafted features and we combine these with learnt features and establish that they complement each other. Additionally, through quantitative analysis we establish that the attention mechanism learns to attend over syntactic heads and the phrase containing the mention, both of which are known to be strong hand-crafted features for our task. We introduce parameter sharing between labels through a hierarchical encoding method, that in lowdimensional projections show clear clusters for each type hierarchy. Lastly, despite using the same evaluation dataset, the literature frequently compare models trained using different data. We demonstrate that the choice of training data has a drastic impact on performance, which decreases by as much as 9.85% loose micro F1 score for a previously proposed method. Despite this discrepancy, our best model achieves state-of-the-art results with 75.36% loose micro F1 score on the well-established FIGER (GOLD) dataset and we report the best results for models trained using publicly available data for the OntoNotes dataset with 64.93% loose micro F1 score
Controllable Abstractive Dialogue Summarization with Sketch Supervision
In this paper, we aim to improve abstractive dialogue summarization quality and, at the same time, enable granularity control. Our model has two primary components and stages: 1) a two-stage generation strategy that generates a preliminary summary sketch serving as the basis for the final summary. This summary sketch provides a weakly supervised signal in the form of pseudo-labeled interrogative pronoun categories and key phrases extracted using a constituency parser. 2) A simple strategy to control the granularity of the final summary, in that our model can automatically determine or control the number of generated summary sentences for a given dialogue by predicting and highlighting different text spans from the source text. Our model achieves state-of-the-art performance on the largest dialogue summarization corpus SAMSum, with as high as 50.79 in ROUGE-L score. In addition, we conduct a case study and show competitive human evaluation results and controllability to human-annotated summaries
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
Access to external knowledge is essential for
many natural language processing tasks, such
as question answering and dialogue. Existing methods often rely on a parametric model
that stores knowledge in its parameters, or use
a retrieval-augmented model that has access
to an external knowledge source. Parametric
and retrieval-augmented models have complementary strengths in terms of computational
efficiency and predictive accuracy. To combine the strength of both approaches, we propose the Efficient Memory-Augmented Transformer (EMAT) â it encodes external knowledge into a key-value memory and exploits the
fast maximum inner product search for memory querying. We also introduce pre-training
tasks that allow EMAT to encode informative key-value representations, and to learn an
implicit strategy to integrate multiple memory slots into the transformer. Experiments
on various knowledge-intensive tasks such as
question answering and dialogue datasets show
that, simply augmenting parametric models
(T5-base) using our method produces more
accurate results (e.g., 25.8 â 44.3 EM on
NQ) while retaining a high throughput (e.g.,
1000 queries/s on NQ). Compared to retrievalaugmented models, EMAT runs substantially
faster across the board and produces more accurate results on WoW and ELI5.
Towards machine-assisted meta-studies: the Hubble constant
We present an approach for automatic extraction of measured values from the astrophysical literature, using the Hubble constant for our pilot study. Our rules-based model â a classical technique in natural language processing â has successfully extracted 298 measurements of the Hubble constant, with uncertainties, from the 208â541 available arXiv astrophysics papers. We have also created an artificial neural network classifier to identify papers in arXiv which report novel measurements. From the analysis of our results we find that reporting measurements with uncertainties and the correct units is critical information when distinguishing novel measurements in free text. Our results correctly highlight the current tension for measurements of the Hubble constant and recover the 3.5Ď discrepancy â demonstrating that the tool presented in this paper is useful for meta-studies of astrophysical measurements from a large number of publications