Search CORE

1,661 research outputs found

Arc-swift: A Novel Transition System for Dependency Parsing

Author: Manning Christopher D.
Qi Peng
Publication venue
Publication date: 01/01/2017
Field of study

Transition-based dependency parsers often need sequences of local shift and reduce operations to produce certain attachments. Correct individual decisions hence require global information about the sentence context and mistakes cause error propagation. This paper proposes a novel transition system, arc-swift, that enables direct attachments between tokens farther apart with a single transition. This allows the parser to leverage lexical information more directly in transition decisions. Hence, arc-swift can achieve significantly better performance with a very small beam size. Our parsers reduce error by 3.7--7.6% relative to those using existing transition systems on the Penn Treebank dependency parsing task and English Universal Dependencies.Comment: Accepted at ACL 201

arXiv.org e-Print Archive

Crossref

Learning Dynamic Feature Selection for Fast Sequential Prediction

Author: McCallum Andrew
Silverstein Kate
Strubell Emma
Vilnis Luke
Publication venue
Publication date: 01/01/2015
Field of study

We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning the features into a sequence of templates which are ordered such that high confidence can often be reached using only a small fraction of all features. Parameter estimation is arranged to maximize accuracy and early confidence in this sequence. Our approach is simpler and better suited to NLP than other related cascade methods. We present experiments in left-to-right part-of-speech tagging, named entity recognition, and transition-based dependency parsing. On the typical benchmarking datasets we can preserve POS tagging accuracy above 97% and parsing LAS above 88.5% both with over a five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase in speed.Comment: Appears in The 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China, July 201

arXiv.org e-Print Archive

Crossref

Parallel Natural Language Parsing: From Analysis to Speedup

Author: Lohuizen Marcellus Paulus van
Publication venue: Technische Universiteit Delft
Publication date: 01/01/2001
Field of study

Electrical Engineering, Mathematics and Computer Scienc

TU Delft Repository

University of Twente Research Information

Reproducibility and Generalization of a Relation Extraction System for Gene-Disease Associations

Author
Publication venue
Publication date
Field of study

Biomedical literature is a rich source of information on Gene-Disease Associations (GDAs) that could help physicians in assessing clinical decisions and improve patient care. GDAs are publicly available in databases containing relationships between gene/miRNA expression and related diseases such as specific types of cancer. Most of these resources, such as DisGeNET, miR2Disease and BioXpress, include also manually curated data from publications. Human annotations are expensive and cannot scale to the huge amount of data available in scientific literature (e.g., biomedical abstracts). Therefore, developing automated tools to identify GDAs is getting traction in the community. Such systems employ Relation Extraction (RE) techniques to extract information on gene/microRNA expression in diseases from text. Once an automated text-mining tool has been developed, it can be tested on human annotated data or it can be compared to state-of-the-art systems. In this work we reproduce DEXTER, a system to automatically extract Gene- Disease Associations (GDAs) from biomedical abstracts. The goal is to provide a benchmark for future works regarding Relation Extraction (RE), enabling researchers to test and compare their results. The implemented version of DEXTER is available in the following git repository: https://github.com/mntlra/DEXTER.Biomedical literature is a rich source of information on Gene-Disease Associations (GDAs) that could help physicians in assessing clinical decisions and improve patient care. GDAs are publicly available in databases containing relationships between gene/miRNA expression and related diseases such as specific types of cancer. Most of these resources, such as DisGeNET, miR2Disease and BioXpress, include also manually curated data from publications. Human annotations are expensive and cannot scale to the huge amount of data available in scientific literature (e.g., biomedical abstracts). Therefore, developing automated tools to identify GDAs is getting traction in the community. Such systems employ Relation Extraction (RE) techniques to extract information on gene/microRNA expression in diseases from text. Once an automated text-mining tool has been developed, it can be tested on human annotated data or it can be compared to state-of-the-art systems. In this work we reproduce DEXTER, a system to automatically extract Gene- Disease Associations (GDAs) from biomedical abstracts. The goal is to provide a benchmark for future works regarding Relation Extraction (RE), enabling researchers to test and compare their results. The implemented version of DEXTER is available in the following git repository: https://github.com/mntlra/DEXTER

Padua Thesis and Dissertation Archive

Recommended from our members

Machine Learning Models for Efficient and Robust Natural Language Processing

Author: Strubell Emma
Publication venue: ScholarWorks@UMass Amherst
Publication date: 30/10/2019
Field of study

Natural language processing (NLP) has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing who did what to whom, has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, a myriad of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency. In this dissertation I develop machine learning methods to facilitate fast and robust inference across many common NLP tasks. First, I describe paired learning and inference algorithms for dynamic feature selection which accelerate inference in linear classifiers, the heart of the fastest NLP models, by 5-10 times. I then present iterated dilated convolutional neural networks (ID-CNNs), a distinct combination of network structure, parameter sharing and training procedures that increase inference speed by 14-20 times with accuracy matching bidirectional LSTMs, the most accurate models for NLP sequence labeling. Finally, I describe linguistically-informed self-attention (LISA), a neural network model that combines multi-head self-attention with multi-task learning to facilitate improved generalization to new domains. We show that incorporating linguistic structure in this way leads to substantial improvements over the previous state-of-the-art (syntax-free) neural network models for SRL, especially when evaluating out-of-domain. I conclude with a brief discussion of potential future directions stemming from my thesis work

ScholarWorks@UMass Amherst