1,360 research outputs found
Improving Syntactic Parsing of Clinical Text Using Domain Knowledge
Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information.
Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasks’ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction
Citation Function and Polarity Classification in Biomedical Papers
The traditional reference evaluation method treats all citations equally. However, a citation can serve various functions. It may reflect the citing paper author’s motivation as well as his/her true attitude towards the cited paper. Investigating such information can be achieved through citation content analysis.
This thesis develops an 8-category classification scheme on citation function and polarity to help understand what role a citation played in scientific papers. A biomedical citation corpus is annotated with this scheme and experimented with supervised machine learning methods. Several types of features that capture the characteristics of citation sentences are extracted by natural language processing techniques to serve as the inputs of automatic classifiers. The importance of cue phrases in citation classification is also addressed and discussed
Graph Convolutional Networks for Text Classification
Text classification is an important and classical problem in natural language
processing. There have been a number of studies that applied convolutional
neural networks (convolution on regular grid, e.g., sequence) to
classification. However, only a limited number of studies have explored the
more flexible graph convolutional neural networks (convolution on non-grid,
e.g., arbitrary graph) for the task. In this work, we propose to use graph
convolutional networks for text classification. We build a single text graph
for a corpus based on word co-occurrence and document word relations, then
learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text
GCN is initialized with one-hot representation for word and document, it then
jointly learns the embeddings for both words and documents, as supervised by
the known class labels for documents. Our experimental results on multiple
benchmark datasets demonstrate that a vanilla Text GCN without any external
word embeddings or knowledge outperforms state-of-the-art methods for text
classification. On the other hand, Text GCN also learns predictive word and
document embeddings. In addition, experimental results show that the
improvement of Text GCN over state-of-the-art comparison methods become more
prominent as we lower the percentage of training data, suggesting the
robustness of Text GCN to less training data in text classification.Comment: Accepted by 33rd AAAI Conference on Artificial Intelligence (AAAI
2019
- …