38,797 research outputs found
Performance of the Charniak-Lease parser on biological text using different training corpora
POS tagging is used as the first step in many NLP workflows, although the accuracy of tag assignment frequently goes unchecked. We hypothesize that changing the training corpora for a parser will affect its POS tagging of a target corpus. To this end we train the Charniak-Lease parser on the WSJ corpus and two biomedical corpora and evaluate its output to MedPost, a POS tagger with a reported 97% accuracy on biomedical text. Our findings indicate that using biomedical training corpora significantly improves performance, but that minor differences in the biomedical training corpora have a significant effect on the correctness of POS tagging. Specifically, the tagging of hyphenated words and verbs was affected. This work suggests that the choice of training corpora is crucial to domain targeted NLP analysis
A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing
We present a novel neural network model that learns POS tagging and
graph-based dependency parsing jointly. Our model uses bidirectional LSTMs to
learn feature representations shared for both POS tagging and dependency
parsing tasks, thus handling the feature-engineering problem. Our extensive
experiments, on 19 languages from the Universal Dependencies project, show that
our model outperforms the state-of-the-art neural network-based
Stack-propagation model for joint POS tagging and transition-based dependency
parsing, resulting in a new state of the art. Our code is open-source and
available together with pre-trained models at:
https://github.com/datquocnguyen/jPTDPComment: v2: also include universal POS tagging, UAS and LAS accuracies w.r.t
gold-standard segmentation on Universal Dependencies 2.0 - CoNLL 2017 shared
task test data; in CoNLL 201
Semantic Tagging with Deep Residual Networks
We propose a novel semantic tagging task, sem-tagging, tailored for the
purpose of multilingual semantic parsing, and present the first tagger using
deep residual networks (ResNets). Our tagger uses both word and character
representations and includes a novel residual bypass architecture. We evaluate
the tagset both intrinsically on the new task of semantic tagging, as well as
on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an
auxiliary loss function predicting our semantic tags, significantly outperforms
prior results on English Universal Dependencies POS tagging (95.71% accuracy on
UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio
An improved neural network model for joint POS tagging and dependency parsing
We propose a novel neural network model for joint part-of-speech (POS)
tagging and dependency parsing. Our model extends the well-known BIST
graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating
a BiLSTM-based tagging component to produce automatically predicted POS tags
for the parser. On the benchmark English Penn treebank, our model obtains
strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+%
absolute improvements to the BIST graph-based parser, and also obtaining a
state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental
results on parsing 61 "big" Universal Dependencies treebanks from raw texts
show that our model outperforms the baseline UDPipe (Straka and Strakov\'a,
2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS
score. In addition, with our model, we also obtain state-of-the-art downstream
task scores for biomedical event extraction and opinion analysis applications.
Our code is available together with all pre-trained models at:
https://github.com/datquocnguyen/jPTDPComment: 11 pages; In Proceedings of the CoNLL 2018 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependencies, to appea
- …