238 research outputs found
An improved neural network model for joint POS tagging and dependency parsing
We propose a novel neural network model for joint part-of-speech (POS)
tagging and dependency parsing. Our model extends the well-known BIST
graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating
a BiLSTM-based tagging component to produce automatically predicted POS tags
for the parser. On the benchmark English Penn treebank, our model obtains
strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+%
absolute improvements to the BIST graph-based parser, and also obtaining a
state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental
results on parsing 61 "big" Universal Dependencies treebanks from raw texts
show that our model outperforms the baseline UDPipe (Straka and Strakov\'a,
2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS
score. In addition, with our model, we also obtain state-of-the-art downstream
task scores for biomedical event extraction and opinion analysis applications.
Our code is available together with all pre-trained models at:
https://github.com/datquocnguyen/jPTDPComment: 11 pages; In Proceedings of the CoNLL 2018 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependencies, to appea
Towards a Semantic Lexicon for Biological Language Processing
This paper explores the use of the resources in the National Library of Medicine's
Unified Medical Language System (UMLS) for the construction of a lexicon useful
for processing texts in the field of molecular biology. A lexicon is constructed from
overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus
to obtain both morphosyntactic and semantic information for terms, and the coverage
of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in
the constructed lexicon, validating the lexicon's coverage of the most frequent terms
in the domain and indicating that the constructed lexicon is potentially an important
resource for biological text processing
A Framework to Adjust Dependency Measure Estimates for Chance
Estimating the strength of dependency between two variables is fundamental
for exploratory analysis and many other applications in data mining. For
example: non-linear dependencies between two continuous variables can be
explored with the Maximal Information Coefficient (MIC); and categorical
variables that are dependent to the target class are selected using Gini gain
in random forests. Nonetheless, because dependency measures are estimated on
finite samples, the interpretability of their quantification and the accuracy
when ranking dependencies become challenging. Dependency estimates are not
equal to 0 when variables are independent, cannot be compared if computed on
different sample size, and they are inflated by chance on variables with more
categories. In this paper, we propose a framework to adjust dependency measure
estimates on finite samples. Our adjustments, which are simple and applicable
to any dependency measure, are helpful in improving interpretability when
quantifying dependency and in improving accuracy on the task of ranking
dependencies. In particular, we demonstrate that our approach enhances the
interpretability of MIC when used as a proxy for the amount of noise between
variables, and to gain accuracy when ranking variables during the splitting
procedure in random forests.Comment: In Proceedings of the 2016 SIAM International Conference on Data
Minin
From POS tagging to dependency parsing for biomedical event extraction
Background: Given the importance of relation or event extraction from
biomedical research publications to support knowledge capture and synthesis,
and the strong dependency of approaches to this information extraction task on
syntactic information, it is valuable to understand which approaches to
syntactic processing of biomedical text have the highest performance. Results:
We perform an empirical study comparing state-of-the-art traditional
feature-based and neural network-based models for two core natural language
processing tasks of part-of-speech (POS) tagging and dependency parsing on two
benchmark biomedical corpora, GENIA and CRAFT. To the best of our knowledge,
there is no recent work making such comparisons in the biomedical context;
specifically no detailed analysis of neural models on this data is available.
Experimental results show that in general, the neural models outperform the
feature-based models on two benchmark biomedical corpora GENIA and CRAFT. We
also perform a task-oriented evaluation to investigate the influences of these
models in a downstream application on biomedical event extraction, and show
that better intrinsic parsing performance does not always imply better
extrinsic event extraction performance. Conclusion: We have presented a
detailed empirical study comparing traditional feature-based and neural
network-based models for POS tagging and dependency parsing in the biomedical
context, and also investigated the influence of parser selection for a
biomedical event extraction downstream task. Availability of data and material:
We make the retrained models available at
https://github.com/datquocnguyen/BioPosDepComment: Accepted for publication in BMC Bioinformatic
- …