16,494 research outputs found
Scientific Information Extraction with Semi-supervised Neural Tagging
This paper addresses the problem of extracting keyphrases from scientific
articles and categorizing them as corresponding to a task, process, or
material. We cast the problem as sequence tagging and introduce semi-supervised
methods to a neural tagging model, which builds on recent advances in named
entity recognition. Since annotated training data is scarce in this domain, we
introduce a graph-based semi-supervised algorithm together with a data
selection scheme to leverage unannotated articles. Both inductive and
transductive semi-supervised learning strategies outperform state-of-the-art
information extraction performance on the 2017 SemEval Task 10 ScienceIE task.Comment: accepted by EMNLP 201
Cross Language Text Classification via Subspace Co-Regularized Multi-View Learning
In many multilingual text classification problems, the documents in different
languages often share the same set of categories. To reduce the labeling cost
of training a classification model for each individual language, it is
important to transfer the label knowledge gained from one language to another
language by conducting cross language classification. In this paper we develop
a novel subspace co-regularized multi-view learning method for cross language
text classification. This method is built on parallel corpora produced by
machine translation. It jointly minimizes the training error of each classifier
in each language while penalizing the distance between the subspace
representations of parallel documents. Our empirical study on a large set of
cross language text classification tasks shows the proposed method consistently
outperforms a number of inductive methods, domain adaptation methods, and
multi-view learning methods.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
- …