351 research outputs found
Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics
The workshop "Mining Scientific Papers: Computational Linguistics and
Bibliometrics" (CLBib 2015), co-located with the 15th International Society of
Scientometrics and Informetrics Conference (ISSI 2015), brought together
researchers in Bibliometrics and Computational Linguistics in order to study
the ways Bibliometrics can benefit from large-scale text analytics and sense
mining of scientific papers, thus exploring the interdisciplinarity of
Bibliometrics and Natural Language Processing (NLP). The goals of the workshop
were to answer questions like: How can we enhance author network analysis and
Bibliometrics using data obtained by text analytics? What insights can NLP
provide on the structure of scientific writing, on citation networks, and on
in-text citation analysis? This workshop is the first step to foster the
reflection on the interdisciplinarity and the benefits that the two disciplines
Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational
Linguistics and Bibliometrics at ISSI 201
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points
for a given document. While sequence-to-sequence (seq2seq) models have achieved
remarkable performance on this task (Meng et al., 2017), model training often
relies on large amounts of labeled data, which is only applicable to
resource-rich domains. In this paper, we propose semi-supervised keyphrase
generation methods by leveraging both labeled data and large-scale unlabeled
samples for learning. Two strategies are proposed. First, unlabeled documents
are first tagged with synthetic keyphrases obtained from unsupervised keyphrase
extraction methods or a selflearning algorithm, and then combined with labeled
samples for training. Furthermore, we investigate a multi-task learning
framework to jointly learn to generate keyphrases as well as the titles of the
articles. Experimental results show that our semi-supervised learning-based
methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables
Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction
In this paper we analyze the effectiveness of using linguistic knowledge from coreference and
anaphora resolution for improving the performance for supervised keyphrase extraction. In order
to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and
evaluate its performance on a standard dataset using different machine learning algorithms. Then,
we consider new sets of features by adding combinations of the linguistic features we propose
and we evaluate the new performance of the system. We also use anaphora and coreference
resolution to transform the documents, trying to simulate the cohesion process performed by the
human mind. We found that our approach has a slightly positive impact on the performance of
automatic keyphrase extraction, in particular when considering the ranking of the results
Keywords at Work: Investigating Keyword Extraction in Social Media Applications
This dissertation examines a long-standing problem in Natural Language Processing (NLP) -- keyword extraction -- from a new angle. We investigate how keyword extraction can be formulated on social media data, such as emails, product reviews, student discussions, and student statements of purpose. We design novel graph-based features for supervised and unsupervised keyword extraction from emails, and use the resulting system with success to uncover patterns in a new dataset -- student statements of purpose. Furthermore, the system is used with new features on the problem of usage expression extraction from product reviews, where we obtain interesting insights. The system while used on student discussions, uncover new and exciting patterns.
While each of the above problems is conceptually distinct, they share two key common elements -- keywords and social data. Social data can be messy, hard-to-interpret, and not easily amenable to existing NLP resources. We show that our system is robust enough in the face of such challenges to discover useful and important patterns. We also show that the problem definition of keyword extraction itself can be expanded to accommodate new and challenging research questions and datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145929/1/lahiri_1.pd
Keyphrase Generation: A Multi-Aspect Survey
Extractive keyphrase generation research has been around since the nineties,
but the more advanced abstractive approach based on the encoder-decoder
framework and sequence-to-sequence learning has been explored only recently. In
fact, more than a dozen of abstractive methods have been proposed in the last
three years, producing meaningful keyphrases and achieving state-of-the-art
scores. In this survey, we examine various aspects of the extractive keyphrase
generation methods and focus mostly on the more recent abstractive methods that
are based on neural networks. We pay particular attention to the mechanisms
that have driven the perfection of the later. A huge collection of scientific
article metadata and the corresponding keyphrases is created and released for
the research community. We also present various keyphrase generation and text
summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th
Conference of the Open Innovations Association FRUCT, Helsinki, Finlan
- …