Search CORE

599 research outputs found

Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization

Author: Carbonell Jaime
Frederking Robert
Gershman Anatole
Marujo Luis
Neto João P.
Publication venue
Publication date: 20/06/2013
Field of study

Fast and effective automated indexing is critical for search and personalized services. Key phrases that consist of one or more words and represent the main concepts of the document are often used for the purpose of indexing. In this paper, we investigate the use of additional semantic features and pre-processing steps to improve automatic key phrase extraction. These features include the use of signal words and freebase categories. Some of these features lead to significant improvements in the accuracy of the results. We also experimented with 2 forms of document pre-processing that we call light filtering and co-reference normalization. Light filtering removes sentences from the document, which are judged peripheral to its main content. Co-reference normalization unifies several written forms of the same named entity into a unique form. We also needed a "Gold Standard" - a set of labeled documents for training and evaluation. While the subjective nature of key phrase selection precludes a true "Gold Standard", we used Amazon's Mechanical Turk service to obtain a useful approximation. Our data indicates that the biggest improvements in performance were due to shallow semantic features, news categories, and rhetorical signals (nDCG 78.47% vs. 68.93%). The inclusion of deeper semantic features such as Freebase sub-categories was not beneficial by itself, but in combination with pre-processing, did cause slight improvements in the nDCG scores.Comment: In 8th International Conference on Language Resources and Evaluation (LREC 2012

arXiv.org e-Print Archive

CiteSeerX

MIsA : multilingual 'IsA' extraction from Corpora

Author: Faralli Stefano
Lefever Els
Paolo Ponzetto Simone
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

MAnnheim DOCument Server

Dependency Grammar Induction with Neural Lexicalization and Big Training Data

Author: Han Wenjuan
Jiang Yong
Tu Kewei
Publication venue
Publication date: 01/01/2017
Field of study

We study the impact of big models (in terms of the degree of lexicalization) and big data (in terms of the training corpus size) on dependency grammar induction. We experimented with L-DMV, a lexicalized version of Dependency Model with Valence and L-NDMV, our lexicalized extension of the Neural Dependency Model with Valence. We find that L-DMV only benefits from very small degrees of lexicalization and moderate sizes of training corpora. L-NDMV can benefit from big training data and lexicalization of greater degrees, especially when enhanced with good model initialization, and it achieves a result that is competitive with the current state-of-the-art.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Large-Scale Goodness Polarity Lexicons for Community Question Answering

Author: Church Kenneth Ward
Mohammad Saif
Nakov Preslav
Pontiki Maria
Rosenthal Sara
Turney Peter D.
Publication venue
Publication date: 20/07/2017
Field of study

We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline and state-of-the art performance on SemEval-2016 Task 3.Comment: SIGIR '17, August 07-11, 2017, Shinjuku, Tokyo, Japan; Community Question Answering; Goodness polarity lexicons; Sentiment Analysi

arXiv.org e-Print Archive

TUbiblio

Crossref