22 research outputs found
Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings
Keyword extraction is a fundamental task in natural language processing that
facilitates mapping of documents to a concise set of representative single and
multi-word phrases. Keywords from text documents are primarily extracted using
supervised and unsupervised approaches. In this paper, we present an
unsupervised technique that uses a combination of theme-weighted personalized
PageRank algorithm and neural phrase embeddings for extracting and ranking
keywords. We also introduce an efficient way of processing text documents and
training phrase embeddings using existing techniques. We share an evaluation
dataset derived from an existing dataset that is used for choosing the
underlying embedding model. The evaluations for ranked keyword extraction are
performed on two benchmark datasets comprising of short abstracts (Inspec), and
long scientific papers (SemEval 2010), and is shown to produce results better
than the state-of-the-art systems.Comment: preprint for paper accepted in Proceedings of 1st IEEE International
Conference on Multimedia Information Processing and Retrieva
#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement
In this paper, we present a dataset containing 9,973 tweets related to the
MeToo movement that were manually annotated for five different linguistic
aspects: relevance, stance, hate speech, sarcasm, and dialogue acts. We present
a detailed account of the data collection and annotation processes. The
annotations have a very high inter-annotator agreement (0.79 to 0.93 k-alpha)
due to the domain expertise of the annotators and clear annotation
instructions. We analyze the data in terms of geographical distribution, label
correlations, and keywords. Lastly, we present some potential use cases of this
dataset. We expect this dataset would be of great interest to psycholinguists,
socio-linguists, and computational linguists to study the discursive space of
digitally mobilized social movements on sensitive issues like sexual
harassment.Comment: Preprint of paper accepted at ICWSM 202