Search CORE

2 research outputs found

Global Entity Ranking Across Multiple Languages

Author: Bhattacharyya Prantik
Spasojevic Nemanja
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/03/2017
Field of study

We present work on building a global long-tailed ranking of entities across multiple languages using Wikipedia and Freebase knowledge bases. We identify multiple features and build a model to rank entities using a ground-truth dataset of more than 10 thousand labels. The final system ranks 27 million entities with 75% precision and 48% F1 score. We provide performance evaluation and empirical evidence of the quality of ranking across languages, and open the final ranked lists for future research.Comment: 2 Pages, 1 Figure, 2 Tables, WWW2017 Companion, WWW 2017 Companio

arXiv.org e-Print Archive

Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media

Author: Bhargava Preeti
Hu Guoning
Spasojevic Nemanja
Publication venue
Publication date: 13/07/2017
Field of study

In this paper, we describe the Lithium Natural Language Processing (NLP) system - a resource-constrained, high- throughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currently incorporated in Lithium products. We also compare our system with existing commercial and academic NLP systems in terms of performance, information extracted and languages supported. We show that Lithium NLP is at par with and in some cases, outperforms state- of-the-art commercial NLP systems.Comment: 9 pages, 6 figures, 2 tables, EMNLP 2017 Workshop on Noisy User Generated Text WNUT 201

arXiv.org e-Print Archive