Complex Word Identification: Challenges in Data Annotation and System
  Performance

Malmasi, Shervin; Paetzold, Gustavo; Specia, Lucia; Zampieri, Marcos

research

Complex Word Identification: Challenges in Data Annotation and System Performance

Authors: Shervin Malmasi
Gustavo Paetzold
Lucia Specia
Marcos Zampieri
Publication date: 13 October 2017
Publisher
Doi

Abstract

This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task. We use ensemble classifiers to investigate how well computational methods can discriminate between complex and non-complex words. Furthermore, we analyze the classification performance to understand what makes lexical complexity challenging. Our findings show that most systems performed poorly on the SemEval CWI dataset, and one of the reasons for that is the way in which human annotation was performed.Comment: Proceedings of the 4th Workshop on NLP Techniques for Educational Applications (NLPTEA 2017

Similar works

Full text

Available Versions

ZENODO

oai:zenodo.org:1040837

Last time updated on 05/01/2018