This paper revisits the problem of complex word identification (CWI)
following up the SemEval CWI shared task. We use ensemble classifiers to
investigate how well computational methods can discriminate between complex and
non-complex words. Furthermore, we analyze the classification performance to
understand what makes lexical complexity challenging. Our findings show that
most systems performed poorly on the SemEval CWI dataset, and one of the
reasons for that is the way in which human annotation was performed.Comment: Proceedings of the 4th Workshop on NLP Techniques for Educational
Applications (NLPTEA 2017