6 research outputs found

    Lexical Complexity Prediction with Assembly Models

    Get PDF
    Tuning the complexity of one\u27s writing is essential to presenting ideas in a logical, intuitive manner to audiences. This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model and a deep neural network model with an underlying Transformer architecture based on BERT. While BERT itself performs competitively, our feature engineering-based model helps in extreme cases, eg. separating instances of easy and neutral difficulty. Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonetic measures. Visualizations of BERT attention maps offer insight into potential features that Transformers models may implicitly learn when fine-tuned for the purposes of lexical complexity prediction. Our assembly technique performs reasonably well at predicting the complexities of single words, and we demonstrate how such techniques can be harnessed to perform well when on multi word expressions (MWEs) too

    archer at SemEval-2021 Task 1: Contextualising Lexical Complexity.

    Get PDF
    Evaluating the complexity of a target word in a sentential context is the aim of the Lexical Complexity Prediction task at SemEval-2021. This paper presents the system created to assess single words lexical complexity, combining linguistic and psycholinguistic variables in a set of experiments involving random forest and XGboost regressors. Beyond encoding out-of-context information about the lemma, we implemented features based on pre-trained language models to model the target word's in-context complexity

    Augmenting the CoAST system with automated text simplification

    Get PDF
    Proper comprehension of academic texts is important for students in higher education. The CoAST platform is a virtual learning environment that endeavours to improve reading comprehension by augmenting theoretically, and lexically, complex texts with helpful annotations provided by a teacher. This thesis extends the CoAST system, and introduces machine learning models that assist the teacher with identifying complex terminology, and writing annotations, by providing relevant definitions for a given word or phrase. A deep learning model is implemented to retrieve definitions for words, or phrases of a arbitrary length. This model surpasses previous work on the task of definition modelling, when evaluated on various automated benchmarks. We investigate the task of complex word identification, producing two convolutional based models that predict the complexity of words and two-word phrases in a context dependent manner. These models were submitted as part of the Lexical Complexity Prediction 2021 shared task, and showed results in a comparable range to that of other submissions. Both of these models are integrated into the CoAST system and evaluated through an online study. When selecting complex words from a document, the teacher’s selections, shared a sizeable overlap with the systems predictions. Results suggest that the technologies introduced in this work would benefit students, and teachers, using the CoAST system
    corecore