5 research outputs found

    Machine Learning for Readability Assessment and Text Simplification in Crisis Communication: A Systematic Review

    Get PDF
    In times of social media, crisis managers can interact with the citizens in a variety of ways. Since machine learning has already been used to classify messages from the population, the question is, whether such technologies can play a role in the creation of messages from crisis managers to the population. This paper focuses on an explorative research revolving around selected machine learning solutions for crisis communication. We present systematic literature reviews of readability assessment and text simplification. Our research suggests that readability assessment has the potential for an effective use in crisis communication, but there is a lack of sufficient training data. This also applies to text simplification, where an exact assessment is only partly possible due to unreliable or non-existent training data and validation measures

    Text Simplifier Based on Syntax Analysis

    Get PDF
    Bakalureusetöö kirjeldab teksti lihtsustamist, keskendudes peamiselt süntaktilisele lihtsustamisele. Inglise keele puhul on süntaktilise lihtsustamise probleemi käsitletud arvukates teadustöödes. Neid tulemusi rakendatakse bakalaureusetöös eesti keelele. Töö eesmärgiks oli luua veebirakendusena teksti lihtsustaja, mille peamiseks lihtsustamismeetodiks oleks lihtlausestamine, s.t liitlausete jagamine lihtlauseteks. Lihtsustaja kasutab süntaksianalüüsiks loomuliku keele töötluse paketti EstNLTK.This bachelor’s thesis gives an overview of text simplification, focusing specifically on syntactic simplification to bring it’s well-researched theory in English over into Estonian. The purpose of the thesis is to create a web-based text simplification application with it’s main method of simplification being sentence splitting. For syntax analysis, the simplifier uses the Estonian natural language toolkit – EstNLTK

    An Automatic Modern Standard Arabic Text Simplification System: A Corpus-Based Approach

    Get PDF
    This thesis brings together an overview of Text Readability (TR) about Text Simplification (TS) with an application of both to Modern Standard Arabic (MSA). It will present our findings on using automatic TR and TS tools to teach MSA, along with challenges, limitations, and recommendations about enhancing the TR and TS models. Reading is one of the most vital tasks that provide language input for communication and comprehension skills. It is proved that the use of long sentences, connected sentences, embedded phrases, passive voices, non- standard word orders, and infrequent words can increase the text difficulty for people with low literacy levels, as well as second language learners. The thesis compares the use of sentence embeddings of different types (fastText, mBERT, XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. The accuracy of the 3-way CEFR (The Common European Framework of Reference for Languages Proficiency Levels) classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification, respectively and 0.71 Spearman correlation for the regression task. At the same time, the binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for the sentence-pair semantic similarity classifier. TS is an NLP task aiming to reduce the linguistic complexity of the text while maintaining its meaning and original information (Siddharthan, 2002; Camacho Collados, 2013; Saggion, 2017). The simplification study experimented using two approaches: (i) a classification approach and (ii) a generative approach. It then evaluated the effectiveness of these methods using the BERTScore (Zhang et al., 2020) evaluation metric. The simple sentences produced by the mT5 model achieved P 0.72, R 0.68 and F-1 0.70 via BERTScore while combining Arabic- BERT and fastText achieved P 0.97, R 0.97 and F-1 0.97. To reiterate, this research demonstrated the effectiveness of the implementation of a corpus-based method combined with extracting extensive linguistic features via the latest NLP techniques. It provided insights which can be of use in various Arabic corpus studies and NLP tasks such as translation for educational purposes
    corecore