Search CORE

2,785 research outputs found

Can Language Models Identify Wikipedia Articles with Readability and Style Issues?

Author: Glowacka Dorota
Liu Yang
Medlar Alan
Publication venue: Association for Computing Machinery
Publication date: 01/08/2021
Field of study

Wikipedia is frequently criticised for having poor readability and style issues. In this article, we investigate using GPT-2, a neural language model, to identify poorly written text in Wikipedia by ranking documents by their perplexity. We evaluated the properties of this ranking using human assessments of text quality, including readability, narrativity and language use. We demonstrate that GPT-2 perplexity scores correlate moderately to strongly with narrativity, but only weakly with reading comprehension scores. Importantly, the model reflects even small improvements to text as would be seen in Wikipedia edits. We conclude by highlighting that Wikipedia's featured articles counter-intuitively contain text with the highest perplexity scores. However, these examples highlight many of the complexities that need to be resolved for such an approach to be used in practice.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Examining the Role of Linguistic Flexibility in the Text Production Process

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: A commonly held belief among educators, researchers, and students is that high-quality texts are easier to read than low-quality texts, as they contain more engaging narrative and story-like elements. Interestingly, these assumptions have typically failed to be supported by the writing literature. Research suggests that higher quality writing is typically associated with decreased levels of text narrativity and readability. Although narrative elements may sometimes be associated with high-quality writing, the majority of research suggests that higher quality writing is associated with decreased levels of text narrativity, and measures of readability in general. One potential explanation for this conflicting evidence lies in the situational influence of text elements on writing quality. In other words, it is possible that the frequency of specific linguistic or rhetorical text elements alone is not consistently indicative of essay quality. Rather, these effects may be largely driven by individual differences in students' ability to leverage the benefits of these elements in appropriate contexts. This dissertation presents the hypothesis that writing proficiency is associated with an individual's flexible use of text properties, rather than simply the consistent use of a particular set of properties. Across three experiments, this dissertation relies on a combination of natural language processing and dynamic methodologies to examine the role of linguistic flexibility in the text production process. Overall, the studies included in this dissertation provide important insights into the role of flexibility in writing skill and develop a strong foundation on which to conduct future research and educational interventions.Dissertation/ThesisDoctoral Dissertation Psychology 201

ASU Digital Repository

Sentiment and Sentence Similarity as Predictors of Integrated and Independent L2 Writing Performance

Author: Ulum Ömer Gökhan
Uzun Kutay
Publication venue: 'LPPM Universitas Advent Indonesia'
Publication date: 28/06/2021
Field of study

This study aimed to utilize sentiment and sentence similarity analyses, two Natural Language Processing techniques, to see if and how well they could predict L2 Writing Performance in integrated and independent task conditions. The data sources were an integrated L2 writing corpus of 185 literary analysis essays and an independent L2 writing corpus of 500 argumentative essays, both of which were compiled in higher education contexts. Both essay groups were scored between 0 and 100. Two Python libraries, TextBlob and SpaCy, were used to generate sentiment and sentence similarity data. Using sentiment (polarity and subjectivity) and sentence similarity variables, regression models were built and 95% prediction intervals were compared for integrated and independent corpora. The results showed that integrated L2 writing performance could be predicted by subjectivity and sentence similarity. However, only subjectivity predicted independent L2 writing performance. The prediction interval of subjectivity for independent writing model was found to be narrower than the same interval for integrated writing. The results show that the sentiment and sentence similarity analysis algorithms can be used to generate complementary data to improve more complex multivariate L2 writing performance prediction models

Online Journal Universitas Advent Indonesia

Applications of Text Analysis Tools for Spoken Response Grading

Author: Crossley Scott
McNamara Danielle
Publication venue: Michigan State University Center for Language Education and Research
Publication date: 01/06/2013
Field of study

ScholarSpace at University of Hawai'i at Manoa

State of the Art LSA

Author: Giesbers Bas
Rusman Ellen
Van Bruggen Jan
Publication venue
Publication date: 30/05/2006
Field of study

Open University of the Netherlands Research Portal

State of the Art LSA

Author: Giesbers Bas
Rusman Ellen
Van Bruggen Jan
Publication venue
Publication date: 30/05/2006
Field of study

Open University of the Netherlands Research Portal

A Statistical Approach to Automatic Essay Scoring

Author: Nektariou Angeliki
Νεκταρίου Αγγελική
Publication venue
Publication date: 01/01/2019
Field of study

Η ολοένα αυξανόμενη ανάγκη για αξιολόγηση των δεξιοτήτων γραπτού λόγου, σε συνδυασμό με την δυναμική της αυτόματης αξιολόγησης γραπτού λόγου να συνδράμει στην διδασκαλία και εκμάθηση, αλλά και την αξιολόγηση γραπτού λόγου, η παρούσα μελέτη στοχεύει στη διερεύνηση της σχέσης ανάμεσα σε υφομετρικά χαρακτηριστικά των κειμένων, άρρηκτα συνδεδεμένων με την αυτόματη αξιολόγηση γραπτού λόγου, και τον βαθμό καλλιέργειας δεξιοτήτων γραπτής έκφρασης εκ μέρους των μαθητών, όπως αυτός αποτυπώνεται στην αξιολόγηση μαθητικών εκθέσεων από εξειδικευμένους αξιολογητές. Το υπό ανάλυση σώμα κειμένων ανακτήθηκε από βάση δεδομένων προσφερόμενων στα πλαίσια πρόσφατου διαγωνισμού αυτόματης αξιολόγησης γραπτού λόγου, που πραγματοποιήθηκε στο σχολικό περιβάλλον των ΗΠΑ. Τα υφομετρικά χαρακτηριστικά των κειμένων που λήφθηκαν υπόψη στην παρούσα ανάλυση εστιάζουν κυρίως σε ενδείκτες συνοχής του κειμένου, λεξιλογικού πλούτου και εύρους μορφοσυντακτικών επιλογών. Από την παρούσα ανάλυση διαφαίνεται άμεση σχέση υφομετρικών χαρακτηριστικών των υπό ανάλυση κειμένων με την αξιολόγηση της οποίας έτυχαν στα πλαίσια του προαναφερθέντος διαγωνισμού. Το εύρημα αυτό καταδεικνύει την καίρια σημασία εντατικοποίησης της σχετικής πειραματικής διερεύνησης, με στόχο την βελτιστοποίηση της εναλλακτικής αυτής μορφής υποστήριξης των εμπλεκομένων στην διδακτική και εξεταστική διαδικασία.Taking into consideration escalating need for testing writing ability and the potential of Automatic Essay Scoring (AES) to support writing instruction and evaluation, the aim of the present study is to explore the relationship between stylometric indices, widely used in AES systems, and the degree of sophistication of learner essays, captured by the score provided by expert human raters. The data analyzed were obtained from a recently organized public AES competition and comprise persuasive essays written in the context of public school in the United States. Stylometric information taken into consideration greatly focuses on measures of cohesion, as well as lexical diversity and syntactic sophistication. Results indicate a clear relationship between quantifiable features of learners’ written responses and the impression which they have made on expert raters. This observation reinforces the importance of pursuing further experimentation into AES, which would yield significant educational and social benefits

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Defining and Assessing Critical Thinking: toward an automatic analysis of HiEd students’ written texts

Author: AMENDUNI FRANCESCA
Publication venue: Università di Foggia
Publication date: 01/01/2021
Field of study

L'obiettivo principale di questa tesi di dottorato è testare, attraverso due studi empirici, l'affidabilità di un metodo volto a valutare automaticamente le manifestazioni del Pensiero Critico (CT) nei testi scritti da studenti universitari. Gli studi empirici si sono basati su una review critica della letteratura volta a proporre una nuova classificazione per sistematizzare le diverse definizioni di CT e i relativi approcci teorici. La review esamina anche la relazione tra le diverse definizioni di CT e i relativi metodi di valutazione. Dai risultati emerge la necessità di concentrarsi su misure aperte per la valutazione del CT e di sviluppare strumenti automatici basati su tecniche di elaborazione del linguaggio naturale (NLP) per superare i limiti attuali delle misure aperte, come l’attendibilità e i costi di scoring. Sulla base di una rubrica sviluppata e implementata dal gruppo di ricerca del Centro di Didattica Museale – Università di Roma Tre (CDM) per la valutazione e l'analisi dei livelli di CT all'interno di risposte aperte (Poce, 2017), è stato progettato un prototipo per la misurazione automatica di alcuni indicatori di CT. Il primo studio empirico condotto su un gruppo di 66 docenti universitari mostra livelli di affidabilità soddisfacenti della rubrica di valutazione, mentre la valutazione effettuata dal prototipo non era sufficientemente attendibile. I risultati di questa sperimentazione sono stati utilizzati per capire come e in quali condizioni il modello funziona meglio. La seconda indagine empirica era volta a capire quali indicatori del linguaggio naturale sono maggiormente associati a sei sottodimensioni del CT, valutate da esperti in saggi scritti in lingua italiana. Lo studio ha utilizzato un corpus di 103 saggi pre-post di studenti universitari di laurea magistrale che hanno frequentato il corso di "Pedagogia sperimentale e valutazione scolastica". All'interno del corso, sono state proposte due attività per stimolare il CT degli studenti: la valutazione delle risorse educative aperte (OER) (obbligatoria e online) e la progettazione delle OER (facoltativa e in modalità blended). I saggi sono stati valutati sia da valutatori esperti, considerando sei sotto-dimensioni del CT, sia da un algoritmo che misura automaticamente diversi tipi di indicatori del linguaggio naturale. Abbiamo riscontrato un'affidabilità interna positiva e un accordo tra valutatori medio-alto. I livelli di CT degli studenti sono migliorati in modo significativo nel post-test. Tre indicatori del linguaggio naturale sono 5 correlati in modo significativo con il punteggio totale di CT: la lunghezza del corpus, la complessità della sintassi e la funzione di peso tf-idf (term frequency–inverse document frequency). I risultati raccolti durante questo dottorato hanno implicazioni sia teoriche che pratiche per la ricerca e la valutazione del CT. Da un punto di vista teorico, questa tesi mostra sovrapposizioni inesplorate tra diverse tradizioni, prospettive e metodi di studio del CT. Questi punti di contatto potrebbero costituire la base per un approccio interdisciplinare e la costruzione di una comprensione condivisa di CT. I metodi di valutazione automatica possono supportare l’uso di misure aperte per la valutazione del CT, specialmente nell'insegnamento online. Possono infatti facilitare i docenti e i ricercatori nell'affrontare la crescente presenza di dati linguistici prodotti all'interno di piattaforme educative (es. Learning Management Systems). A tal fine, è fondamentale sviluppare metodi automatici per la valutazione di grandi quantità di dati che sarebbe impossibile analizzare manualmente, fornendo agli insegnanti e ai valutatori un supporto per il monitoraggio e la valutazione delle competenze dimostrate online dagli studenti.The main goal of this PhD thesis is to test, through two empirical studies, the reliability of a method aimed at automatically assessing Critical Thinking (CT) manifestations in Higher Education students’ written texts. The empirical studies were based on a critical review aimed at proposing a new classification for systematising different CT definitions and their related theoretical approaches. The review also investigates the relationship between the different adopted CT definitions and CT assessment methods. The review highlights the need to focus on open-ended measures for CT assessment and to develop automatic tools based on Natural Language Processing (NLP) technique to overcome current limitations of open-ended measures, such as reliability and costs. Based on a rubric developed and implemented by the Center for Museum Studies – Roma Tre University (CDM) research group for the evaluation and analysis of CT levels within open-ended answers (Poce, 2017), a NLP prototype for the automatic measurement of CT indicators was designed. The first empirical study was carried out on a group of 66 university teachers. The study showed satisfactory reliability levels of the CT evaluation rubric, while the evaluation carried out by the prototype was not yet sufficiently reliable. The results were used to understand how and under what conditions the model works better. The second empirical investigation was aimed at understanding which NLP features are more associated with six CT sub-dimensions as assessed by human raters in essays written in the Italian language. The study used a corpus of 103 students’ pre-post essays who attended a Master's Degree module in “Experimental Education and School Assessment” to assess students' CT levels. Within the module, we proposed two activities to stimulate students' CT: Open Educational Resources (OERs) assessment (mandatory and online) and OERs design (optional and blended). The essays were assessed both by expert evaluators, considering six CT sub-dimensions, and by an algorithm that automatically calculates different kinds of NLP features. The study shows a positive internal reliability and a medium to high inter-coder agreement in expert evaluation. Students' CT levels improved significantly in the post-test. Three NLP indicators significantly correlate with CT total score: the Corpus Length, the Syntax Complexity, and an adapted measure of Term Frequency- Inverse Document Frequency. The results collected during this PhD have both theoretical and practical implications for CT research and assessment. From a theoretical perspective, this thesis shows unexplored similarities among different CT traditions, perspectives, and study methods. These similarities could be exploited to open up an interdisciplinary dialogue among experts and build up a shared understanding of CT. Automatic assessment methods can enhance the use of open-ended measures for CT assessment, especially in online teaching. Indeed, they can support teachers and researchers to deal with the growing presence of linguistic data produced within educational 4 platforms. To this end, it is pivotal to develop automatic methods for the evaluation of large amounts of data which would be impossible to analyse manually, providing teachers an

Archivio Istituzionale della Ricerca- Università degli Studi di Foggia