Search CORE

1,899 research outputs found

A robust methodology for automated essay grading

Author: Fazal Anhar
Publication venue: Curtin University
Publication date: 01/01/2013
Field of study

None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling

espace@Curtin

An automated essay evaluation system using natural language processing and sentiment analysi

Author: Gunakimath Suryakanth Sharvani
Guruvyas Kadagathur Raghavendra Rao
Janardhan Acharya Jeevan
Patil Pranav Prashantha
Sadanand Vijaya Shetty
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2022
Field of study

An automated essay evaluation system is a machine-based approach leveraging long short-term memory (LSTM) model to award grades to essays written in English language. natural language processing (NLP) is used to extract feature representations from the essays. The LSTM network learns from the extracted features and generates parameters for testing and validation. The main objectives of the research include proposing and training an LSTM model using a dataset of manually graded essays with scores. Sentiment analysis is performed to determine the sentiment of the essay as either positive, negative or neutral. The twitter sample dataset is used to build sentiment classifier that analyzes the sentiment based on the student’s approach towards a topic. Additionally, each essay is subjected to detection of syntactical errors as well as plagiarism check to detect the novelty of the essay. The overall grade is calculated based on the quality of the essay, the number of syntactic errors, the percentage of plagiarism found and sentiment of the essay. The corrected essay is provided as feedback to the students. This essay grading model has gained an average quadratic weighted kappa (QWK) score of 0.911 with 99.4% accuracy for the sentiment analysis classifier

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Recommended from our members

Off-topic response detection for spontaneous spoken English assessment

Author: Gales MJF
Knill KM
Malinin A
Van Dalen RC
Wang Y
Publication venue: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Publication date: 01/01/2016
Field of study

Automatic spoken language assessment systems are becoming increasingly important to meet the demand for English second language learning. This is a challenging task due to the high error rates of, even state-of-the-art, non-native speech recognition. Consequently current systems primarily assess fluency and pronunciation. However, content assessment is essential for full automation. As a first stage it is important to judge whether the speaker responds on topic to test questions designed to elicit spontaneous speech. Standard approaches to off-topic response detection assess similarity between the response and question based on bag-of-words representations. An alternative framework based on Recurrent Neural Network Language Models (RNNLM) is proposed in this paper. The RNNLM is adapted to the topic of each test question. It learns to associate example responses to questions with points in a topic space constructed using these example responses. Classification is done by ranking the topic-conditional posterior probabilities of a response. The RNNLMs associate a broad range of responses with each topic, incorporate sequence information and scale better with additional training data, unlike standard methods. On experiments conducted on data from the Business Language Testing Service (BULATS) this approach outperforms standard approaches

Apollo (Cambridge)

Proceedings of the First European Workshop on Latent Semantic Analysis in Technology Enhanced Learning

Author: Kalz Marco
Koper Rob
Van Bruggen Jan
Wild Fridolin
Publication venue
Publication date: 28/03/2007
Field of study

Latent Semantic Analysis (LSA) has been successfully deployed in various educational applications to enrich learning and teaching with information-technology. The primary goal of the workshop is to bring together experts in the field in order to share knowledge gained within the scattered research about latent semantic analysis in educational applications, in particular from the context of the IST projects Cooper, iCamp,T enCompetence and ProLearn

Open University of the Netherlands Research Portal

Recommended from our members

Modelling text meta-properties in automated text scoring for non-native English writing

Author: Zhang Meng
Publication venue: University of Cambridge
Publication date: 16/07/2019
Field of study

Automated text scoring (ATS) is the task of automatically scoring a text based on some given grading criteria. This thesis focuses on ATS in the context of free-text writing exams aimed at learners of English as a foreign language (EFL). The benefit of an ATS system is primarily to provide instant and consistent feedback to language learners, and service reliability also forms a crucial part of an ATS system. Based on previous work, we investigated only partially explored meta-properties in text and integrated them into a machine learning based ATS model across multiple datasets: In most previous work, the proposed models implicitly assume that texts produced by learners in an exam are written independently. However, this is not true for the exams where learners are required to compose multiple texts. We hence explicitly instructed our model which texts are written by the same learner, which boosts model performance in most cases. We used three intra-exam properties within the same exam including prompt, genre and task as a starting point, and we showed that explicitly modelling these properties via frustratingly easy domain adaptation (FEDA) can positively affect model performance in some cases. Furthermore, modelling multiple intra-exam properties together is better than modelling any single property individually or no property in four out of five test sets. We studied how to utilise and combine learners' responses from multiple writing exams. We also proposed a new variant of the transfer-learning ATS model which mitigates the drawbacks of previous work. This variant first builds a ranking model across multiple datasets via FEDA, and the ranking score of each text predicted by the ranking model is used as an extra feature in the baseline model. This variants gives improvement compared to a baseline model on the development sets in terms of root-mean-square error. Furthermore, the transfer-learning model utilising multiple datasets tuned on each development set is always better than the baseline model on the corresponding test set. We found that different datasets favour different meta properties. We therefore combined all the models looking at different meta properties together using ensemble learning. Compared to the baseline model, the combined model has a statistically significant improvement on all the test sets in terms of root-mean-square error based on a permutation test.The Institute for Automated Language Teaching and Assessmen

Apollo (Cambridge)

Automated scholarly paper review: Technologies and challenges

Author: Chen Yidong
Lin Jialiang
Shi Xiaodong
Song Jiaxin
Zhou Zhangping
Publication venue
Publication date: 27/04/2022
Field of study

Peer review is a widely accepted mechanism for research evaluation, playing a pivotal role in scholarly publishing. However, criticisms have long been leveled on this mechanism, mostly because of its inefficiency and subjectivity. Recent years have seen the application of artificial intelligence (AI) in assisting the peer review process. Nonetheless, with the involvement of humans, such limitations remain inevitable. In this review paper, we propose the concept and pipeline of automated scholarly paper review (ASPR) and review the relevant literature and technologies of achieving a full-scale computerized review process. On the basis of the review and discussion, we conclude that there is already corresponding research and implementation at each stage of ASPR. We further look into the challenges in ASPR with the existing technologies. The major difficulties lie in imperfect document parsing and representation, inadequate data, defective human-computer interaction and flawed deep logical reasoning. Moreover, we discuss the possible moral & ethical issues and point out the future directions of ASPR. In the foreseeable future, ASPR and peer review will coexist in a reinforcing manner before ASPR is able to fully undertake the reviewing workload from humans

arXiv.org e-Print Archive

Exploring Automated Essay Scoring Models for Multiple Corpora and Topical Component Extraction from Student Essays

Author: Zhang Haoran
Publication venue
Publication date: 03/05/2021
Field of study

Since it is a widely accepted notion that human essay grading is labor-intensive, automatic scoring method has drawn more attention. It reduces reliance on human effort and subjectivity over time and has commercial benefits for standardized aptitude tests. Automated essay scoring could be defined as a method for grading student essays, which is based on high inter-agreement with human grader, if they exist, and requires no human effort during the process. This research mainly focuses on improving existing Automated Essay Scoring (AES) models with different technologies. We present three different scoring models for grading two corpora: the Response to Text Assessment (RTA) and the Automated Student Assessment Prize (ASAP). First of all, a traditional machine learning model that extracts features based on semantic similarity measurement is employed for grading the RTA task. Secondly, a neural network model with the co-attention mechanism is used for grading sourced-based writing tasks. Thirdly, we propose a hybrid model integrating the neural network model with hand-crafted features. Experiments show that the feature-based model outperforms its baseline, but a stand-alone neural network model significantly outperforms the feature-based model. Additionally, a hybrid model integrating the neural network model and hand-crafted features outperforms its baselines, especially in a cross-prompt experimental setting. Besides, we present two investigations of using the intermediate output of the neural network model for keywords and key phrases extraction from student essays and the source article. Experiments show that keywords and key phrases extracted by our models support the feature-based AES model, and human effort can be relieved by using automated essay quality signals during the training process

D-Scholarship@Pitt

When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs

Author: Abbasi Ahmed
Bevilacqua Marialena
Gan Yi
Oketch Kezia
Qin Ruiyang
Stamey Will
Yang Kai
Zhang Xinyuan
Publication venue
Publication date: 25/09/2023
Field of study

The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating large-language models such as generative pre-trained transformers (GPTs). We empirically assess the differences in how ML-based scoring models trained on human content assess the quality of content generated by humans versus GPTs. To do so, we propose an analysis framework that encompasses essay scoring ML-models, human and ML-generated essays, and a statistical model that parsimoniously considers the impact of type of respondent, prompt genre, and the ML model used for assessment model. A rich testbed is utilized that encompasses 18,460 human-generated and GPT-based essays. Results of our benchmark analysis reveal that transformer pretrained language models (PLMs) more accurately score human essay quality as compared to CNN/RNN and feature-based ML methods. Interestingly, we find that the transformer PLMs tend to score GPT-generated text 10-15\% higher on average, relative to human-authored documents. Conversely, traditional deep learning and feature-based ML models score human text considerably higher. Further analysis reveals that although the transformer PLMs are exclusively fine-tuned on human text, they more prominently attend to certain tokens appearing only in GPT-generated text, possibly due to familiarity/overlap in pre-training. Our framework and results have implications for text classification settings where automated scoring of text is likely to be disrupted by generative AI.Comment: Data available at: https://github.com/nd-hal/automated-ML-scoring-versus-generatio

arXiv.org e-Print Archive

Constrained multi-task learning for automated essay scoring

Author: Briscoe T
Cummins R
Zhang M
Publication venue: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Publication date: 01/01/2016
Field of study

Supervised machine learning models for automated essay scoring (AES) usually require substantial task-specific training data in order to make accurate predictions for a particular writing task. This limitation hinders their utility, and consequently their deployment in real-world settings. In this paper, we overcome this shortcoming using a constrained multi-task pairwisepreference learning approach that enables the data from multiple tasks to be combined effectively. Furthermore, contrary to some recent research, we show that high performance AES systems can be built with little or no task-specific training data. We perform a detailed study of our approach on a publicly available dataset in scenarios where we have varying amounts of task-specific training data and in scenarios where the number of tasks increases.This is the author accepted manuscript. The final version is available from Association for Computational Linguistics at http://acl2016.org/index.php?article_id=71

Crossref

Apollo (Cambridge)