1,899 research outputs found

    A robust methodology for automated essay grading

    Get PDF
    None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling

    An automated essay evaluation system using natural language processing and sentiment analysi

    Get PDF
    An automated essay evaluation system is a machine-based approach leveraging long short-term memory (LSTM) model to award grades to essays written in English language. natural language processing (NLP) is used to extract feature representations from the essays. The LSTM network learns from the extracted features and generates parameters for testing and validation. The main objectives of the research include proposing and training an LSTM model using a dataset of manually graded essays with scores. Sentiment analysis is performed to determine the sentiment of the essay as either positive, negative or neutral. The twitter sample dataset is used to build sentiment classifier that analyzes the sentiment based on the student’s approach towards a topic. Additionally, each essay is subjected to detection of syntactical errors as well as plagiarism check to detect the novelty of the essay. The overall grade is calculated based on the quality of the essay, the number of syntactic errors, the percentage of plagiarism found and sentiment of the essay. The corrected essay is provided as feedback to the students. This essay grading model has gained an average quadratic weighted kappa (QWK) score of 0.911 with 99.4% accuracy for the sentiment analysis classifier

    Proceedings of the First European Workshop on Latent Semantic Analysis in Technology Enhanced Learning

    Get PDF
    Latent Semantic Analysis (LSA) has been successfully deployed in various educational applications to enrich learning and teaching with information-technology. The primary goal of the workshop is to bring together experts in the field in order to share knowledge gained within the scattered research about latent semantic analysis in educational applications, in particular from the context of the IST projects Cooper, iCamp,T enCompetence and ProLearn

    Automated scholarly paper review: Technologies and challenges

    Full text link
    Peer review is a widely accepted mechanism for research evaluation, playing a pivotal role in scholarly publishing. However, criticisms have long been leveled on this mechanism, mostly because of its inefficiency and subjectivity. Recent years have seen the application of artificial intelligence (AI) in assisting the peer review process. Nonetheless, with the involvement of humans, such limitations remain inevitable. In this review paper, we propose the concept and pipeline of automated scholarly paper review (ASPR) and review the relevant literature and technologies of achieving a full-scale computerized review process. On the basis of the review and discussion, we conclude that there is already corresponding research and implementation at each stage of ASPR. We further look into the challenges in ASPR with the existing technologies. The major difficulties lie in imperfect document parsing and representation, inadequate data, defective human-computer interaction and flawed deep logical reasoning. Moreover, we discuss the possible moral & ethical issues and point out the future directions of ASPR. In the foreseeable future, ASPR and peer review will coexist in a reinforcing manner before ASPR is able to fully undertake the reviewing workload from humans

    Exploring Automated Essay Scoring Models for Multiple Corpora and Topical Component Extraction from Student Essays

    Get PDF
    Since it is a widely accepted notion that human essay grading is labor-intensive, automatic scoring method has drawn more attention. It reduces reliance on human effort and subjectivity over time and has commercial benefits for standardized aptitude tests. Automated essay scoring could be defined as a method for grading student essays, which is based on high inter-agreement with human grader, if they exist, and requires no human effort during the process. This research mainly focuses on improving existing Automated Essay Scoring (AES) models with different technologies. We present three different scoring models for grading two corpora: the Response to Text Assessment (RTA) and the Automated Student Assessment Prize (ASAP). First of all, a traditional machine learning model that extracts features based on semantic similarity measurement is employed for grading the RTA task. Secondly, a neural network model with the co-attention mechanism is used for grading sourced-based writing tasks. Thirdly, we propose a hybrid model integrating the neural network model with hand-crafted features. Experiments show that the feature-based model outperforms its baseline, but a stand-alone neural network model significantly outperforms the feature-based model. Additionally, a hybrid model integrating the neural network model and hand-crafted features outperforms its baselines, especially in a cross-prompt experimental setting. Besides, we present two investigations of using the intermediate output of the neural network model for keywords and key phrases extraction from student essays and the source article. Experiments show that keywords and key phrases extracted by our models support the feature-based AES model, and human effort can be relieved by using automated essay quality signals during the training process

    When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs

    Full text link
    The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating large-language models such as generative pre-trained transformers (GPTs). We empirically assess the differences in how ML-based scoring models trained on human content assess the quality of content generated by humans versus GPTs. To do so, we propose an analysis framework that encompasses essay scoring ML-models, human and ML-generated essays, and a statistical model that parsimoniously considers the impact of type of respondent, prompt genre, and the ML model used for assessment model. A rich testbed is utilized that encompasses 18,460 human-generated and GPT-based essays. Results of our benchmark analysis reveal that transformer pretrained language models (PLMs) more accurately score human essay quality as compared to CNN/RNN and feature-based ML methods. Interestingly, we find that the transformer PLMs tend to score GPT-generated text 10-15\% higher on average, relative to human-authored documents. Conversely, traditional deep learning and feature-based ML models score human text considerably higher. Further analysis reveals that although the transformer PLMs are exclusively fine-tuned on human text, they more prominently attend to certain tokens appearing only in GPT-generated text, possibly due to familiarity/overlap in pre-training. Our framework and results have implications for text classification settings where automated scoring of text is likely to be disrupted by generative AI.Comment: Data available at: https://github.com/nd-hal/automated-ML-scoring-versus-generatio

    Constrained multi-task learning for automated essay scoring

    Get PDF
    Supervised machine learning models for automated essay scoring (AES) usually require substantial task-specific training data in order to make accurate predictions for a particular writing task. This limitation hinders their utility, and consequently their deployment in real-world settings. In this paper, we overcome this shortcoming using a constrained multi-task pairwisepreference learning approach that enables the data from multiple tasks to be combined effectively. Furthermore, contrary to some recent research, we show that high performance AES systems can be built with little or no task-specific training data. We perform a detailed study of our approach on a publicly available dataset in scenarios where we have varying amounts of task-specific training data and in scenarios where the number of tasks increases.This is the author accepted manuscript. The final version is available from Association for Computational Linguistics at http://acl2016.org/index.php?article_id=71
    • …
    corecore