468 research outputs found

    Automated Essay Evaluation Using Natural Language Processing and Machine Learning

    Get PDF
    The goal of automated essay evaluation is to assign grades to essays and provide feedback using computers. Automated evaluation is increasingly being used in classrooms and online exams. The aim of this project is to develop machine learning models for performing automated essay scoring and evaluate their performance. In this research, a publicly available essay data set was used to train and test the efficacy of the adopted techniques. Natural language processing techniques were used to extract features from essays in the dataset. Three different existing machine learning algorithms were used on the chosen dataset. The data was divided into two parts: training data and testing data. The inter-rater reliability and performance of these models were compared with each other and with human graders. Among the three machine learning models, the random forest performed the best in terms of agreement with human scorers as it achieved the lowest mean absolute error for the test dataset

    Towards Better Support for Machine-Assisted Human Grading of Short-Text Answers

    Get PDF
    This paper aims at tools to help teachers to grade short text answers submitted by students. While many published approaches for short-text answer grading target on a fully automated process suggesting a grading result, we focus on supporting a teacher. The goal is rather to help a human grader and to improve transparency rather than replacing the human by an Oracle. This paper provides a literature overview of the numerous approaches of short text answer grading which were proposed throughout the years. This paper presents two novel approaches (answer completeness and natural variability) and evaluates these based on published exam data and several assessments collected at our university

    DeepEval: An Integrated Framework for the Evaluation of Student Responses in Dialogue Based Intelligent Tutoring Systems

    Get PDF
    The automatic assessment of student answers is one of the critical components of an Intelligent Tutoring System (ITS) because accurate assessment of student input is needed in order to provide effective feedback that leads to learning. But this is a very challenging task because it requires natural language understanding capabilities. The process requires various components, concepts identification, co-reference resolution, ellipsis handling etc. As part of this thesis, we thoroughly analyzed a set of student responses obtained from an experiment with the intelligent tutoring system DeepTutor in which college students interacted with the tutor to solve conceptual physics problems, designed an automatic answer assessment framework (DeepEval), and evaluated the framework after implementing several important components. To evaluate our system, we annotated 618 responses from 41 students for correctness. Our system performs better as compared to the typical similarity calculation method. We also discuss various issues in automatic answer evaluation

    A rubric based approach towards Automated Essay Grading : focusing on high level content issues and ideas

    Get PDF
    Assessment of a student’s work is by no means an easy task. Even if the student response is in the form of multiple choice answers, manually marking those answer sheets is a task that most teachers regard as rather tedious. The development of an automated method to grade these essays was thus an inevitable step.This thesis proposes a novel approach towards Automated Essay Grading through the use of various concepts found within the field of Narratology. Through a review of the literature, several methods in which essays are graded were identified together with some of the problems. Mainly, the issues and challenges that plague AEG systems were that those following the statistical approach needed a way to deal with more implicit features of free text, while other systems which did manage that were highly dependent on the type of student response, the systems having pre-knowledge pertaining to the subject domain in addition to requiring more computational power. It was also found that while narrative essays are one of the main methods in which a student might be able to showcase his/her mastery over the English language, no system thus far has attempted to incorporate narrative concepts into analysing these type of free text responses.It was decided that the proposed solution would be centred on the detection of Events, which was in turn used to determine the score an essay receives under the criteria of Audience, Ideas, Character and Setting and Cohesion, as defined by the NAPLAN rubric. From the results gathered from experiments conducted on the four criteria mentioned above, it was concluded that the concept of detecting Events as they were within a narrative type story when applied to essay grading, does have a relation towards the score the essay receives. All experiments achieved an average F-measure score of 0.65 and above while exact agreement rates were no lower than 70%. Chi-squared and paired T-test values all indicated that there was insufficient evidence to show that there was any significant difference between the scores generated by the computer and those of the human markers

    Automatic essay scoring for discussion forum in online learning based on semantic and keyword similarities

    Get PDF
    Purpose – The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy. Design/methodology/approach – The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model. Findings – The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2. Originality/value – This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics

    ASSESSING STUDENT LEARNING WITH AUTOMATED TEXT PROCESSING TECHNIQUES

    Get PDF
    Research on distance learning and computer-aided grading has been developed in parallel. Little work has been done in the past to join the two areas to solve the problem of automated learning assessment in virtual classrooms. This paper presents a model for learning assessment using an automated text processing technique to analyze class messages with an emphasis on course topics produced in an online class. It is suggested that students should be evaluated on many dimensions, including the learning artifacts such as course work submitted and class participation. Taking all these grading criteria into consideration, we design a model which combines three grading factors: the quality of course work, the quantity of efforts, and the activeness of participation, for evaluating the performance of students in the class. These three main items are measured on the basis of keyword contribution, message length, and message count, and a score is derived from the class messages to evaluate students’ performance. An assessment model is then constructed from these three measures to compute a performance indicator score for each student. The experiment shows that there is a high correlation between the performance indicator scores and the actual grades assigned by instructors. The rank orders of students by performance indicator scores and by the actual grades are highly correlated as well. Evidence from the experiment shows that the computer grader can be a great supplementary teaching and grading tool for distance learning instructors

    Parallel universes and parallel measures: estimating the reliability of test results

    Get PDF

    Robo-Teaching? : Automated Essay Scoring and K-12 Writing Pedagogy

    Get PDF
    This paper will examine the current state, and future, of AES in secondary literacy education through a review of current research in the topic. An analysis of the history of assessment will seek to explain why AES systems have gained such popularity within high-stakes assessment, and how the use of AES in secondary education, high-stakes testing affects pedagogy. This paper will also look into reliability and validity issues that are presented when using AES as a form of scoring essays. Finally, this paper explores some ways that AES can be used effectively within the K-12 writing classroom, rather than solely in high stakes assessments. Ultimately, this paper will assert that AES is not the right tool for use in high-stakes assessments, but can be beneficial within the high school English classroom as a means of formative assessment

    Using Ontology-Based Approaches to Representing Speech Transcripts for Automated Speech Scoring

    Get PDF
    Text representation is a process of transforming text into some formats that computer systems can use for subsequent information-related tasks such as text classification. Representing text faces two main challenges: meaningfulness of representation and unknown terms. Research has shown evidence that these challenges can be resolved by using the rich semantics in ontologies. This study aims to address these challenges by using ontology-based representation and unknown term reasoning approaches in the context of content scoring of speech, which is a less explored area compared to some common ones such as categorizing text corpus (e.g. 20 newsgroups and Reuters). From the perspective of language assessment, the increasing amount of language learners taking second language tests makes automatic scoring an attractive alternative to human scoring for delivering rapid and objective scores of written and spoken test responses. This study focuses on the speaking section of second language tests and investigates ontology-based approaches to speech scoring. Most previous automated speech scoring systems for spontaneous responses of test takers assess speech by primarily using acoustic features such as fluency and pronunciation, while text features are less involved and exploited. As content is an integral part of speech, the study is motivated by the lack of rich text features in speech scoring and is designed to examine the effects of different text features on scoring performance. A central question to the study is how speech transcript content can be represented in an appropriate means for speech scoring. Previously used approaches from essay and speech scoring systems include bag-of-words and latent semantic analysis representations, which are adopted as baselines in this study; the experimental approaches are ontology-based, which can help improving meaningfulness of representation units and estimating importance of unknown terms. Two general domain ontologies, WordNet and Wikipedia, are used respectively for ontology-based representations. In addition to comparison between representation approaches, the author analyzes which parameter option leads to the best performance within a particular representation. The experimental results show that on average, ontology-based representations slightly enhances speech scoring performance on all measurements when combined with the bag-of-words representation; reasoning of unknown terms can increase performance on one measurement (cos.w4) but decrease others. Due to the small data size, the significance test (t-test) shows that the enhancement of ontology-based representations is inconclusive. The contributions of the study include: 1) it examines the effects of different representation approaches on speech scoring tasks; 2) it enhances the understanding of the mechanisms of representation approaches and their parameter options via in-depth analysis; 3) the representation methodology and framework can be applied to other tasks such as automatic essay scoring
    • 

    corecore