1,332 research outputs found

    Comparative Study of Different Techniques for Automatic Evaluation of English Text Essays

    Get PDF
     Automated essay evaluation keeps to attract a lot of interest because of its educational and commercial importance as well as the related research challenges in the natural language processing field. Automated essay evaluation has the feature of halves, less cost of human resource, and gives the results directly and timely feedback compared with the human evaluator which requires more time and it depends on his /her mood at certain times. This paper has focused on automated evaluation of English text which was performed using various algorithms and techniques by making comparison between these techniques that applied with different size of dataset and length essays as well as the performance of algorithms was assessed using different metrics. The results uncovered that the performance of each technique has affected by the size of dataset and the length of essays. Finally, for future research directions building a standard dataset containing different types of question-answer pair to be able to compare the performance of different techniques fairly

    Sobre los efectos de combinar Análisis Semántico Latente con otras técnicas de procesamiento de lenguaje natural para la evaluación de preguntas abiertas

    Full text link
    Este artículo presenta la combinación de Análisis Semántico Latente (LSA) con otras técnicas de procesamiento del lenguaje natural (lematización, eliminación de palabras funcionales y desambiguación de sentidos) para mejorar la evaluación automática de respuestas en texto libre. El sistema de evaluación de respuestas en texto libre llamado Atenea (Alfonseca & Pérez, 2004) ha servido de marco experimental para probar el esquema combinacional. Atenea es un sistema capaz de realizar preguntas, escogidas aleatoriamente o bien conforme al perfil del estudiante, y asignarles una calificación numérica. Los resultados de los experimentos demuestran que para todos los conjuntos de datos en los que las técnicas de PLN se han combinado con LSA la correlación de Pearson entre las notas dadas por Atenea y las notas dadas por los profesores para el mismo conjunto de preguntas mejora. La causa puede encontrarse en la complementariedad entre LSA, que trabaja a un nivel semántico superficial, y el resto de las técnicas NLP usadas en Atenea, que están más centradas en los niveles léxico y sintáctico.This article presents the combination of Latent Semantic Analysis (LSA) with other natural language processing techniques (stemming, removal of closed-class words and word sense disambiguation) to improve the automatic assessment of students' free-text answers. The combinational schema has been tested in the experimental framework provided by the free-text Computer Assisted Assessment (CAA) system called Atenea (Alfonseca & Pérez, 2004). This system is able to ask randomly or according to the students' profile an open-ended question to the student and then, assign a score to it. The results prove that for all datasets, when the NLP techniques are combined with LSA, the Pearson correlation between the scores given by Atenea and the scores given by the teachers for the same dataset of questions improves. We believe that this is due to the complementarity between LSA, which works more at a shallow semantic level, and the rest of the NLP techniques used in Atenea, which are more focused on the lexical and syntactical levels

    An automated essay evaluation system using natural language processing and sentiment analysi

    Get PDF
    An automated essay evaluation system is a machine-based approach leveraging long short-term memory (LSTM) model to award grades to essays written in English language. natural language processing (NLP) is used to extract feature representations from the essays. The LSTM network learns from the extracted features and generates parameters for testing and validation. The main objectives of the research include proposing and training an LSTM model using a dataset of manually graded essays with scores. Sentiment analysis is performed to determine the sentiment of the essay as either positive, negative or neutral. The twitter sample dataset is used to build sentiment classifier that analyzes the sentiment based on the student’s approach towards a topic. Additionally, each essay is subjected to detection of syntactical errors as well as plagiarism check to detect the novelty of the essay. The overall grade is calculated based on the quality of the essay, the number of syntactic errors, the percentage of plagiarism found and sentiment of the essay. The corrected essay is provided as feedback to the students. This essay grading model has gained an average quadratic weighted kappa (QWK) score of 0.911 with 99.4% accuracy for the sentiment analysis classifier

    Using Ontology-Based Approaches to Representing Speech Transcripts for Automated Speech Scoring

    Get PDF
    Text representation is a process of transforming text into some formats that computer systems can use for subsequent information-related tasks such as text classification. Representing text faces two main challenges: meaningfulness of representation and unknown terms. Research has shown evidence that these challenges can be resolved by using the rich semantics in ontologies. This study aims to address these challenges by using ontology-based representation and unknown term reasoning approaches in the context of content scoring of speech, which is a less explored area compared to some common ones such as categorizing text corpus (e.g. 20 newsgroups and Reuters). From the perspective of language assessment, the increasing amount of language learners taking second language tests makes automatic scoring an attractive alternative to human scoring for delivering rapid and objective scores of written and spoken test responses. This study focuses on the speaking section of second language tests and investigates ontology-based approaches to speech scoring. Most previous automated speech scoring systems for spontaneous responses of test takers assess speech by primarily using acoustic features such as fluency and pronunciation, while text features are less involved and exploited. As content is an integral part of speech, the study is motivated by the lack of rich text features in speech scoring and is designed to examine the effects of different text features on scoring performance. A central question to the study is how speech transcript content can be represented in an appropriate means for speech scoring. Previously used approaches from essay and speech scoring systems include bag-of-words and latent semantic analysis representations, which are adopted as baselines in this study; the experimental approaches are ontology-based, which can help improving meaningfulness of representation units and estimating importance of unknown terms. Two general domain ontologies, WordNet and Wikipedia, are used respectively for ontology-based representations. In addition to comparison between representation approaches, the author analyzes which parameter option leads to the best performance within a particular representation. The experimental results show that on average, ontology-based representations slightly enhances speech scoring performance on all measurements when combined with the bag-of-words representation; reasoning of unknown terms can increase performance on one measurement (cos.w4) but decrease others. Due to the small data size, the significance test (t-test) shows that the enhancement of ontology-based representations is inconclusive. The contributions of the study include: 1) it examines the effects of different representation approaches on speech scoring tasks; 2) it enhances the understanding of the mechanisms of representation approaches and their parameter options via in-depth analysis; 3) the representation methodology and framework can be applied to other tasks such as automatic essay scoring

    DeepEval: An Integrated Framework for the Evaluation of Student Responses in Dialogue Based Intelligent Tutoring Systems

    Get PDF
    The automatic assessment of student answers is one of the critical components of an Intelligent Tutoring System (ITS) because accurate assessment of student input is needed in order to provide effective feedback that leads to learning. But this is a very challenging task because it requires natural language understanding capabilities. The process requires various components, concepts identification, co-reference resolution, ellipsis handling etc. As part of this thesis, we thoroughly analyzed a set of student responses obtained from an experiment with the intelligent tutoring system DeepTutor in which college students interacted with the tutor to solve conceptual physics problems, designed an automatic answer assessment framework (DeepEval), and evaluated the framework after implementing several important components. To evaluate our system, we annotated 618 responses from 41 students for correctness. Our system performs better as compared to the typical similarity calculation method. We also discuss various issues in automatic answer evaluation

    Automatic coding of short text responses via clustering in educational assessment

    Full text link
    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.

    Review of Handbook Of Automated Essay Evaluation: Current Applications And New Directions

    Get PDF

    Optimization of Window Size for Calculating Semantic Coherence Within an Essay

    Get PDF
    Over the last fifty years, as the field of automated essay evaluation has progressed, several ways have been offered. The three aspects of style, substance, and semantics are the primary focus of automated essay evaluation. The style and content attributes have received the most attention, while the semantics attribute has received less attention. A smaller fraction of the essay (window) is chosen to measure semantics, and the essay is broken into smaller portions using this window. The goal of this work is to determine an acceptable window size for measuring semantic coherence between different parts of the essay with more precision
    • …
    corecore