49 research outputs found

    Development of an Automated Scoring Model Using SentenceTransformers for Discussion Forums in Online Learning Environments

    Get PDF
    Due to the limitations of public datasets, research on automatic essay scoring in Indonesian has been restrained and resulted in suboptimal accuracy. In general, the main goal of the essay scoring system is to improve execution time, which is usually done manually with human judgment. This study uses a discussion forum in online learning to generate an assessment between the responses and the lecturer\u27s rubric in the automated essay scoring. A SentenceTransformers pre-trained model that can construct the highest vector embedding was proposed to identify the semantic meaning between the responses and the lecturer\u27s rubric. The effectiveness of monolingual and multilingual models was compared. This research aims to determine the model\u27s effectiveness and the appropriate model for the Automated Essay Scoring (AES) used in paired sentence Natural Language Processing tasks. The distiluse-base-multilingual-cased-v1 model, which employed the Pearson correlation method, obtained the highest performance. Specifically, it obtained a correlation value of 0.63 and a mean absolute error (MAE) score of 0.70. It indicates that the overall prediction result is enhanced when compared to the earlier regression task research

    Automatic Short Answer Grading Using Transformers

    Get PDF
    RÉSUMÉ : L’évaluation des réponses courtes en langage naturel est une tendance dominante dans tout environnement éducatif. Ces techniques ont le potentiel d’aider les enseignants à mieux comprendre les réussites et les échecs de leurs élèves. En comparaison, les autres types d’évaluation ne mesurent souvent pas adéquatement les compétences des élèves, telles que les questions à choix multiples ou celles où il faut combler des espaces. Cependant, ce sont les moyens les plus fréquemment utilisés pour évaluer les élèves, en particulier dans les envi-ronnements de cours en ligne ouverts (MOOCs). La raison de leur emploi fréquent est que ces questions sont plus simples à corriger avec un ordinateur. Comparativement, devoir com-prendre et noter manuellement des réponses courtes est une tâche plus diÿcile et plus longue, d’autant plus en considérant le nombre croissant d’élèves en classe. La notation automatique de réponses courtes, généralement abrégée de l’anglais par ASAG, est une solution parfaite-ment adaptée à ce problème. Dans ce mémoire, nous nous concentrons sur le ASAG basé sur la classification avec des notes nominales, telles que correct ou incorrect. Nous proposons une approche par référence basée sur un modèle d’apprentissage profond, que nous entraînons sur quatre ensembles de données ASAG de pointe, à savoir SemEval-2013 (SciEntBank et BEETLE), Dt-grade et un jeu de données sur la biologie. Notre approche utilise les modèles BERT Base (sensible à la casse ou non) et XLNET Base (seulement sensible à la casse). Notre analyse subséquente emploie les ensembles de données GLUE (General Language Un-derstanding Evaluation), incluant des tâches de questions-réponses, d’implication textuelle, d’identification de paraphrases et d’analyse de similitude textuelle sémantique (STS). Nous démontrons que celles-ci contribuent à une meilleure performance des modèles sur la tâche ASAG, surtout avec le jeu de données SciEntBank.---------- ABSTRACT : Assessment of short natural language answers is a prevailing trend in any educational envi-ronment. It helps teachers to understand better the success and failure of students. Other types of questions such as multiple-choice or fill-in-the-gap questions don’t provide adequate clues for evaluating the students’ proficiency exhaustively. However, they are common means of student evaluation especially in Massive Open Online Courses (MOOCs) environments. One of the major reasons is that they are fairly easy to be graded. Nonetheless, understand-ing and marking manually short answers are more challenging and time-consuming tasks, especially when the number of students grows in a class. Automatic Short Answer Grading, usually abbreviated to ASAG, is a highly demanding solution in this current context. In this thesis, we mainly concentrate on classification-based ASAG with nominal grades such as correct or not correct. We propose a reference-based approach based on a deep learn-ing model on four ASAG state-of-the-art datasets, namely SemEval-2013 (SciEntBank and BEETLE), Dt-grade and Biology dataset. Our approach is based on BERT (cased and un-cased) and XLNET (cased) models. Our secondary analysis includes how GLUE (General Language Understanding Evaluation) tasks such as question answering, entailment, para-phrase identification and semantic textual similarity analysis strengthen the ASAG task on SciEntBank dataset. We show that language models based on transformers such as BERT and XLNET outperform or equal the state-of-the-art feature-based approaches. We further indicate that the performance of our BERT model increases substantially when we fine-tune a BERT model on an entailment task such as the GLUE MNLI dataset and then on the ASAG task compared to the other GLUE models

    Advancement Auto-Assessment of Students Knowledge States from Natural Language Input

    Get PDF
    Knowledge Assessment is a key element in adaptive instructional systems and in particular in Intelligent Tutoring Systems because fully adaptive tutoring presupposes accurate assessment. However, this is a challenging research problem as numerous factors affect students’ knowledge state estimation such as the difficulty level of the problem, time spent in solving the problem, etc. In this research work, we tackle this research problem from three perspectives: assessing the prior knowledge of students, assessing the natural language short and long students’ responses, and knowledge tracing.Prior knowledge assessment is an important component of knowledge assessment as it facilitates the adaptation of the instruction from the very beginning, i.e., when the student starts interacting with the (computer) tutor. Grouping students into groups with similar mental models and patterns of prior level of knowledge allows the system to select the right level of scaffolding for each group of students. While not adapting instruction to each individual learner, the advantage of adapting to groups of students based on a limited number of prior knowledge levels has the advantage of decreasing the authoring costs of the tutoring system. To achieve this goal of identifying or clustering students based on their prior knowledge, we have employed effective clustering algorithms. Automatically assessing open-ended student responses is another challenging aspect of knowledge assessment in ITSs. In dialogue-based ITSs, the main interaction between the learner and the system is natural language dialogue in which students freely respond to various system prompts or initiate dialogue moves in mixed-initiative dialogue systems. Assessing freely generated student responses in such contexts is challenging as students can express the same idea in different ways owing to different individual style preferences and varied individual cognitive abilities. To address this challenging task, we have proposed several novel deep learning models as they are capable to capture rich high-level semantic features of text. Knowledge tracing (KT) is an important type of knowledge assessment which consists of tracking students’ mastery of knowledge over time and predicting their future performances. Despite the state-of-the-art results of deep learning in this task, it has many limitations. For instance, most of the proposed methods ignore pertinent information (e.g., Prior knowledge) that can enhance the knowledge tracing capability and performance. Working toward this objective, we have proposed a generic deep learning framework that accounts for the engagement level of students, the difficulty of questions and the semantics of the questions and uses a novel times series model called Temporal Convolutional Network for future performance prediction. The advanced auto-assessment methods presented in this dissertation should enable better ways to estimate learner’s knowledge states and in turn the adaptive scaffolding those systems can provide which in turn should lead to more effective tutoring and better learning gains for students. Furthermore, the proposed method should enable more scalable development and deployment of ITSs across topics and domains for the benefit of all learners of all ages and backgrounds

    La tecnología central detrás y más allá de ChatGPT: Una revisión exhaustiva de los modelos de lenguaje en la investigación educativa

    Get PDF
    ChatGPT has garnered significant attention within the education industry. Given the core technology behind ChatGPT is language model, this study aims to critically review related publications and suggest future direction of language model in educational research. We aim to address three questions: i) what is the core technology behind ChatGPT, ii) what is the state of knowledge of related research and iii) the potential research direction. A critical review of related publications was conducted in order to evaluate the current state of knowledge of language model in educational research. In addition, we further suggest a purpose oriented guiding framework for future research of language model in education. Our study promptly responded to the concerns raised by ChatGPT from the education industry and offers the industry with a comprehensive and systematic overview of related technologies. We believe this is the first time that a study has been conducted to systematically review the state of knowledge of language model in educational research. ChatGPT ha atraído una gran atención en el sector educativo. Dado que la tecnología central detrás de ChatGPT es el modelo de lenguaje, este estudio tiene como objetivo revisar críticamente publicaciones relacionadas y sugerir la dirección futura del modelo de lenguaje en la investigación educativa. Nuestro objetivo es abordar tres preguntas: i) cuál es la tecnología central detrás de ChatGPT, ii) cuál es el nivel de conocimiento de la investigación relacionada y iii) la dirección del potencial de investigación. Se llevó a cabo una revisión crítica de publicaciones relacionadas con el fin de evaluar el estado actual del conocimiento del modelo lingüístico en la investigación educativa. Además, sugerimos un marco rector para futuras investigaciones sobre modelos lingüísticos en educación. Nuestro estudio respondió rápidamente a las preocupaciones planteadas por el uso de ChatGPT en la industria educativa y proporciona a la industria una descripción general completa y sistemática de las tecnologías relacionadas. Creemos que esta es la primera vez que se realiza un estudio para revisar sistemáticamente el nivel de conocimiento del modelo lingüístico en la investigación educativa

    Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home

    Full text link
    Enriching the quality of early childhood education with interactive math learning at home systems, empowered by recent advances in conversational AI technologies, is slowly becoming a reality. With this motivation, we implement a multimodal dialogue system to support play-based learning experiences at home, guiding kids to master basic math concepts. This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data with kids going through gamified math learning activities. We validate the advantages of a multi-task architecture for NLU and experiment with a diverse set of pretrained language representations for Intent Recognition and Entity Extraction tasks in the math learning domain. To recognize kids' speech in realistic home environments, we investigate several ASR systems, including the commercial Google Cloud and the latest open-source Whisper solutions with varying model sizes. We evaluate the SLU pipeline by testing our best-performing NLU models on noisy ASR output to inspect the challenges of understanding children for math learning in authentic homes.Comment: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at ACL 202
    corecore