Search CORE

4,462 research outputs found

DeepEval: An Integrated Framework for the Evaluation of Student Responses in Dialogue Based Intelligent Tutoring Systems

Author: Banjade Rajendra
Publication venue: University of Memphis Digital Commons
Publication date: 02/12/2014
Field of study

The automatic assessment of student answers is one of the critical components of an Intelligent Tutoring System (ITS) because accurate assessment of student input is needed in order to provide effective feedback that leads to learning. But this is a very challenging task because it requires natural language understanding capabilities. The process requires various components, concepts identification, co-reference resolution, ellipsis handling etc. As part of this thesis, we thoroughly analyzed a set of student responses obtained from an experiment with the intelligent tutoring system DeepTutor in which college students interacted with the tutor to solve conceptual physics problems, designed an automatic answer assessment framework (DeepEval), and evaluated the framework after implementing several important components. To evaluate our system, we annotated 618 responses from 41 students for correctness. Our system performs better as compared to the typical similarity calculation method. We also discuss various issues in automatic answer evaluation

University of Memphis Digital Commons

State of the Art LSA

Author: Giesbers Bas
Rusman Ellen
Van Bruggen Jan
Publication venue
Publication date: 30/05/2006
Field of study

Open University of the Netherlands Research Portal

State of the Art LSA

Author: Giesbers Bas
Rusman Ellen
Van Bruggen Jan
Publication venue
Publication date: 30/05/2006
Field of study

Open University of the Netherlands Research Portal

Exploring Generative AI assisted feedback writing for students' written responses to a physics conceptual question with prompt engineering and few-shot learning

Author: Chen Zhongzhou
Wan Tong
Publication venue
Publication date: 10/11/2023
Field of study

Instructor's feedback plays a critical role in students' development of conceptual understanding and reasoning skills. However, grading student written responses and providing personalized feedback can take a substantial amount of time. In this study, we explore using GPT-3.5 to write feedback to student written responses to conceptual questions with prompt engineering and few-shot learning techniques. In stage one, we used a small portion (n=20) of the student responses on one conceptual question to iteratively train GPT. Four of the responses paired with human-written feedback were included in the prompt as examples for GPT. We tasked GPT to generate feedback to the other 16 responses, and we refined the prompt after several iterations. In stage two, we gave four student researchers the 16 responses as well as two versions of feedback, one written by the authors and the other by GPT. Students were asked to rate the correctness and usefulness of each feedback, and to indicate which one was generated by GPT. The results showed that students tended to rate the feedback by human and GPT equally on correctness, but they all rated the feedback by GPT as more useful. Additionally, the successful rates of identifying GPT's feedback were low, ranging from 0.1 to 0.6. In stage three, we tasked GPT to generate feedback to the rest of the student responses (n=65). The feedback was rated by four instructors based on the extent of modification needed if they were to give the feedback to students. All the instructors rated approximately 70% of the feedback statements needing only minor or no modification. This study demonstrated the feasibility of using Generative AI as an assistant to generating feedback for student written responses with only a relatively small number of examples. An AI assistance can be one of the solutions to substantially reduce time spent on grading student written responses

arXiv.org e-Print Archive

Semantic Matching Evaluation: Optimizing Models for Agreement Between Humans and AutoTutor

Author: Carmon Colin Mackenzie
Publication venue: University of Memphis Digital Commons
Publication date: 04/08/2021
Field of study

The goal of this thesis is to evaluate the answers that students give to questions asked by an intelligent tutoring system (ITS) on electronics, called ElectronixTutor. One learning resource of ElectronixTutor is AutoTutor, an instructional module that helps students learn by holding a conversation in natural language. The semantic relatedness between a student’s verbal input and an ideal answer is a salient feature for assessing performance of the student in AutoTutor. Inaccurate assessment of the verbal contributions will create problems in AutoTutor’s adaptation to the student. Therefore, this thesis evaluated the quality of semantic matches between student input and the expected responses in AutoTutor. AutoTutor evaluates semantic matches with a combination of Latent Semantic Analysis (LSA) and Regular Expressions (RegEx) when assessing student verbal input. Analyzing response-expectation pairings and comparing computer scoring with judge ratings allowed us to look at the agreement between humans and computers overall as well as on an item basis. Aggregate analyses on these data allowed us to observe the overall relative agreement between subject-matter experts and the AutoTutor system. Item analyses allowed us to observe variation between items and interactions between human and computer assessment conditions on various threshold levels (i.e. stringent, intermediate, lenient). As expected, RegEx and LSA showed a positive relationship ρ (5202) = .471. Additionally, F1 measure agreement (the harmonic mean of precision and recall) between the computer and humans was similar to agreement between humans. In some cases, computer-human F1 measure agreement compared to between-humans was as close as F1 = .006

University of Memphis Digital Commons

Recommended from our members

Modelling text meta-properties in automated text scoring for non-native English writing

Author: Zhang Meng
Publication venue: University of Cambridge
Publication date: 16/07/2019
Field of study

Automated text scoring (ATS) is the task of automatically scoring a text based on some given grading criteria. This thesis focuses on ATS in the context of free-text writing exams aimed at learners of English as a foreign language (EFL). The benefit of an ATS system is primarily to provide instant and consistent feedback to language learners, and service reliability also forms a crucial part of an ATS system. Based on previous work, we investigated only partially explored meta-properties in text and integrated them into a machine learning based ATS model across multiple datasets: In most previous work, the proposed models implicitly assume that texts produced by learners in an exam are written independently. However, this is not true for the exams where learners are required to compose multiple texts. We hence explicitly instructed our model which texts are written by the same learner, which boosts model performance in most cases. We used three intra-exam properties within the same exam including prompt, genre and task as a starting point, and we showed that explicitly modelling these properties via frustratingly easy domain adaptation (FEDA) can positively affect model performance in some cases. Furthermore, modelling multiple intra-exam properties together is better than modelling any single property individually or no property in four out of five test sets. We studied how to utilise and combine learners' responses from multiple writing exams. We also proposed a new variant of the transfer-learning ATS model which mitigates the drawbacks of previous work. This variant first builds a ranking model across multiple datasets via FEDA, and the ranking score of each text predicted by the ranking model is used as an extra feature in the baseline model. This variants gives improvement compared to a baseline model on the development sets in terms of root-mean-square error. Furthermore, the transfer-learning model utilising multiple datasets tuned on each development set is always better than the baseline model on the corresponding test set. We found that different datasets favour different meta properties. We therefore combined all the models looking at different meta properties together using ensemble learning. Compared to the baseline model, the combined model has a statistically significant improvement on all the test sets in terms of root-mean-square error based on a permutation test.The Institute for Automated Language Teaching and Assessmen

Apollo (Cambridge)

Advancement Auto-Assessment of Students Knowledge States from Natural Language Input

Author: Ait Khayi Nisrine
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2021
Field of study

Knowledge Assessment is a key element in adaptive instructional systems and in particular in Intelligent Tutoring Systems because fully adaptive tutoring presupposes accurate assessment. However, this is a challenging research problem as numerous factors affect studentsâ€™ knowledge state estimation such as the difficulty level of the problem, time spent in solving the problem, etc. In this research work, we tackle this research problem from three perspectives: assessing the prior knowledge of students, assessing the natural language short and long studentsâ€™ responses, and knowledge tracing.Prior knowledge assessment is an important component of knowledge assessment as it facilitates the adaptation of the instruction from the very beginning, i.e., when the student starts interacting with the (computer) tutor. Grouping students into groups with similar mental models and patterns of prior level of knowledge allows the system to select the right level of scaffolding for each group of students. While not adapting instruction to each individual learner, the advantage of adapting to groups of students based on a limited number of prior knowledge levels has the advantage of decreasing the authoring costs of the tutoring system. To achieve this goal of identifying or clustering students based on their prior knowledge, we have employed effective clustering algorithms. Automatically assessing open-ended student responses is another challenging aspect of knowledge assessment in ITSs. In dialogue-based ITSs, the main interaction between the learner and the system is natural language dialogue in which students freely respond to various system prompts or initiate dialogue moves in mixed-initiative dialogue systems. Assessing freely generated student responses in such contexts is challenging as students can express the same idea in different ways owing to different individual style preferences and varied individual cognitive abilities. To address this challenging task, we have proposed several novel deep learning models as they are capable to capture rich high-level semantic features of text. Knowledge tracing (KT) is an important type of knowledge assessment which consists of tracking studentsâ€™ mastery of knowledge over time and predicting their future performances. Despite the state-of-the-art results of deep learning in this task, it has many limitations. For instance, most of the proposed methods ignore pertinent information (e.g., Prior knowledge) that can enhance the knowledge tracing capability and performance. Working toward this objective, we have proposed a generic deep learning framework that accounts for the engagement level of students, the difficulty of questions and the semantics of the questions and uses a novel times series model called Temporal Convolutional Network for future performance prediction. The advanced auto-assessment methods presented in this dissertation should enable better ways to estimate learnerâ€™s knowledge states and in turn the adaptive scaffolding those systems can provide which in turn should lead to more effective tutoring and better learning gains for students. Furthermore, the proposed method should enable more scalable development and deployment of ITSs across topics and domains for the benefit of all learners of all ages and backgrounds

University of Memphis Digital Commons

TOWARDS BUILDING INTELLIGENT COLLABORATIVE PROBLEM SOLVING SYSTEMS

Author: Gautam Dipesh
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2019
Field of study

Historically, Collaborative Problem Solving (CPS) systems were more focused on Human Computer Interaction (HCI) issues, such as providing good experience of communication among the participants. Whereas, Intelligent Tutoring Systems (ITS) focus both on HCI issues as well as leveraging Artificial Intelligence (AI) techniques in their intelligent agents. This dissertation seeks to minimize the gap between CPS systems and ITS by adopting the methods used in ITS researches. To move towards this goal, we focus on analyzing interactions with textual inputs in online learning systems such as DeepTutor and Virtual Internships (VI) to understand their semantics and underlying intents. In order to address the problem of assessing the student generated short text, this research explores firstly data driven machine learning models coupled with expert generated as well as general text analysis features. Secondly it explores method to utilize knowledge graph embedding for assessing student answer in ITS. Finally, it also explores a method using only standard reference examples generated by human teacher. Such method is useful when a new system has been deployed and no student data were available.To handle negation in tutorial dialogue, this research explored a Long Short Term Memory (LSTM) based method. The advantage of this method is that it requires no human engineered features and performs comparably well with other models using human engineered features.Another important analysis done in this research is to find speech acts in conversation utterances of multiple players in VI. Among various models, a noise label trained neural network model performed better in categorizing the speech acts of the utterances.The learners\u27 professional skill development in VI is characterized by the distribution of SKIVE elements, the components of epistemic frames. Inferring the population distribution of these elements could help to assess the learners\u27 skill development. This research sought a Markov method to infer the population distribution of SKIVE elements, namely the stationary distribution of the elements.While studying various aspects of interactions in our targeted learning systems, we motivate our research to replace the human mentor or tutor with intelligent agent. Introducing intelligent agent in place of human helps to reduce the cost as well as scale up the system

University of Memphis Digital Commons

Composing Measures for Computing Text Similarity

Author: Bär Daniel
Gurevych Iryna
Zesch Torsten
Publication venue
Publication date: 26/01/2015
Field of study

We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used to judge text similarity, for which we provide empirical evidence. We discuss state-of-the-art text similarity measures previously proposed in the literature, before continuing with a thorough discussion of common evaluation metrics and datasets. Based on the analysis, we devise an architecture which combines text similarity measures in a unified classification framework. We apply our system in two evaluation settings, for which it consistently outperforms prior work and competing systems: (a) an intrinsic evaluation in the context of the Semantic Textual Similarity Task as part of the Semantic Evaluation (SemEval) exercises, and (b) an extrinsic evaluation for the detection of text reuse. As a basis for future work, we introduce DKPro Similarity, an open source software package which streamlines the development of text similarity measures and complete experimental setups

TUbiblio

tuprints

FLASA: Fuzzy Logic based on Automarking Short-text Answers

Author: Mohammed Mahmoud Mohammed Amaira
محمد محمود محمد العمايرة
Publication venue: جامعة القدس
Publication date
Field of study

Al-Quds University Digital Repository