Search CORE

9 research outputs found

Introducing a framework to assess newly created questions with Natural Language Processing

Author: A Abyaa
CD Manning
GH Mc Laughlin
J Bergstra
J Verhagen
R Flesch
RC Atkinson
RK Hambleton
X Wang
Y Mao
Z Huang
Publication venue
Publication date: 01/01/2020
Field of study

Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions by extracting meaningful features from the text of the question and of the possible choices. We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy, showing that it outperforms previously proposed models, reducing by 6.7% the RMSE for difficulty estimation and by 10.8% the RMSE for discrimination estimation. We also present the results of an ablation study performed to support our features choice and to show the effects of different characteristics of the questions' text on difficulty and discrimination.Comment: Accepted at the International Conference of Artificial Intelligence in Educatio

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

Knowledge Tracing: A Review of Available Technologies

Author: Dai Miao
Du Xu
Hung Jui-Long
Li Hao
Tang Hengtao
Publication venue: The Aquila Digital Community
Publication date: 01/10/2021
Field of study

As a student modeling technique, knowledge tracing is widely used by various intelligent tutoring systems to infer and trace the individual’s knowledge state during the learning process. In recent years, various models were proposed to get accurate and easy-to-interpret results. To make sense of the wide Knowledge tracing (KT) modeling landscape, this paper conducts a systematic review to provide a detailed and nuanced discussion of relevant KT techniques from the perspective of assumptions, data, and algorithms. The results show that most existing KT models consider only a fragment of the assumptions that relate to the knowledge components within items and student’s cognitive process. Almost all types of KT models take “quize data” as input, although it is insufficient to reflect a clear picture of students’ learning process. Dynamic Bayesian network, logistic regression and deep learning are the main algorithms used by various knowledge tracing models. Some open issues are identified based on the analytics of the reviewed works and discussed potential future research directions

Aquila Digital Community (University of Southern Mississippi, USM)

R2DE: a NLP approach to estimating IRT parameters of newly generated questions

Author: Chen Penghe
Ding Xinyi
Huang Zhenya
Huang Zhenya
Manning Christopher
Su Yu
Wang Zhiwei
Yaneva Victoria
Publication venue
Publication date: 01/01/2020
Field of study

The main objective of exams consists in performing an assessment of students' expertise on a specific subject. Such expertise, also referred to as skill or knowledge level, can then be leveraged in different ways (e.g., to assign a grade to the students, to understand whether a student might need some support, etc.). Similarly, the questions appearing in the exams have to be assessed in some way before being used to evaluate students. Standard approaches to questions' assessment are either subjective (e.g., assessment by human experts) or introduce a long delay in the process of question generation (e.g., pretesting with real students). In this work we introduce R2DE (which is a Regressor for Difficulty and Discrimination Estimation), a model capable of assessing newly generated multiple-choice questions by looking at the text of the question and the text of the possible choices. In particular, it can estimate the difficulty and the discrimination of each question, as they are defined in Item Response Theory. We also present the results of extensive experiments we carried out on a real world large scale dataset coming from an e-learning platform, showing that our model can be used to perform an initial assessment of newly created questions and ease some of the problems that arise in question generation

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

A Systematic Review of Deep Learning Approaches to Educational Data Mining

Author: Hernández-Blanco Antonio
Herrera-Flores Boris
Navarro Colorado Borja
Tomás David
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasing attention in the educational domain. Deep Learning is a machine learning method based on neural network architectures with multiple layers of processing units, which has been successfully applied to a broad set of problems in the areas of image recognition and natural language processing. This paper surveys the research carried out in Deep Learning techniques applied to EDM, from its origins to the present day. The main goals of this study are to identify the EDM tasks that have benefited from Deep Learning and those that are pending to be explored, to describe the main datasets used, to provide an overview of the key concepts, main architectures, and configurations of Deep Learning and its applications to EDM, and to discuss current state-of-the-art and future directions on this area of research

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Fraud detection for online banking for scalable and distributed data

Author: Haq Ikram
Publication venue: 'Federation University Australia'
Publication date: 01/01/2020
Field of study

Online fraud causes billions of dollars in losses for banks. Therefore, online banking fraud detection is an important field of study. However, there are many challenges in conducting research in fraud detection. One of the constraints is due to unavailability of bank datasets for research or the required characteristics of the attributes of the data are not available. Numeric data usually provides better performance for machine learning algorithms. Most transaction data however have categorical, or nominal features as well. Moreover, some platforms such as Apache Spark only recognizes numeric data. So, there is a need to use techniques e.g. One-hot encoding (OHE) to transform categorical features to numerical features, however OHE has challenges including the sparseness of transformed data and that the distinct values of an attribute are not always known in advance. Efficient feature engineering can improve the algorithm’s performance but usually requires detailed domain knowledge to identify correct features. Techniques like Ripple Down Rules (RDR) are suitable for fraud detection because of their low maintenance and incremental learning features. However, high classification accuracy on mixed datasets, especially for scalable data is challenging. Evaluation of RDR on distributed platforms is also challenging as it is not available on these platforms. The thesis proposes the following solutions to these challenges: • We developed a technique Highly Correlated Rule Based Uniformly Distribution (HCRUD) to generate highly correlated rule-based uniformly-distributed synthetic data. • We developed a technique One-hot Encoded Extended Compact (OHE-EC) to transform categorical features to numeric features by compacting sparse-data even if all distinct values are unknown. • We developed a technique Feature Engineering and Compact Unified Expressions (FECUE) to improve model efficiency through feature engineering where the domain of the data is not known in advance. • A Unified Expression RDR fraud deduction technique (UE-RDR) for Big data has been proposed and evaluated on the Spark platform. Empirical tests were executed on multi-node Hadoop cluster using well-known classifiers on bank data, synthetic bank datasets and publicly available datasets from UCI repository. These evaluations demonstrated substantial improvements in terms of classification accuracy, ruleset compactness and execution speed.Doctor of Philosoph

Federation ResearchOnline

Recommended from our members

Incorporating Rich Features into Deep Knowledge Tracing

Author: Zhang Liang
Publication venue: Worcester Polytechnic Institute - Gordon Library
Publication date: 14/04/2017
Field of study

The desire to follow student learning within intelligent tutoring systems in near real time has led to the development of several models anticipating the correctness of the next item as students work through an assignment. Such models have in- cluded Bayesian Knowledge Tracing (BKT), Performance Factors Analysis (PFA), and more recently with developments in Deep Learning, Deep Knowledge Tracing (DKT). The DKT model, based on the use of a recurrent neural network, exhibited promising results in paper [PBH+15]. Thus far, however, the model has only considered the knowledge components of the problems and correctness as input, neglecting the breadth of other features col- lected by computer-based learning platforms. This work seeks to improve upon the DKT model by incorporating more features at the problem-level and student-level. With this higher dimensional input, an adaption to the original DKT model struc- ture is also proposed, incorporating an Autoencoder network layer to convert the input into a low dimensional feature vector to reduce both the resource requirement and time needed to train. Experimental results show that our adapted DKT model, which includes more combinations of features, can effectively improve accuracy

Digital WPI