Discovery with Models: A Case Study on Carelessness in Computer-based Science Inquiry

Abstract

In recent years, an increasing number of analyses in Learning Analytics and Educational Data Mining (EDM) have adopted a "Discovery with Models" approach, where an existing model is used as a key component in a new EDM/analytics analysis. This article presents a theoretical discussion on the emergence of discovery with models, its potential to enhance research on learning and learners, and key lessons learned in how discovery with models can be conducted validly and effectively. We illustrate these issues through discussion of a case study where discovery with models was used to investigate a form of disengaged behavior, i.e., carelessness, in the context of middle school computer-based science inquiry. This behavior has been acknowledged as a problem in education as early as the 1920s. With the increasing use of high-stakes testing, the cost of student carelessness can be higher. For instance, within computer-based learning environments careless errors can result in reduced educational effectiveness, with students continuing to receive material they have already mastered. Despite the importance of this problem, it has received minimal research attention, in part due to difficulties in operationalizing carelessness as a construct. Building from theory on carelessness and a Bayesian framework for knowledge modeling, we use machine-learned detectors to predict carelessness within authentic use of a computer-based learning environment. We then use a discovery with models approach to link these validated carelessness measures to survey data, to study the correlations between the prevalence of carelessness and student goal orientation. The second construct, carelessness, refers to incorrect answers given by a student on material that the student should be able to answer correctly Rodriguez-Fornells & Maydeu-Olivares, 2000). The application of discovery with models involves two main phases. First, a model of a construct is developed using machine learning or knowledge engineering techniques, and is then validated, as discussed below. Second, this validated model is applied to data and used as a component in another analysis: For example, for identifying outliers through model predictions; examining which variables best predict the modeled construct; finding relationships between the construct and other variables using correlations, predictions, associations rules, causal relationships or other methods; or studying the contexts where the construct occurs, including its prevalence across domains, systems, or populations. For example, in One essential question to pose prior to a discovery with model analysis is whether the model adopted is valid, both overall, and for the specific situation in which it is being used. Ideally, a model should be validated using an approach such as cross-validation, where the model is repeatedly trained on one portion of the data and tested on a different portion, with model predictions compared to appropriate external measures, for example assessments made by humans with acceptably high inter-rater reliability, such as field observations of student behavior for gaming the system (cf. Even after validating in this fashion, validity should be re-considered if the model is used for a substantially different population or context than was used when developing the model.. An alternative approach is to use a simpler knowledge-engineered definition, rationally deriving a function/rule that is then applied to the data. In this case, the model can be inferred to have face validity. However, knowledge-engineered models often DISCOVERY WITH MODELS: A CASE STUDY ON CARELESSNESS 6 produce different results than machine learning-based models, for example in the case of gaming the system. Research studying whether student or content is a better predictor of gaming the system identified different results, depending on which model was applied (cf. Baker, 2007a

    Similar works