524 research outputs found

    Towards Interpretable Deep Learning Models for Knowledge Tracing

    Full text link
    As an important technique for modeling the knowledge states of learners, the traditional knowledge tracing (KT) models have been widely used to support intelligent tutoring systems and MOOC platforms. Driven by the fast advancements of deep learning techniques, deep neural network has been recently adopted to design new KT models for achieving better prediction performance. However, the lack of interpretability of these models has painfully impeded their practical applications, as their outputs and working mechanisms suffer from the intransparent decision process and complex inner structures. We thus propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models. Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model by backpropagating the relevance from the model's output layer to its input layer. The experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions, and partially validate the computed relevance scores from both question level and concept level. We believe it can be a solid step towards fully interpreting the DLKT models and promote their practical applications in the education domain

    The effects of individual differences and linguistic features on reading comprehension of health-related texts

    Get PDF
    Background. Relatively little attention has been focused on whether or how the effects of reader characteristics, or of the linguistic properties of a text, predict reading comprehension of health-related information. In addition, there is little evidence for the utility of any of the writing guidelines promulgated by the National Health Service (NHS) in order to improve the comprehension of health information. Nonetheless, some previous research suggests that health-related texts could be adapted for different groups of users to optimise understanding. Thus, existing knowledge presents important limitations, and raises concerns with potentially far-reaching practical implications. To address these concerns, I investigated how variation in individual differences and in text features predicts the comprehension of health-related texts, examining how the effects of textual features may differ for different kinds of readers. Method. The focus of this thesis is on Study 3, in which I investigated the predictors of tested comprehension, but I report preliminary studies where I examined the readability of a sample of health-related texts (Study 1), and the perceived comprehension of a sample of health-related texts (Study 2). In the primary study (Study 3), I used Bayesian mixed-effects models to analyse the influences that affect the accuracy of responses to questions probing the comprehension of a sample of health-related texts. I measured variation among 200 participants in their cognitive abilities, to capture the effects of individual differences, as well as variation in the linguistic features of texts, to capture the effects of text structure and content. Results. I found that tested comprehension was less likely to be accurate among older participants. However, comprehension accuracy was greater given higher levels of education, health literacy, and English language proficiency levels. In addition, self-rated evaluations of perceived comprehension predicted comprehension, but only in the absence of other individual-differences-related predictors. Variation in text features, including readability estimates, did not predict comprehension accuracy, and there was no evidence for the modulation of the effects of individual differences by text features. Discussion. Text features did not module the effects of individual differences to influence comprehension accuracy in any meaningful way. This suggests that adapting health-related texts to different groups of the population may be of limited practical value. Implications. Individual differences really matter to comprehension. Thus, optimally, understanding of health-related texts amongst the end-users should be tested, and interventions to aid readers, such as those with relatively low health literacy levels, could be used to improve comprehension of health-texts. In the absence of sensitive measures of reader characteristics, and when testing of understanding is not possible, the use of end-user evaluations of health-related texts may serve as a useful proxy of tested comprehension. However, looking for text effects, and guidance focusing on text effects, seems less useful given the reported evidence. Consequently, the effectiveness of designing health-related texts with the consideration of NHS’s text writing guidelines, is likely to be limited

    Student Modeling in Intelligent Tutoring Systems

    Get PDF
    After decades of development, Intelligent Tutoring Systems (ITSs) have become a common learning environment for learners of various domains and academic levels. ITSs are computer systems designed to provide instruction and immediate feedback, which is customized to individual students, but without requiring the intervention of human instructors. All ITSs share the same goal: to provide tutorial services that support learning. Since learning is a very complex process, it is not surprising that a range of technologies and methodologies from different fields is employed. Student modeling is a pivotal technique used in ITSs. The model observes student behaviors in the tutor and creates a quantitative representation of student properties of interest necessary to customize instruction, to respond effectively, to engage students¥¯ interest and to promote learning. In this dissertation work, I focus on the following aspects of student modeling. Part I: Student Knowledge: Parameter Interpretation. Student modeling is widely used to obtain scientific insights about how people learn. Student models typically produce semantically meaningful parameter estimates, such as how quickly students learn a skill on average. Therefore, parameter estimates being interpretable and plausible is fundamental. My work includes automatically generating data-suggested Dirichlet priors for the Bayesian Knowledge Tracing model, in order to obtain more plausible parameter estimates. I also proposed, implemented, and evaluated an approach to generate multiple Dirichlet priors to improve parameter plausibility, accommodating the assumption that there are subsets of skills which students learn similarly. Part II: Student Performance: Student Performance Prediction. Accurately predicting student performance is one of the most desired features common evaluations for student modeling. for an ITS. The task, however, is very challenging, particularly in predicting a student¥¯s response on an individual problem in the tutor. I analyzed the components of two common student models to determine which aspects provide predictive power in classifying student performance. I found that modeling the student¥¯s overall knowledge led to improved predictive accuracy. I also presented an approach, which, rather than assuming students are drawn from a single distribution, modeled multiple distributions of student performances to improve the model¥¯s accuracy. Part III: Wheel-spinning: Student Future Failure in Mastery Learning. One drawback of the mastery learning framework is its possibility to leave a student stuck attempting to learn a skill he is unable to master. We refer to this phenomenon of students being given practice with no improvement as wheel-spinning. I analyzed student wheel-spinning across different tutoring systems and estimated the scope of the problem. To investigate the negative consequences of see what wheel-spinning could have done to students, I investigated the relationships between wheel-spinning and two other constructs of interest about students: efficiency of learning and ¥°gaming the system¥±. In addition, I designed a generic model of wheel-spinning, which uses features easily obtained by most ITSs. The model can be well generalized to unknown students with high accuracy classifying mastery and wheel-spinning problems. When used as a detector, the model can detect wheel-spinning in its early stage with satisfying satisfactory precision and recall

    Neural Cognitive Diagnosis for Intelligent Education Systems

    Full text link
    Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e.g., logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i.e., the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability

    Studies in Analytical Chemistry and Chemical Education. Part 1: Characterization of Complex Organics By Raman Spectroscopy and Gas Chromatography. Part 2: Differential Item Functioning on Multiple-choice General Chemistry Assessments

    Get PDF
    PART 1: CHARACTERIZATION OF COMPLEX ORGANICS BY RAMAN SPECTROSCOPY AND GAS CHROMATOGRAPHY. The analytical chemistry component of this thesis focused on instrumentation and methods to address challenges in art conservation, particularly the identification, quantitation, and reactivity of a set of representative varnishes and their degradation products. Methods for characterizing varnishes are of great interest to art conservators to restore art work more accurately. A database was created as a means to identify and quantify the composition of aged varnishes. Fourier Transform (FT)-Raman Spectroscopy was used to study common organic acids found in varnishes. The database included nine short-chain carboxylic acids, four di-carboxylic acids, and six medium-to-long-chain fatty acids. Four varnish samples (Linseed Oil, Tung Oil, Dammar, and Mastic) were studied as well. Through visual comparison and fingerprinting analysis comparison, identification of components in the Raman Spectral Database were recognized as components of the varnish samples. Singular Value Decomposition (SVD) was conducted to determine how well the database represented the unknown varnish samples. SVD was applied to the 19 standards collected in building the database. To reduce the amount of data, seven singular values were chosen. The seven singular values were then used to model several unknowns - Linseed Oil, Tung Oil, Dammar, and Mastic. The root-mean square (RMS) error for the unknowns were 0.08, 0.13, 0.21, and 0.21 Raman Intensity units, for Linseed Oil, Tung Oil, Dammar, and Mastic, respectively. If those values are compared to the largest peak in the unknown spectra, the % relative RMS errors are 1.7%, 1.7%, 4.9%, and 6.4%, respectively. A method based upon Gas Chromatography (GC) was developed to characterize carboxylic acids formed as a result of varnish degradation. In this method, a headspace solid-phase microextraction (SPME) approach was optimized in which a 75 ”m carboxen-polydimethylsiloxane (CAR/PDMS) SPME fiber was used to analyze mono carboxylic acids. For quantitative determinations, the injection port was in the splitless mode and held at 250°C for 1.0 min for the desorption of the analytes from the SPME fiber. After the initial minute, the injector was switched to a 1:100 split ratio. The temperature program consisted of the oven being initially set to a temperature of 30°C and held for 1 min, and then ramped at 25°C/min to 200°C, where the temperature was held for 1 min, thereby resulting in a total run time of 8.80 min. The PFPD was held at 200 °C for the entire run with a 0.5 ms gate delay, and the gate width was set to 20.0 ms. The mono carboxylic acids that were studied were Formic, Acetic, Propanoic, Butyric, Valeric, and Caproic Acid. A linear relationship was observed between the number of carbons in the carboxylic acid and the retention time (y = 0.75x + 1.55, R2=0.95). Quantitation of Acetic Acid was done by calibration using a first-order regression fit. The model yielded: y = 0.29x + 0.92 (R2=0.95). Using a second-order model, a better fit was found: y = 0.0025x2 - 0.0016x + 5.9 (R2=0.99). An ageing chamber was designed, fabricated, and tested as a means for better understanding the decomposition of varnishes over time as a function of temperature, humidity, and ultraviolet light. The goal in the development of the ageing chamber was to demonstrate that it may be possible to create Standard Reference Materials (SRMs) artificially that resemble authentically aged varnishes. This is possible by the use of the ageing chamber that was built because it is directly incorporated into a GC oven where temperature, where UV radiation, humidity levels, and pollutants can be precisely controlled and carefully monitored. The GC method for carboxylic acids described above was developed to aid in the measurement of carboxylic acid fragments that could arise from the ageing process. There are promising results of the Raman Intensity increasing as the sample aged. PART 2: DIFFERENTIAL ITEM FUNCTIONING ON MULTIPLE-CHOICE GENERAL CHEMISTRY ASSESSMENTS. Over the past 30 years, there have been a plethora of studies on gender differences. Some of the earlier studies found that male students typically outperform female students in visual-spatial and quantitative abilities, whereas female students outperform male students in verbal abilities. In later studies it was reinforced that female students still tended to outperform male students in verbal abilities while the gap in science and mathematics (the latter as an extension of visual-spatial and quantitative abilities) closed greatly. During this same time, more female students entered the science, technology, engineering, and mathematics (STEM) fields. In 1966, only 25% of all STEM bachelor\u27s degrees were obtained by female students, whereas in 2010 that percentage had grown to 50%. Specifically in chemistry, 49.9% of the bachelor\u27s degrees were earned by women compared to the 18.5% in 1966.1 With assessments as a large source of the student\u27s overall course grade, it is imperative that those assessments be valid and unbiased. One way to determine this is to use Differential Item Functioning (DIF). DIF occurs when subgroups of equal abilities perform statistically different on an item on an assessment where typically students that are matched with equivalent ability would have an equivalent possibility of answering the question on the assessment correctly. Because of the difficulty in determining students\u27 ability often times the subgroups are matched on their proficiency or the score they received on an assessment. This dissertation focused on four main questions. The first question focused on identifying items that exhibited DIF. The second question was to determine if DIF was real, i.e. did it persist no matter the set of students or the matching criteria used? The third question focused on determining the causes of DIF by cloning the items by content and construct (format). Lastly, it was hypothesized that one of the reasons behind why DIF is happening was due to the students\u27 problem-solving process and examining these through the use of incorrect heuristics. Data for the first part of the study was collected from two American Chemical Society‐Examinations Institute (ACS‐EI) trial tests (Form A and Form B) that were given to students who had completed one term of general chemistry. This data was analyzed using the Mantel‐Haenszel statistic to determine which items exhibited possible DIF. Along with the Mantel‐Haenzel statistic a two stage DIF analysis2 was conducted. Out of the 140 items, 33 exhibited DIF. On Form A there were 14 items which exhibited DIF, seven that favored male students and seven that favored female students. On Form B there were 19 items which exhibited DIF, 11 that favored female students and eight that favored male students. Those items that exhibited the highest probability of DIF were cloned and included on hourly examinations. These items were examined for DIF persistence against both stages of the two-stage analysis and other relevant measures of proficiency. As more results were collected, patterns emerged for persistent DIF items. On the 24 hourly examinations that were included in this analysis, there were a total of 687 items: 33 (5%) had a significant value using the Mantel-Haenszel statistic, thereby exhibiting persistent DIF. Of those 33 items, 15 were flagged with persistent DIF that favored female students and 18 were flagged with persistent DIF that favored male students. On the three standardized examinations, there were a total of 140 items; 19 (14%) had a significant value using the Mantel-Haenszel statistic, thereby exhibiting persistent DIF. Of those 19 items, two of the items that were flagged with persistent DIF favored female students and 17 of the items that were flagged with persistent DIF favored male students. Along with these items, certain content areas and formats of the items were found to favor one gender. Over six semesters of testing, the content areas that consistently showed DIF that favored male students were measurement (density), greatest/least number of atoms, limiting reagents, ideal gas equation, and crystal structures; the content areas that favored female students were nomenclature and molecular orbital theory. The formats that tended to favor male students were visual-spatial, reasoning, and computation; the format that favored female students was specific chemical knowledge. By cloning these items, it was found that some of the possible causes of persistent DIF for certain items were the content and/or the format. Lastly semi-structured interviews were conducted and it was found that for seven items the possible reason why DIF was happening was due to one subgroup using an incorrect heuristic. These items were in the specific content areas of measurement (density), greatest/least number of atoms, stoichiometry-general, and crystal structures. Additionally, the format inclusions of visual-spatial, reasoning, and computation for these items could also be contributing factors to the observed results. References 1. S&E Degrees: 1966-2010: National Center for Science and Engineering Statistics. http://www.nsf.gov/statistics/nsf11316/content.cfm?pub_id=4062&id=2 (accessed May 26). 2. Zenisky, A. L.; Hambleton, R. K., Detection of Differential Item Functioning in Large-Scale State Assessments: A Study Evaluating a Two-Stage Approach. Educational and Psychological Measurement 2003a, 63 (1), 51-64

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment
    • 

    corecore