Sampling for network motif detection and estimation of Q-matrix and learning trajectories in DINA model

Abstract

Monte Carlo methods provide tools to conduct statistical inference on models that are difficult or impossible to compute analytically and are widely used in many areas of statistical applications, such as bioinformatics and psychometrics. This thesis develops several sampling algorithms to address open issues in network analysis and educational assessments. The first problem we investigate is network motif detection. Network motifs are substructures that appear significantly more often in the given network than in other random networks. Motif detection is crucial for discovering new characteristics in biological, developmental, and social networks. We propose a novel sequential importance sampling strategy to estimate subgraph frequencies and detect network motifs. The method is developed by sampling subgraphs sequentially node by node using a carefully chosen proposal distribution. The method generates subgraphs from a distribution close to uniform and performs better than competing methods. We apply the method to four real-world networks and demonstrate outstanding performance in practical examples. The other two issues are related to educational measurement in psychometrics. Cognitive diagnosis models (CDMs) are partially ordered latent class models to classify students into skill mastery profiles. In educational assessment, these models help researchers analyze students' mastery of skills and learning process based on their responses to test items. The deterministic inputs, noisy "AND" gate model (DINA) is a popular psychometric model for cognitive diagnosis. We investigate the estimation of Q-matrix in DINA model. Q matrix is a binary matrix which maps the test item to its corresponding required attributes. We propose a Bayesian framework for estimating the DINA Q matrix. The proposed algorithms ensure that the estimated Q matrices always satisfy the identifiability constraints. We present Monte Carlo simulations to support the accuracy of parameter recovery and apply our algorithms to Tatsuoka's fraction-subtraction dataset. The last project is related to the recovery of learning process. The increasing presence of electronic and online learning resources presents challenges and opportunities for psychometric techniques that can assist in the measurement of abilities and even hasten their mastery. CDMs can assist in carefully navigating through the training and assessment of these skills in e-learning applications. We propose a class of CDMs for modeling changes in attributes, which we refer to as learning trajectories. We focus on the development of Bayesian procedures for estimating parameters of a first-order hidden Markov model and apply the developed model to a spatial rotation experimental intervention

    Similar works

    Full text

    thumbnail-image

    Available Versions