140 research outputs found
Statistical analysis of Q-matrix based diagnostic classification models
Diagnostic classification models (DMCs) have recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Central to the model specification is the so-called Q-matrix that provides a qualitative specification of the item-attribute relationship. In this article, we develop theories on the identifiability for the Q-matrix under the DINA and the DINO models. We further propose an estimation procedure for the Q-matrix through the regularized maximum likelihood. The applicability of this procedure is not limited to the DINA or the DINO model and it can be applied to essentially all Q-matrix based DMCs. Simulation studies show that the proposed method admits high probability recovering the true Q-matrix. Furthermore, two case studies are presented. The first case is a dataset on fraction subtraction (educational application) and the second case is a subsample of the National Epidemiological Survey on Alcohol and Related Conditions concerning the social anxiety disorder (psychiatric application)
Identifiability of Cognitive Diagnosis Models with Polytomous Responses
Cognitive Diagnosis Models (CDMs) are a powerful statistical and psychometric
tool for researchers and practitioners to learn fine-grained diagnostic
information about respondents' latent attributes. There has been a growing
interest in the use of CDMs for polytomous response data, as more and more
items with multiple response options become widely used. Similar to many latent
variable models, the identifiability of CDMs is critical for accurate parameter
estimation and valid statistical inference. However, the existing
identifiability results are primarily focused on binary response models and
have not adequately addressed the identifiability of CDMs with polytomous
responses. This paper addresses this gap by presenting sufficient and necessary
conditions for the identifiability of the widely used DINA model with
polytomous responses, with the aim to provide a comprehensive understanding of
the identifiability of CDMs with polytomous responses and to inform future
research in this field
Recommended from our members
Statistical Inference and Experimental Design for Q-matrix Based Cognitive Diagnosis Models
There has been growing interest in recent years in using cognitive diagnosis models for diagnostic measurement, i.e., classification according to multiple discrete latent traits. The Q-matrix, an incidence matrix specifying the presence or absence of a relationship between each item in the assessment and each latent attribute, is central to many of these models. Important applications include educational and psychological testing; demand in education, for example, has been driven by recent focus on skills-based evaluation. However, compared to more traditional models coming from classical test theory and item response theory, cognitive diagnosis models are relatively undeveloped and suffer from several issues limiting their applicability. This thesis exams several issues related to statistical inference and experimental design for Q-matrix based cognitive diagnosis models.
We begin by considering one of the main statistical issues affecting the practical use of Q-matrix based cognitive diagnosis models, the identifiability issue. In statistical models, identifiability is prerequisite for most common statistical inferences, including parameter estimation and hypothesis testing. With Q-matrix based cognitive diagnosis models, identifiability also affects the classification of respondents according to their latent traits. We begin by examining the identifiability of model parameters, presenting necessary and sufficient conditions for identifiability in several settings.
Depending on the area of application and the researcher's degree of control over the experiment design, fulfilling these identifiability conditions may be difficult. The second part of this thesis proposes new methods for parameter estimation and respondent classification for use with non-identifiable models. In addition, our framework allows consistent estimation of the severity of the non-identifiability problem, in terms of the proportion of the population affected by it. The implications of this measure for the design of diagnostic assessments are also discussed
Statistical Analysis of Structured Latent Attribute Models
In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts' prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology. SLAMs bring exciting opportunities and unique challenges. In particular, with high-dimensional discrete latent attributes and structural constraints encoded by a structural matrix, one needs to balance the gain in the model's explanatory power and interpretability, against the difficulty of understanding and handling the complex model structure. This dissertation studies such a family of structured latent attribute models from theoretical, methodological, and computational perspectives. On the theoretical front, we present identifiability results that advance the theoretical knowledge of how the structural matrix influences the estimability of SLAMs. The new identifiability conditions guide real-world practices of designing diagnostic tests and also lay the foundation for drawing valid statistical conclusions. On the methodology side, we propose a statistically consistent penalized likelihood approach to selecting significant latent patterns in the population in high dimensions. Computationally, we develop scalable algorithms to simultaneously recover both the structural matrix and the dependence structure of the latent attributes in ultrahigh dimensional scenarios. These developments explore an exponentially large model space involving many discrete latent variables, and they address the estimation and computation challenges of high-dimensional SLAMs arising from large-scale scientific measurements. The application of the proposed methodology to the data from international educational assessments reveals meaningful knowledge structures of the student population.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155196/1/yuqigu_1.pd
Detecting stochastic dominance for poset-valued random variables as an example of linear programming on closure systems
In this paper we develop a linear programming method for detecting stochastic dominance for random variables with values in a partially ordered set (poset) based on the upset-characterization of stochastic dominance. The proposed detection-procedure is based on a descriptively interpretable
statistic, namely the maximal probability-difference of an upset. We show how our method is related to the general task of maximizing a linear function on a closure system. Since closure systems are describable via their valid formal implications, we can use here ingredients of formal concept analysis. We also address the question of inference via resampling and via conservative bounds given by the application of Vapnik-Chervonenkis theory, which also allows for an adequate pruning of the envisaged closure system that allows for the regularization of the test statistic (by paying a price of less conceptual rigor). We illustrate the developed methods by applying them to a variety of data examples, concretely to multivariate inequality analysis, item impact and differential item functioning in item response theory and to the analysis of distributional differences in spatial statistics. The power of regularization is illustrated with a data example in the context of cognitive diagnosis models
- …