18,864 research outputs found
Formal and Informal Model Selection with Incomplete Data
Model selection and assessment with incomplete data pose challenges in
addition to the ones encountered with complete data. There are two main reasons
for this. First, many models describe characteristics of the complete data, in
spite of the fact that only an incomplete subset is observed. Direct comparison
between model and data is then less than straightforward. Second, many commonly
used models are more sensitive to assumptions than in the complete-data
situation and some of their properties vanish when they are fitted to
incomplete, unbalanced data. These and other issues are brought forward using
two key examples, one of a continuous and one of a categorical nature. We argue
that model assessment ought to consist of two parts: (i) assessment of a
model's fit to the observed data and (ii) assessment of the sensitivity of
inferences to unverifiable assumptions, that is, to how a model described the
unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Use of Loglinear Models for Assessing Differential Item Functioning Across Manifest and Latent Examinee Groups
Loglinear latent class models are used to detect differential item functioning (DIF). These models are formulated in such a manner that the attribute to be assessed may be continuous, as in a Rasch model, or categorical, as in Latent Class Mastery models. Further, an item may exhibit DIF with respect to a manifest grouping variable, a latent grouping variable, or both. Likelihood-ratio tests for assessing the presence of various types of DIF are described, and these methods are illustrated through the analysis of a "real world" data set
Graphical Markov models: overview
We describe how graphical Markov models started to emerge in the last 40
years, based on three essential concepts that had been developed independently
more than a century ago. Sequences of joint or single regressions and their
regression graphs are singled out as being best suited for analyzing
longitudinal data and for tracing developmental pathways. Interpretations are
illustrated using two sets of data and some of the more recent, important
results for sequences of regressions are summarized.Comment: 22 pages, 9 figure
Network Psychometrics
This chapter provides a general introduction of network modeling in
psychometrics. The chapter starts with an introduction to the statistical model
formulation of pairwise Markov random fields (PMRF), followed by an
introduction of the PMRF suitable for binary data: the Ising model. The Ising
model is a model used in ferromagnetism to explain phase transitions in a field
of particles. Following the description of the Ising model in statistical
physics, the chapter continues to show that the Ising model is closely related
to models used in psychometrics. The Ising model can be shown to be equivalent
to certain kinds of logistic regression models, loglinear models and
multi-dimensional item response theory (MIRT) models. The equivalence between
the Ising model and the MIRT model puts standard psychometrics in a new light
and leads to a strikingly different interpretation of well-known latent
variable models. The chapter gives an overview of methods that can be used to
estimate the Ising model, and concludes with a discussion on the interpretation
of latent variables given the equivalence between the Ising model and MIRT.Comment: In Irwing, P., Hughes, D., and Booth, T. (2018). The Wiley Handbook
of Psychometric Testing, 2 Volume Set: A Multidisciplinary Reference on
Survey, Scale and Test Development. New York: Wile
Latent class analysis variable selection
We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
- …