659 research outputs found

    Estimation of Models in a Rasch Family for Polytomous Items and Multiple Latent Variables

    Get PDF
    The Rasch family of models considered in this paper includes models for polytomous items and multiple correlated latent traits, as well as for dichotomous items and a single latent variable. An R package is described that computes estimates of parameters and robust standard errors of a class of log-linear-by-linear association (LLLA) models, which are derived from a Rasch family of models. The LLLA models are special cases of log-linear models with bivariate interactions. Maximum likelihood estimation of LLLA models in this form is limited to relatively small problems; however, pseudo-likelihood estimation overcomes this limitation. Maximizing the pseudo-likelihood function is achieved by maximizing the likelihood of a single conditional multinomial logistic regression model. The parameter estimates are asymptotically normal and consistent. Based on our simulation studies, the pseudo-likelihood and maximum likelihood estimates of the parameters of LLLA models are nearly identical and the loss of efficiency is negligible. Recovery of parameters of Rasch models fit to simulated data is excellent.

    Estimation of Models in a Rasch Family for Polytomous Items and Multiple Latent Variables

    Get PDF
    The Rasch family of models considered in this paper includes models for polytomous items and multiple correlated latent traits, as well as for dichotomous items and a single latent variable. An R package is described that computes estimates of parameters and robust standard errors of a class of log-linear-by-linear association (LLLA) models, which are derived from a Rasch family of models. The LLLA models are special cases of log-linear models with bivariate interactions. Maximum likelihood estimation of LLLA models in this form is limited to relatively small problems; however, pseudo-likelihood estimation overcomes this limitation. Maximizing the pseudo-likelihood function is achieved by maximizing the likelihood of a single conditional multinomial logistic regression model. The parameter estimates are asymptotically normal and consistent. Based on our simulation studies, the pseudo-likelihood and maximum likelihood estimates of the parameters of LLLA models are nearly identical and the loss of efficiency is negligible. Recovery of parameters of Rasch models fit to simulated data is excellent

    Using Response Times for Modeling Missing Responses in Large-Scale Assessments

    Get PDF
    Examinees differ in how they interact with assessments. In low-stakes large-scale assessments (LSAs), missing responses pose an obvious example of such differences. Understanding the underlying mechanisms is paramount for making appropriate decisions on how to deal with missing responses in data analysis and drawing valid inferences on examinee competencies. Against this background, the present work aims at providing approaches for a nuanced modeling and understanding of test-taking behavior associated with the occurrence of missing responses in LSAs. These approaches are aimed at a) improving the treatment of missing responses in LSAs, b) supporting a better understanding of missingness mechanisms in particular and examinee test-taking behavior in general, and c) considering differences in test-taking behavior underlying missing responses when drawing inferences about examinee competencies. To that end, the present work leverages the additional information contained in response times and integrates research on modeling missing responses with research on modeling response times associated with observed responses. By documenting lengths of interactions, response times contain valuable information on how examinees interact with assessments and may as such critically contribute to understanding the processes underlying both observed and missing responses. This work presents four modeling approaches that focus on different aspects and mechanisms of missing responses. The first two approaches focus on modeling not-reached items. The second two approaches aim at modeling omitted items. The first approach employs the framework for the joint modeling of speed and ability by van der Linden (2007) for modeling the mechanism underlying not-reached items due to lack of working speed. On the basis of both theoretical considerations as well as a comprehensive simulation study, it is argued that by accounting for differences in speed this framework is well suited for modeling the mechanism underlying not-reached items due to lack thereof. In assessing empirical test-level response times, it is, however, also illustrated that some examinees quit the assessment before reaching the end of the test or being forced to stop working due to a time limit. Building on these results, the second approach of this work aims at disentangling and jointly modeling multiple mechanisms underlying not-reached items. Employing information on response times, not-reached items due to lack of speed are distinguished from not-reached items due to quitting. The former is modeled by considering examinee speed. Quitting behavior - defined as stopping to work before the time limit is reached while there are still unanswered items - is modeled as a survival process, with the item position at which examinees are most likely to quit being governed by their test endurance, conceptualized as a third latent variable besides speed and ability. The third approach presented in this work focuses on jointly modeling omission behavior and response behavior, thus providing a better understanding of how these two types of behavior differ. For doing so, the approach extends the framework for jointly modeling speed and ability by a model component for the omission process and introduces the concept of different speed levels examinees operate on when generating responses and omitting items. This approach supports a more nuanced understanding of both the missingness mechanism underlying omissions and examinee pacing behavior through assessment of whether examinees employ different pacing strategies when generating responses or omitting items The fourth approach builds on previous theoretical work relating omitted responses to examinee disengagement and provides a model-based approach that allows for identifying and modeling examinee disengagement in terms of both omission and guessing behavior. Disengagement is identified at the item-by-examinee level by employing a mixture modeling approach that allows for different data-generating processes underlying item responses and omissions as well as different distributions of response times associated with engaged and disengaged behavior. Item-by-examinee mixing proportions themselves are modeled as a function of additional person and item parameters. This allows relating disengagement to ability and speed as well as identifying items that are likely to evoke disengaged test-taking behavior. The approaches presented in this work are tested and illustrated by a) evaluating their statistical performance under conditions typically encountered in LSAs by means of comprehensive simulation studies, b) illustrating their advances over previously developed approaches, and c) applying them to real data from major LSAs, thereby illustrating their potential for understanding examinee test-taking behavior in general and missingness mechanisms in particular. The potential of the approaches developed in this work for deepening the understanding of results from LSAs is discussed and implications for the improvement of assessment procedures - ranging from construction and administration to analysis, interpretation and reporting - are derived. Limitations of the proposed approaches are discussed and suggestions for future research are provided

    The Role of Life Satisfaction in Predicting Youth Violence and Offending: A Prospective Examination

    Get PDF
    Life satisfaction in adolescence has been shown to protect against numerous negative outcomes (e.g., substance use, sexual risk-taking), but limited work has directly explored the relationship between life satisfaction and youth violence and offending. As such, we conducted a prospective assessment to explore this relationship among community (n = 334) and at-risk youth (n = 99). Findings suggest life satisfaction is significantly associated with decreased offending and violence within both samples and adds incremental value above established risk factors in predicting violent and total offending among community youth. Furthermore, moderation analyses indicate that the protective value of life satisfaction is greater for youth with high callous–unemotional traits. Mediation analyses suggest that youth who are unsatisfied with their lives may seek out substance use, in turn elevating risk of offending. Together, these findings indicate that efforts to improve overall life satisfaction may help prevent adolescent offending. However, future research is needed

    The Role of Psychopathic Features and Developmental Risk Factors in Trajectories of Physical Intimate Partner Violence

    Get PDF
    Objective: Limited research has examined the association between different dimensions of psychopathy and membership in trajectories of physical intimate partner violence (IPV) while also considering developmental precursors. Thus, the current study examined the role of adolescent unidimensional, interpersonal-affective, and lifestyle-antisocial psychopathic features and developmental risk factors in trajectories of physical IPV in young adulthood. Method: Data were derived from 885 male offenders who participated in the Pathways to Desistance Study and were assessed using the Psychopathy Checklist: Youth Version (PCL:YV). Results: Semi-parametric group-based modeling identified three trajectories of physical IPV from ages 18 through 25: (a) a no physical IPV trajectory (70.5%, n = 624), (b) a low-level physical IPV trajectory (21.9%, n = 194), and (c) a high-level decreasing physical IPV trajectory (7.6%, n = 67). In multinomial logistic regression models controlling for exposure to violence, substance abuse, and peer delinquency, PCL:YV Total scores were associated with an increased likelihood of membership in the low-level and high-level physical IPV trajectories compared to the no physical IPV trajectory. In addition, Factor 2 scores (lifestyle-antisocial features) were associated with an increased likelihood of membership in the high-level decreasing physical IPV trajectory compared to the no physical IPV trajectory. Factor 1 scores (interpersonal-affective features) were unrelated to trajectory group assignment. Conclusions: Psychopathic features in adolescents should be considered in prevention and intervention strategies targeting physical IPV

    Measuring Job Satisfaction with Rating Scales: Problems and Remedies

    Get PDF
    Job satisfaction is an aspect of cognitive well-being and one of the standard indicators of quality of life. A job satisfaction measure is included in several national panel surveys. The assessment of job satisfaction with a precise and valid measure is a pre-requisite for obtaining accurate analysis results and drawing valid conclusions. However, an inadequately designed response format can impair the way respondents answer the questions, and there is reason to suspect that the 11-point rating scale standardly used in national panel surveys for assessing cognitive well-being could be a problem. Respondents may be overwhelmed by the large number of response categories and, therefore, cope with an increased response burden by using response styles (e.g., overusing particular response categories) and other types of inappropriate category use (e.g., careless responses or ignoring irrelevant or unclear categories). Consequently, data provided by panel surveys may be of reduced quality. Thus, the research in the present dissertation aimed first to investigate whether an 11-point rating scale is adequate for a valid assessment of job satisfaction, one of the relevant life domains. Due to the lack of evidence, the second aim was to examine the performance of mixed polytomous item response theory (IRT) models when applied to detect inappropriate category use under the data condition typical for panel surveys with a job satisfaction measure. The third aim was to study whether a rating scale with fewer response categories may be more optimal to measure job satisfaction. In addition, the fourth aim was to describe the personal profiles of response-style users by means of personality trait, cognitive ability, socio-demographic variables, and contextual factors. It is important to identify these profiles because a person’s use of a specific response style can occur consistently across different traits and rating scales and, therefore, is considered a type of disposition. To examine the adequacy of an 11-point rating scale, we explored patterns of category use in the data on job satisfaction provided by the Household, Income and Labour Dynamics in Australia (HILDA) survey (first wave, n = 7,036). For this purpose, mixed polytomous IRT models were applied. The analyses showed that most respondents (60%) overused extreme response categories (e.g., adopted an extreme response style [ERS]) or the two lowest and two highest categories (e.g., adopted a so-called semi-extreme response style [semi-ERS]), whereas others demonstrated more appropriate response behavior (a so-called differential response style [DRS]). Moreover, all respondents ignored many response categories, especially those who exhibited the ERS and semi-ERS. These findings emphasize the limited adequacy of a long rating scale for assessing job satisfaction due to a large presence of inappropriate category use. Generally speaking, an 11-point rating scale does not allow one to assess fine-grained differences between respondents in their levels of job satisfaction, as intended by the developers of panel surveys. In contrast, this rating scale seems to overburden respondents with superfluous response categories and evoke response styles due to the difficulties they experience by determining the meaning of fine categories. To conclude, a rating scale with fewer response categories may be more optimal. To address the second aim, a Monte Carlo simulation study was conducted. It included two models: the mixed partial credit model (mPCM; Rost, 1997) and the restricted mixed generalized partial credit model (rmGPCM; GPCM; Muraki, 1997; mGPCM; von Davier & Yamamoto, 2004). These models are suitable for detecting patterns of inappropriate category use. The latter model is more complex and includes freely estimated item discrimination parameters (but which are restricted to be class-invariant). In particular, the simulation study focused on identifying the required sample size for a proper application of these models. In addition, we investigated what information criteria (AIC, BIC, CAIC, AIC3, and SABIC) are effective for model selection. Analysis showed that both models performed appropriately with at least 2,500 observation. By further increasing the sample size, more accurate parameter and standard error estimates could be obtained. Generally, the simulation study revealed that the mPCM performed slightly better than the rmGPCM. Specifically, both models showed estimation problems due to low category frequencies, leading to inaccurate estimates. For the recommended sample size, both the AIC3 and the SABIC were the most suitable. For the large sample sizes (consisting of at least 4,500 cases), both the BIC and CAIC were effective. The AIC, however, was insufficiently accurate. For the third aim, an experimental study with a between-subject design and randomization was conducted to compare the performance of two short rating scales (with 4 and 6 response categories) with that of a long rating scale (11 response categories) with regard to the presence of inappropriate category use and reliability (N = 6,999 employees from the USA). For this purpose, the multidimensional mixed polytomous IRT model was applied. Notably, the results from the simulation study were used at the preparation stage of this study (e.g., regarding the minimum sample size required within an experimental condition). Overall, when the rating scale was short, both the proportion of respondents who used a specific response style and the number of ignored response categories were reduced, indicating less bias in data collected with short rating scales. This finding confirmed the suggestion that some respondents use response styles as an adjustment strategy due to the inadequately large number of response categories offered. Interestingly, the same response styles were present regardless of rating scale length, suggesting that optimizing rating scale length can only partly prevent inappropriate category use. Apparently, a proportion of the respondents use a particular response style due to dispositions. To attain the fourth aim, the personal profiles of respondents who used a particular response style were investigated with two datasets: (i) a small set of the potential predictors that were available in the HILDA survey (socio-demographic variables and job-related factors); and (ii) several relevant scales and variables (personality traits, cognitive ability, socio-demographic variables, and job-related factors) that were intentionally collected in the experimental study for this purpose. For both datasets, the assignment of respondents to latent classes indicating different response styles was an outcome variable. The analyses were conducted using multinomial logistic regressions. Therefore, the findings obtained on the basis of the first dataset provided the response-format-specific characteristics of response-style users (for the 11-point rating scale). By contrast, the second analysis allowed to reveal general predictors that explained the use of a particular response style, regardless of rating scale length, whereas response-format-specific predictors explained the occurrence of a response style for a certain rating scale. Specifically, some of the general predictors found for ERS use included a high level of general self-efficacy and self-perceived job autonomy; for non-ERS use, as a tendency to avoid extreme categories, a low need for cognition was the general predictor, indicating that response styles can be caused by dispositions, and therefore they can hardly be prevented by optimizing the features of a rating scale. The predictors specific to a particular response format were then socio-demographic variables, cognitive abilities, and certain job-related factors, suggesting that profiles of respondents who used a particular response style vary depending on the rating scale administrated to collect data. Presumably, these groups of predictors primarily characterize respondents who are inclined to use response styles as an adjustment strategy due to an inadequately designed rating scale. In sum, an 11-point rating scale was shown to have serious shortcomings, including a high proportion of respondents with response styles and many ignored response categories. Therefore, this rating scale is of limited adequacy for a valid assessment of job satisfaction (and other aspects of cognitive well-being). By contrast, the 4- and 6-point rating scales showed a superior performance with regard to the presence of inappropriate category use. These short rating scales were found to have fewer respondents using response styles and to include almost no redundant response categories. Thus, these shorter rating scales are more adequate for this purpose. Generally, shorter rating scales eliminated the inappropriate category use that is primarily measure-dependent. Nevertheless, the same response styles were present in the data, regardless of rating scale length, suggesting that stable dispositions may be another major cause of response styles. Furthermore, some of these personal characteristics were identified (as general predictors). For example, ERS use could be explained by a high level of general self-efficacy and self-perceived job autonomy. Therefore, any optimizing of the rating scale may not be sufficient to eliminate effects caused by the consistent use of response styles. In this case, statistical approaches of controlling the effects of response styles should be applied. A promising approach for dealing with inappropriate category use are mixed polytomous IRT models

    DIFFERENT APPROACHES TO COVARIATE INCLUSION IN THE MIXTURE RASCH MODEL

    Get PDF
    The present dissertation project investigates different approaches to adding covariates and the impact in fitting mixture item response theory (IRT) models. Mixture IRT models serve as an important methodology for tackling several important psychometric issues in test development, including detecting latent differential item functioning (DIF). A Monte Carlo simulation study is conducted in which data generated according to a two-class mixture Rasch model (MRM) with both dichotomous and continuous covariates are fitted to several MRMs with misspecified covariates to examine the effects of covariate inclusion on model parameter estimation. In addition, both complete response data and incomplete response data with different types of missingness are considered in the present study in order to simulate practical assessment settings. Parameter estimation is carried out within a Bayesian framework vis-à-vis Markov chain Monte Carlo (MCMC) algorithms. Two empirical examples using the Programme for International Student Assessment (PISA) 2009 U.S. reading assessment data are presented to demonstrate the impact of different specifications of covariate effects for an MRM in real applications
    corecore