17 research outputs found

    A New Investigation of Fake Resistance of a Multidimensional Forced-Choice Measure: An Application of Differential Item/Test Functioning

    Get PDF
    To address faking issues associated with Likert-type personality measures, multidimensional forced-choice (MFC) measures have recently come to light as important components of personnel assessment systems. Despite various efforts to investigate the fake resistance of MFC measures, previous research has mainly focused on the scale mean differences between honest and faking conditions. Given the recent psychometric advancements in MFC measures (e.g., Brown & Maydeu-Olivares, 2011; Stark et al., 2005; Lee et al., 2019; Joo et al., 2019), there is a need to investigate the fake resistance of MFC measures through a new methodological lens. This research investigates the fake resistance of MFC measures through recently proposed differential item functioning (DIF) and differential test functioning (DTF) methodologies for MFC measures (Lee, Joo, & Stark, 2020). Overall, our results show that MFC measures are more fake resistant than Likert-type measures at the item and test levels. However, MFC measures may still be susceptible to faking if MFC measures include many mixed blocks consisting of positively and negatively keyed statements within a block. It may be necessary for future research to find an optimal strategy to design mixed blocks in the MFC measures to satisfy the goals of validity and scoring accuracy. Practical implications and limitations are discussed in the paper

    Robustness of the Within- and Between-Series Estimators to Non-Normal Multiple-Baseline Studies: A Monte Carlo Study

    Get PDF
    In single-case research, multiple-baseline (MB) design is the most widely used design in practical settings. It provides the opportunity to estimate the treatment effect based on not only within-series comparisons of treatment phase to baseline phase observations, but also time-specific between-series comparisons of observations from those that have started treatment to those that are still in the baseline. In MB studies, the average treatment effect and the variation of these effects across multiple participants can be estimated using various statistical modeling methods. Recently, two types of statistical modeling methods were proposed for analyzing MB studies: a) within-series model and b) between-series model. The within-series model is a typical two-level multilevel modeling approach analyzing the measurement occasions within a participant, whereas the between-series model is an alternative modeling approach analyzing participants’ measurement occasions at certain time points, where some participants are in the baseline phase and others are in the treatment phase. Parameters of both within- and between-series models are generally estimated with restricted maximum likelihood (ReML) estimation and ReML is developed based on the assumption of normality (Hox, et al., 2010; Raudenbush & Bryk, 2002). However, in practical educational and psychological settings, observed data may not be easily assumed to be normal. Therefore, the purpose of this study is to investigate the robustness of analyzing MB studies with the within- and between-series models when level-1 errors are non-normal. A Monte Carlo study was conducted under the conditions where level-1 errors were generated from non-normal distributions in which skewness and kurtosis of the distribution were manipulated. Four statistical approaches were considered for comparison based on theoretical and/or empirical rationales. The approaches were defined by the crossing of two analytic decisions: a) whether to use a within- or between-series estimate of effect and b) whether to use REML estimation with Kenward-Roger adjustment for inferences or Bayesian estimation and inference. The accuracy of parameter estimation and statistical power and Type I error were systematically analyzed. The results of the study showed the within- and between-series models are robust to the non-normality of the level-1 error variance. Both within- and between-series models estimated the treatment effect accurately and statistical inferences were acceptable. ReML and Bayesian estimations also showed similar results in the current study. Applications and implications for applied and methodology researchers are discussed based on the findings of the study

    Evaluating Anchor-Item Designs for Concurrent Calibration With the GGUM

    No full text
    Concurrent calibration using anchor items has proven to be an effective alternative to separate calibration and linking for developing large item banks, which are needed to support continuous testing. In principle, anchor-item designs and estimation methods that have proven effective with dominance item response theory (IRT) models, such as the 3PL model, should also lead to accurate parameter recovery with ideal point IRT models, but surprisingly little research has been devoted to this issue. This study, therefore, had two purposes: (a) to develop software for concurrent calibration with, what is now the most widely used ideal point model, the generalized graded unfolding model (GGUM); (b) to compare the efficacy of different GGUM anchor-item designs and develop empirically based guidelines for practitioners. A Monte Carlo study was conducted to compare the efficacy of three anchor-item designs in vertical and horizontal linking scenarios. The authors found that a block-interlaced design provided the best parameter recovery in nearly all conditions. The implications of these findings for concurrent calibration with the GGUM and practical recommendations for pretest designs involving ideal point computer adaptive testing (CAT) applications are discussed

    Linking Methods for the Zinnes–Griggs Pairwise Preference IRT Model

    No full text
    Forced-choice item response theory (IRT) models are being more widely used as a way of reducing response biases in noncognitive research and operational testing contexts. As applications have increased, there has been a growing need for methods to link parameters estimated in different examinee groups as a prelude to measurement equivalence testing. This study compared four linking methods for the Zinnes and Griggs (ZG) pairwise preference ideal point model. A Monte Carlo simulation compared test characteristic curve (TCC) linking, item characteristic curve (ICC) linking, mean/mean (M/M) linking, and mean/sigma (M/S) linking. The results indicated that ICC linking and the simpler M/M and M/S methods performed better than TCC linking, and there were no substantial differences among the top three approaches. In addition, in the absence of possible contamination of the common (anchor) item subset due to differential item functioning, five items should be adequate for estimating the metric transformation coefficients. Our article presents the necessary equations for ZG linking and provides recommendations for practitioners who may be interested in developing and using pairwise preference measures for research and selection purposes

    GGUM-RANK Statement and Person Parameter Estimation With Multidimensional Forced Choice Triplets

    No full text
    Historically, multidimensional forced choice (MFC) measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for interindividual comparisons. However, with the recent advent of item response theory (IRT) scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components in high-stake evaluation settings. This article aims to add to burgeoning methodological advances in MFC measurement by focusing on statement and person parameter recovery for the GGUM-RANK (generalized graded unfolding-RANK) IRT model. Markov chain Monte Carlo (MCMC) algorithm was developed for estimating GGUM-RANK statement and person parameters directly from MFC rank responses. In simulation studies, it was examined that how the psychometric properties of statements composing MFC items, test length, and sample size influenced statement and person parameter estimation; and it was explored for the benefits of measurement using MFC triplets relative to pairs. To demonstrate this methodology, an empirical validity study was then conducted using an MFC triplet personality measure. The results and implications of these studies for future research and practice are discussed

    Item Parameter Estimation With the General Hyperbolic Cosine Ideal Point IRT Model

    No full text
    Over the last decade, researchers have come to recognize the benefits of ideal point item response theory (IRT) models for noncognitive measurement. Although most applied studies have utilized the Generalized Graded Unfolding Model (GGUM), many others have been developed. Most notably, David Andrich and colleagues published a series of papers comparing dominance and ideal point measurement perspectives, and they proposed ideal point models for dichotomous and polytomous single-stimulus responses, known as the Hyperbolic Cosine Model (HCM) and the General Hyperbolic Cosine Model (GHCM), respectively. These models have item response functions resembling the GGUM and its more constrained forms, but they are mathematically simpler. Despite the apparent impact of Andrich’s work on ensuing investigations, the HCM and GHCM have been largely overlooked by applied researchers. This may stem from questions about the compatibility of the parameter metric with other ideal point estimation and model-data fit software or seemingly unrealistic parameter estimates sometimes produced by the original joint maximum likelihood (JML) estimation software. Given the growing list of ideal point applications and variations in sample and scale characteristics, the authors believe these HCMs warrant renewed consideration. To address this need and overcome potential JML estimation difficulties, this study developed a marginal maximum likelihood (MML) estimation algorithm for the GHCM and explored parameter estimation requirements in a Monte Carlo study manipulating sample size, scale length, and data types. The authors found a sample size of 400 was adequate for parameter estimation and, in accordance with GGUM studies, estimation was superior in polytomous conditions

    Mcmc Z-G: An Irt Computer Program For Forced-Choice Noncognitive Measurement

    No full text
    In recent years, there has been a surge of interest in measuring noncognitive constructs in educational and managerial/organizational settings. For the most part, these noncognitive constructs have been and continue to be measured using Likert-type (ordinal response) scales, which are susceptible to several types of response distortion. To deal with these response biases, researchers have proposed using forced-choice format, which requires respondents or raters to evaluate cognitive, affective, or behavioral descriptors presented in blocks of two or more. The workhorse for this measurement endeavor is the item response theory (IRT) model developed by Zinnes and Griggs (Z-G), which was first used as the basis for a computerized adaptive rating scale (CARS), and then extended by many organizational scientists. However, applications of the Z-G model outside of organizational contexts have been limited, primarily due to the lack of publicly available software for parameter estimation. This research effort addressed that need by developing a Markov chain Monte Carlo (MCMC) estimation program, called MCMC Z-G, which uses a Metropolis-Hastings-within-Gibbs algorithm to simultaneously estimate Z-G item and person parameters. This publicly available computer program MCMC Z-G can run on both Mac OS® and Windows® platforms

    MCMC Z-G: An IRT Computer Program for Forced-Choice Noncognitive Measurement

    No full text
    In recent years, there has been a surge of interest in measuring noncognitive constructs in educational and managerial/organizational settings. For the most part, these noncognitive constructs have been and continue to be measured using Likert-type (ordinal response) scales, which are susceptible to several types of response distortion. To deal with these response biases, researchers have proposed using forced-choice format, which requires respondents or raters to evaluate cognitive, affective, or behavioral descriptors presented in blocks of two or more. The workhorse for this measurement endeavor is the item response theory (IRT) model developed by Zinnes and Griggs (Z-G), which was first used as the basis for a computerized adaptive rating scale (CARS), and then extended by many organizational scientists. However, applications of the Z-G model outside of organizational contexts have been limited, primarily due to the lack of publicly available software for parameter estimation. This research effort addressed that need by developing a Markov chain Monte Carlo (MCMC) estimation program, called MCMC Z-G, which uses a Metropolis-Hastings-within-Gibbs algorithm to simultaneously estimate Z-G item and person parameters. This publicly available computer program MCMC Z-G can run on both Mac OS® and Windows® platforms

    Approaches for Specifying the Level-1 Error Structure When Synthesizing Single-Case Data

    No full text
    © 2017, © 2017 Taylor & Francis Group, LLC. Multilevel modeling has been utilized for combining single-case experimental design (SCED) data assuming simple level-1 error structures. The purpose of this study is to compare various multilevel analysis approaches for handling potential complexity in the level-1 error structure within SCED data, including approaches assuming simple and complex error structures (heterogeneous, autocorrelation, and both) and those using fit indices to select between alternative error structures. A Monte Carlo study was conducted to empirically validate the suggested multilevel modeling approaches. Results indicate that each approach leads to fixed effect estimates with little to no bias and that inferences for fixed effects were frequently accurate, particularly when a simple homogeneous level-1 error structure or a first-order autoregressive structure was assumed and the inferences were based on the Kenward-Roger method. Practical implications and recommendations are discussed.status: publishe

    The impact of response-guided baseline phase extensions on treatment effect estimates

    No full text
    BACKGROUND: When developmental disabilities researchers use multiple-baseline designs they are encouraged to delay the start of an intervention until the baseline stabilizes or until preceding cases have responded to intervention. Using ongoing visual analyses to guide the timing of the start of the intervention can help to resolve potential ambiguities in the graphical display; however, these forms of response-guided experimentation have been criticized as a potential source of bias in treatment effect estimation and inference. AIMS AND METHODS: Monte Carlo simulations were used to examine the bias and precision of average treatment effect estimates obtained from multilevel models of four-case multiple-baseline studies with series lengths that varied from 19 to 49 observations per case. We varied the size of the average treatment effect, the factors used to guide intervention decisions (baseline stability, response to intervention, both, or neither), and whether the ongoing analysis was masked or not. RESULTS: None of the methods of responding to the data led to appreciable bias in the treatment effect estimates. Furthermore, as timing-of-intervention decisions became responsive to more factors, baselines became longer and treatment effect estimates became more precise. CONCLUSIONS: Although the study was conducted under limited conditions, the response-guided practices did not lead to substantial bias. By extending baseline phases they reduced estimation error and thus improved the treatment effect estimates obtained from multilevel models.status: publishe
    corecore