194 research outputs found

    On the Potential Mismatch between the Function of the Bayes Factor and Researchers’ Expectations

    Get PDF
    The aim of this study is to investigate whether there is a potential mismatch between the usability of a statistical tool and psychology researchers’ expectation of it. Bayesian statistics is often promoted as an ideal substitute for frequentists statistics since it coincides better with researchers’ expectations and needs. A particular incidence of this is the proposal of replacing Null Hypothesis Significance Testing (NHST) by Null Hypothesis Bayesian Testing (NHBT) using the Bayes factor. In this paper, it is studied to what extent the usability and expectations of NHBT match well. First, a study of the reporting practices in 73 psychological publications was carried out. It was found that eight Questionable Reporting and Interpreting Practices (QRIPs) occur more than once among the practitioners when doing NHBT. Specifically, our analysis provides insight into possible mismatches and their occurrence frequencies. A follow-up survey study has been conducted to assess such mismatches. The sample (N = 108) consisted of psychology researchers, experts in methodology (and/or statistics), and applied researchers in fields other than psychology. The data show that discrepancies exist among the participants. Interpreting the Bayes Factor as posterior odds and not acknowledging the notion of relative evidence in the Bayes Factor are arguably the most concerning ones. The results of the paper suggest that a shift of statistical paradigm cannot solve the problem of misinterpretation altogether if the users are not well acquainted with the tools

    On the white, the black, and the many shades of gray in between:Our reply to Van Ravenzwaaij and Wagenmakers (2021)

    Get PDF
    In 2019 we wrote an article (Tendeiro & Kiers, 2019) in Psychological Methods over null hypothesis Bayesian testing and its working horse, the Bayes factor. Recently, van Ravenzwaaij and Wagenmakers (2021) offered a response to our piece, also in this journal. Although we do welcome their contribution with thought-provoking remarks on our article, we ended up concluding that there were too many "issues" in van Ravenzwaaij and Wagenmakers (2021) that warrant a rebuttal. In this article we both defend the main premises of our original article and we put the contribution of van Ravenzwaaij and Wagenmakers (2021) under critical appraisal. Our hope is that this exchange between scholars decisively contributes toward a better understanding among psychologists of null hypothesis Bayesian testing in general and of the Bayes factor in particular. (PsycInfo Database Record (c) 2022 APA, all rights reserved)

    The crit coefficient in Mokken scale analysis:A simulation study and an application in quality-of-life research

    Get PDF
    PURPOSE: In Mokken scaling, the Crit index was proposed and is sometimes used as evidence (or lack thereof) of violations of some common model assumptions. The main goal of our study was twofold: To make the formulation of the Crit index explicit and accessible, and to investigate its distribution under various measurement conditions. METHODS: We conducted two simulation studies in the context of dichotomously scored item responses. We manipulated the type of assumption violation, the proportion of violating items, sample size, and quality. False positive rates and power to detect assumption violations were our main outcome variables. Furthermore, we used the Crit coefficient in a Mokken scale analysis to a set of responses to the General Health Questionnaire (GHQ-12), a self-administered questionnaire for assessing current mental health. RESULTS: We found that the false positive rates of Crit were close to the nominal rate in most conditions, and that power to detect misfit depended on the sample size, type of violation, and number of assumption-violating items. Overall, in small samples Crit lacked the power to detect misfit, and in larger samples power differed considerably depending on the type of violation and proportion of misfitting items. Furthermore, we also found in our empirical example that even in large samples the Crit index may fail to detect assumption violations. DISCUSSION: Even in large samples, the Crit coefficient showed limited usefulness for detecting moderate and severe violations of monotonicity. Our findings are relevant to researchers and practitioners who use Mokken scaling for scale and questionnaire construction and revision. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11136-021-02924-z

    The Use of Nonparametric Item Response Theory to Explore Data Quality

    Get PDF
    The aim of this chapter is to provide insight into a number of commonly used nonparametric item response theory (NIRT) methods and to show how these methods can be used to describe and explore the psychometric quality of questionnaires used in patient-reported outcome measurement and, more in general, typical performance measurement (personality, mood, health-related constructs). NIRT is an extremely valuable tool for preliminary data analysis and for evaluating whether item response data are acceptable for parametric IRT modeling. This is in particular useful in the field of typical performance measurement where the construct being measured is often very different than in maximum performance measurement (education, intelligence; see Chapter 1 of this handbook). Our basic premise is that there are no “best tools” or “best models” and that the usefulness of psychometric modeling depends on the specific aims of the instrument (questionnaire, test) that is being used. Most important is, however, that it should be clear for a researcher how sensitive a specific method (for example, DETECT, or Mokken scaling) is to the assumptions that are being investigated. The NIRT literature is not always clear about this, and in this chapter we try to clarify some of these ambiguities

    Direct transformations yielding the knight's move pattern in 3x3x3 arrays

    Get PDF
    Three-way arrays (or tensors) can be regarded as extensions of the traditional two-way data matrices that have a third dimension. Studying algebraic properties of arrays is relevant, for example, for the Tucker three-way PCA method, which generalizes principal component analysis to three-way data. One important algebraic property of arrays is concerned with the possibility of transformations to simplicity. An array is said to be transformed to a simple form when it can be manipulated by a sequence of invertible operations such that a vast majority of its entries become zero. This paper shows how 3 × 3 × 3 arrays, whether symmetric or nonsymmetric, can be transformed to a simple form with 18 out of its 27 entries equal to zero. We call this simple form the “knight's move pattern” due to a loose resemblance to the moves of a knight in a game of chess. The pattern was examined by Kiers, Ten Berge, and Rocci. It will be shown how the knight's move pattern can be found by means of a numeric–algebraic procedure based on the Gröbner basis. This approach seems to work almost surely for randomly generated arrays, whether symmetric or nonsymmetric

    On the Practical Consequences of Misfit in Mokken Scaling

    Get PDF
    Mokken scale analysis is a popular method to evaluate the psychometric quality of clinical and personality questionnaires and their individual items. Although many empirical papers report on the extent to which sets of items form Mokken scales, there is less attention for the effect of violations of commonly used rules of thumb. In this study, the authors investigated the practical consequences of retaining or removing items with psychometric properties that do not comply with these rules of thumb. Using simulated data, they concluded that items with low scalability had some influence on the reliability of test scores, person ordering and selection, and criterion-related validity estimates. Removing the misfitting items from the scale had, in general, a small effect on the outcomes. Although important outcome variables were fairly robust against scale violations in some conditions, authors conclude that researchers should not rely exclusively on algorithms allowing automatic selection of items. In particular, content validity must be taken into account to build sensible psychometric instruments
    • …
    corecore