Transparency and Validity in Coding Open-Ended Data for Quantitative Analysis

Abstract

Open-ended data are rich sources of information in psychological research, but reporting practices differ substantially. Here, we assess current reporting practices for quantitative coding of open-ended data, provide strategies for making it more valid and reliable, and investigate questionable research practices in this area. First, we systematically examined articles in four top psychology journals and found that 21% included open-ended data coded by humans. However, only 36% of these reported sufficient details about the coding process. We propose guidelines for transparently reporting on the quantitative coding of open-ended data, informed by concerns with replicability, content validity, and statistical validity. We identify several practices that researchers can share information about, such as how units of analysis and categories were determined, whether there was a gold-standard coder, whether the test phase was masked and pre-determined, and whether there were multiple test-phases. Our data simulations indicate that a common statistic for testing reliability on open-ended data, Cohen’s kappa (κ), can become inflated when researchers use repeated test phases and manipulate categories such as by including a missing data category. To facilitate transparent and valid coding of open-ended data, we provide a pre-registration template that can be adapted for different types of studies

    Similar works

    Full text

    thumbnail-image

    Available Versions