71 research outputs found

    Exploratory Mediation Analysis with Many Potential Mediators

    Full text link
    Social and behavioral scientists are increasingly employing technologies such as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional' datasets with more columns than rows. There is increasing interest, but little substantive theory, in the role the variables in these data play in known processes. This necessitates exploratory mediation analysis, for which structural equation modeling is the benchmark method. However, this method cannot perform mediation analysis with more variables than observations. One option is to run a series of univariate mediation models, which incorrectly assumes independence of the mediators. Another option is regularization, but the available implementations may lead to high false positive rates. In this paper, we develop a hybrid approach which uses components of both filter and regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering conditional on the other selected mediators. We show through simulation that it improves performance over existing methods. Finally, we provide an empirical example, showing how our method may be used for epigenetic research.Comment: R code and package are available online as supplementary material at https://github.com/vankesteren/cmfilter and https://github.com/vankesteren/ema_simulation

    Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

    Get PDF
    Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some cases. We also show that embeddings can be used to predict respondent's answers to completely new survey questions. Furthermore, BERT-based embedding techniques and the Universal Sentence Encoder provide more valid representations of survey questions than do others. Our results thus highlight the necessity to examine the construct validity of text embeddings before deploying them in social science research.Comment: Under revie

    Estimating stochastic survey response errors using the multitrait‐multierror model

    Get PDF
    From Wiley via Jisc Publications RouterHistory: received 2018-09-17, rev-recd 2021-01-26, accepted 2021-05-30, pub-electronic 2021-10-12Article version: VoRPublication status: PublishedFunder: ESRC National Centre for Research Methods, University of Southampton; Id: http://dx.doi.org/10.13039/501100000613; Grant(s): R121711Abstract: Surveys are well known to contain response errors of different types, including acquiescence, social desirability, common method variance and random error simultaneously. Nevertheless, a single error source at a time is all that most methods developed to estimate and correct for such errors consider in practice. Consequently, estimation of response errors is inefficient, their relative importance is unknown and the optimal question format may not be discoverable. To remedy this situation, we demonstrate how multiple types of errors can be estimated concurrently with the recently introduced ‘multitrait‐multierror’ (MTME) approach. MTME combines the theory of design of experiments with latent variable modelling to estimate response error variances of different error types simultaneously. This allows researchers to evaluate which errors are most impactful, and aids in the discovery of optimal question formats. We apply this approach using representative data from the United Kingdom to six survey items measuring attitudes towards immigrants that are commonly used across public opinion studies

    Achieving Fair Inference Using Error-Prone Outcomes

    Get PDF
    Recently, an increasing amount of research has focused on methods to assess and account for fairness criteria when predicting ground truth targets in supervised learning. However, recent literature has shown that prediction unfairness can potentially arise due to measurement error when target labels are error prone. In this study we demonstrate that existing methods to assess and calibrate fairness criteria do not extend to the true target variable of interest, when an error-prone proxy target is used. As a solution to this problem, we suggest a framework that combines two existing fields of research: fair ML methods, such as those found in the counterfactual fairness literature and measurement models found in the statistical literature. Firstly, we discuss these approaches and how they can be combined to form our framework. We also show that, in a healthcare decision problem, a latent variable model to account for measurement error removes the unfairness detected previously

    Flexible Extensions to Structural Equation Models Using Computation Graphs

    Get PDF
    Structural equation modeling (SEM) is being applied to ever more complex data types and questions, often requiring extensions such as regularization or novel fitting functions. To extend SEM, researchers currently need to completely reformulate SEM and its optimization algorithm–a challenging and time–consuming task. In this paper, we introduce the computation graph for SEM, and show that this approach can extend SEM without the need for bespoke software development. We show that both existing and novel SEM improvements follow naturally. To demonstrate, we introduce three SEM extensions: least absolute deviation estimation, Bayesian LASSO optimization, and sparse high–dimensional mediation analysis. We provide an implementation of SEM in PyTorch–popular software in the machine learning community–to accelerate development of structural equation models adequate for modern–day data and research questions

    Evaluating the construct validity of text embeddings with application to survey questions

    Get PDF
    Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are high-quality representations of the information needed to be encoded. We view this quality evaluation problem from a measurement validity perspective, and propose the use of the classic construct validity framework to evaluate the quality of text embeddings. First, we describe how this framework can be adapted to the opaque and high-dimensional nature of text embeddings. Second, we apply our adapted framework to an example where we compare the validity of survey question representation across text embedding models
    corecore