95 research outputs found
Exploratory Mediation Analysis with Many Potential Mediators
Social and behavioral scientists are increasingly employing technologies such
as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional'
datasets with more columns than rows. There is increasing interest, but little
substantive theory, in the role the variables in these data play in known
processes. This necessitates exploratory mediation analysis, for which
structural equation modeling is the benchmark method. However, this method
cannot perform mediation analysis with more variables than observations. One
option is to run a series of univariate mediation models, which incorrectly
assumes independence of the mediators. Another option is regularization, but
the available implementations may lead to high false positive rates. In this
paper, we develop a hybrid approach which uses components of both filter and
regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering
conditional on the other selected mediators. We show through simulation that it
improves performance over existing methods. Finally, we provide an empirical
example, showing how our method may be used for epigenetic research.Comment: R code and package are available online as supplementary material at
https://github.com/vankesteren/cmfilter and
https://github.com/vankesteren/ema_simulation
The Expected Parameter Change (EPC) for local dependence assessment in binary data latent class models
Binary data latent class models crucially assume local independence,
violations of which can seriously bias the results. We present two tools for
monitoring local dependence in binary data latent class models: the "Expected
Parameter Change" (EPC) and a generalized EPC, estimating the substantive size
and direction of possible local dependencies. The asymptotic and finite sample
behavior of the measures is studied, and two applications to the U.S. Census
estimation of Hispanic ethnicity and medical experts' ratings of x-rays
demonstrate its value in arriving at a model that balances realism and
parsimony.Comment: R code implementing our proposal and including both example datasets
is available online as supplementary materia
Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
Text embedding models from Natural Language Processing can map text data
(e.g. words, sentences, documents) to supposedly meaningful numerical
representations (a.k.a. text embeddings). While such models are increasingly
applied in social science research, one important issue is often not addressed:
the extent to which these embeddings are valid representations of constructs
relevant for social science research. We therefore propose the use of the
classic construct validity framework to evaluate the validity of text
embeddings. We show how this framework can be adapted to the opaque and
high-dimensional nature of text embeddings, with application to survey
questions. We include several popular text embedding methods (e.g. fastText,
GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct
validity analyses. We find evidence of convergent and discriminant validity in
some cases. We also show that embeddings can be used to predict respondent's
answers to completely new survey questions. Furthermore, BERT-based embedding
techniques and the Universal Sentence Encoder provide more valid
representations of survey questions than do others. Our results thus highlight
the necessity to examine the construct validity of text embeddings before
deploying them in social science research.Comment: Under revie
Differential privacy and social science: An urgent puzzle
Accessing and combining large amounts of data is important for quantitative social scientists, but increasing amounts of data also increase privacy risks. To mitigate these risks, important players in official statistics, academia, and business see a solution in the concept of differential privacy. In this opinion piece, we ask how differential privacy can benefit from social-scientific insights, and, conversely, how differential privacy is likely to transform social science. First, we put differential privacy in the larger context of social science. We argue that the discussion on implementing differential privacy has been clouded by incompatible subjective beliefs about risk, each perspective having merit for different data types. Moreover, we point out existing social-scientific insights that suggest limitations to the premises of differential privacy as a data protection approach. Second, we examine the likely consequences for social science if differential privacy is widely implemented. Clearly, workflows must change, and common social science data collection will become more costly. However, in addition to data protection, differential privacy may bring other positive side effects. These could solve some issues social scientists currently struggle with, such as p-hacking, data peeking, or overfitting; after all, differential privacy is basically a robust method to analyze data. We conclude that, in the discussion around privacy risks and data protection, a large number of disciplines must band together to solve this urgent puzzle of our time, including social science, computer science, ethics, law, and statistics, as well as public and private policy
Estimating stochastic survey response errors using the multitrait‐multierror model
From Wiley via Jisc Publications RouterHistory: received 2018-09-17, rev-recd 2021-01-26, accepted 2021-05-30, pub-electronic 2021-10-12Article version: VoRPublication status: PublishedFunder: ESRC National Centre for Research Methods, University of Southampton; Id: http://dx.doi.org/10.13039/501100000613; Grant(s): R121711Abstract: Surveys are well known to contain response errors of different types, including acquiescence, social desirability, common method variance and random error simultaneously. Nevertheless, a single error source at a time is all that most methods developed to estimate and correct for such errors consider in practice. Consequently, estimation of response errors is inefficient, their relative importance is unknown and the optimal question format may not be discoverable. To remedy this situation, we demonstrate how multiple types of errors can be estimated concurrently with the recently introduced ‘multitrait‐multierror’ (MTME) approach. MTME combines the theory of design of experiments with latent variable modelling to estimate response error variances of different error types simultaneously. This allows researchers to evaluate which errors are most impactful, and aids in the discovery of optimal question formats. We apply this approach using representative data from the United Kingdom to six survey items measuring attitudes towards immigrants that are commonly used across public opinion studies
Estimating stochastic survey response errors using the multitrait‐multierror model
Surveys are well known to contain response errors of different types, including acquiescence, social desirability, common method variance and random error simultaneously. Nevertheless, a single error source at a time is all that most methods developed to estimate and correct for such errors consider in practice. Consequently, estimation of response errors is inefficient, their relative importance is unknown and the optimal question format may not be discoverable. To remedy this situation, we demonstrate how multiple types of errors can be estimated concurrently with the recently introduced ‘multitrait-multierror’ (MTME) approach. MTME combines the theory of design of experiments with latent variable modelling to estimate response error variances of different error types simultaneously. This allows researchers to evaluate which errors are most impactful, and aids in the discovery of optimal question formats. We apply this approach using representative data from the United Kingdom to six survey items measuring attitudes towards immigrants that are commonly used across public opinion studies
- …