198,491 research outputs found

    A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome

    Get PDF
    Unobserved confounding is a well known threat to causal inference in non-experimental studies. The instrumental variable design can under certain conditions be used to recover an unbiased estimator of a treatment effect even if unobserved confounding cannot be ruled out with certainty. For continuous outcomes, two stage least squares is the most common instrumental variable estimator used in epidemiologic applications. For a rare binary outcome, an analogous linear-logistic two-stage procedure can be used. Alternatively, a control function approach is sometimes used which entails entering the residual from the first stage linear model as a covariate in a second stage logistic regression of the outcome on the treatment. Both strategies for binary response have previously formally been justified only for continuous exposure, which has impeded widespread use of the approach outside of this setting. In this note, we consider the important setting of binary exposure in the context of a binary outcome. We provide an alternative motivation for the control function approach which is appropriate for binary exposure, thus establishing simple conditions under which the approach may be used for instrumental variable estimation when the outcome is rare. In the proposed approach, the first stage regression involves a logistic model of the exposure conditional on the instrumental variable, and the second stage regression is a logistic regression of the outcome on the exposure adjusting for the first stage residual. In the event of a non-rare outcome, we recommend replacing the second stage logistic model with a risk ratio regression

    Causal Inference with Two-Stage Logistic Regression - Accuracy, Precision, and Application

    Get PDF
    Two-stage predictor substitution (2SPS) and the two-stage residual inclusion (2SRI) are two approaches to instrumental variable (IV) analysis. While 2SPS and 2SRI with linear models are well-studied methods of causal inference, the properties of 2SPS and 2SRI for logistic binary outcomes have not been thoroughly studied. We study the bias and variance properties of 2SPS and 2SRI for a logistic outcome model so that we can apply these IV approaches to the causal inference of binary outcomes. We also propose and implement an extension of generalized structure mean model originally developed for a randomized trial. We first present closed form expressions of asymptotic bias for the causal odds ratio from both 2SPS and 2SRI approaches. Our closed form bias results show that the 2SPS logistic regression generates asymptotically biased estimates of this causal odds ratio when there is no unmeasured confounding and that this bias increases with increasing unmeasured confounding. The 2SRI logistic regression is asymptotically unbiased when there is no unmeasured confounding, but when there is unmeasured confounding, there is bias and it increases with increasing unmeasured confounding. In the second part, we propose the sandwich variance estimator of logistic regression of both 2SPS and 2SRI approaches and the variance estimator is adjusted for the fact that the estimates from the first stage regression is included as covariates in the second stage regression. The simulation results show that the adjusted estimates are consistent with the observed variance while the naive estimates without the adjustments are biased. This study also shows that the 2SRI method has a larger variance than the 2SPS method. Lastly, we compare the 2SPS and 2SRI logistic regression with the generalized structure mean model (GSMM). Our simulation results show that the GSMM is an unbiased estimator of complier-average causal effect (CACE) and has the least variance among the three approaches. We apply these three methods to the analysis of the GPRD database on antidiabetic effect of bezafibrate

    Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis

    Get PDF
    Comparison of logistic regression, SVM and random forest performance in the plasma training data set. Table S2. Pathway significance and relative log fold changes in our metabolomics data and TCGA breast cancer RNA-Seq data. Table S3. Detected metabolites and their differential test results among the two models. a All-stage diagnosis model. b Early-stage diagnosis model. Table S4. Single-variate logistic analysis of metabolites or pathways selected as features in the metabolite-based or pathway-based early-stage diagnosis model. Table S5. Comparison of pathway features in the full-size (101 input pathways) and half-size (51 input pathways) pathway-based early-stage diagnosis models. (DOCX 34 kb

    Using brand knowledge to predict beer brand preference and loyalty for samples of new frequent users in Perth and Beijing

    Full text link
    This study tests a model of Brand Knowledge and Brand Equity of brands of beer on new and frequent users in two populations that differ in their stage of the beer product life cycle and culture. Using Multiple Logistic Regression (MLR) and Binomial Logistic Regression (BLR), models based on the respondents\u27 Brand Knowledge are able to correctly identify Chinese respondents&rsquo; preferred brand of beer 56% of the time, while correctly identifying 77% of respondents in an Australian sample when three top brands are tested. The model could further identify 67% of those that stay or switch in both the Australian and the Chinese samples.<br /

    A two-stage hybrid model by using artificial neural networks as feature construction algorithms

    Get PDF
    We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under the ROC curve, and KS statistic. By creating new features with the neural network technique, the underlying nonlinear relationships between variables are identified. Furthermore, by using a very simple neural network structure, the model could overcome the drawbacks of neural networks in terms of its long training time, complex topology, and limited interpretability

    Two-stage Regression for Treatment Effect Estimation.

    Full text link
    Two-stage regression is a common tool for instrumental variable analysis in applied research. This dissertation introduces additional uses of two-stage models that enable researchers to make inferences that are unavailable in one-stage models. The first part of the dissertation explores recent methodology that examines whether the predicted response to control affects the magnitude of the treatment effect. The method is a two-stage variation of the Peters-Belson method, studying the interaction between the treatment effect and a prognostic score, i.e. an outcome regression that was fitted to control observations and then extrapolated into the treatment group. The dissertation expands this method into a two-stage regression method termed “Peters-Belson with Prognostic Heterogeneity” (PBPH), which addresses propagation of error from the first stage to the second. A non-standard construction based on stacked estimating equations tests hypotheses about the treatment-prognostic score interaction, deriving confidence intervals by inverting families of tests. These tests combine characteristics of Wald and generalized score tests, improving the small-sample coverage of a similar Wald method and the power of comparable generalized score tests. Following this, the dissertation enhances the PBPH methodology, addressing complications that the applied researcher may encounter. The method is adapted to accommodate generalized linear models (as opposed to linear models only) at the first stage. Further adaptations accommodate designs assigning study subjects to treatment conditions by cluster, such as cluster randomized trials. A simulation study clarifies sample size requirements’ dependency on the complexity of the first stage model, culminating in a theoretically motivated rule of thumb for the maximum number of first stage regressors as a function of n. The final chapter examines treatment effect estimation with a binary response. Simulations reveal that using two-stage regression models sheds light on whether a treatment effect is linear on the logit scale (logistic regression) or linear on the probability scale (linear regression). Often, matching is performed on observational data to aid in treatment effect estimation. Conditional logistic regression is a typical approach to matched data with binary response; however, we show that two-stage regression in this setting offers benefits not available to a one-stage conditional logistic model.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135742/1/jerrick_1.pd

    Factors that Influence Farmers’ Participation in Soil and Water Conservations in Farta Woreda, South Gondar Zone of Amhara National Regional State, Ethiopia

    Get PDF
    Soil water conservation activities have an impact on food security. However, extent of participation in soil water conservation depends on the extent of farmers’ participation. The objective of this study was to examine factors that influence farmers’ participation in soil and water conservation activities. Primary and secondary data were used for this research and primary data were collected using pre-tested structured questionnaire. A two stage sampling procedure was employed to select sample households. In the first stage 6 sample kebeles was selected out of 41 kebelas and in the second stage from 8230 households, 381 sample households was selected through a simple random sampling. Descriptive statistics and binary logistic regression model was employed to identify factors that affect farmers’ participation in soil and water conservation. The result of binary logit model revealed variables such as family size, education level, livestock holding and size of cultivated land were significant factors that affect farmers’ participation in soil and water conservation activities. Keywords: Participation, Soil and Water Conservation, Binary Logistic Regression DOI: 10.7176/JNSR/12-15-01 Publication date:August 31st 202

    Factors that Influence Farmers’ Participation in Soil and Water Conservations in Farta Woreda, South Gondar Zone of Amhara National Regional State, Ethiopia

    Get PDF
    Soil water conservation activities have an impact on food security. However, extent of participation in soil water conservation depends on the extent of farmers’ participation. The objective of this study was to examine factors that influence farmers’ participation in soil and water conservation activities. Primary and secondary data were used for this research and primary data were collected using pre-tested structured questionnaire. A two stage sampling procedure was employed to select sample households. In the first stage 6 sample kebeles was selected out of 41 kebelas and in the second stage from 8230 households, 381 sample households was selected through a simple random sampling. Descriptive statistics and binary logistic regression model was employed to identify factors that affect farmers’ participation in soil and water conservation. The result of binary logit model revealed variables such as family size, education level, livestock holding and size of cultivated land were significant factors that affect farmers’ participation in soil and water conservation activities. Keywords: Participation, Soil and Water Conservation, Binary Logistic Regression DOI: 10.7176/JBAH/13-2-05 Publication date: January 31st 202
    • …
    corecore