1,348 research outputs found

    Deductive semiparametric estimation in Double-Sampling Designs with application to PEPFAR

    Full text link
    Non-ignorable dropout is common in studies with long follow-up time, and it can bias study results unless handled carefully. A double-sampling design allocates additional resources to pursue a subsample of the dropouts and find out their outcomes, which can address potential biases due to non-ignorable dropout. It is desirable to construct semiparametric estimators for the double-sampling design because of their robustness properties. However, obtaining such semiparametric estimators remains a challenge due to the requirement of the analytic form of the efficient influence function (EIF), the derivation of which can be ad hoc and difficult for the double-sampling design. Recent work has shown how the derivation of EIF can be made deductive and computerizable using the functional derivative representation of the EIF in nonparametric models. This approach, however, requires deriving the mixture of a continuous distribution and a point mass, which can itself be challenging for complicated problems such as the double-sampling design. We propose semiparametric estimators for the survival probability in double-sampling designs by generalizing the deductive and computerizable estimation approach. In particular, we propose to build the semiparametric estimators based on a discretized support structure, which approximates the possibly continuous observed data distribution and circumvents the derivation of the mixture distribution. Our approach is deductive in the sense that it is expected to produce semiparametric locally efficient estimators within finite steps without knowledge of the EIF. We apply the proposed estimators to estimating the mortality rate in a double-sampling design component of the President's Emergency Plan for AIDS Relief (PEPFAR) program. We evaluate the impact of double-sampling selection criteria on the mortality rate estimates

    Discussions

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111979/1/j.1751-5823.2011.00145.x.pd

    Semiparametric theory and empirical processes in causal inference

    Full text link
    In this paper we review important aspects of semiparametric theory and empirical processes that arise in causal inference problems. We begin with a brief introduction to the general problem of causal inference, and go on to discuss estimation and inference for causal effects under semiparametric models, which allow parts of the data-generating process to be unrestricted if they are not of particular interest (i.e., nuisance functions). These models are very useful in causal problems because the outcome process is often complex and difficult to model, and there may only be information available about the treatment process (at best). Semiparametric theory gives a framework for benchmarking efficiency and constructing estimators in such settings. In the second part of the paper we discuss empirical process theory, which provides powerful tools for understanding the asymptotic behavior of semiparametric estimators that depend on flexible nonparametric estimators of nuisance functions. These tools are crucial for incorporating machine learning and other modern methods into causal inference analyses. We conclude by examining related extensions and future directions for work in semiparametric causal inference

    Response-dependent two-phase sampling designs

    Get PDF
    This is the peer reviewed version of the following article: McIsaac, M. A. and Cook, R. J. (2014), Response-dependent two-phase sampling designs for biomarker studies. Can J Statistics, 42: 268–284. doi: 10.1002/cjs.11207, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/cjs.11207/references. This article may be used for non-commercial purposes in accordance With Wiley Terms and Conditions for self-archiving.Two-phase sampling designs are developed and investigated for use in the context of a rheumatology study where interest lies in the association between a biomarker with an expensive assay and disease progression. We derive optimal phase-II stratum-specific sampling probabilities for analyses from parametric maximum likelihood (ML), mean score (MS), inverse probability weighted (IPW), and augmented IPW (AIPW) estimating equations. The easy-to-implement optimally efficient design for the MS estimator is found to be asymptotically optimal for the IPW and AIPW estimators we consider, and is shown to result in efficiency gains over balanced and simple random sampling even when analyses are likelihood-based. We further demonstrate the robustness of this optimal design and show that it results in very efficient estimation even when the model or parameters used in its derivation are misspecified.Natural Sciences and Engineering Research Council of Canada (RGPIN 155849); Canadian Institutes for Health Research (FRN 13887

    Augmented two-step estimating equations with nuisance functionals and complex survey data

    Full text link
    Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedure can be carried out through a two-step generalized empirical likelihood method. Unfortunately, the resulting inference is not efficient and the nonparametric version of the Wilks' theorem breaks down even under simple random sampling. We propose an augmented estimating equations method with nuisance functionals and complex surveys. The second-step augmented estimating functions obey the Neyman orthogonality condition and automatically handle the impact of the first-step plug-in estimator, and the resulting estimator of the main parameters of interest is invariant to the first step method. More importantly, the generalized empirical likelihood based Wilks' theorem holds for the main parameters of interest under the design-based framework for commonly used survey designs, and the maximum generalized empirical likelihood estimators achieve the semiparametric efficiency bound. Performances of the proposed methods are demonstrated through simulation studies and an application using the dataset from the New York City Social Indicators Survey.Comment: 43 page
    corecore