112 research outputs found

    Informed strategies for multivariate missing data

    Get PDF
    Joint modelling (JM) and fully conditional specification (FCS) are two widely used strategies for imputing multivariate missing data. JM involves specifying a multivariate distribution for the missing data and drawing imputations from their conditional distributions. The FCS approach specifies the distribution for each partially observed variable conditional on all other variables. The main advantage of FCS over JM is that FCS allows for tremendous flexibility in multivariate model design. However, there are often extra structures in the missing data that FCS cannot model properly in practice. Moreover, it is challenging to preserve the relations among multiple variables when performing the imputation on a variable-by-variable basis. This thesis aims to develop hybrid imputation that provides a strategy to specify hybrids of JM and FCS. To achieve this goal, I propose different solutions to missing data problems when applying FCS is not optimal. In chapter 2, I first discuss some general methods to impute squares. I improve the polynomial combination method and compare it with the substantive model compatible fully conditional specification method. Finally, I summarise the properties of both approaches. In chapter 3, I develop multivariate predictive mean matching, which allows simultaneous imputation of multiple missing variables. I combine the methodology of univariate predictive mean matching and canonical regression analysis. The advantage of this imputation method is the preservation of relations among a set of missing variables. Finally, I show the potential scenarios where multivariate predictive mean matching could be used and discuss the limitations. In chapter 4, I develop the hybrid imputation method to estimate individual treatment effects. The idea is that by imputing unobserved outcomes, we could calculate the differences between potential outcomes under different treatment conditions. However, there is a problem the data has no information about the correlation between potential outcomes. The proposed hybrid imputation method specifies the partial correlation and performs a sensitivity analysis to overcome this problem. Finally, I demonstrate the validity of the proposed hybrid imputation method and show how to apply it in practice. In chapter 5, I investigate the compatibility of FCS when the prior for conditional models are informative. Many authors illustrated the compatibility property of FCS when the prior for conditional models is non-informative. However, the compatibility property in the case of informative priors has not received much attention. I demonstrate that FCS under the normal linear model with an informative inverse-gamma prior is compatible with a joint distribution and provide the corresponding normal inverse-Wishart prior distribution for the joint distribution. In chapter 6, I develop a novel strategy to diagnose multiple imputation models based on posterior predictive checking. The general idea is that if the imputation model is congenial to the substantive model, the expected value of the observed data is in the centre of corresponding predictive posterior distributions. By applying the proposed diagnosis method, the researcher could compare the `over-imputed’ data with the observed data and evaluate the fitness of the imputation model

    Scene Graph Generation with External Knowledge and Image Reconstruction

    Full text link
    Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better scene graphs, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.Comment: 10 pages, 5 figures, Accepted in CVPR 201

    A note on imputing squares via polynomial combination approach

    Get PDF
    The polynomial combination (PC) method, proposed by Vink and Van Buuren, is a hot-deck multiple imputation method for imputation models containing squared terms. The method yields unbiased regression estimates and preserves the quadratic relationships in the imputed data for both MCAR and MAR mechanisms. However, Vink and Van Buuren never studied the coverage rate of the PC method. This paper investigates the coverage of the nominal 95% confidence intervals for the polynomial combination method and improves the algorithm to avoid the perfect prediction issue. We also compare the original and the improved PC method to the substantive model compatible fully conditional specification method proposed by Bartlett et al. and elucidate the two imputation methods’ characters

    Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking

    Get PDF
    Missing data are often dealt with multiple imputation. A crucial part of the multiple imputation process is selecting sensible models to generate plausible values for incomplete data. A method based on posterior predictive checking is proposed to diagnose imputation models based on posterior predictive checking. To assess the congeniality of imputation models, the proposed diagnostic method compares the observed data with their replicates generated under corresponding posterior predictive distributions. If the imputation model is congenial with the substantive model, the observed data are expected to be located in the centre of corresponding predictive posterior distributions. Simulation and application are designed to investigate the proposed diagnostic method for parametric and semi-parametric imputation approaches, continuous and discrete incomplete variables, univariate and multivariate missingness patterns. The results show the validity of the proposed diagnostic method

    Kruppel-Like Factor 4-Dependent Staufen1-Mediated mRNA Decay Regulates Cortical Neurogenesis

    Get PDF
    Kruppel-like factor 4 (Klf4) is a zinc-finger-containing protein that plays a critical role in diverse cellular physiology. While most of these functions attribute to its role as a transcription factor, it is postulated that Klf4 may play a role other than transcriptional regulation. Here we demonstrate that Klf4 loss in neural progenitor cells (NPCs) leads to increased neurogenesis and reduced self-renewal in mice. In addition, Klf4 interacts with RNA-binding protein Staufen1 (Stau1) and RNA helicase Ddx5/17. They function together as a complex to maintain NPC self-renewal. We report that Klf4 promotes Stau1 recruitment to the 3′-untranslated region of neurogenesis-associated mRNAs, increasing Stau1-mediated mRNA decay (SMD) of these transcripts. Stau1 depletion abrogated SMD of target mRNAs and rescued neurogenesis defects in Klf4-overexpressing NPCs. Furthermore, Ddx5/17 knockdown significantly blocked Klf4-mediated mRNA degradation. Our results highlight a novel molecular mechanism underlying stability of neurogenesis-associated mRNAs controlled by the Klf4/Ddx5/17/Stau1 axis during mammalian corticogenesis

    How to relate potential outcomes: Estimating individual treatment effects under a given specified partial correlation

    Get PDF
    In most medical research, the average treatment effect is used to evaluate a treatment’s performance. However, precision medicine requires knowledge of individual treatment effects: What is the difference between a unit’s measurement under treatment and control conditions? In most treatment effect studies, such answers are not possible because the outcomes under both experimental conditions are not jointly observed. This makes the problem of causal inference a missing data problem. We propose to solve this problem by 

    A blended distance to define "people-like-me"

    Get PDF
    Curve matching is a prediction technique that relies on predictive mean matching, which matches donors that are most similar to a target based on the predictive distance. Even though this approach leads to high prediction accuracy, the predictive distance may make matches look unconvincing, as the profiles of the matched donors can substantially differ from the profile of the target. To counterbalance this, similarity between the curves of the donors and the target can be taken into account by combining the predictive distance with the Mahalanobis distance into a `blended distance' measure. The properties of this measure are evaluated in two simulation studies. Simulation study I evaluates the performance of the blended distance under different data-generating conditions. The results show that blending towards the Mahalanobis distance leads to worse performance in terms of bias, coverage, and predictive power. Simulation study II evaluates the blended metric in a setting where a single value is imputed. The results show that a property of blending is the bias-variance trade off. Giving more weight to the Mahalanobis distance leads to less variance in the imputations, but less accuracy as well. The main conclusion is that the high prediction accuracy achieved with the predictive distance necessitates the variability in the profiles of donors
    • …
    corecore