120 research outputs found

    Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness

    Full text link
    Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.Comment: Journal of Computational and Graphical Statistics, 202

    Topics in bootstrap methods for survey sampling and spatially balanced design

    Get PDF
    This dissertation consists of three parts. In the first part, we propose new bootstrap methods for three commonly used sampling designs, including the Poisson sampling, simple random sampling, and probability-proportional-to-size sampling. We show that the proposed bootstrap methods are second-order accurate and easy to be implemented in practice. Two simulation studies are conducted to compare the proposed bootstrap methods with the Wald method, and the proposed bootstrap methods outperform the Wald method in terms of coverage rate. It is well-known that a spatially balanced sample, which spread over the study domain well, can improve the estimation efficiency under dependent settings. In the second part, we propose to use a block bootstrap method to estimate the variance and make inference based on a sample generated by a one-per-stratum sampling design. We show the validity of the block bootstrap method and compare it with another commonly used sampling design theoretically. Simulation study shows that the block bootstrap method can provide valid variance estimator and inference for the one-per-stratum sampling design. Although there are many researches about spatially balanced sampling design, there are few discussing the spatio-temporal balanced sampling design. In the third part, we propose a spatio-temporal balanced sampling design to generate annual samples, such that the sample for each year is spatially balanced, and the one combining from consecutive years is also spatially balanced. We also propose design-based variance estimator for the estimates of annual status and annual change. The proposed sampling design is used in the National Resources Inventory rangeland on-site survey, and it shows that the proposed design performs better than the current design and estimators

    Oral GS-441524 derivatives: Next-generation inhibitors of SARS‐CoV‐2 RNA‐dependent RNA polymerase

    Get PDF
    GS-441524, an RNA‐dependent RNA polymerase (RdRp) inhibitor, is a 1′-CN-substituted adenine C-nucleoside analog with broad-spectrum antiviral activity. However, the low oral bioavailability of GS‐441524 poses a challenge to its anti-SARS-CoV-2 efficacy. Remdesivir, the intravenously administered version (version 1.0) of GS-441524, is the first FDA-approved agent for SARS-CoV-2 treatment. However, clinical trials have presented conflicting evidence on the value of remdesivir in COVID-19. Therefore, oral GS-441524 derivatives (VV116, ATV006, and GS-621763; version 2.0, targeting highly conserved viral RdRp) could be considered as game-changers in treating COVID-19 because oral administration has the potential to maximize clinical benefits, including decreased duration of COVID-19 and reduced post-acute sequelae of SARS-CoV-2 infection, as well as limited side effects such as hepatic accumulation. This review summarizes the current research related to the oral derivatives of GS-441524, and provides important insights into the potential factors underlying the controversial observations regarding the clinical efficacy of remdesivir; overall, it offers an effective launching pad for developing an oral version of GS-441524

    An adaptive weighting algorithm for accurate radio tomographic image in the environment with multipath and WiFi interference

    Get PDF
    Radio frequency device-free localization based on wireless sensor network has proved its feasibility in buildings. With this technique, a target can be located relying on the changes of received signal strengths caused by the moving object. However, the accuracy of many such systems deteriorates seriously in the environment with WiFi and the multipath interference. State-of-the-art methods do not efficiently solve the WiFi and multipath interference problems at the same time. In this article, we propose and evaluate an adaptive weighting radio tomography image algorithm to improve the accuracy of radio frequency device-free localization in the environment with multipath and different intensity of WiFi interference. Field experiments prove that our approach outperforms the state-of-the-art radio frequency device-free localization systems in the environment with multipath and WiFi interference

    Probability Weighted Clustered Coefficients Regression Models in Complex Survey Sampling

    Full text link
    Regression analysis is commonly conducted in survey sampling. However, existing methods fail when the relationships vary across different areas or domains. In this paper, we propose a unified framework to study the group-wise covariate effect under complex survey sampling based on pairwise penalties, and the associated objective function is solved by the alternating direction method of multipliers. Theoretical properties of the proposed method are investigated under some generality conditions. Numerical experiments demonstrate the superiority of the proposed method in terms of identifying groups and estimation efficiency for both linear regression models and logistic regression models.Comment: 35 pages,2 figure

    Multiple bias-calibration for adjusting selection bias of non-probability samples using data integration

    Full text link
    Valid statistical inference is challenging when the sample is subject to unknown selection bias. Data integration can be used to correct for selection bias when we have a parallel probability sample from the same population with some common measurements. How to model and estimate the selection probability or the propensity score (PS) of a non-probability sample using an independent probability sample is the challenging part of the data integration. We approach this difficult problem by employing multiple candidate models for PS combined with empirical likelihood. By incorporating multiple propensity score models into the internal bias calibration constraint in the empirical likelihood setup, the selection bias can be eliminated so long as the multiple candidate models contain a true PS model. The bias calibration constraint under the multiple PS models is called multiple bias calibration. Multiple PS models can include both missing-at-random and missing-not-at-random models. Asymptotic properties are discussed, and some limited simulation studies are presented to compare the proposed method with some existing competitors. Plasmode simulation studies using the Culture \& Community in a Time of Crisis dataset demonstrate the practical usage and advantages of the proposed method

    Sampling techniques for big data analysis in finite population inference

    Get PDF
    In analyzing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary information from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent probability sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed methods are easy to implement in practice.Comment: 24 pages, 3 table
    corecore