45 research outputs found

    STATISTICAL METHODS FOR ANALYZING RANDOMIZED TRIALS AND BRAIN IMAGING DATA

    Get PDF
    My thesis work focuses on developing reliable and innovative statistical methods for improving analyses of biomedical data. I address two types of questions: improving precision of randomized clinical trials and identifying brain networks using brain imaging. For randomized clinical trials, we proved the statistical validity of two commonly used methods to improve precision while being robust to misspecification of models used in the analysis. We demonstrated our results by re-analyzing completed randomized trials and showed that substantial precision gain can be achieved by these two methods. For brain imaging, we proposed a consistent estimator for the brain networks that are common across people. Applied to a motor-task functional magnetic resonance imaging data set, our estimator identifies meaningful brain networks that are consistent with current scientific understandings of motor networks

    Analysis of Covariance (ANCOVA) in Randomized Trials: More Precision, Less Conditional Bias, and Valid Confidence Intervals, Without Model Assumptions

    Get PDF
    Covariate adjustment in the randomized trial context refers to an estimator of the average treatment effect that adjusts for chance imbalances between study arms in baseline variables (called “covariates ). The baseline variables could include, e.g., age, sex, disease severity, and biomarkers. According to two surveys of clinical trial reports, there is confusion about the statistical properties of covariate adjustment. We focus on the ANCOVA estimator, which involves fitting a linear model for the outcome given the treatment arm and baseline variables, and trials with equal probability of assignment to treatment and control. We prove the following new (to the best of our knowledge) robustness property of ANCOVA to arbitrary model misspecification: Not only is the ANCOVA point estimate consistent (as proved by Yang and Tsiatis (2001)) but so is its standard error. This implies that confidence intervals and hypothesis tests conducted as if the linear model were correct are still valid even when the linear model is arbitrarily misspecified, e.g., when the baseline variables are nonlinearly related to the outcome or there is treatment effect heterogeneity. We also give a simple, robust formula for the variance reduction (equivalently, sample size reduction) from using ANCOVA. By re-analyzing completed randomized trials for mild cognitive impairment, schizophrenia, and depression, we demonstrate how ANCOVA can reduce variance, reduce bias conditional on chance imbalance, and increase power even when by chance there is perfect balance across arms in the baseline variables

    A Differential Effect Approach to Partial Identification of Treatment Effects

    Full text link
    We consider identification and inference for the average treatment effect and heterogeneous treatment effect conditional on observable covariates in the presence of unmeasured confounding. Since point identification of average treatment effect and heterogeneous treatment effect is not achievable without strong assumptions, we obtain bounds on both average and heterogeneous treatment effects by leveraging differential effects, a tool that allows for using a second treatment to learn the effect of the first treatment. The differential effect is the effect of using one treatment in lieu of the other, and it could be identified in some observational studies in which treatments are not randomly assigned to units, where differences in outcomes may be due to biased assignments rather than treatment effects. With differential effects, we develop a flexible and easy-to-implement semi-parametric framework to estimate bounds and establish asymptotic properties over the support for conducting statistical inference. We provide conditions under which causal estimands are point identifiable as well in the proposed framework. The proposed method is examined by a simulation study and two case studies using datasets from National Health and Nutrition Examination Survey and Youth Risk Behavior Surveillance System.Comment: 52 pages, 5 figures, 11 table

    Improved Hardness of Approximating k-Clique under ETH

    Full text link
    In this paper, we prove that assuming the exponential time hypothesis (ETH), there is no f(k)nko(1/loglogk)f(k)\cdot n^{k^{o(1/\log\log k)}}-time algorithm that can decide whether an nn-vertex graph contains a clique of size kk or contains no clique of size k/2k/2, and no FPT algorithm can decide whether an input graph has a clique of size kk or no clique of size k/f(k)k/f(k), where f(k)f(k) is some function in k1o(1)k^{1-o(1)}. Our results significantly improve the previous works [Lin21, LRSW22]. The crux of our proof is a framework to construct gap-producing reductions for the kk-Clique problem. More precisely, we show that given an error-correcting code C:Σ1kΣ2kC:\Sigma_1^k\to\Sigma_2^{k'} that is locally testable and smooth locally decodable in the parallel setting, one can construct a reduction which on input a graph GG outputs a graph GG' in (k)O(1)nO(logΣ2/logΣ1)(k')^{O(1)}\cdot n^{O(\log|\Sigma_2|/\log|\Sigma_1|)} time such that: \bullet If GG has a clique of size kk, then GG' has a clique of size KK, where K=(k)O(1)K = (k')^{O(1)}. \bullet If GG has no clique of size kk, then GG' has no clique of size (1ε)K(1-\varepsilon)\cdot K for some constant ε(0,1)\varepsilon\in(0,1). We then construct such a code with k=kΘ(loglogk)k'=k^{\Theta(\log\log k)} and Σ2=Σ1k0.54|\Sigma_2|=|\Sigma_1|^{k^{0.54}}, establishing the hardness results above. Our code generalizes the derivative code [WY07] into the case with a super constant order of derivatives.Comment: 48 page

    The Mechanism Study of Vortex Tools Drainage Gas Recovery of Gas Well

    Get PDF
    The liquid loading of gas well is an important issue in deep exploitation of natural gas. The technic of vortex drainage has good prospects because the tool construction and construction work is simple, the technic is environmental and efficient. Currently, the mechanism for the vortex drainage and the theory of fluid motion are still missing. Therefore, in order to further understand the downhole flow field, verify drainage mechanism and select best working conditions, based on computational fluid dynamics and mixture model of multiphase flow through Fluent, the study established a three-dimensional structural model of vortex tools and the numerical simulation has been done. By monitoring the wellhead and the radial distribution of the liquid content and observing the state of the gas-liquid flow and the path line, the study analyzed the influence on gas well flow field by vortex tool. The study revealed the working mechanism of vortex tools to facilitate understanding the nature of the vortex drainage process, guide how to select the preferred process conditions and provide theoretical basis for the application and the dynamics simulation of vortex drainage technology.Key words: The liquid loading of gas well; Vortex drainage; Multiphase flow; Numerical simulatio

    Model-robust and efficient covariate adjustment for cluster-randomized experiments

    Full text link
    Cluster-randomized experiments are increasingly used to evaluate interventions in routine practice conditions, and researchers often adopt model-based methods with covariate adjustment in the statistical analyses. However, the validity of model-based covariate adjustment is unclear when the working models are misspecified, leading to ambiguity of estimands and risk of bias. In this article, we first adapt two conventional model-based methods, generalized estimating equations and linear mixed models, with weighted g-computation to achieve robust inference for cluster-average and individual-average treatment effects. To further overcome the limitations of model-based covariate adjustment methods, we propose an efficient estimator for each estimand that allows for flexible covariate adjustment and additionally addresses cluster size variation dependent on treatment assignment and other cluster characteristics. Such cluster size variations often occur post-randomization and, if ignored, can lead to bias of model-based estimators. For our proposed efficient covariate-adjusted estimator, we prove that when the nuisance functions are consistently estimated by machine learning algorithms, the estimator is consistent, asymptotically normal, and efficient. When the nuisance functions are estimated via parametric working models, the estimator is triply-robust. Simulation studies and analyses of three real-world cluster-randomized experiments demonstrate that the proposed methods are superior to existing alternatives

    Model-Robust Inference for Clinical Trials that Improve Precision by Stratified Randomization and Adjustment for Additional Baseline Variables

    Get PDF
    We focus on estimating the average treatment effect in clinical trials that involve stratified randomization, which is commonly used. It is important to understand the large sample properties of estimators that adjust for stratum variables (those used in the randomization procedure) and additional baseline variables, since this can lead to substantial gains in precision and power. Surprisingly, to the best of our knowledge, this is an open problem. It was only recently that a simpler problem was solved by Bugni et al. (2018) for the case with no additional baseline variables, continuous outcomes, the analysis of covariance (ANCOVA) estimator, and no missing data. We generalize their results in three directions. First, in addition to continuous outcomes, we handle binary and time-to-event outcomes; this broadens the applicability of the results. Second, we allow adjustment for an additional, preplanned set of baseline variables, which can improve precision. Third, we handle missing outcomes under the missing at random assumption. We prove that a wide class of estimators is asymptotically normally distributed under stratified randomization and has equal or smaller asymptotic variance than under simple randomization. For each estimator in this class, we give a consistent variance estimator. This is important in order to fully capitalize on the combined precision gains from stratified randomization and adjustment for additional baseline variables. The above results also hold for the biased-coin covariate-adaptive design. We demonstrate our results using completed trial data sets of treatments for substance use disorder, where adjustment for additional baseline variables brings substantial variance reduction

    On the mixed-model analysis of covariance in cluster-randomized trials

    Full text link
    In the analyses of cluster-randomized trials, a standard approach for covariate adjustment and handling within-cluster correlations is the mixed-model analysis of covariance (ANCOVA). The mixed-model ANCOVA makes stringent assumptions, including normality, linearity, and a compound symmetric correlation structure, which may be challenging to verify and may not hold in practice. When mixed-model ANCOVA assumptions are violated, the validity and efficiency of the model-based inference for the average treatment effect are currently unclear. In this article, we prove that the mixed-model ANCOVA estimator for the average treatment effect is consistent and asymptotically normal under arbitrary misspecification of its working model. Under equal randomization, we further show that the model-based variance estimator for the mixed-model ANCOVA estimator remains consistent, clarifying that the confidence interval given by standard software is asymptotically valid even under model misspecification. Beyond robustness, we also provide a caveat that covariate adjustment via mixed-model ANCOVA may lead to precision loss compared to no adjustment when the covariance structure is misspecified, and describe when a cluster-level ANCOVA becomes more efficient. These results hold under both simple and stratified randomization, and are further illustrated via simulations as well as analyses of three cluster-randomized trials

    Semiparametric partial common principal component analysis for covariance matrices

    Get PDF
    We consider the problem of jointly modeling multiple covariance matrices by partial common principal component analysis (PCPCA), which assumes a proportion of eigenvectors to be shared across covariance matrices and the rest to be individual-specific. This paper proposes consistent estimators of the shared eigenvectors in the PCPCA as the number of matrices or the number of samples to estimate each matrix goes to infinity. We prove such asymptotic results without making any assumptions on the ranks of eigenvalues that are associated with the shared eigenvectors. When the number of samples goes to infinity, our results do not require the data to be Gaussian distributed. Furthermore, this paper introduces a sequential testing procedure to identify the number of shared eigenvectors in the PCPCA. In simulation studies, our method shows higher accuracy in estimating the shared eigenvectors than competing methods. Applied to a motor-task functional magnetic resonance imaging data set, our estimator identifies meaningful brain networks that are consistent with current scientific understandings of motor networks during a motor paradigm
    corecore