Search CORE

92 research outputs found

Nested Partially-Latent Class Models for Dependent Binary Data; Estimating Disease Etiology

Author: Deloria-Knoll Maria
Wu Zhenke
Zeger Scott
Publication venue
Publication date: 29/10/2015
Field of study

The Pneumonia Etiology Research for Child Health (PERCH) study seeks to use modern measurement technology to infer the causes of pneumonia for which gold-standard evidence is unavailable. The paper describes a latent variable model designed to infer from case-control data the etiology distribution for the population of cases, and for an individual case given his or her measurements. We assume each observation is drawn from a mixture model for which each component represents one cause or disease class. The model addresses a major limitation of the traditional latent class approach by taking account of residual dependence among multivariate binary outcome given disease class, hence reduces estimation bias, retains efficiency and offers more valid inference. Such "local dependence" on a single subject is induced in the model by nesting latent subclasses within each disease class. Measurement precision and covariation can be estimated using the control sample for whom the class is known. In a Bayesian framework, we use stick-breaking priors on the subclass indicators for model-averaged inference across different numbers of subclasses. Assessment of model fit and individual diagnosis are done using posterior samples drawn by Gibbs sampling. We demonstrate the utility of the method on simulated and on the motivating PERCH data.Comment: 30 pages with 5 figures and 1 table; 1 appendix with 4 figures and 1 tabl

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization

Author: Li Mengbing
Stephenson Briana
Wu Bolin
Wu Zhenke
Publication venue
Publication date: 20/09/2023
Field of study

Traditional applications of latent class models (LCMs) often focus on scenarios where a set of unobserved classes are well-defined and easily distinguishable. However, in numerous real-world applications, these classes are weakly separated and difficult to distinguish, creating significant numerical challenges. To address these issues, we have developed an R package ddtlcm that provides comprehensive analysis and visualization tools designed to enhance the robustness and interpretability of LCMs in the presence of weak class separation, particularly useful for small sample sizes. This package implements a tree-regularized Bayesian LCM that leverages statistical strength between latent classes to make better estimates using limited data. A Shiny app has also been developed to improve user interactivity. In this paper, we showcase a typical analysis pipeline with simulated data using ddtlcm. All software has been made publicly available on CRAN and GitHub

arXiv.org e-Print Archive

A robust test for the stationarity assumption in sequential decision making

Author: Shi Chengchun
Wang Jitao
Wu Zhenke
Publication venue
Publication date: 03/07/2023
Field of study

Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in suboptimal policies learned under stationary assumptions. In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments

LSE Research Online

Partially-Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology

Author: Deloria-Knoll Maria
Hammitt Laura L.
Wu Zhenke
Zeger Scott L.
Publication venue
Publication date: 31/05/2014
Field of study

In population studies on the etiology of disease, one goal is the estimation of the fraction of cases attributable to each of several causes. For example, pneumonia is a clinical diagnosis of lung infection that may be caused by viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology is challenging because directly sampling from the lung to identify the etiologic pathogen is not standard clinical practice in most settings. Instead, measurements from multiple peripheral specimens are made. This paper introduces the statistical methodology designed for estimating the population etiology distribution and the individual etiology probabilities in the Pneumonia Etiology Research for Child Health (PERCH) study of 9; 500 children for 7 sites around the world. We formulate the scientific problem in statistical terms as estimating the mixing weights and latent class indicators under a partially-latent class model (pLCM) that combines heterogeneous measurements with different error rates obtained from a case-control study. We introduce the pLCM as an extension of the latent class model. We also introduce graphical displays of the population data and inferred latent-class frequencies. The methods are tested with simulated data, and then applied to PERCH data. The paper closes with a brief description of extensions of the pLCM to the regression setting and to the case where conditional independence among the measures is relaxed.Comment: 25 pages, 4 figures, 1 supplementary materia

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

Weakly-supervised Multi-output Regression via Correlated Gaussian Processes

Author: Chung Seokhyun
Kontar Raed Al
Wu Zhenke
Publication venue
Publication date: 19/02/2020
Field of study

Multi-output regression seeks to infer multiple latent functions using data from multiple groups/sources while accounting for potential between-group similarities. In this paper, we consider multi-output regression under a weakly-supervised setting where a subset of data points from multiple groups are unlabeled. We use dependent Gaussian processes for multiple outputs constructed by convolutions with shared latent processes. We introduce hyperpriors for the multinomial probabilities of the unobserved labels and optimize the hyperparameters which we show improves estimation. We derive two variational bounds: (i) a modified variational bound for fast and stable convergence in model inference, (ii) a scalable variational bound that is amenable to stochastic optimization. We use experiments on synthetic and real-world data to show that the proposed model outperforms state-of-the-art models with more accurate estimation of multiple latent functions and unobserved labels

arXiv.org e-Print Archive

Deductive Derivation and Computerization of Compatible Semiparametric Efficient Estimation

Author: Diaz Ivan
Frangakis Constantine E.
Qian Tianchen
Wu Zhenke
Publication venue: Collection of Biostatistics Research Archive
Publication date: 21/05/2014
Field of study

Researchers often seek robust inference for a parameter through semiparametric estimation. Efficient semiparametric estimation currently requires theoretical derivation of the efficient influence function (EIF), which can be a challenging and time-consuming task. If this task can be computerized, it can save dramatic human effort, which can be transferred, for example, to the design of new studies. Although the EIF is, in principle, a derivative, simple numerical differentiation to calculate the EIF by a computer masks the EIF\u27s functional dependence on the parameter of interest. For this reason, the standard approach to obtaining the EIF has been the theoretical construction of the space of scores under all possible parametric submodels. This process currently depends on the correctness of conjectures about these spaces, and the correct verification of such conjectures. The correct guessing of such conjectures, though successful in some problems, is a nondeductive process, i.e., is not guaranteed to succeed (e.g., is not computerizable), and the verification of conjectures is generally susceptible to mistakes. We propose a method that can deductively produce semiparametric locally efficient estimators. The proposed method is computerizable, meaning that it does not need either conjecturing for, or otherwise theoretically deriving the functional form of the EIF, and is guaranteed to produce the result. The method is demonstared through an example

Collection Of Biostatistics Research Archive

Dynamic Survival Transformers for Causal Inference with Electronic Health Records

Author: Chatha Prayag
Regier Jeffrey
Wang Yixin
Wu Zhenke
Publication venue
Publication date: 25/10/2022
Field of study

In medicine, researchers often seek to infer the effects of a given treatment on patients' outcomes. However, the standard methods for causal survival analysis make simplistic assumptions about the data-generating process and cannot capture complex interactions among patient covariates. We introduce the Dynamic Survival Transformer (DynST), a deep survival model that trains on electronic health records (EHRs). Unlike previous transformers used in survival analysis, DynST can make use of time-varying information to predict evolving survival probabilities. We derive a semi-synthetic EHR dataset from MIMIC-III to show that DynST can accurately estimate the causal effect of a treatment intervention on restricted mean survival time (RMST). We demonstrate that DynST achieves better predictive and causal estimation than two alternative models.Comment: Accepted to the NeurIPS 2022 Workshop on Learning from Time Series for Healt

arXiv.org e-Print Archive