92 research outputs found
Nested Partially-Latent Class Models for Dependent Binary Data; Estimating Disease Etiology
The Pneumonia Etiology Research for Child Health (PERCH) study seeks to use
modern measurement technology to infer the causes of pneumonia for which
gold-standard evidence is unavailable. The paper describes a latent variable
model designed to infer from case-control data the etiology distribution for
the population of cases, and for an individual case given his or her
measurements. We assume each observation is drawn from a mixture model for
which each component represents one cause or disease class. The model addresses
a major limitation of the traditional latent class approach by taking account
of residual dependence among multivariate binary outcome given disease class,
hence reduces estimation bias, retains efficiency and offers more valid
inference. Such "local dependence" on a single subject is induced in the model
by nesting latent subclasses within each disease class. Measurement precision
and covariation can be estimated using the control sample for whom the class is
known. In a Bayesian framework, we use stick-breaking priors on the subclass
indicators for model-averaged inference across different numbers of subclasses.
Assessment of model fit and individual diagnosis are done using posterior
samples drawn by Gibbs sampling. We demonstrate the utility of the method on
simulated and on the motivating PERCH data.Comment: 30 pages with 5 figures and 1 table; 1 appendix with 4 figures and 1
tabl
ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization
Traditional applications of latent class models (LCMs) often focus on
scenarios where a set of unobserved classes are well-defined and easily
distinguishable. However, in numerous real-world applications, these classes
are weakly separated and difficult to distinguish, creating significant
numerical challenges. To address these issues, we have developed an R package
ddtlcm that provides comprehensive analysis and visualization tools designed to
enhance the robustness and interpretability of LCMs in the presence of weak
class separation, particularly useful for small sample sizes. This package
implements a tree-regularized Bayesian LCM that leverages statistical strength
between latent classes to make better estimates using limited data. A Shiny app
has also been developed to improve user interactivity. In this paper, we
showcase a typical analysis pipeline with simulated data using ddtlcm. All
software has been made publicly available on CRAN and GitHub
A robust test for the stationarity assumption in sequential decision making
Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in suboptimal policies learned under stationary assumptions. In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments
Partially-Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology
In population studies on the etiology of disease, one goal is the estimation
of the fraction of cases attributable to each of several causes. For example,
pneumonia is a clinical diagnosis of lung infection that may be caused by
viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology
is challenging because directly sampling from the lung to identify the
etiologic pathogen is not standard clinical practice in most settings. Instead,
measurements from multiple peripheral specimens are made. This paper introduces
the statistical methodology designed for estimating the population etiology
distribution and the individual etiology probabilities in the Pneumonia
Etiology Research for Child Health (PERCH) study of 9; 500 children for 7 sites
around the world. We formulate the scientific problem in statistical terms as
estimating the mixing weights and latent class indicators under a
partially-latent class model (pLCM) that combines heterogeneous measurements
with different error rates obtained from a case-control study. We introduce the
pLCM as an extension of the latent class model. We also introduce graphical
displays of the population data and inferred latent-class frequencies. The
methods are tested with simulated data, and then applied to PERCH data. The
paper closes with a brief description of extensions of the pLCM to the
regression setting and to the case where conditional independence among the
measures is relaxed.Comment: 25 pages, 4 figures, 1 supplementary materia
Weakly-supervised Multi-output Regression via Correlated Gaussian Processes
Multi-output regression seeks to infer multiple latent functions using data
from multiple groups/sources while accounting for potential between-group
similarities. In this paper, we consider multi-output regression under a
weakly-supervised setting where a subset of data points from multiple groups
are unlabeled. We use dependent Gaussian processes for multiple outputs
constructed by convolutions with shared latent processes. We introduce
hyperpriors for the multinomial probabilities of the unobserved labels and
optimize the hyperparameters which we show improves estimation. We derive two
variational bounds: (i) a modified variational bound for fast and stable
convergence in model inference, (ii) a scalable variational bound that is
amenable to stochastic optimization. We use experiments on synthetic and
real-world data to show that the proposed model outperforms state-of-the-art
models with more accurate estimation of multiple latent functions and
unobserved labels
Deductive Derivation and Computerization of Compatible Semiparametric Efficient Estimation
Researchers often seek robust inference for a parameter through semiparametric estimation. Efficient semiparametric estimation currently requires theoretical derivation of the efficient influence function (EIF), which can be a challenging and time-consuming task. If this task can be computerized, it can save dramatic human effort, which can be transferred, for example, to the design of new studies. Although the EIF is, in principle, a derivative, simple numerical differentiation to calculate the EIF by a computer masks the EIF\u27s functional dependence on the parameter of interest. For this reason, the standard approach to obtaining the EIF has been the theoretical construction of the space of scores under all possible parametric submodels. This process currently depends on the correctness of conjectures about these spaces, and the correct verification of such conjectures. The correct guessing of such conjectures, though successful in some problems, is a nondeductive process, i.e., is not guaranteed to succeed (e.g., is not computerizable), and the verification of conjectures is generally susceptible to mistakes. We propose a method that can deductively produce semiparametric locally efficient estimators. The proposed method is computerizable, meaning that it does not need either conjecturing for, or otherwise theoretically deriving the functional form of the EIF, and is guaranteed to produce the result. The method is demonstared through an example
Dynamic Survival Transformers for Causal Inference with Electronic Health Records
In medicine, researchers often seek to infer the effects of a given treatment
on patients' outcomes. However, the standard methods for causal survival
analysis make simplistic assumptions about the data-generating process and
cannot capture complex interactions among patient covariates. We introduce the
Dynamic Survival Transformer (DynST), a deep survival model that trains on
electronic health records (EHRs). Unlike previous transformers used in survival
analysis, DynST can make use of time-varying information to predict evolving
survival probabilities. We derive a semi-synthetic EHR dataset from MIMIC-III
to show that DynST can accurately estimate the causal effect of a treatment
intervention on restricted mean survival time (RMST). We demonstrate that DynST
achieves better predictive and causal estimation than two alternative models.Comment: Accepted to the NeurIPS 2022 Workshop on Learning from Time Series
for Healt
- …