40 research outputs found

    Multi-study R-learner for Heterogeneous Treatment Effect Estimation

    Full text link
    We propose a general class of algorithms for estimating heterogeneous treatment effects on multiple studies. Our approach, called the multi-study R-learner, generalizes the R-learner to account for between-study heterogeneity and achieves cross-study robustness of confounding adjustment. The multi-study R-learner is flexible in its ability to incorporate many machine learning techniques for estimating heterogeneous treatment effects, nuisance functions, and membership probabilities. We show that the multi-study R-learner treatment effect estimator is asymptotically normal within the series estimation framework. Moreover, we illustrate via realistic cancer data experiments that our approach results in lower estimation error than the R-learner as between-study heterogeneity increases

    Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    Get PDF
    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.B. Ren is supported by National Science Foundation under Grant No. DMS-1042785. S. Favaro is supported by the European Research Council (ERC) through StG N-BNP 306406. L. Trippa has been supported by the Claudia Adams Barr Program in Innovative Basic Cancer Research. S. Holmes was supported by the NIH grant R01AI112401

    Biogeography of the Intestinal Mucosal and Lumenal Microbiome in the Rhesus Macaque

    Get PDF
    SummaryThe gut microbiome is widely studied by fecal sampling, but the extent to which stool reflects the commensal composition at intestinal sites is poorly understood. We investigated this relationship in rhesus macaques by 16S sequencing feces and paired lumenal and mucosal samples from ten sites distal to the jejunum. Stool composition correlated highly with the colonic lumen and mucosa and moderately with the distal small intestine. The mucosal microbiota varied most based on location and was enriched in oxygen-tolerant taxa (e.g., Helicobacter and Treponema), while the lumenal microbiota showed inter-individual variation and obligate anaerobe enrichment (e.g., Firmicutes). This mucosal and lumenal community variability corresponded to functional differences, such as nutrient availability. Additionally, Helicobacter, Faecalibacterium, and Lactobacillus levels in stool were highly predictive of their abundance at most other gut sites. These results quantify the composition and biogeographic relationships between gut microbial communities in macaques and support fecal sampling for translational studies

    Multivariable association discovery in population-scale meta-omics studies.

    Get PDF
    It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2\u27s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles
    corecore