Search CORE

241 research outputs found

Data and Statistical Methods To Analyze the Human Microbiome

Author: Waldron Levi
Publication venue: CUNY Academic Works
Publication date: 01/03/2018
Field of study

The Waldron lab for computational biostatistics bridges the areas of cancer genomics and microbiome studies for public health, developing methods to exploit publicly available data resources and to integrate-omics studies

City University of New York

Directory of Open Access Journals

BAYESIAN NONPARAMETRIC CROSS-STUDY VALIDATION OF PREDICTION METHODS

Author: Huttenhower Curtis
Parmigiani Giovanni
Trippa Lorenzo
Waldron Levi
Publication venue: CUNY Academic Works
Publication date: 01/01/2015
Field of study

We consider comparisons of statistical learning algorithms using multiple data sets, via leave-one-in cross-study validation: each of the algorithms is trained on one data set; the resulting model is then validated on each remaining data set. This poses two statistical challenges that need to be addressed simultaneously. The first is the assessment of study heterogeneity, with the aim of identifying a subset of studies within which algorithm comparisons can be reliably carried out. The second is the comparison of algorithms using the ensemble of data sets. We address both problems by integrating clustering and model comparison. We formulate a Bayesian model for the array of cross-study validation statistics, which defines clusters of studies with similar properties and provides the basis for meaningful algorithm comparison in the presence of study heterogeneity. We illustrate our approach through simulations involving studies with varying severity of systematic errors, and in the context of medical prognosis for patients diagnosed with cancer, using high-throughput measurements of the transcriptional activity of the tumor’s genes

arXiv.org e-Print Archive

City University of New York

Crossref

Recommended from our members

Report on emerging technologies for translational bioinformatics: a symposium on gene expression profiling for archival tissues

Author: Huttenhower Curtis
Parmigiani Giovanni
Simpson Peter
Waldron Levi
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: With over 20 million formalin-fixed, paraffin-embedded (FFPE) tissue samples archived each year in the United States alone, archival tissues remain a vast and under-utilized resource in the genomic study of cancer. Technologies have recently been introduced for whole-transcriptome amplification and microarray analysis of degraded mRNA fragments from FFPE samples, and studies of these platforms have only recently begun to enter the published literature

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

University of Queensland eSpace

Lineage-specific interface proteins match up the cell cycle and differentiation in embryo stem cells.

Author: Brunak Søren
Quattrone Alessandro
Re Angela
Waldron Levi
Workman Christopher
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The shortage of molecular information on cell cycle changes along embryonic stem cell (ESC) differentiation prompts an in silico approach, which may provide a novel way to identify candidate genes or mechanisms acting in coordinating the two programs. We analyzed germ layer specific gene expression changes during the cell cycle and ESC differentiation by combining four human cell cycle transcriptome profiles with thirteen in vitro human ESC differentiation studies. To detect cross-talk mechanisms we then integrated the transcriptome data that displayed differential regulation with protein interaction data. A new class of non-transcriptionally regulated genes was identified, encoding proteins which interact systematically with proteins corresponding to genes regulated during the cell cycle or cell differentiation, and which therefore can be seen as interface proteins coordinating the two programs. Functional analysis gathered insights in fate-specific candidates of interface functionalities. The non-transcriptionally regulated interface proteins were found to be highly regulated by post-translational ubiquitylation modification, which may synchronize the transition between cell proliferation and differentiation in ESCs

City University of New York

Elsevier - Publisher Connector

Crossref

Directory of Open Access Journals

Copenhagen University Research Information System

Online Research Database In Technology

Assessment of statistical methods from single cell, bulk RNA-seq, and metagenomics applied to microbiome data

Author: Calgaro Matteo
Risso Davide
Romualdi Chiara
Vitulo Nicola
Waldron Levi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

BackgroundThe correct identification of differentially abundant microbial taxa between experimental conditions is a methodological and computational challenge. Recent work has produced methods to deal with the high sparsity and compositionality characteristic of microbiome data, but independent benchmarks comparing these to alternatives developed for RNA-seq data analysis are lacking.ResultsWe compare methods developed for single-cell and bulk RNA-seq, and specifically for microbiome data, in terms of suitability of distributional assumptions, ability to control false discoveries, concordance, power, and correct identification of differentially abundant genera. We benchmark these methods using 100 manually curated datasets from 16S and whole metagenome shotgun sequencing.ConclusionsThe multivariate and compositional methods developed specifically for microbiome analysis did not outperform univariate methods developed for differential expression analysis of RNA-seq data. We recommend a careful exploratory data analysis prior to application of any inferential model and we present a framework to help scientists make an informed choice of analysis methods in a dataset-specific manner

Catalogo dei prodotti della ricerca

Archivio istituzionale della ricerca - Università di Padova

Inferring random change point from left-censored longitudinal data by segmented mechanistic nonlinear models, with application in HIV surveillance study

Author: Braunstein Sarah L.
Nash Denis
Robertson McKaylee
Waldron Levi
Zhang Hongbin
Publication venue
Publication date: 02/08/2022
Field of study

The primary goal of public health efforts to control HIV epidemics is to diagnose and treat people with HIV infection as soon as possible after seroconversion. The timing of initiation of antiretroviral therapy (ART) treatment after HIV diagnosis is, therefore, a critical population-level indicator that can be used to measure the effectiveness of public health programs and policies at local and national levels. However, population-based data on ART initiation are unavailable because ART initiation and prescription are typically measured indirectly by public health departments (e.g., with viral suppression as a proxy). In this paper, we present a random change-point model to infer the time of ART initiation utilizing routinely reported individual-level HIV viral load from an HIV surveillance system. To deal with the left-censoring and the nonlinear trajectory of viral load data, we formulate a flexible segmented nonlinear mixed effects model and propose a Stochastic version of EM (StEM) algorithm, coupled with a Gibbs sampler for the inference. We apply the method to a random subset of HIV surveillance data to infer the timing of ART initiation since diagnosis and to gain additional insights into the viral load dynamics. Simulation studies are also performed to evaluate the properties of the proposed method

arXiv.org e-Print Archive

Recommended from our members

Metagenomic microbial community profiling using unique clade-specific marker genes

Author: Ballarini Annalisa
Huttenhower Curtis
Jousson Olivier
Narasimhan Vagheesh
Segata Nicola
Waldron Levi D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2013
Field of study

Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large datasets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches. We validated MetaPhlAn on terabases of short reads and provide the largest metagenomic profiling to date of the human gu

Harvard University - DASH

Cross-study validation for the assessment of prediction algorithms

Author: Bernau Christoph
Boulesteix Anne-Laure
Huttenhower Curtis
Parmigiani Giovanni
Riester Markus
Trippa Lorenzo
Waldron Levi
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/06/2014
Field of study

Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

City University of New York

Crossref

Harvard University - DASH

PubMed Central

Metagenomic biomarker discovery and explanation

Author: Garrett Wendy S.
Gevers Dirk
Huttenhower Curtis
Izard Jacques
Miropolsky Larisa
Segata Nicola
Waldron Levi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/.National Institute of Dental and Craniofacial Research (U.S.) (grant DE017106)National Institutes of Health (U.S.) (NIH grant AI078942)Burroughs Wellcome FundNational Institutes of Health (U.S.) (NIH 1R01HG005969

DSpace@MIT

Crossref

DigitalCommons@University of Nebraska

Springer - Publisher Connector

PubMed Central