Search CORE

1,483 research outputs found

Recommended from our members

Robust prediction of clinical outcomes using cytometry data.

Author: Butte Atul J
Glicksberg Benjamin S
Hu Zicheng
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

MotivationFlow cytometry and mass cytometry are widely used to diagnose diseases and to predict clinical outcomes. When associating clinical features with cytometry data, traditional analysis methods require cell gating as an intermediate step, leading to information loss and susceptibility to batch effects. Here, we wish to explore an alternative approach that predicts clinical features from cytometry data without the cell-gating step. We also wish to test if such a gating-free approach increases the accuracy and robustness of the prediction.ResultsWe propose a novel strategy (CytoDx) to predict clinical outcomes using cytometry data without cell gating. Applying CytoDx on real-world datasets allow us to predict multiple types of clinical features. In particular, CytoDx is able to predict the response to influenza vaccine using highly heterogeneous datasets, demonstrating that it is not only accurate but also robust to batch effects and cytometry platforms.Availability and implementationCytoDx is available as an R package on Bioconductor (bioconductor.org/packages/CytoDx). Data and scripts for reproducing the results are available on bitbucket.org/zichenghu_ucsf/cytodx_study_code/downloads.Supplementary informationSupplementary data are available at Bioinformatics online

eScholarship - University of California

Recommended from our members

Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates.

Author: Avila Patrick
Butte Atul J
Glicksberg Benjamin S
Harding-Theobald Emily
Rudrapatna Vivek A
Wang Connie
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

ObjectiveMedical billing data are an attractive source of secondary analysis because of their ease of use and potential to answer population-health questions with statistical power. Although these datasets have known susceptibilities to biases, the degree to which they can distort the assessment of quality measures such as colorectal cancer screening rates are not widely appreciated, nor are their causes and possible solutions.MethodsUsing a billing code database derived from our institution's electronic health records, we estimated the colorectal cancer screening rate of average-risk patients aged 50-74 years seen in primary care or gastroenterology clinic in 2016-2017. 200 records (150 unscreened, 50 screened) were sampled to quantify the accuracy against manual review.ResultsOut of 4611 patients, an analysis of billing data suggested a 61% screening rate, an estimate that matches the estimate by the Centers for Disease Control. Manual review revealed a positive predictive value of 96% (86%-100%), negative predictive value of 21% (15%-29%) and a corrected screening rate of 85% (81%-90%). Most false negatives occurred due to examinations performed outside the scope of the database-both within and outside of our institution-but 21% of false negatives fell within the database's scope. False positives occurred due to incomplete examinations and inadequate bowel preparation. Reasons for screening failure include ordered but incomplete examinations (48%), lack of or incorrect documentation by primary care (29%) including incorrect screening intervals (13%) and patients declining screening (13%).ConclusionsBilling databases are prone to substantial bias that may go undetected even in the presence of confirmatory external estimates. Caution is recommended when performing population-level inference from these data. We propose several solutions to improve the use of these data for the assessment of healthcare quality

eScholarship - University of California

Quantifying the relationship between co-expression, co-regulation and gene function

Author: Allocco Dominic J
Butte Atul J
Kohane Isaac S
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: It is thought that genes with similar patterns of mRNA expression and genes with similar functions are likely to be regulated via the same mechanisms. It has been difficult to quantitatively test these hypotheses on a large scale because there has been no general way of determining whether genes share a common regulatory mechanism. Here we use data from a recent genome wide binding analysis in combination with mRNA expression data and existing functional annotations to quantify the likelihood that genes with varying degrees of similarity in mRNA expression profile or function will be bound by a common transcription factor. RESULTS: Genes with strongly correlated mRNA expression profiles are more likely to have their promoter regions bound by a common transcription factor. This effect is present only at relatively high levels of expression similarity. In order for two genes to have a greater than 50% chance of sharing a common transcription factor binder, the correlation between their expression profiles (across the 611 microarrays used in our study) must be greater than 0.84. Genes with similar functional annotations are also more likely to be bound by a common transcription factor. Combining mRNA expression data with functional annotation results in a better predictive model than using either data source alone. CONCLUSIONS: We demonstrate how mRNA expression data and functional annotations can be used together to estimate the probability that genes share a common regulatory mechanism. Existing microarray data and known functional annotations are sufficient to identify only a relatively small percentage of co-regulated genes

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks

Author: Butte Atul J
Kohane Isaac S
Wolfe Cecily J
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA) heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain. RESULTS: We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO)? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories. CONCLUSION: We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing whether pairs of functions operate together

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data.

Author: Butte Atul J
Datta Debajyoti
Frazier Remi
Giangreco Nicholas
Glicksberg Benjamin S
Larsen Rick
Lee Nelson
Oskotsky Boris
Rudrapatna Vivek
Tatonetti Nicholas P
Thangaraj Phyllis M
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

Objectives:Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Out-comes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. Materials and methods:We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. Results:ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. Conclusion:ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu)

eScholarship - University of California

Recommended from our members

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Author: Butte Atul J
Fan Xuancheng
Glicksberg Benjamin S
Goldstein Theodore
Ludwig Dana
Muenzen Kathleen
Norgeot Beau
Oskotsky Boris
Peterson Thomas A
Rutenberg Eugenia
Schenk Gundolf
Schmajuk Gabriela
Sirota Marina
Yazdany Jinoos
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods

eScholarship - University of California

Percent Fat Mass Increases with Recovery, But Does Not Vary According to Dietary Therapy in Young Malian Children Treated for Moderate Acute Malnutrition.

Author: Ackatia-Armah
Arsenault
Bahwere
Black
Boutton
Brown
Brozek
Butte
Butte
Christine M McDonald
Christopher P Duggan
de Onis
Dicko
Fabiansen
Fabiansen
Fjeld
Fomon
Global Nutrition Cluster: MAM Task Force
Golden
Golden
Kabir
Kenneth H Brown
Ministère de la Santé République du Mali
National Agricultural Library
Radhakrishna
Robert S Ackatia-Armah
Roland Kupka
Seydou Doumbia
Skau
UNICEF
UNICEF
UNICEF/WHO/World Bank Group
Wells
WHO
WHO/UNICEF
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

BackgroundModerate acute malnutrition (MAM) affects 34.1 million children globally. Treatment effectiveness is generally determined by the amount and rate of weight gain. Body composition (BC) assessment provides more detailed information on nutritional stores and the type of tissue accrual than traditional weight measurements alone.ObjectiveThe aim of this study was to compare the change in percentage fat mass (%FM) and other BC parameters among young Malian children with MAM according to receipt of 1 of 4 dietary supplements, and recovery status at the end of the 12-wk intervention period.MethodsBC was assessed using the deuterium oxide dilution method in a subgroup of 286 children aged 6-35 mo who participated in a 12-wk community-based, cluster-randomized effectiveness trial of 4 dietary supplements for the treatment of MAM: 1) lipid-based, ready-to-use supplementary food (RUSF); 2) special corn-soy blend "plus plus" (CSB++); 3) locally processed, fortified flour (MI); or 4) locally milled flours plus oil, sugar, and micronutrient powder (LMF). Multivariate linear regression modeling was used to evaluate change in BC parameters by treatment group and recovery status.ResultsMean ± SD %FM at baseline was 28.6% ± 5.32%. Change in %FM did not vary between groups. Children who received RUSF vs. MI gained more (mean; 95% CI) weight (1.43; 1.13, 1.74 kg compared with 0.84; 0.66, 1.03 kg; P = 0.02), FM (0.70; 0.45, 0.96 kg compared with 0.20; 0.05, 0.36 kg; P = 0.01), and weight-for-length z score (1.23; 0.79, 1.54 compared with 0.49; 0.34, 0.71; P = 0.03). Children who recovered from MAM exhibited greater increases in all BC parameters, including %FM, than children who did not recover.ConclusionsIn this study population, children had higher than expected %FM at baseline. There were no differences in %FM change between groups. International BC reference data are needed to assess the utility of BC assessment in community-based management of acute malnutrition programs. This trial was registered at clinicaltrials.gov as NCT01015950

Crossref

eScholarship - University of California

Reconstruction of metabolic networks from high-throughput metabolite profiling data: in silico analysis of red blood cell metabolism

Author: Barbagallo M.
Butte A.J.
C. J. UNKEFER
Frenkel E.
G. S. ESCOLA
Garay R.
I. NEMENMAN
Jacobasch G.
Katz L.
Kemp G.
Kirk K.
M. E. WALL
Mendes P.
P. J. UNKEFER
Rose I.
Steinhauser D.
W. S. HLAVACEK
Walser M.
Wang K.
Publication venue: 'Wiley'
Publication date: 13/06/2007
Field of study

We investigate the ability of algorithms developed for reverse engineering of transcriptional regulatory networks to reconstruct metabolic networks from high-throughput metabolite profiling data. For this, we generate synthetic metabolic profiles for benchmarking purposes based on a well-established model for red blood cell metabolism. A variety of data sets is generated, accounting for different properties of real metabolic networks, such as experimental noise, metabolite correlations, and temporal dynamics. These data sets are made available online. We apply ARACNE, a mainstream transcriptional networks reverse engineering algorithm, to these data sets and observe performance comparable to that obtained in the transcriptional domain, for which the algorithm was originally designed.Comment: 14 pages, 3 figures. Presented at the DIMACS Workshop on Dialogue on Reverse Engineering Assessment and Methods (DREAM), Sep 200

arXiv.org e-Print Archive

Crossref

Recommended from our members

Reproducibility of gene expression across generations of Affymetrix microarrays

Author: Beggs Alan H
Butte Atul J
Haslett Judith N
Kohane Isaac S
Kunkel Louis M
Nimgaonkar Ashish
Sanoudou Despina
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: The development of large-scale gene expression profiling technologies is rapidly changing the norms of biological investigation. But the rapid pace of change itself presents challenges. Commercial microarrays are regularly modified to incorporate new genes and improved target sequences. Although the ability to compare datasets across generations is crucial for any long-term research project, to date no means to allow such comparisons have been developed. In this study the reproducibility of gene expression levels across two generations of Affymetrix GeneChips(® )(HuGeneFL and HG-U95A) was measured. RESULTS: Correlation coefficients were computed for gene expression values across chip generations based on different measures of similarity. Comparing the absolute calls assigned to the individual probe sets across the generations found them to be largely unchanged. CONCLUSION: We show that experimental replicates are highly reproducible, but that reproducibility across generations depends on the degree of similarity of the probe sets and the expression level of the corresponding transcript

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California