1,483 research outputs found
Recommended from our members
Robust prediction of clinical outcomes using cytometry data.
MotivationFlow cytometry and mass cytometry are widely used to diagnose diseases and to predict clinical outcomes. When associating clinical features with cytometry data, traditional analysis methods require cell gating as an intermediate step, leading to information loss and susceptibility to batch effects. Here, we wish to explore an alternative approach that predicts clinical features from cytometry data without the cell-gating step. We also wish to test if such a gating-free approach increases the accuracy and robustness of the prediction.ResultsWe propose a novel strategy (CytoDx) to predict clinical outcomes using cytometry data without cell gating. Applying CytoDx on real-world datasets allow us to predict multiple types of clinical features. In particular, CytoDx is able to predict the response to influenza vaccine using highly heterogeneous datasets, demonstrating that it is not only accurate but also robust to batch effects and cytometry platforms.Availability and implementationCytoDx is available as an R package on Bioconductor (bioconductor.org/packages/CytoDx). Data and scripts for reproducing the results are available on bitbucket.org/zichenghu_ucsf/cytodx_study_code/downloads.Supplementary informationSupplementary data are available at Bioinformatics online
Recommended from our members
Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates.
ObjectiveMedical billing data are an attractive source of secondary analysis because of their ease of use and potential to answer population-health questions with statistical power. Although these datasets have known susceptibilities to biases, the degree to which they can distort the assessment of quality measures such as colorectal cancer screening rates are not widely appreciated, nor are their causes and possible solutions.MethodsUsing a billing code database derived from our institution's electronic health records, we estimated the colorectal cancer screening rate of average-risk patients aged 50-74 years seen in primary care or gastroenterology clinic in 2016-2017. 200 records (150 unscreened, 50 screened) were sampled to quantify the accuracy against manual review.ResultsOut of 4611 patients, an analysis of billing data suggested a 61% screening rate, an estimate that matches the estimate by the Centers for Disease Control. Manual review revealed a positive predictive value of 96% (86%-100%), negative predictive value of 21% (15%-29%) and a corrected screening rate of 85% (81%-90%). Most false negatives occurred due to examinations performed outside the scope of the database-both within and outside of our institution-but 21% of false negatives fell within the database's scope. False positives occurred due to incomplete examinations and inadequate bowel preparation. Reasons for screening failure include ordered but incomplete examinations (48%), lack of or incorrect documentation by primary care (29%) including incorrect screening intervals (13%) and patients declining screening (13%).ConclusionsBilling databases are prone to substantial bias that may go undetected even in the presence of confirmatory external estimates. Caution is recommended when performing population-level inference from these data. We propose several solutions to improve the use of these data for the assessment of healthcare quality
Quantifying the relationship between co-expression, co-regulation and gene function
BACKGROUND: It is thought that genes with similar patterns of mRNA expression and genes with similar functions are likely to be regulated via the same mechanisms. It has been difficult to quantitatively test these hypotheses on a large scale because there has been no general way of determining whether genes share a common regulatory mechanism. Here we use data from a recent genome wide binding analysis in combination with mRNA expression data and existing functional annotations to quantify the likelihood that genes with varying degrees of similarity in mRNA expression profile or function will be bound by a common transcription factor. RESULTS: Genes with strongly correlated mRNA expression profiles are more likely to have their promoter regions bound by a common transcription factor. This effect is present only at relatively high levels of expression similarity. In order for two genes to have a greater than 50% chance of sharing a common transcription factor binder, the correlation between their expression profiles (across the 611 microarrays used in our study) must be greater than 0.84. Genes with similar functional annotations are also more likely to be bound by a common transcription factor. Combining mRNA expression data with functional annotation results in a better predictive model than using either data source alone. CONCLUSIONS: We demonstrate how mRNA expression data and functional annotations can be used together to estimate the probability that genes share a common regulatory mechanism. Existing microarray data and known functional annotations are sufficient to identify only a relatively small percentage of co-regulated genes
Recommended from our members
Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks
BACKGROUND: Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA) heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain. RESULTS: We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO)? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories. CONCLUSION: We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing whether pairs of functions operate together
Recommended from our members
ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data.
Objectives:Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Out-comes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. Materials and methods:We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. Results:ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. Conclusion:ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu)
Recommended from our members
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.
There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods
Percent Fat Mass Increases with Recovery, But Does Not Vary According to Dietary Therapy in Young Malian Children Treated for Moderate Acute Malnutrition.
BackgroundModerate acute malnutrition (MAM) affects 34.1 million children globally. Treatment effectiveness is generally determined by the amount and rate of weight gain. Body composition (BC) assessment provides more detailed information on nutritional stores and the type of tissue accrual than traditional weight measurements alone.ObjectiveThe aim of this study was to compare the change in percentage fat mass (%FM) and other BC parameters among young Malian children with MAM according to receipt of 1 of 4 dietary supplements, and recovery status at the end of the 12-wk intervention period.MethodsBC was assessed using the deuterium oxide dilution method in a subgroup of 286 children aged 6-35 mo who participated in a 12-wk community-based, cluster-randomized effectiveness trial of 4 dietary supplements for the treatment of MAM: 1) lipid-based, ready-to-use supplementary food (RUSF); 2) special corn-soy blend "plus plus" (CSB++); 3) locally processed, fortified flour (MI); or 4) locally milled flours plus oil, sugar, and micronutrient powder (LMF). Multivariate linear regression modeling was used to evaluate change in BC parameters by treatment group and recovery status.ResultsMean ± SD %FM at baseline was 28.6% ± 5.32%. Change in %FM did not vary between groups. Children who received RUSF vs. MI gained more (mean; 95% CI) weight (1.43; 1.13, 1.74 kg compared with 0.84; 0.66, 1.03 kg; P = 0.02), FM (0.70; 0.45, 0.96 kg compared with 0.20; 0.05, 0.36 kg; P = 0.01), and weight-for-length z score (1.23; 0.79, 1.54 compared with 0.49; 0.34, 0.71; P = 0.03). Children who recovered from MAM exhibited greater increases in all BC parameters, including %FM, than children who did not recover.ConclusionsIn this study population, children had higher than expected %FM at baseline. There were no differences in %FM change between groups. International BC reference data are needed to assess the utility of BC assessment in community-based management of acute malnutrition programs. This trial was registered at clinicaltrials.gov as NCT01015950
Reconstruction of metabolic networks from high-throughput metabolite profiling data: in silico analysis of red blood cell metabolism
We investigate the ability of algorithms developed for reverse engineering of
transcriptional regulatory networks to reconstruct metabolic networks from
high-throughput metabolite profiling data. For this, we generate synthetic
metabolic profiles for benchmarking purposes based on a well-established model
for red blood cell metabolism. A variety of data sets is generated, accounting
for different properties of real metabolic networks, such as experimental
noise, metabolite correlations, and temporal dynamics. These data sets are made
available online. We apply ARACNE, a mainstream transcriptional networks
reverse engineering algorithm, to these data sets and observe performance
comparable to that obtained in the transcriptional domain, for which the
algorithm was originally designed.Comment: 14 pages, 3 figures. Presented at the DIMACS Workshop on Dialogue on
Reverse Engineering Assessment and Methods (DREAM), Sep 200
Recommended from our members
Reproducibility of gene expression across generations of Affymetrix microarrays
BACKGROUND: The development of large-scale gene expression profiling technologies is rapidly changing the norms of biological investigation. But the rapid pace of change itself presents challenges. Commercial microarrays are regularly modified to incorporate new genes and improved target sequences. Although the ability to compare datasets across generations is crucial for any long-term research project, to date no means to allow such comparisons have been developed. In this study the reproducibility of gene expression levels across two generations of Affymetrix GeneChips(® )(HuGeneFL and HG-U95A) was measured. RESULTS: Correlation coefficients were computed for gene expression values across chip generations based on different measures of similarity. Comparing the absolute calls assigned to the individual probe sets across the generations found them to be largely unchanged. CONCLUSION: We show that experimental replicates are highly reproducible, but that reproducibility across generations depends on the degree of similarity of the probe sets and the expression level of the corresponding transcript
- …