82 research outputs found

    A graph theoretic approach to testing associations between disparate sources of functional genomic data

    Get PDF
    The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology. We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived predictome and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments

    Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms

    Get PDF
    Background: Data generated using ‘omics’ technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of signal-to-noise ratio in the dataset, imbalance in class distribution and choice of metric for quantifying performance of the classifier. To guide study design, we present a summary of the key characteristics of ‘omics’ data profiled in several human or animal model experiments utilizing high-content mass spectrometry and multiplexed immunoassay based techniques. Results: The analysis of data from seven ‘omics’ studies revealed that the average magnitude of effect size observed in human studies was markedly lower when compared to that in animal studies. The data measured in human studies were characterized by higher biological variation and the presence of outliers. The results from simulation studies indicated that the classifier Prediction Analysis for Microarrays (PAM) had the highest power when the class conditional feature distributions were Gaussian and outcome distributions were balanced. Random Forests was optimal when feature distributions were skewed and when class distributions were unbalanced. We provide a free open-source R statistical software library (MVpower) that implements the simulation strategy proposed in this paper. Conclusion: No single classifier had optimal performance under all settings. Simulation studies provide useful guidance for the design of biomedical studies involving high-dimensionality data

    Immunological investigations in tuberculous ascites

    Get PDF
    Cell mediated immunity was assessed in seven patients with bacteriologically and/or histologically confirmed tuberculous ascites. Eight non-tuberculous ascites patients were included as controls. Anti-PPD antibody levels were also estimated by ELISA. Macrophage from tuberculous ascitic fluid showed increased production of H202 when compared with ascitic fluid macrophages from controls. Proliferative response of lymphocytes to PPD antigen was greater in ascitic fluid than in peripheral blood in tuberculous patients, while the responses were reversed in control patients. Tuberculous ascitic fluid had higher levels of anti-PPD antibodies than ascitic fluid from controls, though their levels in peripheral blood were similar in the two groups. It is concluded that the results provide support to the concept of immunologic localization

    Bayesian variable selection for high dimensional predictors and self-reported outcomes

    Get PDF
    BACKGROUND: The onset of silent diseases such as type 2 diabetes is often registered through self-report in large prospective cohorts. Self-reported outcomes are cost-effective; however, they are subject to error. Diagnosis of silent events may also occur through the use of imperfect laboratory-based diagnostic tests. In this paper, we describe an approach for variable selection in high dimensional datasets for settings in which the outcome is observed with error. METHODS: We adapt the spike and slab Bayesian Variable Selection approach in the context of error-prone, self-reported outcomes. The performance of the proposed approach is studied through simulation studies. An illustrative application is included using data from the Women\u27s Health Initiative SNP Health Association Resource, which includes extensive genotypic ( \u3e 900,000 SNPs) and phenotypic data on 9,873 African American and Hispanic American women. RESULTS: Simulation studies show improved sensitivity of our proposed method when compared to a naive approach that ignores error in the self-reported outcomes. Application of the proposed method resulted in discovery of several single nucleotide polymorphisms (SNPs) that are associated with risk of type 2 diabetes in a dataset of 9,873 African American and Hispanic participants in the Women\u27s Health Initiative. There was little overlap among the top ranking SNPs associated with type 2 diabetes risk between the racial groups, adding support to previous observations in the literature of disease associated genetic loci that are often not generalizable across race/ethnicity populations. The adapted Bayesian variable selection algorithm is implemented in R. The source code for the simulations are available in the Supplement. CONCLUSIONS: Variable selection accuracy is reduced when the outcome is ascertained by error-prone self-reports. For this setting, our proposed algorithm has improved variable selection performance when compared to approaches that neglect to account for the error-prone nature of self-reports

    Caffeinated Coffee, Decaffeinated Coffee and Endometrial Cancer Risk: A Prospective Cohort Study among US Postmenopausal Women

    Get PDF
    There is plausible biological evidence as well as epidemiologic evidence to suggest coffee consumption may lower endometrial cancer risk. We evaluated the associations between self-reported total coffee, caffeinated coffee and decaffeinated coffee, and endometrial cancer risk using the Women’s Health Initiative Observational Study Research Materials obtained from the National Heart, Lung, and Blood Institute Biological Specimen and Data Repository Coordinating Center. Our primary analyses included 45,696 women and 427 incident endometrial cancer cases, diagnosed over a total of 342,927 person-years of follow-up. We used Cox-proportional hazard models to evaluate coffee consumption and endometrial cancer risk. Overall, we did not find an association between coffee consumption and endometrial cancer risk. Compared to non-daily drinkers (none or <1 cup/day), the multivariable adjusted hazard ratios for women who drank ≥4 cups/day were 0.86 (95% confidence interval (CI) 0.63, 1.18) for total coffee, 0.89 (95% CI 0.63, 1.27) for caffeinated coffee, and 0.51 (95% CI 0.25, 1.03) for decaf coffee. In subgroup analyses by body mass index (BMI) there were no associations among normal-weight and overweight women for total coffee and caffeinated coffee. However among obese women, compared to the referent group (none or <1 cup/day), the hazard ratios for women who drank ≥2 cups/day were: 0.72 (95% CI 0.50, 1.04) for total coffee and 0.66 (95% CI 0.45, 0.97) for caffeinated coffee. Hazard ratios for women who drank ≥2 cups/day for decaffeinated coffee drinkers were 0.67 (0.43-1.06), 0.93 (0.55-1.58) and 0.80 (0.49-1.30) for normal, overweight and obese women, respectively. Our study suggests that caffeinated coffee consumption may be associated with lower endometrial cancer risk among obese postmenopausal women, but the association with decaffeinated coffee remains unclear

    Bayesian Variable Selection Methods for Matched Case-Control Studies

    Get PDF
    Abstract Matched case-control designs are currently used in many biomedical applications. To ensure high efficiency and statistical power in identifying features that best discriminate cases from controls, it is important to account for the use of matched designs. However, in the setting of high dimensional data, few variable selection methods account for matching. Bayesian approaches to variable selection have several advantages, including the fact that such approaches visit a wider range of model subsets. In this paper, we propose a variable selection method to account for case-control matching in a Bayesian context and apply it using simulation studies, a matched brain imaging study conducted at Massachusetts General Hospital, and a matched cardiovascular biomarker study conducted by the High Risk Plaque Initiative

    Marginal structural models for the estimation of the risk of Diabetes Mellitus in the presence of elevated depressive symptoms and antidepressant medication use in the Women\u27s Health Initiative observational and clinical trial cohorts

    Get PDF
    BACKGROUND: We evaluate the combined effect of the presence of elevated depressive symptoms and antidepressant medication use with respect to risk of type 2 diabetes among approximately 120,000 women enrolled in the Women\u27s Health Initiative (WHI), and compare several different statistical models appropriate for causal inference in non-randomized settings. METHODS: Data were analyzed for 52,326 women in the Women\u27s Health Initiative Clinical Trials (CT) Cohort and 68,169 women in the Observational Study (OS) Cohort after exclusions. We included follow-up to 2005, resulting in a median duration of 7.6 years of follow up after enrollment. Results from three multivariable Cox models were compared to those from marginal structural models that included time varying measures of antidepressant medication use, presence of elevated depressive symptoms and BMI, while adjusting for potential confounders including age, ethnicity, education, minutes of recreational physical activity per week, total energy intake, hormone therapy use, family history of diabetes and smoking status. RESULTS: Our results are consistent with previous studies examining the relationship of antidepressant medication use and risk of type 2 diabetes. All models showed a significant increase in diabetes risk for those taking antidepressants. The Cox Proportional Hazards models using baseline covariates showed the lowest increase in risk , with hazard ratios of 1.19 (95 % CI 1.06 - 1.35) and 1.14 (95 % CI 1.01 - 1.30) in the OS and CT, respectively. Hazard ratios from marginal structural models comparing antidepressant users to non-users were 1.35 (95 % CI 1.21 - 1.51) and 1.27 (95 % CI 1.13 - 1.43) in the WHI OS and CT, respectively - however, differences among estimates from traditional Cox models and marginal structural models were not statistically significant in both cohorts. One explanation suggests that time-dependent confounding was not a substantial factor in these data, however other explanations exist. Unadjusted Cox Proportional Hazards models showed that women with elevated depressive symptoms had a significant increase in diabetes risk that remained after adjustment for confounders. However, this association missed the threshold for statistical significance in propensity score adjusted and marginal structural models. CONCLUSIONS: Results from the multiple approaches provide further evidence of an increase in risk of type 2 diabetes for those on antidepressants

    White blood cell DNA methylation and risk of breast cancer in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO)

    Get PDF
    Background Several studies have suggested that global DNA methylation in circulating white blood cells (WBC) is associated with breast cancer risk. Methods To address conflicting results and concerns that the findings for WBC DNA methylation in some prior studies may reflect disease effects, we evaluated the relationship between global levels of WBC DNA methylation in white blood cells and breast cancer risk in a case-control study nested within the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) cohort. A total of 428 invasive breast cancer cases and 419 controls, frequency matched on age at entry (55–59, 60–64, 65–69, ≥70 years), year of entry (on/before September 30, 1997, on/after October 1, 1997) and period of DNA extraction (previously extracted, newly extracted) were included. The ratio of 5-methyl-2’ deoxycytidine [5-mdC] to 2’-deoxyguanine [dG], assuming [dG] = [5-mdC] + [2’-deoxycytidine [dC]] (%5-mdC), was determined by liquid chromatography-electrospray ionization-tandem mass spectrometry, an especially accurate method for assessing total genomic DNA methylation. Results Odds ratio (OR) estimates and 95% confidence intervals (CI) for breast cancer risk adjusted for age at entry, year of entry, and period of DNA extraction, were 1.0 (referent), 0.89 (95% CI, 0.6–1.3), 0.88 (95% CI, 0.6–1.3), and 0.84 (95% CI, 0.6–1.2) for women in the highest compared to lowest quartile levels of %5md-C (p for trend = .39). Effects did not meaningfully vary by time elapsed from WBC collection to diagnosis. Discussion These results do not support the hypothesis that global DNA hypomethylation in WBC DNA is associated with increased breast cancer risk prior to the appearance of clinical disease
    corecore