3,761 research outputs found

    Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

    Get PDF
    Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan

    Modeling and analysis of disease and risk factors through learning Bayesian networks from observational data

    Full text link
    This paper focuses on identification of the relationships between a disease and its potential risk factors using Bayesian networks in an epidemiologic study, with the emphasis on integrating medical domain knowledge and statistical data analysis. An integrated approach is developed to identify the risk factors associated with patients' occupational histories and is demonstrated using real-world data. This approach includes several steps. First, raw data are preprocessed into a format that is acceptable to the learning algorithms of Bayesian networks. Some important considerations are discussed to address the uniqueness of the data and the challenges of the learning. Second, a Bayesian network is learned from the preprocessed data set by integrating medical domain knowledge and generic learning algorithms. Third, the relationships revealed by the Bayesian network are used for risk factor analysis, including identification of a group of people who share certain common characteristics and have a relatively high probability of developing the disease, and prediction of a person's risk of developing the disease given information on his/her occupational history. Copyright © 2007 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58076/1/893_ftp.pd

    A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

    Full text link
    No comprehensive review of Bayesian networks (BNs) in healthcare has been published in the past, making it difficult to organize the research contributions in the present and identify challenges and neglected areas that need to be addressed in the future. This unique and novel scoping review of BNs in healthcare provides an analytical framework for comprehensively characterizing the domain and its current state. The review shows that: (1) BNs in healthcare are not used to their full potential; (2) a generic BN development process is lacking; (3) limitations exists in the way BNs in healthcare are presented in the literature, which impacts understanding, consensus towards systematic methodologies, practice and adoption of BNs; and (4) a gap exists between having an accurate BN and a useful BN that impacts clinical practice. This review empowers researchers and clinicians with an analytical framework and findings that will enable understanding of the need to address the problems of restricted aims of BNs, ad hoc BN development methods, and the lack of BN adoption in practice. To map the way forward, the paper proposes future research directions and makes recommendations regarding BN development methods and adoption in practice

    Bayesian networks for disease diagnosis: What are they, who has used them and how?

    Full text link
    A Bayesian network (BN) is a probabilistic graph based on Bayes' theorem, used to show dependencies or cause-and-effect relationships between variables. They are widely applied in diagnostic processes since they allow the incorporation of medical knowledge to the model while expressing uncertainty in terms of probability. This systematic review presents the state of the art in the applications of BNs in medicine in general and in the diagnosis and prognosis of diseases in particular. Indexed articles from the last 40 years were included. The studies generally used the typical measures of diagnostic and prognostic accuracy: sensitivity, specificity, accuracy, precision, and the area under the ROC curve. Overall, we found that disease diagnosis and prognosis based on BNs can be successfully used to model complex medical problems that require reasoning under conditions of uncertainty.Comment: 22 pages, 5 figures, 1 table, Student PhD first pape

    Bayesian Networks with Expert Elicitation as Applicable to Student Retention in Institutional Research

    Get PDF
    The application of Bayesian networks within the field of institutional research is explored through the development of a Bayesian network used to predict first- to second-year retention of undergraduates. A hybrid approach to model development is employed, in which formal elicitation of subject-matter expertise is combined with machine learning in designing model structure and specification of model parameters. Subject-matter experts include two academic advisors at a small, private liberal arts college in the southeast, and the data used in machine learning include six years of historical student-related information (i.e., demographic, admissions, academic, and financial) on 1,438 first-year students. Netica 5.12, a software package designed for constructing Bayesian networks, is used for building and validating the model. Evaluation of the resulting model’s predictive capabilities is examined, as well as analyses of sensitivity, internal validity, and model complexity. Additionally, the utility of using Bayesian networks within institutional research and higher education is discussed. The importance of comprehensive evaluation is highlighted, due to the study’s inclusion of an unbalanced data set. Best practices and experiences with expert elicitation are also noted, including recommendations for use of formal elicitation frameworks and careful consideration of operating definitions. Academic preparation and financial need risk profile are identified as key variables related to retention, and the need for enhanced data collection surrounding such variables is also revealed. For example, the experts emphasize study skills as an important predictor of retention while noting the absence of collection of quantitative data related to measuring students’ study skills. Finally, the importance and value of the model development process is stressed, as stakeholders are required to articulate, define, discuss, and evaluate model components, assumptions, and results

    Computational Hybrid Systems for Identifying Prognostic Gene Markers of Lung Cancer

    Get PDF
    Lung cancer is the most fatal cancer around the world. Current lung cancer prognosis and treatment is based on tumor stage population statistics and could not reliably assess the risk for developing recurrence in individual patients. Biomarkers enable treatment options to be tailored to individual patients based on their tumor molecular characteristics. To date, there is no clinically applied molecular prognostic model for lung cancer. Statistics and feature selection methods identify gene candidates by ranking the association between gene expression and disease outcome, but do not account for the interactions among genes. Computational network methods could model interactions, but have not been used for gene selection due to computational inefficiency. Moreover, the curse of dimensionality in human genome data imposes more computational challenges to these methods.;We proposed two hybrid systems for the identification of prognostic gene signatures for lung cancer using gene expressions measured with DNA microarray. The first hybrid system combined t-tests, Statistical Analysis of Microarray (SAM), and Relief feature selections in multiple gene filtering layers. This combinatorial system identified a 12-gene signature with better prognostic performance than published signatures in treatment selection for stage I and II patients (log-rank P\u3c0.04, Kaplan-Meier analyses). The 12-gene signature is a more significant prognostic factor (hazard ratio=4.19, 95% CI: [2.08, 8.46], P\u3c0.00006) than other clinical covariates. The signature genes were found to be involved in tumorigenesis in functional pathway analyses.;The second proposed system employed a novel computational network model, i.e., implication networks based on prediction logic. This network-based system utilizes gene coexpression networks and concurrent coregulation with signaling pathways for biomarker identification. The first application of the system modeled disease-mediated genome-wide coexpression networks. The entire genomic space were extensively explored and 21 gene signatures were discovered with better prognostic performance than all published signatures in stage I patients not receiving chemotherapy (hazard ratio\u3e1, CPE\u3e0.5, P \u3c 0.05). These signatures could potentially be used for selecting patients for adjuvant chemotherapy. The second application of the system modeled the smoking-mediated coexpression networks and identified a smoking-associated 7-gene signature. The 7-gene signature generated significant prognostication specific to smoking lung cancer patients (log-rank P\u3c0.05, Kaplan-Meier analyses), with implications in diagnostic screening of lung cancer risk in smokers (overall accuracy=74%, P\u3c0.006). The coexpression patterns derived from the implication networks in both applications were successfully validated with molecular interactions reported in the literature (FDR\u3c0.1).;Our studies demonstrated that hybrid systems with multiple gene selection layers outperform traditional methods. Moreover, implication networks could efficiently model genome-scale disease-mediated coexpression networks and crosstalk with signaling pathways, leading to the identification of clinically important gene signatures
    • …
    corecore