587 research outputs found

    Bayesian Network Modeling and Inference in Plant Gene Networks And Analysis of Sequencing and Imaging Data

    Get PDF
    Scientific and technological advancements over the years have made curing, preventing or managing all diseases, a goal that seems to be within reach. The approach to manipulating biological systems is multifaceted. This dissertation focuses on two problems that pose fundamental challenges in developing methods to control biological systems: the first is to model complex interactions in biological systems; the second is faithful representation and analysis of biological data obtained from scientific equipments. The first part of this dissertation is a discussion on modeling and inference in gene networks, and Bayesian inference. Then we describe the application of Bayesian network modeling to represent interactions among genes, and integrating gene expression data in order to identify potential points of intervention in the gene network. We conclude with a summary of evolving directions for modeling gene interactions. The second topic this dissertation focuses on is taming biological data to obtain actionable insights. We introduce the challenges in representation and analysis of high throughput sequencing data and proceeds to describe the analysis of imaging data in the dynamic environment of cancer cells. Then we discuss tackling the problem of analyzing high throughput RNA sequencing data in order to pinpoint genes that exhibit different behaviors under monitored experimental conditions. Then we address the interesting problem of deciphering and quantifying gene-level activity from epifluorescent imaging data

    Analysis for warning factors of type 2 diabetes mellitus complications with Markov blanket based on a Bayesian network model

    Get PDF
    Background and objective Type 2 diabetes mellitus (T2DM) complications seriously affect the quality of life and could not be cured completely. Actions should be taken for prevention and self-management. Analysis of warning factors is beneficial for patients, on which some previous studies focused. They generally used the professional medical test factors or complete factors to predict and prevent, but it was inconvenient and impractical for patients to self-manage. With this in mind, this study built a Bayesian network (BN) model, from the perspective of diabetic patients’ self-management and prevention, to predict six complications of T2DM using the selected warning factors which patients could have access from medical examination. Furthermore, the model was analyzed to explore the relationships between physiological variables and T2DM complications, as well as the complications themselves. The model aims to help patients with T2DM self-manage and prevent themselves from complications. Methods The dataset was collected from a well-known data center called the National Health Clinical Center between 1st January 2009 and 31st December 2009. After preprocess and impute the data, a BN model merging expert knowledge was built with Bootstrap and Tabu search algorithm. Markov Blanket (MB) was used to select the warning factors and predict T2DM complications. Moreover, a Bayesian network without prior information (BN-wopi) model learned using 10-fold cross-validation both in structure and in parameters was added to compare with other classifiers learned using 10-fold cross-validation fairly. The warning factors were selected according the structure learned in each fold and were used to predict. Finally, the performance of two BN models using warning features were compared with Naïve Bayes model, Random Forest model, and C5.0 Decision Tree model, which used all features to predict. Besides, the validation parameters of the proposed model were also compared with those in existing studies using some other variables in clinical data or biomedical data to predict T2DM complications. Results Experimental results indicated that the BN models using warning factors performed statistically better than their counterparts using all other variables in predicting T2DM complications. In addition, the proposed BN model were effective and significant in predicting diabetic nephropathy (DN) (AUC: 0.831), diabetic foot (DF) (AUC: 0.905), diabetic macrovascular complications (DMV) (AUC: 0.753) and diabetic ketoacidosis (DK) (AUC: 0.877) with the selected warning factors compared with other experiments. Conclusions The warning factors of DN, DF, DMV, and DK selected by MB in this research might be able to help predict certain T2DM complications effectively, and the proposed BN model might be used as a general tool for prevention, monitoring, and self-management

    Statistical modelling of cardiovascular disease patients using Bayesian approaches

    Get PDF
    This study focuses on statistical modelling on cardiovascular disease (CVD) patients in Malaysia. A secondary dataset from the National Cardiovascular Disease Database-Acute Coronary Syndrome (NCVD-ACS) registry for the years 2006 to 2013 is utilised. Studies have shown that CVD affects males and females differently. Thus, a gender-specific analysis with regard to the risk factors and mortality among ST-Elevation Myocardial Infarction (STEMI) patients is needed. Initially, this study performed the standard multivariate logistic analysis where the aims are to identify risk factors associated with mortality for each gender and to compare differences, if any, among STEMI patients. The results showed that gender differences existed among STEMI patients. Even though females share the same risk factors as males, there are risk factors that relate only to females which may have increased their tendency to develop and increase the risk of mortality of CVD patients. An important contribution of this analysis is that it gives an understanding of possible gender-based differences in baseline characteristics, risk factors, treatments and outcomes which will help cardiac care specialists in improving current management of patients with CVD. Next, Bayesian analysis is proposed to develop a prognostic model of the STEMI patients. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach is applied. Beside that, comparisons of the parameter estimates from the proposed Bayesian and frequentist models are made. The results showed that the proposed Bayesian modelling can deal correctly with the probabilities and provides parameter estimates of the posterior distribution which have natural clinical interpretations. In doing so, several programming codes for the Bayesian model development and convergence diagnostics in the Just Another Gibbs Sampler (JAGS) software in R interface are developed. In the final part of this study, a graphical probabilistic model framework defined using a Bayesian Network (BN) is proposed to identify and interpret the dependence structure between the predictors and health outcomes of STEMI patients. In doing so, the two learning processes are involved in obtaining the BN model from the data namely the structural learning and parameter learning. From the structural learning, 25 and 20 arcs were considered significant for males’ and females’ BN respectively. A few variables namely, Killip class, renal disease and age group were classified as key predictors as they were the most influential variables directly associated with the outcome of patients’ status. Moreover, conditional probabilities for each feature were obtained. The novelty of this study is that it provides an indication on the strength of each arc in the network by exploiting the bootstrap resampling method in the structural learning. A graphical model is developed where the relationships in a diagrammatical form is capable to be displayed and the cause-effect relationships can be illustrated. An important implication of this model is that it identifies dependencies based on the different features of variables. It can also include expert knowledge to improve predictability for data driven research when information or resources regarding the variables are limited

    Regularized Machine Learning in the Genetic Prediction of Complex Traits

    Get PDF
    Compared to univariate analysis of genome-wide association (GWA) studies, machine learning&ndash;based models have been shown to provide improved means of learning such multilocus panels of genetic variants and their interactions that are most predictive of complex phenotypic traits. Many applications of predictive modeling rely on effective variable selection, often implemented through model regularization, which penalizes the model complexity and enables predictions in individuals outside of the training dataset. However, the different regularization approaches may also lead to considerable differences, especially in the number of genetic variants needed for maximal predictive accuracy, as illustrated here in examples from both disease classification and quantitative trait prediction. We also highlight the potential pitfalls of the regularized machine learning models, related to issues such as model overfitting to the training data, which may lead to over-optimistic prediction results, as well as identifiability of the predictive variants, which is important in many medical applications. While genetic risk prediction for human diseases is used as a motivating use case, we argue that these models are also widely applicable in nonhuman applications, such as animal and plant breeding, where accurate genotype-to-phenotype modeling is needed. Finally, we discuss some key future advances, open questions and challenges in this developing field, when moving toward low-frequency variants and cross-phenotype interactions.</p

    DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases

    Get PDF
    BACKGROUND: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named domainRBF (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases. RESULTS: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn\u27s disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource. CONCLUSIONS: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases

    ASSESSMENT OF RISK SCORES FOR THE PREDICTION AND DETECTION OF TYPE 2 DIABETES MELLITUS IN CLINICAL SETTINGS

    Full text link
    Health and sociological indicators confirm that life expectancy is increasing, and so, the years that patients have to live with chronic diseases and co-morbidities. Type 2 Diabetes is one of the most common chronic diseases, specially linked to overweight and ages over sixty. As a metabolic disease, Type 2 Diabetes affects multiple organs by causing damage in blood vessels and nervous system at micro and macro scale. Mortality of subjects with diabetes is three times higher than the mortality for subjects with other chronic diseases. On the one hand, the management of diabetes is focused on the maintenance of the blood glucose levels under a threshold by the prescription of anti-diabetic drugs and a combination of healthy food habits and moderate physical activity. Recent studies have demonstrated the effectiveness of new strategies to delay and even prevent the onset of Type 2 Diabetes by a combination of active and healthy lifestyle on cohorts of mid to high risk subjects. On the other hand, prospective research has been driven on large groups of population to build risk scores which aim to obtain a rule for the classification of patients according to the odds for developing the disease. Currently there are more than two hundred models and risk scores for doing this, but a few have been properly evaluated in external groups and, to date, none of them has been tested on a population based study. The research study presented in this doctoral thesis strives to use externally validated risk scores for the prediction and detection of Type 2 Diabetes on a population data base in Hospital La Fe (Valencia, Spain). The study hypothesis is that the integration of existing prediction and detection risk scores on Electronic Health Records increases the early-detection of high risk cases. To evaluate this hypothesis three studies on the clinical, user and technology dimensions have been driven to evaluate the extent to which the models and the hospital is ready to exploit such models to identify high risk groups and drive efficient preventive strategies. The findings presented in this thesis suggest that Electronic Health Records are not prepared to massively feed risk models. Some of the evaluated models have shown a good classification performance, which accompanied to the well-acceptance of web-based tools and the acceptable technical performance of the information and communication technology system, suggests that after some work these models can effectively drive a new paradigm of active screening for Type 2 Diabetes.Los indicadores de salud y sociológicos confirman que la esperanza de vida está aumentando, y por lo tanto, los años que los pacientes tienen que vivir con enfermedades crónicas y comorbilidades. Diabetes tipo 2 es una de las enfermedades crónicas más comunes, especialmente relacionadas con el sobrepeso y edades superiores a los sesenta años. Como enfermedad metabólica, la diabetes tipo 2 afecta a múltiples órganos causando daño en los vasos sanguíneos y el sistema nervioso a escala micro y macro. La mortalidad de sujetos con diabetes es tres veces mayor que la mortalidad de sujetos con otras enfermedades crónicas. Por un lado, la estrategia de manejo se centra en el mantenimiento de los niveles de glucosa en sangre bajo un umbral mediante la prescripción de fármacos antidiabéticos y una combinación de hábitos alimentarios saludables y actividad física moderada. Estudios recientes han demostrado la eficacia de nuevas estrategias para retrasar e incluso prevenir la aparición de la diabetes tipo 2 mediante una combinación de estilo de vida activo y saludable en cohortes de sujetos de riesgo medio a alto. Por otro lado, la investigación prospectiva se ha dirigido a grupos de la población para construir modelos de riesgo que pretenden obtener una regla para la clasificación de las personas según las probabilidades de desarrollar la enfermedad. Actualmente hay más de doscientos modelos de riesgo para hacer esta identificación, no obstante la inmensa mayoría no han sido debidamente evaluados en grupos externos y, hasta la fecha, ninguno de ellos ha sido probado en un estudio poblacional. El estudio de investigación presentado en esta tesis doctoral pretende utilizar modelos riesgo validados externamente para la predicción y detección de la Diabetes Tipo 2 en una base de datos poblacional del Hospital La Fe de Valencia (España). La hipótesis del estudio es que la integración de los modelos de riesgo de predicción y detección existentes la práctica clínica aumenta la detección temprana de casos de alto riesgo. Para evaluar esta hipótesis, se han realizado tres estudios sobre las dimensiones clínicas, del usuario y de la tecnología para evaluar hasta qué punto los modelos y el hospital están dispuestos a explotar dichos modelos para identificar grupos de alto riesgo y conducir estrategias preventivas eficaces. Los hallazgos presentados en esta tesis sugieren que los registros de salud electrónicos no están preparados para alimentar masivamente modelos de riesgo. Algunos de los modelos evaluados han demostrado un buen desempeño de clasificación, lo que acompañó a la buena aceptación de herramientas basadas en la web y el desempeño técnico aceptable del sistema de tecnología de información y comunicación, sugiere que después de algún trabajo estos modelos pueden conducir un nuevo paradigma de la detección activa de la Diabetes Tipo 2.Els indicadors sociològics i de salut confirmen un augment en l'esperança de vida, i per tant, dels anys que les persones han de viure amb malalties cròniques i comorbiditats. la diabetis de tipus 2 és una de les malalties cròniques més comunes, especialment relacionades amb l'excés de pes i edats superiors als seixanta anys. Com a malaltia metabòlica, la diabetis de tipus 2 afecta múltiples òrgans causant dany als vasos sanguinis i el sistema nerviós a escala micro i macro. La mortalitat de subjectes amb diabetis és tres vegades superior a la mortalitat de subjectes amb altres malalties cròniques. D'una banda, l'estratègia de maneig se centra en el manteniment dels nivells de glucosa en sang sota un llindar mitjançant la prescripció de fàrmacs antidiabètics i una combinació d'hàbits alimentaris saludables i activitat física moderada. Estudis recents han demostrat l'eficàcia de noves estratègies per a retardar i fins i tot prevenir l'aparició de la diabetis de tipus 2 mitjançant una combinació d'estil de vida actiu i saludable en cohorts de subjectes de risc mitjà a alt. D'altra banda, la investigació prospectiva s'ha dirigit a grups específics de la població per construir models de risc que pretenen obtenir una regla per a la classificació de les persones segons les probabilitats de desenvolupar la malaltia. Actualment hi ha més de dos-cents models de risc per fer aquesta identificació, però la immensa majoria no han estat degudament avaluats en grups externs i, fins ara, cap d'ells ha estat provat en un estudi poblacional. L'estudi d'investigació presentat en aquesta tesi doctoral utilitza models de risc validats externament per a la predicció i detecció de diabetis de tipus 2 en una base de dades poblacional de l'Hospital La Fe de València (Espanya). La hipòtesi de l'estudi és que la integració dels models de risc de predicció i detecció existents la pràctica clínica augmenta la detecció de casos d'alt risc. Per avaluar aquesta hipòtesi, s'han realitzat tres estudis sobre les dimensions clíniques, de l'usuari i de la tecnologia per avaluar fins a quin punt els models i l'hospital estan disposats a explotar aquests models per identificar grups d'alt risc i conduir estratègies preventives. Les troballes presentades sugereixen que els registres de salut electrònics no estan preparats per alimentar massivament models de risc. Alguns dels models avaluats han demostrat una bona classificació, el que va acompanyar a la bona acceptació d'eines basades en el web i el rendiment tècnic acceptable del sistema de tecnologia d'informació i comunicacions implementat. La conclusió es que encara es necesari treball per que aquests models poden conduir un nou paradigma de la detecció activa de la diabetis de tipus 2.Martínez Millana, A. (2017). ASSESSMENT OF RISK SCORES FOR THE PREDICTION AND DETECTION OF TYPE 2 DIABETES MELLITUS IN CLINICAL SETTINGS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86209TESI

    From genetical genomics to systems genetics: potential applications in quantitative genomics and animal breeding

    Get PDF
    This article reviews methods of integration of transcriptomics (and equally proteomics and metabolomics), genetics, and genomics in the form of systems genetics into existing genome analyses and their potential use in animal breeding and quantitative genomic modeling of complex traits. Genetical genomics or the expression quantitative trait loci (eQTL) mapping method and key findings in this research are reviewed. Various procedures and potential uses of eQTL mapping, global linkage clustering, and systems genetics are illustrated using actual analysis on recombinant inbred lines of mice with data on gene expression (for diabetes- and obesity-related genes), pathway, and single nucleotide polymorphism (SNP) linkage maps. Experimental and bioinformatics difficulties and possible solutions are discussed. The main uses of this systems genetics approach in quantitative genomics were shown to be in refinement of the identified QTL, candidate gene and SNP discovery, understanding gene-environment and gene-gene interactions, detection of candidate regulator genes/eQTL, discriminating multiple QTL/eQTL, and detection of pleiotropic QTL/eQTL, in addition to its use in reconstructing regulatory networks. The potential uses in animal breeding are direct selection on heritable gene expression measures, termed "expression assisted selection,” and genetical genomic selection of both QTL and eQTL based on breeding values of the respective genes, termed "expression-assisted evaluation.

    Research Evaluation 2000-2010:Department of Mathematical Sciences

    Get PDF
    corecore