183 research outputs found

    A comparison of genomic profiles of complex diseases under different models

    Get PDF
    Background: Various approaches are being used to predict individual risk to polygenic diseases from data provided by genome-wide association studies. As there are substantial differences between the diseases investigated, the data sets used and the way they are tested, it is difficult to assess which models are more suitable for this task. Results: We compared different approaches for seven complex diseases provided by the Wellcome Trust Case Control Consortium (WTCCC) under a within-study validation approach. Risk models were inferred using a variety of learning machines and assumptions about the underlying genetic model, including a haplotype-based approach with different haplotype lengths and different thresholds in association levels to choose loci as part of the predictive model. In accordance with previous work, our results generally showed low accuracy considering disease heritability and population prevalence. However, the boosting algorithm returned a predictive area under the ROC curve (AUC) of 0.8805 for Type 1 diabetes (T1D) and 0.8087 for rheumatoid arthritis, both clearly over the AUC obtained by other approaches and over 0.75, which is the minimum required for a disease to be successfully tested on a sample at risk, which means that boosting is a promising approach. Its good performance seems to be related to its robustness to redundant data, as in the case of genome-wide data sets due to linkage disequilibrium. Conclusions: In view of our results, the boosting approach may be suitable for modeling individual predisposition to Type 1 diabetes and rheumatoid arthritis based on genome-wide data and should be considered for more in-depth research.This work was supported by the Spanish Secretary of Research, Development and Innovation [TIN2010-20900-C04-1]; the Spanish Health Institute Carlos III [PI13/02714]and [PI13/01527] and the Andalusian Research Program under project P08-TIC-03717 with the help of the European Regional Development Fund (ERDF). The authors are very grateful to the reviewers, as they believe that their comments have helped to substantially improve the quality of the paper

    Decision trees to evaluate the risk of developing multiple sclerosis

    Get PDF
    Introduction: Multiple sclerosis (MS) is a persistent neurological condition impacting the central nervous system (CNS). The precise cause of multiple sclerosis is still uncertain; however, it is thought to arise from a blend of genetic and environmental factors. MS diagnosis includes assessing medical history, conducting neurological exams, performing magnetic resonance imaging (MRI) scans, and analyzing cerebrospinal fluid. While there is currently no cure for MS, numerous treatments exist to address symptoms, decelerate disease progression, and enhance the quality of life for individuals with MS. Methods: This paper introduces a novel machine learning (ML) algorithm utilizing decision trees to address a key objective: creating a predictive tool for assessing the likelihood of MS development. It achieves this by combining prevalent demographic risk factors, specifically gender, with crucial immunogenetic risk markers, such as the alleles responsible for human leukocyte antigen (HLA) class I molecules and the killer immunoglobulin-like receptors (KIR) genes responsible for natural killer lymphocyte receptors. Results: The study included 619 healthy controls and 299 patients affected by MS, all of whom originated from Sardinia. The gender feature has been disregarded due to its substantial bias in influencing the classification outcomes. By solely considering immunogenetic risk markers, the algorithm demonstrates an ability to accurately identify 73.24% of MS patients and 66.07% of individuals without the disease. Discussion: Given its notable performance, this system has the potential to support clinicians in monitoring the relatives of MS patients and identifying individuals who are at an increased risk of developing the disease

    Decision trees to evaluate the risk of developing multiple sclerosis

    Get PDF
    IntroductionMultiple sclerosis (MS) is a persistent neurological condition impacting the central nervous system (CNS). The precise cause of multiple sclerosis is still uncertain; however, it is thought to arise from a blend of genetic and environmental factors. MS diagnosis includes assessing medical history, conducting neurological exams, performing magnetic resonance imaging (MRI) scans, and analyzing cerebrospinal fluid. While there is currently no cure for MS, numerous treatments exist to address symptoms, decelerate disease progression, and enhance the quality of life for individuals with MS.MethodsThis paper introduces a novel machine learning (ML) algorithm utilizing decision trees to address a key objective: creating a predictive tool for assessing the likelihood of MS development. It achieves this by combining prevalent demographic risk factors, specifically gender, with crucial immunogenetic risk markers, such as the alleles responsible for human leukocyte antigen (HLA) class I molecules and the killer immunoglobulin-like receptors (KIR) genes responsible for natural killer lymphocyte receptors.ResultsThe study included 619 healthy controls and 299 patients affected by MS, all of whom originated from Sardinia. The gender feature has been disregarded due to its substantial bias in influencing the classification outcomes. By solely considering immunogenetic risk markers, the algorithm demonstrates an ability to accurately identify 73.24% of MS patients and 66.07% of individuals without the disease.DiscussionGiven its notable performance, this system has the potential to support clinicians in monitoring the relatives of MS patients and identifying individuals who are at an increased risk of developing the disease

    Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disea

    Cancer risk prediction with whole exome sequencing and machine learning

    Get PDF
    Accurate cancer risk and survival time prediction are important problems in personalized medicine, where disease diagnosis and prognosis are tuned to individuals based on their genetic material. Cancer risk prediction provides an informed decision about making regular screening that helps to detect disease at the early stage and therefore increases the probability of successful treatments. Cancer risk prediction is a challenging problem. Lifestyle, environment, family history, and genetic predisposition are some factors that influence the disease onset. Cancer risk prediction based on predisposing genetic variants has been studied extensively. Most studies have examined the predictive ability of variants in known mutated genes for specific cancers. However, previous studies have not explored the predictive ability of collective genomic variants from whole-exome sequencing data. It is crucial to train a model in one study and predict another related independent study to ensure that the predictive model generalizes to other datasets. Survival time prediction allows patients and physicians to evaluate the treatment feasibility and helps chart health treatment plans. Many studies have concluded that clinicians are inaccurate and often optimistic in predicting patients’ survival time; therefore, the need increases for automated survival time prediction from genomic and medical imaging data. For cancer risk prediction, this dissertation explores the effectiveness of ranking genomic variants in whole-exome sequencing data with univariate features selection methods on the predictive capability of machine learning classifiers. The dissertation performs cross-study in chronic lymphocytic leukemia, glioma, and kidney cancers that show that the top-ranked variants achieve better accuracy than the whole genomic variants. For survival time prediction, many studies have devised 3D convolutional neural networks (CNNs) to improve the accuracy of structural magnetic resonance imaging (MRI) volumes to classify glioma patients into survival categories. This dissertation proposes a new multi-path convolutional neural network with SNP and demographic features to predict glioblastoma survival groups with a one-year threshold that improves upon existing machine learning methods. The dissertation also proposes a multi-path neural network system to predict glioblastoma survival categories with a 14-year threshold from a heterogeneous combination of genomic variations, messenger ribonucleic acid (RNA) expressions, 3D post-contrast T1 MRI volumes, and 2D post-contrast T1 MRI modality scans that show the malignancy. In 10-fold cross-validation, the mean 10-fold accuracy of the proposed network with handpicked 2D MRI slices (that manifest the tumor), mRNA expressions, and SNPs slightly improves upon each data source individually

    Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

    Get PDF
    Motivation: Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype-phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the Project MinE dataset. Based on recent insight that regulatory regions harbor the majority of disease-associated variants, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. Results: Our approach identifies potentially ALS-associated promoter regions, and generally outperforms other classification methods. Test results support the hypothesis that non-additive combinations of variants contribute to ALS. Architectures and protocols developed are tailored toward processing population-scale, whole-genome data. We consider this a relevant first step toward deep learning assisted genotype-phenotype association in whole genome-sized data

    Statistical methods for genetic association studies with response-selective sampling designs

    Get PDF
    This dissertation describes new statistical methods designed to improve the power of genetic association studies. Of particular interest are studies with a response-selective sampling design, i.e. case-control studies of unrelated individuals and case-control studies of family members. The statistical methods presented in this dissertation (a) take advantage of information available in the distribution of the covariates in case-control studies by modeling the ascertainment process; (b) incorporate information from both family-based studies and case-control studies of unrelated individuals; (c) use "richer" models of the relationship between genetic variants and phenotypes, compared to models used in standard genetic association studies; and (d) integrate different types of data, such as genomic, epigenomic, transcriptomic and environmental information. Together, these methods will improve the ability of the genetics community to identify the genetic basis of complex human phenotypes.UBL - phd migration 201

    Exosomal MicroRNA Signatures in Central Nervous System Diseases

    Get PDF
    During the last decade there has been a growing interest in studying extracellular vesicles, in particular exosomes and their miRNA contents. Exosomes are released by almost all cell types. They are packed with specific information, stable against degradation processes, are small and flexible enough to cross the blood-brain barrier (BBB), and are readily found in biological fluids including blood. MicroRNAs (miRNAs) are involved in nearly every cellular process and play a regulatory role in central nervous system (CNS) associated diseases. Accordingly, exosomal miRNAs could be ideal biomarkers to measure CNS disease activity and treatment response. In this thesis, the aim was to establish a robust protocol to investigate whether the differential expression of serum exosomal miRNA can be used as a biomarker for the accurate diagnosis of the CNS diseases multiple sclerosis (MS) and glioblastoma multiforme (GBM), as well as for the monitoring of disease progression and treatment response. Exosomes were purified from serum and their RNA contents profiled using highthroughput sequencing. In my first study, I profiled exosome–associated miRNAs in serum samples from MS patients and identified distinct biomarkers for the diagnosis of MS and identification of the disease subtype. In my second study, I investigated the effect of treatment in MS patients. I hypothesised that the deregulation of serum exosomal miRNAs is associated with the efficacy of therapy and is predictive of MS activity phases. Finally, I studied serum exosomal miRNA profiles to discover diagnostic biomarkers for GBM, and to demonstrate the applicability of my protocol to other neurological diseases. Taken together, my results demonstrate the exceptional utility of serum exosomal miRNA profiles as a blood-based biomarker to diagnose the CNS associated diseases, using a robust and easily reproducible protocol

    Pharmacogénomique de la sclérose en plaques : méthodes et applications

    Get PDF
    L'expansion ainsi que l'évolution du domaine de la génétique au cours de ces dernières années a été fulgurante. Cela s'accompagne par la génération d'une masse importante d'information génétique sur les traits complexes chez l'homme. Une question naturelle est de savoir comment utiliser cette information dans la pratique médicale quotidienne. Il y a dix ans à peine le séquençage du génome humain nécessitait une collaboration scientifique d'envergure internationale entre les différents acteurs de la recherche biomédicale. Aujourd'hui, il n'est pas exclu à ce que, dans un avenir proche, on puisse obtenir le profil génétique de chaque patient dans la pratique médicale courante. La pharmacogénomique, une fusion de la pharmacologie et de la génomique, vise à déterminer le traitement le plus approprié à chaque patient en fonction de son patrimoine génétique. En effet, plusieurs études pharmacogénomiques ont pu démontrer l'intérêt d'intégrer l'information génétique du patient pour déterminer son traitement optimal. Le cas de la warfarine, un anticoagulant, a souvent été considéré comme l'un des succès les plus motivants pour poursuivre ce type d'études. Cependant, le succès ainsi que le besoin de ces études dépendent de multiples facteurs et varient considérablement selon les traits étudiés. L'objectif de ce travail est d'évaluer l'état actuel des connaissances pour la sclérose en plaques (SEP), une maladie neurologique invalidante touchant principalement les jeunes adultes. À ce jour, il n'existe aucun remède à la SEP, mais il existe des traitements modificateurs de la maladie avec des degrés d'efficacité et de toxicité variable. Les facteurs génétiques qui influencent la réponse au traitement chez les patients atteints de SEP sont à ce jour mal connus. Même si ces facteurs peuvent être mis en évidence dans le futur, il n'en demeure pas moins que leur utilisation en routine clinique n'est pas aussi simple que supposée. Dans ce travail, nous avons essayé de mettre en évidence la complexité du passage de l'utilisation de données génétiques à grande échelle à la pratique médicale pour les traits complexes. Nous avons mené des études d'association et de prédiction. Tout d'abord, nous exposons leurs concepts et revisitons les différences dans leurs objectifs. Plus précisément, nous avons effectué une analyse d'association simple-marqueur de la réponse à l'interféron-bêta chez les patients atteint de SEP. Ensuite, nous avons comparé les modèles simple-marqueur et multi-marqueur dans le contexte de la recherche d'association puis dans celui de la prédiction en utilisant des données réelles et des données simulées. Différentes approches de modélisation multi-marqueur existent. Nous nous sommes basés sur l'analyse des scores polygéniques et des méthodes d'estimation bayésienne en évaluant plusieurs des propriétés de ces approches de modélisation. Nos résultats montrent que, dans la cadre d'une étude d'association pangénomique, les modèles multi-marqueurs, récemment préconisés, ne sont pas forcément plus puissants que les modèles classiques simple-marqueur. En revanche, les modèles multi-marqueurs qui prennent en compte l'effet de plusieurs marqueurs simultanément apparaissent clairement mieux adaptés pour prédire le risque génétique. Néanmoins, en se concentrant sur l'analyse des scores polygéniques, nous montrons que de nombreux facteurs comme la taille de l'échantillon de l'étude et l'héritabilité du trait influencent la performance prédictive d'un modèle. Les études pharmacogénomiques peuvent révolutionner les soins aux patients. Cependant, en dehors de l'enthousiasme qu'elles peuvent susciter, nous discutons dans la dernière partie de cette thèse les questions sociales, éthiques et économiques qu'elles soulèvent.The field of genetics is rapidly expanding and evolving. As more and more is understood on the genetics of complex human traits, a natural question arises as to how these findings can be translated to the everyday medical practice. While a little more than a decade ago sequencing the entire human genome was achieved by the largest international scientific collaboration ever undertaken in biology, today it is not farfetched to expect that in the near future obtaining the genetic profile of each patient may become routine medical practice. Pharmacogenomics, a blend of pharmacology and genomics, aims to determine the most suitable treatment for each patient as a function of his or her genetic makeup. Pharmacogenomic studies have increasingly provided evidence that there are gains to be achieved by incorporating genetic information when determining the optimal treatment choice for a patient. The case of warfarin, an anticoagulant, has often been considered as one of the most motivating success stories to pursue such type of studies. The success as well as the need of such studies, however, depend on a multitude of factors and vary greatly across traits. The objective of this thesis is to evaluate the current state of the art for Multiple Sclerosis (MS), a debilitating neurological disorder affecting primarily young adults. To date, no cure exists for MS but a number of disease-modifying therapies have been approved with varying degree of efficacy and toxicity. So far, little is known on the genetic factors that influence response to treatment in MS patients. Moreover, even if such factors are known apriori, evaluating and proving their utility at the clinical level is not as straightforward as one may be inclined to think. In this thesis, we highlight why the road to translate such findings to medical practice remains rough and challenging. In particular, relying on the association and prediction studies that we have conducted, we expose the design and limitations of each and discuss model choice in each context. Specifically, we conducted single-marker association analysis of response to interferon-bêta in MS patients. We compared single-marker to multi-marker models in the context of association and also in that of prediction using both real and simulated datasets. Different approaches to multi-marker modeling exist. We focused on polygenic score analyses and Bayesian estimation methods and evaluated several of the properties of these modeling approaches. Our findings showed that, in the context of association, the use of more complex and computationally heavy multi-marker models that has been recently advocated may lead to little, if any, benefit over the classical single-marker association analysis. On the other hand, multi-marker models that take into account the effect of many markers simultaneously clearly appear better suited to predict genetic risk. Nevertheless, focusing on polygenic score analyses, we demonstrated that many factors such as the study sample size and the heritability of the trait influence the predictive performance of a model. Pharmacogenomic studies may revolutionize patient care. However, in all the excitement of the promise that they hold, in the concluding part of this thesis we also address the social, ethical and economic issues that they raise
    • …
    corecore